Wiktionary:Beer parlour

(Redirected from Wiktionary:BP)

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit

September 2021

Quicker deletion/removal of uncited RFVsEdit

Voting on: Reducing the time period for closing/removing/deleting an RFV which is not cited. To be clear, this will apply to RFV-sense as well as non-English RFV nominations.

Rationale: To quickly close RFVs, which clearly don't exist. For example, if a user sends an entry to RFV, which clearly doesn't exist, like Hindi ओगाज़्म (ogāzm)[RFV Discussion], but cannot be speedied per the rules. In such a case waiting a month should not be required.

As the wording contains the discussion may be closed, this doesn't mean that if there is an ongoing discussion it has to be stopped. The entry can stay in RFV for a longer time if any important discussion is going on.

Disadvantages and solutions: A possible disadvantage of this is that in less time the chances of the entry going unnoticed is higher. This is not a problem if the entry creator keeps looking at the pages created by them in the past through their watchlist and if the editors of the language keep checking the categories to see which entries are in RFV. Even if this doesn't happen, and a valid entry is deleted before a knowledgeable editor could intervene, its undeletion can be requested in the future with valid citations. The entry creator and/or the editor(s) of that language might also be notified at the time of nominating or before closing.

If both options pass, the one with the larger Support:Oppose ratio will be executed.


  • Vote starts: 06:12, 1 September 2021 (UTC)
  • Vote ends: 06:12, 21 September 2021 (UTC)
  • Vote created: —Svārtava2 • 06:12, 1 September 2021 (UTC)[]

Option 1: 2 weeksEdit


  1.   Support as proposer —Svārtava2 • 13:50, 1 September 2021 (UTC)[]
  2.   Support. I guess we're saying this is a speedy-delete option, just as we have for RFD, that can be used from time to time when an entry's unattestability is beyond debate. I support it in that very limited capacity, but this option should be used sparsely to avoid the concerns DCDuring poses (and which I share). Imetsia (talk) 15:27, 1 September 2021 (UTC)[]


  1.   Oppose Even many English terms need time to allow their being cited, especially during months when few are here to do so. To rush the process will lead to further degradation of the quality of citations, which often do not unambiguously support the definitions they supposedly support. DCDuring (talk) 12:34, 1 September 2021 (UTC)[]
  2.   Oppose Equinox 15:08, 1 September 2021 (UTC)[]
  3.   Oppose To my mind, the reasons offered are not sufficient to warrant making such a change and the timing should be left as is. Geographyinitiative (talk) 16:16, 1 September 2021 (UTC)[]
  4.   Oppose. We already speedy-delete obvious cases, but the most are not that easy. —Μετάknowledgediscuss/deeds 16:51, 1 September 2021 (UTC)[]
    Not always. ·~ dictátor·mundꟾ 19:47, 2 September 2021 (UTC)[]
  5.   Oppose. As DCDuring says above. I've been trying to work out what stops me slapping RfV on a thousand oik-deletable Thai senses for an enthusiast like svartava2 to then delete. The best hope would be for me to blocked as being disruptive, but I'm not sure I would be. Even the example above seems a bad one. Treating the nukta as optional, I found 6 examples of the word being used in what looked like Hindi to me. I think the word exists, it's just that the examples didn't seem durably archived. --RichardW57 (talk) 17:38, 2 September 2021 (UTC)[]
  6.   Oppose. There is no need to rush to delete things (assuming that existing speedy-delete rules for patent rubbish are working successfully). (By the way, on a somewhat different point, did someone comment somewhere that RFV'd entries may be deleted simply because no one has properly attempted to cite them? This is more of a concern to me. Not that I am demanding that there should always be someone available and willing to do this work, just that there should be a mechanism to show when reasonable effort/attention has been applied.) Mihia (talk) 22:00, 3 September 2021 (UTC)[]
  7.   Oppose. The proposer has many virtues, but patience isn't one of them. Not everyone checks in every day, or even every week, and verification can be quite time-consuming. We shouldn't be increasing the risk of deleting real words just because someone finds it painful to go through all the steps and wait for the results. Chuck Entz (talk) 23:19, 3 September 2021 (UTC)[]
  8.   Oppose Why is this in the form of a vote? --{{victar|talk}} 18:38, 7 September 2021 (UTC)[]


  1.   Abstain. ·~ dictátor·mundꟾ 19:50, 2 September 2021 (UTC)[]

Option 2: 3 weeksEdit


  1.   SupportSvārtava2 • 13:50, 1 September 2021 (UTC)[]
  2.   Support - seems balanced enough. Rishabhbhat (talk) 13:53, 1 September 2021 (UTC)[]
  3.   Support. See above. Although at this point, how much time are we really saving with the deletions? Just seven days? Imetsia (talk) 15:27, 1 September 2021 (UTC)[]


  1.   Oppose See above. DCDuring (talk) 15:15, 1 September 2021 (UTC)[]
  2.   Oppose To my mind, the reasons offered are not sufficient to warrant making such a change and the timing should be left as is. Geographyinitiative (talk) 16:16, 1 September 2021 (UTC)[]
  3.   Oppose. We already speedy-delete obvious cases, but the most are not that easy. —Μετάknowledgediscuss/deeds 16:51, 1 September 2021 (UTC)[]
  4.   Oppose. I still haven't got round to writing up the quotation for RfV'd कदाय (kadāya). Maybe I should just mistype the quotation and leave someone else to do the translation. --RichardW57 (talk) 17:38, 2 September 2021 (UTC)[]
  5.   Oppose. See no need for this marginal change. Mihia (talk) 22:28, 3 September 2021 (UTC)[]
  6.   Oppose weakly. This wouldn't make much difference and I don't really see a problem with the status quo. Equinox 22:30, 3 September 2021 (UTC)[]
  7.   Oppose per my comments above. Chuck Entz (talk) 23:20, 3 September 2021 (UTC)[]
  8.   Oppose --{{victar|talk}} 18:38, 7 September 2021 (UTC)[]



Both options failed.

Option 1
Option 2

Wiktionary:Votes/2020-07/Removing letter entries except TranslingualEdit

I would like to be able to finally launch this vote (it seems long past due), so if anyone wants to give any last feedback on it, now's your perfect chance. Thadh (talk) 11:40, 1 September 2021 (UTC)[]

I am very much   Strongly opposed to this proposal just because letters are words just like anything else, though the second option is better. Moving letters to translingual entries rids the dictionary of important information that the letters entries bring such as:
  • Knowing the alphabetical order in a language
  • Connecting the letters to their letter names
  • Seeing the pronunciations of each letter, whether the phonemic pronunciation or the pronunciation of the letter name
Among other issues. Also, it's important to note the languages that don't follow the typical a~z 26-letter order; those languages having their own entries is important because a translingual entry (if it even exists) wouldn't tell me that gb is a letter in Yorùbá, or that ʻ is a letter in Hawaiian, without some serious modification. Would we lose the mutation tables for Welsh rh, the inflection tables for Hungarian dzs, the important etymological & historical information for Jeju & Korean (yaw) (let alone all the other Hangul letter entries), the pronunciation information for Thai (kɔɔ), and more? Some of these entries do not belong in translingual, and we'd be deleting so much important information. The proposal seems to be partially related to combatting the Lua memory errors that are happening in a, but I do not think that it's the best solution, and seems very much Latin-script-centric without considering the languages that don't use the script. Even moving the entries to the Appendix would cause issues, as trying to fit them on one alphabetical page would be a nightmare for a bunch of languages. I would seriously make sure to talk with editors from other not-as-represented languages before launching such a proposal. AG202 (talk) 14:24, 1 September 2021 (UTC)[]
That's why I'm posting it here! ;) Now, the main question is, are most of these informations something we want in a dictionary? You see, from where I'm standing, alphabetical orders are rarely set, and even when they are, it seems more the domain of Wikipedia, rather than Wiktionary. Sure, the etymological information of certain letters may be interesting, but it's not something we can't place under a Translingual section, and pronunciations aren't often useful, because either the language is phonemic (so the pronunciation will be present wherever, including the About: page and Wikipedia) or it isn't (in which case there'll be a number of different pronunciations, so the pronunciation section will be unusable). I want to make clear though that reducing memory usage isn't the goal of the vote, it's a benefit. Thadh (talk) 16:58, 1 September 2021 (UTC)[]
Yes I'd say that that's information that we'd want in a multilingual dictionary if we're truly aiming for "all words in all languages". Etymological information would not work in translingual for issues such as Jeju & Korean (yaw), as you can see that the history is different for both Jeju & Korean, unless you really want to have multiple etymology sections for translingual. Putting all these entries in translingual sounds like more of a mess than what we have right now, if I'm being completely honest. I'm not sure what you mean by alphabetical orders are rarely set, as most languages do have an alphabetical order. In regards to Wikipedia, it's lacking a bunch of information that's already listed here, and I'd rather not have a hypothetical of deleting important info here, just because it might appear on Wikipedia. It's also just generally interesting to see how letters have changed and developed between languages, such as seeing how c changes completely between languages, giving users those comparisons up front. Regardless though, that still doesn't address the issues brought up with Welsh & Hungarian about inflection & mutation tables which are important enough to be included with those entries. If the Lua Memory Error problem is not the goal of the vote, then what exactly is the goal? It's really hard to see how much else is worth deleting such important information for a bunch of languages on this website. If anything, Rua's proposal was the best, but I'm still very wary of that one as well. AG202 (talk) 17:28, 1 September 2021 (UTC)[]
“letters are words just like anything else” … “gb is a letter in Yorùbá”—essentialist delusions, nothing follows from it. And most is beside the point: that “we'd be deleting (so much) important information” would have to be illustrated since it is intended to move information instead of deleting it.
Maybe we should split the vote to do it for Latin letters first and have the option to go back? Just to see how it works? Would it make it more or less Latin-script-centric? 🤷 Fay Freak (talk) 17:32, 1 September 2021 (UTC)[]
@AG202 I've created a set of examples to illustrate a possible implementation of option 2: User:Thadh/Translingual/a, User:Thadh/Appendix:List of languages using the letter "a", User:Thadh/Appendix:List of Afar letters. Any history on the alphabet could be placed at the second appendix, but since orthographical history isn't really my domain, I omitted it for Afar. Mutations? Just as simply. How the appendix will look like is mostly up to the editors that make them (i.e. members of the language's community).
The point of this vote is: Why do we have this information clogging up in the mainspace? We could have over 3624 entries of the letter a alone, and that would mean a lot of entries to sift through just to get to the one you want. Thadh (talk) 20:34, 1 September 2021 (UTC)[]
It's hard to see how an Appendix like that would work for etymologies, mutations, inflections, & more complex pronunciation systems, without it turning into normal entries for each letter on the Appendix. It just doesn't cleanly line up like that if it's a more complicated letter system. And re: clogging up the namespace, I think there really needs to be a better solution. Mi has the same Lua memory errors and has more languages listed than o & e individually. If you look at a more closely, most of the entries aren't even about the letter (50 letter entries compared to 134 languages with even more etymologies), with for example Scottish Gaelic a having nine etymologies, none of which are about the letter, so regardless we'd still have that issue of Lua errors to fix (the Scottish Gaelic entries should remain obviously, just an example). managed to fix its error issue, so I think more ideas could be brought up before removing letters in general. Also, I wouldn't consider some of the more pertinent information to be clogging up the mainspace. If the goal is to fix clogged entries and lua memory errors and the like, then a more encompassing solution should be found. What's being proposed right now is a damning short-term solution to a much longer-term problem. AG202 (talk) 21:38, 1 September 2021 (UTC)[]
@AG202: See User:Thadh/Appendix:List of Welsh letters: There is every possibility to host any number of information in a dedicated appendix without it having to turn into full-fledged entries. So if your main concern is only that information will be lost, I hope you're satisfyingly convinced it won't with an adequately executed option 2. Thadh (talk) 22:43, 1 September 2021 (UTC)[]
Thank you, but I’m still wary about it, especially when it comes to non-Latin script-based letters. If we could have more examples in the vote beforehand, possibly, but I’m iffy about voting for a proposal like that first before we know how it’ll look. I’m still more in favor of fixing the long term issue of Lua errors before deleting all letter entries. After all, there are many reasons why the vote was cancelled originally, let alone the ones I’ve mentioned. I do appreciate the dialogue though. AG202 (talk) 22:57, 1 September 2021 (UTC)[]
The vote wasn't ever 'cancelled', I just didn't have the time or motivation to find out how to officially start it XD. I wouldn't oppose more discussion on the implementation beforehand, but I'm not optimistic it's going to happen, since it'll require quite a bit of thought and participation from many community members of the concerned languages, while the result of the vote isn't even known yet! Anyway, I'm happy I've cleared the air a little. Thadh (talk) 23:19, 1 September 2021 (UTC)[]
Oops sorry the vote originally put by Metaknowledge was cancelled iirc. But yes regardless, I look forward to more discussion and hopefully more editors can get involved! AG202 (talk) 23:29, 1 September 2021 (UTC)[]
@Thadh Quick followup, for cases such as English A where there are derived terms, or usage notes such as in Hungarian y, or references as in Latin v, or multiple meanings as in English Y, how would those be best addressed concisely? The more I look at different language's letter entries, the more information I see that would make it hard to concisely put things on a single appendix page without having aforementioned entries or very long pages. AG202 (talk) 01:32, 2 September 2021 (UTC)[]
@AG202: I'll go example by example on this:
  • The derived terms given at A are actually derivatives of the symbol, and the symbol should of course stay; there's no way of knowing "A" means a letter grade.
  • The usage examples given at y seem a bit wordy to me, but if we want to keep them, it can be given in an appendix under an L2 depicting this issue, or under an L3 within History or something like that.
  • Y being an upsilon is a good one, but not something we can't host at the translingual section, since I don't think it is language-specific
  • Finally, v having references can be resolved by either giving these references at the appendix (since it's monolingual, the length of the references shouldn't be an issue, just like on Wikipedia) or just deleted as unimportant, on the community's discretion.
I hope these solutions are okay with you. Thadh (talk) 09:51, 2 September 2021 (UTC)[]
@Thadh Apologies for the late response, but those solutions, while I still think more people should be involved from other communities, especially non-Latin-script-based ones, are alright for now. AG202 (talk) 15:16, 13 September 2021 (UTC)[]
And then I also wanted to point out the case of archaic/obsolete letters or letters that aren't used in an alphabet and what the case would be for those, example being at for Korean. AG202 (talk) 15:18, 13 September 2021 (UTC)[]
Those, too, can be depicted in a history section or something like that. Thadh (talk) 15:20, 13 September 2021 (UTC)[]
Alright, I've been going through Category:Letters by language and there's quite a bit that still needs to be addressed especially from those communities, such as sign language letters, braille, language-specific morse code, and more, so I really hope that more people get involved. AG202 (talk) 15:24, 13 September 2021 (UTC)[]
If your intent is to foster discussion, then don't call genuine concerns delusions, or else I will not engage further. Gb is a letter in Yoruba, if you knew a single thing about the language. Regarding, "would have to be illustrated since it is intended to move information instead of deleting it.", if you look at the actual vote, one of the options is literally "Option 1: Remove all these entries. Update the CFI to not include these.", which would lead to mass deletion, and then the second option could still lead to the loss of information. AG202 (talk) 21:22, 1 September 2021 (UTC)[]
I can't imagine option 1 passing, and I think Thadh should just remove it from the vote altogether. —Μετάknowledgediscuss/deeds 00:35, 2 September 2021 (UTC)[]
Good point. At this point even I'm not in favour of it, so I'll remove it tomorrow right now. Even so, I think the discussions concerning how the appendices would look like should be left to the individual communities and that we shouldn't wait for these discussions to be finished before voting. Thadh (talk) 01:03, 2 September 2021 (UTC)[]
So the merits of the project depend on whither we move the content and how we point to it. Of course we would then have to account for the fact that some letters are two letters, so to say. So just this circumstance and that “letters are words” of course did not evoke the supposition that nothing could or should be done.
Of course I did not assume that “remove” could actually pass separately from “move”. In the end editors would still decide to write something to an appendix or somewhere unless it was specifically excluded, which it wasn’t.
The question arises how to move stuff while 1. not causing new module errors by the copy of content 2. not causing too much work either due to restructuring and reformulating 3. not duplicating Wikipedia by kind of writing articles on writing systems. Creating overview indexes linking to subpages housing letter entries? This sounds lame enough.
Of course, moving the Chinese characters would be a lot of work for no obvious benefit. It is hard to imagine how all the information on or could be ported somewhither. Are we intending to separate the meanings of the words described from the history of character usage? I can imagine one can heavily disagree on what goes where and better lives on with the module error because there one knows what works.
In the end, if we can only lump or split else, dynamic content fetching is unavoidable. Perhaps we want a “glyph information” tool that only fetches information requestes for a language. With correct TAB key implementation that would be faster to use than A with 200+ sections. It would be like those other Unicode sites but with language-specific information, and information valid for sundry languages.
(Actually, what we want is Lua more memory. Let’s just create more three-letter entries for obscure languages to show by way of more module errors that the issue is pressing. Tiffs about letters won’t help then. I mean man, instead of creating wahtuh for every language, pick out them short words. Would be so epic you already regret the letter topic. Headline “Wikimedia does not afford enough RAM for the simples!”) Fay Freak (talk) 02:03, 2 September 2021 (UTC)[]
@Fay Freak: before this goes outta hand, I want to stress: CJKV CHARACTERS ARE NOT PART OF THE VOTE (just like any hieroglyph). They don't have the header "letter" for the simple reason they aren't one. Now, concerning how the appendices would look like, see my examples posted above. Thadh (talk) 09:51, 2 September 2021 (UTC)[]
I note that for English, 'a' includes entries as both a letter ('The first letter of the English alphabet, written in the Latin script') and a noun ('The name of the Latin script letter A/a'). Are both of these to be moved? Basque does much the same, and I didn't look beyond that. Entries like English cee, and Pali ra and rassa escape the removal from mainspace as worded because they are currently recorded as nouns. How great is the risk that they will be reclassified as 'letters' or even 'letter forms'? I very much want a simple search for 'rassa' to find the entry for the case form of a name of a letter. --RichardW57m (talk) 12:17, 2 September 2021 (UTC)[]
@RichardW57m: See the discussion page of the vote. These nouns are not part of this vote. Thadh (talk) 13:03, 2 September 2021 (UTC)[]
@Thadh: Does "L3-header" include "L4-header" when parts of speech are demoted to level 4 by the inclusion of a grouping L3-header such as "===Etymology 1===="? An example of this is a. Should entries with noun headwords be split from letter 'L3-headers'? Same example. --RichardW57 (talk) 14:33, 3 October 2021 (UTC)[]
That's why it's L3 and/or the headword template. I think it's pretty straightforward what the vote's objective is (delete all non-translingual letter entries), we don't need to set everything in stone. Thadh (talk) 15:51, 3 October 2021 (UTC)[]
Huh? This one is L4 and {{head|LANG|noun}}, so you don't catch it at all! --RichardW57 (talk) 20:47, 3 October 2021 (UTC)[]
The noun isn't part of the vote. The letter has the header "letter". Thadh (talk) 21:01, 3 October 2021 (UTC)[]
The nesting of headers and calls of {{head}} is: a, Welsh, Etymology 1, Letter, {{cy-noun}}. There is only that one call of {{head}} or equivalents. As I understand you, you're saying we remove nothing of the letter, because it is all formally serving the the noun entry. I think the item has to be split into letter and noun. --08:05, 4 October 2021 (UTC)
@RichardW57: Nouns have to obide by the nouns' Criteria for inclusion, so at least one (LDL) or three (WDL) use(s) of the letter as a noun denoting the letter, which is not the case with letter entries (since you could use any noun using the letter as a verification). Thadh (talk) 12:35, 4 October 2021 (UTC)[]
@Thadh:: That's not an issue in this case, as the letter and noun senses are already separate senses. Demonstrating letters is not as easy as you suggest - the surname Lhuyd doesn't make 'Lh', let alone 'uy', a modern Welsh letter. RichardW57m (talk) 14:05, 4 October 2021 (UTC)[]
Pinging the Korean workgroup since this would involve the deletion and move of Hangul entries such as (g), (aw), (f), and more, and KSL entries like 𝠀𝪜, and I'm still wary of putting everything cleanly in an Appendix. (Notifying TAKASUGI Shinji, Atitarev, HappyMidnight, Tibidibi, B2V22BHARAT, Quadmix77, Kaepoong): AG202 (talk) 15:31, 13 September 2021 (UTC)[]
The first of them looks as though it should actually be translingual rather than Korean! Sign language may be messier. --RichardW57m (talk) 16:43, 13 September 2021 (UTC)[]
To be honest, sign languages is a whole foreign word for me, so I have no idea what the practices are, what a good solution is for them and whether their letters are at all comparable to those of written languages. Thadh (talk) 16:55, 13 September 2021 (UTC)[]
We should probably have people from sign languages weigh in and/or exclude them from the policy until they do. @RichardW57m I disagree as a lot of the information there is specific to Korean and may not apply to other languages that have used the script & letter such as Cia-Cia. AG202 (talk) 19:19, 13 September 2021 (UTC)[]

New Android app based on Wiktionary - VedaistEdit

Hello everyone. I've just released a new wiktionary based english dictionary app for Android users called Vedaist. For those interested, give it a try at the Play store.

The app has a minimal interface compared to wiktionary and IMHO is a better experience on a mobile browser. Currently the app has around 750,000 words with meanings and images where possible. There are also no ads in the app. I did release an iOS version in late June and this is the second platform. The android version currently lacks features like setting personal goals, but those will be added soon.

Thanks again for building wiktionary. If there is any feedback for me, please reach out.

Toucanvs (talk) 05:23, 2 September 2021 (UTC)[]

Hey @Kiril kovachev. Wanted to cc you in case you are interested in the android version. Toucanvs (talk) 05:24, 2 September 2021 (UTC)[]
You mention Vedaist is powered by Wikipedia sites and sadly not Vedaist is powered by Wiktionary. I guess it's because few people in the real world give a shit about Wiktionary. Want to correct this? TVdinnerless (talk) 00:37, 3 September 2021 (UTC)[]

Modifying the page WT:WDLEdit

I had proposed a cleanup of the arrangement of the languages at Wiktionary talk:Criteria for inclusion/Well documented languages#Request for cleaning up some while ago. Someone might want to edit the page accordingly or continue the discussion. ·~ dictátor·mundꟾ 19:32, 2 September 2021 (UTC)[]

@Kutchkutch: Hi. I think no one cares about making minor changes to the list; you can go ahead and change it. ·~ dictátor·mundꟾ 10:45, 8 September 2021 (UTC)[]

Treatment of Early Modern Korean?Edit

(Notifying TAKASUGI Shinji, Atitarev, HappyMidnight, Tibidibi, B2V22BHARAT, Quadmix77, Kaepoong): @LoutK, Mujjingun

What is the best way to deal with the lemmatization of Early Modern Korean? There are some unresolved questions regarding the status quo merger of EMK and Contemporary Korean:

  • Should Sino-Korean words attested only in EMK be lemmatized in their modern readings, or in the Hangul form of the time?
For instance, 『이언언해/易言言解』 has 긔긔션 (guiguisyeon), which is obviously 기기 (機器, gigiseon). Should this be lemmatized as 긔긔션 (guiguisyeon), or as 기기선 (gigiseon) with 긔긔션 (guiguisyeon) being a soft redirect?
Alternatively, should EMK words attested only in hanja form be lemmatized at their EMK readings at the time, or in the modern reading?
I currently favor using modern Sino-Korean readings for all EMK words.
  • Should non-SK words with obsolete orthographies be modernized, even if the modernized spelling is not attested, for consistency with modern words?
For example, 졀ᄯᅡ빗 (Yale: cyelstapis) in 『한청문감/漢淸文鑑』 would be 절따빛 (jeolttabit) today, but only 절따말 (jeolttamal) is found in dictionaries of contemporary Korean. Should the lemmatization be consistent with 절따말 (jeolttamal), or faithful to the EMK spelling?
I note that Oxford English Dictionary would artifically modernize Middle English spellings in cases corresponding to this.
  • Should EMK words whose regular reflexes are now dialectal be redirected to modern dialectal forms, or the standard forms? For example, should 짐츼 (jimchui) be an Early Modern form of 김치 (gimchi) or of 짐치 (jimchi)?

A definitive solution to these issues would be to spin off Early Modern Korean as another L2. The big downside is that this would lead to immense duplication of content across Korean and EMK. Or, more realistically, the Korean entries will have all the relevant information and the EMK entry will have a neglected single-word gloss. We can already see this by comparing French and Middle French entries, e.g. Middle French faire is in quite a pitiful state compared to French faire. Polysemous or tricky EMK words like ᄒᆞ다 (hᆞda) might be much better served by just being soft redirects to the comprehensive entry at 하다 (hada).

What should be done?--Tibidibi (talk) 15:34, 3 September 2021 (UTC)[]

Edit: in conventional monolingual sources, EMK is not treated as part of the contemporary language.--Tibidibi (talk) 16:01, 3 September 2021 (UTC)[]
I'm in   Support of splitting EMK. Since it's said that even educated natives not trained in EMK struggle with the language, it should be split, let alone the high amount of obsolete characters, Hanja, & more used in the language. Also, it'll be easier with connecting etymologies since Modern Korean or even dialectal words that derived from EMK forms will be able to point to a specific entry under the EMK header. EMK words attested only in Hanja form should be lemmatized at the EMK readings imho as that's what was attested, and we shouldn't be putting lemmas at unattested or modernized forms (unlike what Oxford does). AG202 (talk) 23:40, 3 September 2021 (UTC)[]

I think Early Modern Korean should be treated as part of "Korean" and there should be redirects to the entry for the Modern 표준어 form whenever possible. If there is no corresponding modern reflex, then one of the variant forms should be chosen arbitrarily and all the variant forms should redirect to that. Also, the original orthography should be preserved, and Hanja terms should not be transcribed into Hangul. English Wiktionary seems to do something similar, redirecting the archaic spelling of speake from Early Modern English to the modern standard form speak.--Mujjingun

@Mujjingun I disagree that "Hanja terms should not be transcribed into Hangul". I'm not sure to what extent you mean this, but while the majority of EMK texts are written in pure Hangul, there are also ones that use Hanja. If I'm understanding you correctly, you mean that terms attested only in the latter should be lemmatized at their Hanja form, while terms attested in the former should be lemmatized at Hangul. Variably lemmatizing according only to the source in which they are found will severely inconvenience the reader, especially when this is the result of inconsistent orthographic practice, not some ulterior logic behind it.
Regarding the original orthography for words with no modern reflex, conventional sources like the 우리말샘 dictionary do not have the problem of consistency because EMK is not actually treated as part of Korean proper; they are treated as 옛말 (yenmal) and do not have full definitions, all being soft redirects. But if we consider EMK to be akin to one of the modern dialects, I do suppose that preserving the original orthography is valuable. For example, it would be dumb to redirect Yukjin Korean 아심탢다 (asimtaenta) to the theoretical cognate *아심찮다 (*asimchanta).
I still believe that all (obvious) EMK Sino-Korean terms should be lemmatized at their present-day readings, not at the actual EMK forms. SK words are particularly likely to be rewritten in modernized form. When works like the 『청구영언/靑丘永言』, which are written in mixed script, are republished, the Sino-Korean words are given with modernized readings. Extending from this principle, 기기선 (gigiseon) would be preferred to 긔긔션 (guiguisyeon).--Tibidibi (talk) 02:06, 4 September 2021 (UTC)[]

Etruscan topicEdit

Is it possible to add to the words of the topics categories the word "Gentes" for the notable families in the Etruscan culture?--BandiniRaffaele2 (talk) 15:46, 3 September 2021 (UTC)[]

I need this for my Category:ett:Gentes.--BandiniRaffaele2 (talk) 06:10, 4 September 2021 (UTC)[]

We have Category:Latin nomina gentilia; presumably the Etruscan category should follow the same naming scheme? —Μετάknowledgediscuss/deeds 07:15, 4 September 2021 (UTC)[]
@Metaknowledge: I think yes.--BandiniRaffaele2 (talk) 09:01, 4 September 2021 (UTC)[]

Hard redirect: ꝛ and ſEdit

Words spelt with the r rotunda and the long s generally get redirected to the normal spelling with r and s, but not always. For example, ‘noꝛ’ does not get redirected to ‘nor’, or ‘Iſrael’ to ’Israel’. Any explanation for this? ·~ dictátor·mundꟾ 16:24, 3 September 2021 (UTC)[]

The automatic redirection only works if there is a unique page with the r or s. noꝛ doesn't automatically redirect to nor, because the software doesn't know whether you want nor, Nor, NOR, or what. But noꝛꝛ does automatically redirect to norr, because there is only one page name with those letters. —Mahāgaja · talk 17:39, 3 September 2021 (UTC)[]
Oh, that makes sense. But, ideally, a lowercase word should redirect to only the corresponding lowercase word; or in case of ‘Iſrael’, the software should only consider the letter I sans diacritics. I personally think that would be better, though there’s not much advantage of that otherwise. ·~ dictátor·mundꟾ 17:57, 3 September 2021 (UTC)[]
This is why I think we should use actual #REDIRECT pages instead of relying on the software to redirect for us. —Mahāgaja · talk 18:05, 3 September 2021 (UTC)[]
@Mahagaja: Are we actually allowed to redirect such spellings (of course, if they are attested, as in the KJV, etc.)? If it is noncontroversial, then I myself am willing to do that. ·~ dictátor·mundꟾ 10:23, 5 September 2021 (UTC)[]
I am not hugely enthusiastic about obscure- or obsolete-character versions of ordinary words redirecting to the normal-character versions with no explanation. Generally speaking, although I create them myself, I dislike automatic redirects altogether. The reader is thrown to a different entry to what they typed with usually no indication of why, and only a very missable indication that it has happened at all. Mihia (talk) 21:09, 5 September 2021 (UTC)[]
We have a template {{obsolete typography of}}. For example, English haue is defined as “obsolete typography of have”. I think we can likewise define againſt as “obsolete typography of against”, and so on and so forth. If someone is willing to go through an heroic effort of creating redirects to normalized spellings, with not too much extra effort they can use this method. It is IMO a bit awkward though when the same obsolete typographic form applies to a word with several valid part-of-speech assignments, an issue also present for misspellings; why is acount not listed as a verb so as to acount for such occurrences as found here?  --Lambiam 11:42, 6 September 2021 (UTC)[]
@Lambiam: Should we then create entries for words spelt with ꝛ and ſ, using that template? ·~ dictátor·mundꟾ 13:37, 7 September 2021 (UTC)[]
My opinion does not carry more weight than that of others, but for the terms I can think of that appears (to me) the best currently available option. It has the additional advantage that it does not stand in the way of regular entries that happen to have the same spelling, such as German haue.  --Lambiam 15:13, 7 September 2021 (UTC)[]
Another issue to consider on this subject: long ſ is not always equivalent to short s. In old Serbo-Croatian texts one common orthography kept them totally distinct, using ſ for /z/ and s for /s/, and similarly ſc for /ʒ/ and sc for /ʃ/. This also suggests manually-created pages are a better option than automatic software redirects, at least as far as long ſ is concerned. — Vorziblix (talk · contribs) 09:21, 10 September 2021 (UTC)[]
What's the name of that orthography? When was it used? The w:Long s link helps a bit more than a link to one random work, as it does point out several minor, historical orthographies that used it.
YILDIZ redirects to Yildiz, not yıldız. Given that Turkish is spoken by 80 million people and uses that spelling today in standard Turkish, I think that's a bigger problem. In either case, but especially in those obscure long-s-using orthographies, I think reflecting the needs of English speakers is more important. If you do have to create an entry with ſ in it, it won't automatically redirect away from it. To create manual redirects for every word used in every European language until about 1800 is crazy levels of work. Looking at the Unix words list, about one-third of English words have an s not in final position, with it alone listing 24000 words that would need redirects manually created.--Prosfilaes (talk) 09:13, 17 September 2021 (UTC)[]

Past participles - lemmas or notEdit

I was wondering whether Macedonian adjectival past participles, which can modify nouns attributively and decline as adjectives, should be listed as adjective lemmas or non-lemma forms of verbs. On the whole, they have the same properties as English participles, which are adjectival in contexts such as "a shattered vase". I see that for shattered, there is an adjective section, but the treatment of such participles seems to be inconsistent across languages:

  • Russian покрашенный (pokrašennyj) - only non-lemma
  • Italian dipinto - adjective and non-lemma
  • Spanish pintado - only non-lemma (the noun lemma is immaterial to the discussion)
  • Romanian vopsit - only adjective (common practice in Bogdan's entries based on what I've seen so far)
  • French peint - only non-lemma
  • German gemalt - only non-lemma
  • Dutch gemaald - only non-lemma
  • Hungarian festett - adjective and non-lemma
  • Bulgarian искан (iskan) - only non-lemma (closest relative of Macedonian)

In all these languages, the form in question are both verbal (e.g. used in compound tenses, except the Hungarian form, which is a simple tense when verbal) and adjectival (i.e. used to modify nouns and declined as plain adjectives like "happy"). Are the differences in treatment due to compliance with different lexicographical traditions in the countries where the languages are spoken? Why are there inconsistencies even among English entries, e.g. repaired has no adjective section although shattered does? Martin123xyz (talk) 09:53, 6 September 2021 (UTC)[]

Dutch malen has the more common past participle gemalen, which will always be the form used adjectively (gemalen koffiebonen, not gemaalde koffiebonen). This is also listed only as a non-lemma.  --Lambiam 11:57, 6 September 2021 (UTC)[]
Thank you for the clarification Martin123xyz (talk) 12:41, 6 September 2021 (UTC)[]
When the term has an adjectival meaning not fully explained by the semantics of the verb whose past participle it is, it should IMO definitely have an adjectival entry. For example, we now list German gewichst only as the past participle of wichsen, but that cannot explain the sense “clever, cunning”.[1] One test of adjectivality is whether the term can be the complement of the usual copula and can be graded with adverbs like very, or with a comparative and superlative. “The plane has just taken off” is fine; “The plane is taken off” and “This plane is very taken off” are not possible. But “He has been very depressed for a long time” makes a perfect sentence.  --Lambiam 12:19, 6 September 2021 (UTC)[]
I agree that participles should be treated as adjectives when they have some additional meaning that cannot be predicted from the verb. However, the other tests are not so reliable. In Macedonian, "the plane is taken off" is grammatical, and the same goes for many intransitive verbs indicating a change of state. These constructions could arguably be treated as perfect tenses with the copula as an auxiliary of the "Ich bin gekommen" type. However, one could also say "a taken-off airplane" in Macedonian, where the participle is clearly not part of a tense. As for the possibility of using "very", it seems to correlate with the possibility of quantifying the underlying verb. We can say "he saddened him greatly" but not "the plane took off greatly" because taking-off is construed as binary, whereas emotional changes are construed as gradual, regardless of whether a verb or a participle is involved. A more reliable test is whether we can add an explicit agent in a passive construction. "Mary was depressed" cannot normally be expanded into something like "Mary was depressed by John", whereas "Mary was killed" is much more easily expanded into "Mary was killed by John". This naturally poses problems for participles derived from intransitive verbs. Martin123xyz (talk) 12:41, 6 September 2021 (UTC)[]
But "Mary was depressed by the death of her brother" is quite natural - 'depress' does not always have a personal agent. --RichardW57m (talk) 16:16, 6 September 2021 (UTC)[]
It can also be a matter of convenience. In Pali, the tendency of past participles to have meanings beyond that of the verb plus participles, of which there are several, needing a 48-cell declension table prompted me to treat them as lemmas. --RichardW57m (talk) 16:16, 6 September 2021 (UTC)[]
In Arabic there should not be “participle” headers, because there aren’t undeclined participles, the participles aren’t used for periphrastic tempora. Hence they have the headers of adjectives but the definition line is “active participle of …” or “passive participle of …”, in so far as there aren’t additional meaning. These aren’t even present in مولد where only the etymology sections tell that they are active and passive participles. Fay Freak (talk) 16:25, 6 September 2021 (UTC)[]
A preliminary conclusion may be that the best practice will differ across languages. If all past participles of some language can practically always be used as adjectives, listing them separately as adjectives is pointless. Compare how German and Turkish adjectives are usually not also listed separately as adverbs (as seen in “es hat gut geschmeckt”), and English adjectives that can apply to people not also as (collective) nouns (as seen in “the unaware may fall for this scam”).  --Lambiam 09:01, 7 September 2021 (UTC)[]
A participle is of course derived from a verb, but most often not "merely" a derivative of the verb, even without any semantic extension or shift from the verb's original sense. It's possible, but tedious, to explain the word silenced in a sentence such as "You speak for the silenced." purely as a derivation from the verb to silence. Here it's more pragmatic to treat it as an adjective that is used nominally, which is also the conventional treatment in grammar books, I think.
I suspect the inconsistency with English terms is mostly about whether a usage is widespread. I feel there is something less "common" in expressions such as "a repaired car" as compared to ones like "a disgraced politician", although we can surely find examples like "sequencing of repaired DNA damage regions" in technical writing. --Frigoris (talk) 08:31, 18 September 2021 (UTC)[]

Using syn template for alternate pluralsEdit

I have inserted {{syn}} at some entries where a word has multiple plurals. E.g. cactus with cacti, cactuses, and cactusses. I didn't want to proceed with this without getting feedback from others: does this seem like a good idea? If not, is there a better way to say, "Sometimes, cactus is pluralized as cacti but sometimes it's cactuses or cactusses"? —Justin (koavf)TCM 18:10, 6 September 2021 (UTC)[]

Another way that is used is to list them is as “alternative forms”, as seen e.g. at formulae. I’ve not been able to think of a reason why the use of {{syn}} for this purpose should be ill-advised; these alternative plurals are indeed synonyms in the strict sense of the word. If I had to devise a term for such alternative plurals, I’d suggest the neologism synenic, from συν- (sun-, same) +‎ ἑνικός (henikós, singular).  --Lambiam 09:22, 7 September 2021 (UTC)[]
I still know not why we don’t the same template for “alternative forms” as we have for synonyms. Fay Freak (talk) 11:54, 7 September 2021 (UTC)[]
Perhaps because you did not create it? Or do we need a Wiktionary:Requested templates page?  --Lambiam 16:15, 10 September 2021 (UTC)[]
I think syn is definitely wrong for alternate word forms. I think it's better to put them in the pos declaration, if the language your working has accelerators for multiple plurals, such as the English seraph uses {{en-noun|s|seraphim|seraphims}} or Spanish ananás uses {{es-noun|m|ananás|pl2=+}} JeffDoozan (talk) 15:50, 13 September 2021 (UTC)[]

Template for original research in reconstructed entriesEdit

I have created {{original research}} to place at the bottom of the reconstruction entries that result from original research. There is consensus in favour of keeping such entries, but it remains unclear to our readers that some of our reconstructed entries are copied from referenced sources, whereas others are novel content produced by Wiktionarians (and whose references may support individual forms or sound changes). Note: this template existed in 2013 with unnecessarily aggressive wording, and was deleted summarily by Rua.Μετάknowledgediscuss/deeds 21:28, 6 September 2021 (UTC)[]

This is the warning I have imagined for a few places. Shouldn’t it be, though you aimed at continuing the tradition of a historical title, called {{original reconstruction}}? Regarding namespace conflicts, its title looks like a warning of broader application that could go into mainspace, while being restricted to the reconstruction space in the beginning. Unless of course you relegate the current text to a parameter, so we can use other parameters for other texts (some raw examples: |1=newetymon: The etymon of this entry is original to Wiktionary and has not been proposed previously. |2=neworganism The identification of the organism referred to by this vernacular name is original research.) Fay Freak (talk) 22:07, 6 September 2021 (UTC)[]
I imagine this as being restricted to reconstructions. The reason is that all of Wiktionary incorporates original research to a degree; our definitions are based on quotations that we find, and tweaked to fit what we observe. Labelling all of that would be absurd, and unnecessary. Reconstructions are unique because the word itself does not exist, and this template is intended to indicate that not only the details but the headword itself is our work. —Μετάknowledgediscuss/deeds 00:29, 7 September 2021 (UTC)[]
I'd be all for restricting using this template to situations where the reconstructed headword is something Wiktionarians concocted themselves rather than derived from a reference. A rephrasing of the template for this end may be in order. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 01:32, 7 September 2021 (UTC)[]
That's what I intended, to be honest. How can I rephrase it? Would that eliminate the need to put it on PWG entries? —Μετάknowledgediscuss/deeds 01:39, 7 September 2021 (UTC)[]
I find this template both silly and unnecessary. @Mellohi! is starting to add it to West Germanic entries but 90% of PWG are reconstructions based on Proto-Germanic reconstructions and aren't directly attestable. --{{victar|talk}} 01:04, 7 September 2021 (UTC)[]
(Notifying Rua, Wikitiki89, Benwing2, Mnemosientje, The Editor's Apprentice, Hazarasp): Pinging other Germanic editors and also @Leasnam, Kwékwlos to this discussion. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 01:32, 7 September 2021 (UTC)[]
Though this isn't a issue I feel strongly about, I can't say I'm fond of this idea. Users should be able to implicitly detect entries based on original research, as they'll lack sources; explicitly specifying this is unnecessary, unprofessional, and inconsistent with practice in normal entries. Additionally, the combination of {{reconstructed}} and {{original research}} is unsightly. If people are dead-set on explicitly indicating this, it's better to add a parameter to {{reconstructed}} which adds extra text rather than having a separate {{original research}}. Hazarasp (parlement · werkis) 03:07, 7 September 2021 (UTC)[]
@Hazarasp: I actually prefer an extra parameter to {{reconstructed}}, that would modify the message to convey what the OR template is doing right now. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 03:10, 7 September 2021 (UTC)[]
@Victar: Now clearly I was decided that this should not be added to Proto-West Germanic entries and it would be a caricature to add this template, yet @Mellohi! surprises me with this exact application of the template which I dismissed as a crank sport of mine. They aren’t concocted enough. The forms are kind of implicit in published Proto-Germanic reconstructions, under the assumption that Proto-West Germanic was in between, in any case the forms are “derived from the reference” regularly as my man itself distinguishes. When there is a Proto-Slavic reconstruction with a particular ending chosen instead of an other, the others being given as alternative forms, I did not see there enough distinction either. No five entries I have found remarkable enough for a whole banner. Fay Freak (talk) 11:49, 7 September 2021 (UTC)[]
The obvious issue with this template is that we don't have a No original research policy like Wikipedia does, and that especially applies to reconstructions. What's the point of this template, to warn readers that this entry is unreliable? If an entry is exceptionally unreliable, it shouldn't exist in the first place, right? I can see this being useful for etymology sections, but perhaps an inline note would be better, like.[original research] --{{victar|talk}} 18:37, 7 September 2021 (UTC)[]
It's not about reliability. It's about telling readers we didn't just omit references, but that we have gone beyond them. —Μετάknowledgediscuss/deeds 20:19, 7 September 2021 (UTC)[]
Even without a No original research policy, our readers should always get an idea where a reconstruction comes from. Some kind of non-stigmatizing flagging that the lack of mention of an external source in an entry is not just negligence is a good idea. For the reconstructed entry itself, the template (ideally called {{original reconstruction}}) looks good. An inline note in etymology sections as proposed by @Victar also makes sense, but only with redlinks. If the reconstructed entry exists, the reader can get the information about the origin of the reconstruction in the entry itself. There is no need to "stigmatize" well-thought (and ideally community-vetted) OR entries against externally-sourced material. –Austronesier (talk) 07:56, 8 September 2021 (UTC)[]
I   support Victar’s proposal of inline note so that we can mark specific portions of the entry as original research material. To give an example, I have to reconstruct the definitions of sourced reconstructed Prakrit and reconstructed Ashokan Prakrit terms, and such definitions begotten through original research can be tagged with such a note. ·~ dictátor·mundꟾ 14:13, 8 September 2021 (UTC)[]
If all we have is an inline note, where would we put it if the form of the headword itself is OR? I see this in Proto-Japonic, for instance. ‑‑ Eiríkr Útlendi │Tala við mig 18:03, 8 September 2021 (UTC)[]
If you really need to, I guess you could put it after {{head}}. --{{victar|talk}} 23:10, 8 September 2021 (UTC)[]
It would suffice to add a parameter to {{reconstructed|R=WT:$link2discussion}} for example per one above comment. Add the same for {{der|R=|sign=*}}, that is restricted to etymology sections, and then create a footnote {{rfe}} and/or usage-note much like conjugations have.
Or would tags in edit summaries to policy decision suffice on occasion? Like this thread for PWG?!
It would be nice anyway if {{rfe}} would impliment dedicated links to the discussions as good citation practice, with some automation wizardry from WT:ES-titles to add a marked-up citation if one is found out later. ApisAzuli (talk) 18:17, 15 October 2021 (UTC)[]

The 2022 Community Wishlist Survey will happen in JanuaryEdit

SGrabarczuk (WMF) (talk) 00:24, 7 September 2021 (UTC)[]

Results for the most contended Wikimedia Foundation Board of Trustees electionEdit

Read in other languages

Thank you to everyone who participated in the 2021 Board election. The Elections Committee has reviewed the votes of the 2021 Wikimedia Foundation Board of Trustees election, organized to select four new trustees. A record 6,873 people from across 214 projects cast their valid votes. The following four candidates received the most support:

  1. Rosie Stephenson-Goodknight
  2. Victoria Doronina
  3. Dariusz Jemielniak
  4. Lorenzo Losa

While these candidates have been ranked through the community vote, they are not yet appointed to the Board of Trustees. They still need to pass a successful background check and meet the qualifications outlined in the Bylaws. The Board has set a tentative date to appoint new trustees at the end of this month.

Read the full announcement here. Xeno (WMF) (talk) 01:54, 9 September 2021 (UTC)[]

lol I tried. Equinox 14:58, 9 September 2021 (UTC)[]

Call for Candidates for the Movement Charter Drafting Committee ending 14 September 2021Edit

Movement Strategy announces the Call for Candidates for the Movement Charter Drafting Committee. The Call opens August 2, 2021 and closes September 14, 2021.

The Committee is expected to represent diversity in the Movement. Diversity includes gender, language, geography, and experience. This comprises participation in projects, affiliates, and the Wikimedia Foundation.

English fluency is not required to become a member. If needed, translation and interpretation support is provided. Members will receive an allowance to offset participation costs. It is US$100 every two months.

We are looking for people who have some of the following skills:

  • Know how to write collaboratively. (demonstrated experience is a plus)
  • Are ready to find compromises.
  • Focus on inclusion and diversity.
  • Have knowledge of community consultations.
  • Have intercultural communication experience.
  • Have governance or organization experience in non-profits or communities.
  • Have experience negotiating with different parties.

The Committee is expected to start with 15 people. If there are 20 or more candidates, a mixed election and selection process will happen. If there are 19 or fewer candidates, then the process of selection without election takes place.

Will you help move Wikimedia forward in this important role? Submit your candidacy here. Please contact strategy2030 wikimedia.org with questions.

This message may have been sent previously - please note that the deadline for candidate submissions was extended and candidacies are still being accepted until 14 September 2021. Xeno (WMF) 17:16, 10 September 2021 (UTC)[]

Server switchEdit

SGrabarczuk (WMF) (talk) 00:46, 11 September 2021 (UTC)[]

Talk to the Community TechEdit


Read this message in another languagePlease help translate to your language


As we have recently announced, we, the team working on the Community Wishlist Survey, would like to invite you to an online meeting with us. It will take place on September 15th, 23:00 UTC on Zoom, and will last an hour. Click here to join.



The meeting will not be recorded or streamed. Notes without attribution will be taken and published on Meta-Wiki. The presentation (first three points in the agenda) will be given in English.

We can answer questions asked in English, French, Polish, and Spanish. If you would like to ask questions in advance, add them on the Community Wishlist Survey talk page or send to sgrabarczuk@wikimedia.org.

Natalia Rodriguez (the Community Tech manager) will be hosting this meeting.

Invitation link

See you! SGrabarczuk (WMF) (talk) 03:04, 11 September 2021 (UTC)[]

Request about a new transliteration of HainaneseEdit

Why don't we apply Hainanese Transliteration Scheme to the category for Hainanese? Since Hainanese has no available pronunciation module now, I recommend introducing w:Hainanese Transliteration Scheme into Wiktionary. I am the native speaker and able to help the establishment. dia5 dia5!--洗腳盆收購站長 (talk) 11:34, 11 September 2021 (UTC)[]

I just found this: Module:nan-pron-Hainan. @Justinrleung Is this module ready to apply on Hainanese articles? --TongcyDai (talk) 11:46, 11 September 2021 (UTC)[]
@洗腳盆收購站長, TongcyDai: It's still experimental because it can only handle two character tone sandhi. This is technically the same level that our Taiyuan Jin can handle, so I guess it could be incorporated into {{zh-pron}} soon. 洗腳盆收購站長, are you a speaker of the Wenchang dialect? If so, do you have any insight on multi-character tone sandhi? — justin(r)leung (t...) | c=› } 14:40, 11 September 2021 (UTC)[]

Definitions for semantically straightforward inflected forms in subsidiary Pali script.Edit

We have a potential style war for definitions of inflected forms in subsidiary scripts.

I have been writing, for example:

# {{pi-sc|Brah|kaṇṇo}}, ''which is'' {{pi-nr-inflection of|𑀓𑀡𑁆𑀡||nom|s|t=ear}}, which yields
  1. Latin script form of kaṇṇo, which is nominative singular of 𑀓𑀡𑁆𑀡 (kaṇṇa, ear)

@Svartava2 prefers:

# {{pi-sc|Brah|kaṇṇo|pos={{pi-nr-inflection of|𑀓𑀡𑁆𑀡||nom|s|t=ear}}}}, which yields
  1. Latin script form of kaṇṇo (nominative singular of 𑀓𑀡𑁆𑀡 (kaṇṇa, ear))

I dislike the latter form for several reasons:

  1. It results in nested parentheses, which are generally considered bad style
  2. It looks ugly - even a dash would be a better connective. The aim is to combine simultaneously applicable definitions.
  3. It does not lend itself to more complicated chains, such as case forms of participles of causatives where the generative lexical information beyond the meaning of the finite verb the causative is derived from is at most form of the participle itself.
  4. It abuses the 'part of speech' field of {{pi-sc}}.

What style do editors think we should be using? --RichardW57 (talk) 11:38, 11 September 2021 (UTC)[]

@RichardW57: i have no strong opinion on this and it isnt any kind of "war" as you say. i however do not like "which is" but would be ok with a colon or semi-colon. eg {{pi-sc|Brah|kaṇṇo}}: {{pi-nr-inflection of|𑀓𑀡𑁆𑀡||nom|s|t=ear}} or {{pi-sc|Brah|kaṇṇo}}; {{pi-nr-inflection of|𑀓𑀡𑁆𑀡||nom|s|t=ear}}. regarding the abuse/misuse of |pos=, it is very common; technically if shouldn't be there at all in Form-of templates but it is there. i dont know how to but i'd surely like to show you the different uses of pos parameter in these form-of templates... Svārtava2 • 13:46, 11 September 2021 (UTC)[]
@Svartava2: I was about to revert your change to the Pali entry given as an example above, but then I thought you were likely to change it back again, so I decided some public discussion would improve matters. A semicolon has the wrong semantics in a dictionary. It implies an alternative meaning, whereas these are simultaneous meanings. A colon seems odd. For simple cases like the example above, 'and' might work, and I've taken to using it in more complicated examples, such as ອະພິໂລປິໂຕ (abilopito):
# {{pi-sc|Lao|abhiropito}}, ''which is'' {{pi-nr-inflection of|ອະພິໂລປິຕະ|eqv=abhiropita||nom|s|m}} ''and is'' {{pi-nr-inflection of|ອະພິໂລເປຕິ|eqv=abhiropeti||past|part|t=to concentrate on}}, which yields
  1. Lao script form of abhiropito, which is nominative singular masculine of ອະພິໂລປິຕະ (abilopita), which is a form of abhiropita and is past participle of ອະພິໂລເປຕິ (abilopeti), which is a form of abhiropeti (to concentrate on)
Note that I've just taken a surplus comma out of that, which shows that varying the punctuation is tricky. Adding an article might improve the flow, but choosing the right one could be difficult. --RichardW57 (talk) 15:41, 11 September 2021 (UTC)[]
You're the one who added |pos= to {{pi-sc}}! You added it on 24 May 2021. --RichardW57 (talk) 15:41, 11 September 2021 (UTC)[]
@RichardW57 feel free to revert, I won't edit war on this. Svārtava2 • 15:45, 11 September 2021 (UTC)[]

In context, the first two positional arguments to {{pi-sc}} are usually redundant, though @Octahedron80 would like to use the first to identify the writing system rather than just the script. The second argument gives the usual Roman script form, which is not always the same as the transliteration. These can differ because of the way a nasal before a consonant is written, and some writing systems drop distinctions made in more formal writing systems. --RichardW57 (talk) 11:38, 11 September 2021 (UTC)[]

Nominative singulars are being automatically recorded if different from the lemma form, for some dictionaries use the nominal singular as the citation form. --RichardW57 (talk) 11:38, 11 September 2021 (UTC)[]

Policy on deletion consensusEdit

It appears that there is no formal policy on the deletion of RFDs, and what consensus is needed. What is regarded as no consensus[1], and what is considered to be enough for deleting the nominated page? Thoughts? Svārtava2 • 09:13, 12 September 2021 (UTC)[]

As I've said before, 3/5 should be a pretty strong "consensus" (lowercase-c) to keep/delete an entry. But I think the matter is more time-sensitive than our other votes. If the rfd has been sitting around for months and months and it doesn't look like 3/5 will be reached, then at that point a simple bare-majority (50% +1) should be enough to keep/delete the entry.
If (a) a lot of users have voted in the deletion request, (b) there is a technical minority/majority, (c) but it's on a thin margin, and (d) a lot of time has gone by; I would close that as RFD-kept by no consensus. Likewise if the vote is cleanly cut-down-the-middle as 50-50.
Lastly, if the vote has been in operation for several months (3 months+ should be a good rule of thumb) and absolutely no one has weighed in on the deletion request, it's safe to close as RFD-deleted by no objection. At that point, no single user has been interested enough to express an opinion on the entry's standing; so the entry can be deleted safely. If someone after the fact is alarmed at that, they can contest the decision a full seven days after the entry has been deleted. Per Chuck, "[e]ntries can always be undeleted, if anyone objects. It's more important to refrain from archiving too fast, so people have a chance to see what's happened and raise an objection, if necessary." Imetsia (talk) 15:03, 12 September 2021 (UTC)[]
Imetsia closed that RFD appropriately. Sodhak/Svartava/whatever, you need to stop looking for legislative solutions to personal quarrels. You are squandering whatever meagre goodwill you still have. —Μετάknowledgediscuss/deeds 17:32, 12 September 2021 (UTC)[]
@Imetsia: thanks for the explanation. @Metaknowledge: may I ask you for what personal quarrel do you think I started this discussion? If you think it was for {{bor+}} then you are wrong. Svārtava2 • 05:27, 13 September 2021 (UTC)[]
@Metaknowledge: what is "that RFD"? Svārtava2 • 05:35, 13 September 2021 (UTC)[]
Obviously (see the link you provided above) this.  --Lambiam 08:37, 15 September 2021 (UTC)[]
anyways, I asked that for t:t-ws, no "personal quarrel". "squandering whatever meagre goodwill you still have" is totally inappropriate comment directed at me for no reason at all. Svārtava2 • 06:02, 17 September 2021 (UTC)[]

Does anyone strongly support keeping anagrams?Edit

I'm mostly just a (very happy) user of Wiktionary, but have watched the debate on anagrams with interest. See the Anagrams talk page, where I've tried to add links to all relevant Beer Parlour threads. Anagrams have a vocal advocate in Equinox. Is there anyone else who feels strongly that anagrams should be kept on Wiktionary? Khromegnome (talk) 09:50, 14 September 2021 (UTC)[]

Yeah, I strongly agree with keeping anagrams, though I'll say that I've never put any work or thought into this area.

--Geographyinitiative (talk) 14:06, 14 September 2021 (UTC)[]

Why do you feel strongly about keeping them, if you haven't thought about them specifically? Do you generally oppose removing content? Khromegnome (talk) 09:52, 15 September 2021 (UTC)[]
Strongly? Probably not but moderately, yes. Wiktionary is an all-purpose dictionary, including a rhyming dictionary and an etymological dictionary and a multilingual dictionary (as well as a thesaurus, etc.) A "Scrabble dictionary" is a valid dictionary in my eyes. —Justin (koavf)TCM 04:30, 15 September 2021 (UTC)[]
Would you support the addition of other related lists (e.g. a list of all words that can be formed by adding letters, also useful for Scrabble)? Khromegnome (talk) 09:52, 15 September 2021 (UTC)[]
At the level of an particular entry, no but in the Appendix namespace? Sure. Virtually anything can go in there, include fairly trivial and fun word lists. —Justin (koavf)TCM 21:11, 15 September 2021 (UTC)[]
Not passionately, but I'd be strongly opposed to getting rid of them, mostly because they are useful for word games and it makes little sense to get rid of them when we have so many and it is so easy to add them automatically. Andrew Sheedy (talk) 05:21, 15 September 2021 (UTC)[]
They’re useless though for crypto clues such as “anagram of margana”.  --Lambiam 08:29, 15 September 2021 (UTC)[]
I'm strongly in favor of removing them. They're only useful for 1% of the users and even for them only 1% of the time. Apart from that, they're nothing but browser delay and useless clutter (both while scrolling as well as in the edit history). Additionally, seeing as they can be generated by a program anyway, why should we essentially tabulate the output of a computer program into every article? Oh wait, it's not even every article, it's really a hit or miss; and where's the option to input a random collection of letters and get all anagrams of that? A Wiktionary-based anagram tool seems like a decent idea but it should definitely not be incorporated into the source of the articles. --Fytcha (talk) 17:21, 15 September 2021 (UTC)[]
Are those real statistics? I could claim that the audio link causes browser delay and useless clutter while being only useful for 1% of the users 1% of the time. I find anagrams interesting, and don't see much point in proactively removing them; certainly "browser delay" is silly. I seriously doubt that the time difference with and without is noticeable or easily measurable; the variation on browser load times on any one setup would likely be far larger than the difference between the average load times of the pages.--Prosfilaes (talk) 09:22, 17 September 2021 (UTC)[]
Most of our entries are useful for fewer than 1% of users. So what? Equinox 15:54, 19 September 2021 (UTC)[]
There is potentially a case for changing anagrams into a real-time search feature, like "find words that begin or end with". Equinox 23:29, 15 September 2021 (UTC)[]
I have never liked the inclusion of anagrams. DonnanZ (talk) 09:48, 1 October 2021 (UTC)[]
I found a really weird one this morning: %iles for Elis, which shouldn't happen. TBH, I prefer entries which are anagram-proof. DonnanZ (talk) 09:06, 4 October 2021 (UTC)[]
  • I have no strong feelings about them either way. I don't mind if we have them. — SGconlaw (talk) 11:05, 1 October 2021 (UTC)[]

Change to Transliteration of SinhalaEdit

Has there been any discussion of a change to the transliteration of Sinhalese? Or was today's (Tuesday's) change of Module:si-translit just unilateral vandalism of Sinhalese (si), Pali (pi) and possibly some Sanskrit (sa) transliteration by @Inqilābī? The transliteration of niggahita was changed to be the same as that of the velar nasal. Alerting @InsularAdam, Atitarev.--RichardW57 (talk) 23:21, 14 September 2021 (UTC)[]

@RichardW57: see Wiktionary:Information_desk/2021/September#Wrong_pronunciation_with_Sinhalese. Not exactly a broadbased consensus. Chuck Entz (talk) 04:16, 15 September 2021 (UTC)[]
My goodness. @RichardW57: I am amazed to see Sinhalese is using the exact transliteration as that of Pali and Sanskrit— unlike other NIA languages. Sinhalese should definitely have its unique module! ·~ dictátor·mundꟾ 18:53, 15 September 2021 (UTC)[]
The natural reply to that disruptive proposition is that the only good Inqilābī is a dead Inqilābī. Actually, Sinhalese has some unique consonants, so these get their unique transliterations. Just how many different transliteration modules can a translation page support? I think the limit is about 100, so kindly bound your profligacy. --RichardW57 (talk) 19:33, 15 September 2021 (UTC)[]
To be more precise in Sinhala script transliteration, it proceeds in two stages. First the source is transliterated into a common system. Then the outputs in that common system are tweaked to the particular language, though Pali and Sanskrit currently use consistent flavours of IAST. Thus, as "ṁ" was the transliteration chosen for anusvara in the original Sinhalese transliteration scheme, it gets systematically replaced by "ṃ" for Pali and Sanskrit. Similarly, for the syllabic consonants, the module accommodates ring below for Sinhalese and dot below for Sanskrit (and the logic for Pali follows the path for Sanskrit).

Can somebody explain WT:SOP to me?Edit

To me it seems that in orthographic systems where compounds are written without spaces and hyphens, people are much less likely to call something out as SOP. To give an example: puré de batata has just been nominated for deletion on the grounds of SOP, however nobody would ever dare propose the same for Kartoffelbrei. The only thing separating the two entries is an arbitrary peculiarity of their respective orthographies, apart from that they're identical in every regard. --Fytcha (talk) 17:05, 15 September 2021 (UTC)[]

We do have a rule that spaces and hyphens inside compound words affect whether we keep the word. Our policy is influenced by English spelling conventions and the fact that English speakers looking up foreign words are less likely to know where to break unspaced compounds. Can you propose an easily applied rule that would allow us to decide whether to delete Kartoffelbrei as a sum of parts? Vox Sciurorum (talk) 17:17, 15 September 2021 (UTC)[]
In some compounding languages, including German, the issue is compounded by the insertion of interfixes in compounds, following somewhat unpredictable rules: Eiweiß = Ei + ∅ + Weiß, but Eierschale = Ei +‎ -er- +‎ Schale; also Dutch kalfsvlees ~ kalf +‎ -s- +‎ vlees, but kalverkop = kalf +‎ -er- +‎ kop. Dutch has the specific issue of the orthographic choice between the interfixes -e- and -en-, which has historically vacillated quite a bit: ruggengraat = rug +‎ -en- +‎ graat (since 1996), but ruggespraak = rug +‎ -e- +‎ spraak. If Wiktionary is meant to be also usable as a reference for the spelling of terms, these should be included even in those cases when their meaning can be understood from the parts. Another issue to consider is ambiguity; for example, valkuil = val +‎ kuil and valkuil = valk +‎ uil coexist in Dutch.  --Lambiam 15:28, 16 September 2021 (UTC)[]
Kar + Toffel + Brei is edible hiking shoes, which is entirely different from puré + de + batata. In any case, it's an English rule that works well enough for most other languages normally written with spaces that we keep it. If enough people spoke polysynthetic languages we might have to rethink it, but most of those languages are from the Americas or Australia, not good places for a language to be from if it wants to have speakers.--Prosfilaes (talk) 09:47, 17 September 2021 (UTC)[]
I think continental Germanic and Indian (not just Indic) languages could give us a flood of terms. If that matters, we are relying on the self-constraint of the German, Sanskrit and Pali editors. I'm not sure what's stopping Thai acting like a polysynthetic language - most short sentences lack any word-separating punctuation, and hyphenation when words are split between lines is far from universal. --RichardW57m (talk) 14:52, 17 September 2021 (UTC)[]
Making solutions for future problems often just makes bad solutions that need fixing in the future. Polysynthetic is not about the writing system; it's about how words are composed in the underlying language. We don't use the space-free rules on languages that don't use spaces between words. Frankly, I'd be pushing for more cites earlier; how many German compound words actually have three cites? Even if the answer is still too many, people won't be rushing them into Wiktionary willy-nilly.--Prosfilaes (talk) 06:45, 18 September 2021 (UTC)[]
So what rules do we use for Thai? Thai word lists used for linebreaking and spellchecking often look to me as though they have a rule that a single English word translates to a single Thai word. I think we're currently being well-served by the judgement of the initial editors. --RichardW57 (talk) 11:07, 18 September 2021 (UTC)[]
‘Edible hiking shoes’??? —Caoimhin ceallach (talk) 03:10, 18 September 2021 (UTC)[]
Kar = curved depression in a mountainside; Toffel = slipper; Brei = mash. Maybe an Alpine dish mashed by slippers? You go out and walk in the flowers and bring the slippers in to mash the food?
For an actual example, I was actually pounding my head yesterday over sentebrio in Esperanto; is it sen (without) + *tebrio? sente (feelingly) + *brio? No, obviously it's sent' (feeling with the grammatical marker chopped off) + ebrio (drunkness), but the facts that (a) sen is an incredibly common prefix, (b) sente is a valid Esperanto word and sent isn't, and (c) I wasn't familiar with ebrio, kept me backing up and looking up words that didn't exist in multiple dictionaries. (Note that the rule didn't really matter for Esperanto, since we're missing most of those words anyway.)--Prosfilaes (talk) 06:45, 18 September 2021 (UTC)[]
You mean to say that a German compound is in theory analysable in more ways because it's written together and therefore it deserves to be an entry more? —Caoimhin ceallach (talk) 19:00, 18 September 2021 (UTC)[]
I'm saying that we should have an entry because users are going to turn to the dictionary to try and figure out the meaning of those phrase, unlike space separated phrases where users are going to look up the (obviously) separate parts separately. "In theory analysable" is irrelevant; it's unlikely a user would actually conclude that it's edible hiking shoes, but could easily spend a lot of time looking up Kar or Kart or Toffelbrei or Offelbrei before they realized where the split was. I was actually a little surprised that Kartoffel wasn't a compound.--Prosfilaes (talk) 02:09, 19 September 2021 (UTC)[]
Ok, now I get it. That actually makes sense. Sorry, I was puzzling over your example for ages and couldn't fathom how you could construe that meaning or what you meant to say by it. —Caoimhin ceallach (talk) 12:26, 19 September 2021 (UTC)[]

Turkish -ma forms: gerunds or verbal nouns?Edit

In Turkish one may form a noun by replacing the infinitive suffix -mak with -ma. On the templatized conjugation table this form is called a gerund. In some references it is called a verbal noun. I don't know or care much which it is, but I would like a decision because we have distinct templates {{gerund of}} and {{verbal noun of}}. Category:Turkish gerunds and Category:Turkish verbal nouns both exist. Vox Sciurorum (talk) 20:43, 15 September 2021 (UTC)[]

@Allahverdi VerdizadeFenakhay (تكلم معاي · ما ساهمت) 20:56, 15 September 2021 (UTC)[]
Verbal nouns. See here. Allahverdi Verdizade (talk) 21:22, 15 September 2021 (UTC)[]
Since gerunds in various languages are generally verbal nouns, it is not a simple either–or issue. Some Turkish grammar books call these -ma/-me forms “gerunds”,[2] but Lewis (Turkish Grammar) reserves the term for suffixes that form adverbial clauses, such as repeated -e, -erek and -ken, and this use is widely followed. Thus, using the term “gerund” for -ma/-me forms is not per se wrong but potentially confusing, whereas calling them “verbal nouns” is unambiguous.  --Lambiam 14:48, 16 September 2021 (UTC)[]
In German, they are commonly known as "Kurzinfinitiv" (short infinitive) and are categorized as a subset of verbal nouns. Some sources: [3], [4], [5], [6], [7] --Fytcha (talk) 15:17, 16 September 2021 (UTC)[]

Announcing ilscripto 0.0.1: pure Lua Scribunto engineEdit

To celebrate the 30th birthday of Linux kernel, I present you ilscripto 0.0.1.

This engine is >85% identical to the php engine, see START.md. As you can see, getContent is powered by local backup, while since mw.language:formatDate is not fully implemented, there're some issues with w:Module:CS1. For wiktionaries, saving files like US, Us and us to Windows may cause issues.

This engine is inspired by bliki and partial-MediaWiki-lua-environment, also pure Lua. I hope this project will attract 100 users globally, and provide new chances for bot editing. Crowley666 (talk) 10:20, 17 September 2021 (UTC)[]

By the way, I hope my bot will also be useful. Crowley666 (talk) 10:20, 17 September 2021 (UTC)[]
Excellent work. Sounds much more complete than my current local Lua environment. I'll be sure to try it out when I have some time. — Eru·tuon 21:34, 17 September 2021 (UTC)[]

User:Donnanz’s etyl clean-up methodsEdit

Previous discussions:

If I said I didn't vote in favour, I didn't. DonnanZ (talk) 12:08, 20 September 2021 (UTC)[]
There is always a listing for derived terms, by default. DonnanZ (talk) 12:08, 20 September 2021 (UTC)[]
Nonsense, I don't have a bot. DonnanZ (talk) 12:08, 20 September 2021 (UTC)[]

We are all familiar with the disruptive edits of Donnanz (talkcontribs). For a few years, Donnanz has been indiscriminately substituting all {{etyl}}s with {{der}}, even though the replacement ought to be done by more specific etymology templates. Not only is this useless, but also harmful: for “there is nothing to signal that many of the uses of {{der}} are inappropriate”. Indeed, this has exacerbated fixing etymologies, because an editor has to be alert about whether the source code has {{etyl}} or a misplaced & invalid {{der}}. Donnanz has taken upon himself to clear up the entire list in Category:etyl cleanup, disregarding the fact that etyl clean-up is meant to be done diligently. All entries using the deprecated template get properly categorized, as it behaves like {{der}}, and as such there is no point in mechanically performing a fake clean-up that does not change categorization. He has been asked before multiple times (see the linked discussions) to abandon this practice, but he has always refused to listen. This is probably the greatest scandal in the history of this project, and Donnanz has somehow managed to continue doing the unwanted edits unchecked.

In the interest of preserving the legitimacy of this project, Donnanz should be legally banned from doing etyl clean-up. How that could be achieved is a different matter, but I urge the community to consider this issue in earnest. Thank you. ·~ dictátor·mundꟾ 17:21, 18 September 2021 (UTC)[]

As a point of clarification, Donnanz sometimes resorts to unhelpful and provocative language, so I think that his "automaton" reference is just a rude and inappropriate joke rather than an admission that he is actually using a bot to edit. That said, your proposal seems sound to me. —Justin (koavf)TCM 18:03, 18 September 2021 (UTC)[]
  Support. "This is probably the greatest scandal in the history of this project" is unnecessarily inflammatory, but yes, I would like Donnanz to stop doing this. PUC – 18:28, 18 September 2021 (UTC)[]
I said ‘probably’… ·~ dictátor·mundꟾ 18:31, 18 September 2021 (UTC)[]
Ok. Let's not argue about that, it doesn't matter. PUC – 18:35, 18 September 2021 (UTC)[]
Why not just create some hidden categories or dumps, like French terms derived but not inherited from Middle French and use those as a tool for cleanup? --{{victar|talk}} 20:29, 18 September 2021 (UTC)[]
I did suggest to User:PUC a long time ago that he could clean up French, but judging by the lack of progress, it would appear that he can't be bothered. DonnanZ (talk) 20:59, 18 September 2021 (UTC)[]
I have no inkling ’bout your uncanny fascination with etyl clean-up, but how do you expect other editors to clear up etyl usages overnight? You might be a retired person (per your userpage; no offense intended) so you have time aplenty fiddling with etymologies, but such a demand from other editors is so unwholesome. ·~ dictátor·mundꟾ 00:04, 19 September 2021 (UTC)[]
As I said, a long time ago. No one is suggesting the job can be done overnight, that is an exaggeration.
I don't see how such a ban is workable, would it prevent me from adding new etymology? DonnanZ (talk) 08:40, 19 September 2021 (UTC)[]
I have also seen misuse of {{bor}} (e.g. Piffard). If a French person migrates to another country and keeps their name, they are introducing their name to the country, and thus into the language of that country - it is not a borrowing. As far as I know, there is no template which caters for such introductions, but use of {{bor}} is misleading, and should be avoided. DonnanZ (talk) 09:07, 19 September 2021 (UTC)[]
That is a good point. Such words are actually translingulisms, but people do not care much. ·~ dictátor·mundꟾ 14:52, 19 September 2021 (UTC)[]
Well, this proposal is about banning the disruptive edits, not about how to deal with the mess. ·~ dictátor·mundꟾ 23:17, 18 September 2021 (UTC)[]
What is meant by "disruptive"? DonnanZ (talk) 08:40, 19 September 2021 (UTC)[]
Is it still disruptive if there's a easy solution to it? --{{victar|talk}} 18:16, 19 September 2021 (UTC)[]
  • Thanks to Inqilabi for bringing this on BP with relevant discussions listed. Many Hindi etymologies are still suffering from Donnanz's fake etyl cleanup. I wholeheartedly support ban on his etyl cleanup, or also ban on any editing at all until he has cleaned up his fake cleanups: let his karma return. Svārtava2 • 07:09, 19 September 2021 (UTC)[]
    I think banning Donnanz from editing entirely is a step too far; However, I do think they ought to stop - an {{etyl}} template on itself signifies a page up for cleanup, a bare {{der}} doesn't, and is less likely to be found. Thadh (talk) 10:22, 19 September 2021 (UTC)[]
    editing entirely until he has completely cleaned up his fake etyl cleanup Svārtava2 • 11:27, 19 September 2021 (UTC)[]
    It's still disproportionate. PUC – 11:31, 19 September 2021 (UTC)[]
I don't think that idea has been thought through. If I were banned from editing entirely I would be unable to "clean up" your so-called "fakes", even if I wanted to; you would have to do the job yourself to your satisfaction. A lot of fuss about nothing. DonnanZ (talk) 11:47, 19 September 2021 (UTC)[]
They meant non-cleanup editing. Anyway, like PUC says, it's disproportionate. Thadh (talk) 12:26, 19 September 2021 (UTC)[]
  • I want to make it very clear that this proposal does not seek any retributive justice. This is just a formal request to ban Donnanz from doing etyl clean-up. Yes, any etyl clean-up, for he is really not interested in using specific etymology templates. That said, I have not made the slightest expression that he would be forbidden from editing the Etymology section altogether. He is even free to add new etymologies indiscriminately using {{der}}, for that matter. Any new proposals should be brought forth elsewhere, not here, because the matter under consideration is only the fake etyl clean-up. ·~ dictátor·mundꟾ 12:23, 19 September 2021 (UTC)[]
Well, with that concession (for other editors too) you have unwittingly made a mockery of your own proposal. There are many instances where {{der}} should be used, such as for Proto-Indo-European, an unwritten, reconstructed and probably largely theoretical language. You can't do away with {{der}} as some would like. So it may not be only me contributing to your so-called "mess", there might be other "guilty" parties. I am merely the easiest one to blame. DonnanZ (talk) 14:01, 19 September 2021 (UTC)[]
The discussion is not about fixing wrong etymologies abounding the etymology sections; I have already stated clearly what the only objective of this proposal is. But for your information, there are terms in languages that are directly inherited from a PIE (or any other reconstructed language) term, for which {{inh}} should be used, e.g., the descendants of *ph₂tḗr. {{der}} is used for other cases, such as roots. Read the description at Template:inherited. And of course, there are many, many misuses of etymology templates, either through carelessness or wrong conviction. ·~ dictátor·mundꟾ 14:52, 19 September 2021 (UTC)[]
You can't blame any editor for having different convictions, as I don't think reputable dictionaries include Proto- languages as etymology. It's all very murky. DonnanZ (talk) 16:04, 19 September 2021 (UTC)[]
Not some of them, but the OED sometimes does. For instance, it mentions "Old Germanic *hafjan" in the entry for heave (apparently that's what they reconstructed for Proto-Germanic *habjaną at the time) and "Germanic *swōtja-" in the entry for sweet (adjective and adverb). Merriam-Webster didn't mention proto-languages for these words, but it also had much shorter etymologies. I think on etymology we'd do better to emulate the OED. — Eru·tuon 18:45, 19 September 2021 (UTC)[]
This isn't really about proto-languages. Let's look at a few scenarios:
  1. A word is borrowed from some stage of Low German into Old Norse, and eventually ends up as Norwegian Nynorsk
  2. A modern English word that can be traced back to an Old French descendant of a Latin word is borrowed directly into modern Norwegian Bokmål.
  3. A modern English word that can be traced back to a Norman French borrowing from Old Norse is borrowed directly into modern Norwegian Bokmål.
  4. A Old Norse word is present as part of the basic vocabulary and is obviously not borrowed from anywhere. It then is passed down in an unbroken line to Norwegian Nynorsk
These are all plausible scenarios, and you want to be able to distinguish, for instance, between Old Norse that was borrowed into another language and later borrowed back into modern Norwegian and Old Norse that Became Norwegian by way of regular language change. Also, a word that was borrowed into Old Norse should be marked as inherited from Old Norse, but derived from Low German. Chuck Entz (talk) 20:03, 19 September 2021 (UTC)[]
Just to show the whole picture: there are benefits to clearing an entire language of {{etyl}}: languages that have no etyl templates can be added to an exclusion list that prevents anyone from adding new etyl templates for that language (Adding it to the list before then would cause a module error in every existing etyl template for that language). It's not just a matter of saving the cleanup for later: as long as there are etyl templates for a language, people will continue to add new ones and the job will get harder. There's an abuse filter that tags edits which increase the number of etyl templates in an entry. It's been in place for 4 years, and already has 8917 hits. Chuck Entz (talk) 20:24, 19 September 2021 (UTC)[]
While I'm at it: pinging @Mahagaja, who probably knows as much about the issues involved as anyone. Chuck Entz (talk) 20:35, 19 September 2021 (UTC)[]
All I have to say is I too get annoyed when people clean up {{etyl}} by blindly replacing it with {{der}} rather than by paying attention to what kind of derivation it is and using the various templates correctly. I correct these as I encounter them, but of course usually I don't notice them. I have no idea whether Donnanz is the only person who does this or even the person who does it the most. I don't bother checking the page histories of the pages affected. —Mahāgaja · talk 20:42, 19 September 2021 (UTC)[]
I think @Ultimateria does it too sometimes. PUC – 21:49, 19 September 2021 (UTC)[]
I don't think I've done that in at least two years; as soon as I read the public discourse about it I stopped. However I add new etymologies with {{der}} (unless it's obviously a modern borrowing) because I don't feel qualified to make the distinction, and it's certainly better than having no etymology. Ultimateria (talk) 22:18, 19 September 2021 (UTC)[]
@Ultimateria: Sorry for the false accusation then ("accusation" is a bit strong, but you get my point), and yes I agree that adding an etymology with an imprecise template is better than having no etymology at all. PUC – 18:52, 23 September 2021 (UTC)[]
Despite your acute observations, I doubt that any more languages will be free of {{etyl}} any time soon. There is a lot of apathy around, which is not helped by User:Inqilābī. There is virtually no improvement in Category:etyl cleanup for today. DonnanZ (talk) 22:34, 19 September 2021 (UTC)[]
I think it is actually useful to have a template for signalling an etymological connection that can be used when it is unknown whether, for example, an Old French term is inherited from Latin or a borrowing.  --Lambiam 08:19, 20 September 2021 (UTC)[]
  • I have been threatened with a non-specific block by User:Benwing2 on my talk page. This is very heavy-handed so I am reporting it here. DonnanZ (talk) 08:59, 20 September 2021 (UTC)[]
There's nothing heavy-handed about it. Several users asked you to stop these edits and you dismissed them all. I don't see anyone who agrees with your edits; you are the only "noisy minority" here. Ultimateria (talk) 15:02, 20 September 2021 (UTC)[]
If you were on the receiving end, you would think differently. DonnanZ (talk) 15:14, 20 September 2021 (UTC)[]

  Support. Donnanz's edits border on the disruptive. {{etyl}} is not bad in itself, insofar as it does the same work as {{der}}; the reason it is subpar and needs to be cleaned up is because it is not as specific as {{bor}}, {{inh}}, and so forth. This lack of specificity is the problem with {{etyl}}. When Donnanz "fixes" {{etyl}} to {{der}}, nothing has been cleaned up. The problem is still there; the only thing that has changed is that the problem is no longer detectible (not being categorized in Category:etyl cleanup), and hence really much worse.--Tibidibi (talk) 12:02, 20 September 2021 (UTC)[]

  • I am washing my hands of Category:etyl cleanup, as satisfying a noisy minority of users is not worth the trouble, and my efforts are not appreciated. I will still monitor the situation, checking on progress or the lack of it, and carry on with other work. Nobody has won, "match abandoned".
Special thanks go to User:Chuck Entz, for content hidden by User:Inqilābī. DonnanZ (talk) 14:28, 20 September 2021 (UTC)[]
I do not get what is the point of repeatedly pinging me or other editors, but I hid the content due to its irrelevance, especially when it was in the lead. ·~ dictátor·mundꟾ 12:00, 21 September 2021 (UTC)[]
I guess he means he didn't like being overruled - twice. DonnanZ (talk) 12:25, 21 September 2021 (UTC)[]
?? ·~ dictátor·mundꟾ 12:30, 21 September 2021 (UTC)[]

Does Hawaiian have adjectives?Edit

Category:Hawaiian_adjectives has been created and deleted repeatedly and now an IP keeps on adding entries such as huluhulu and mahū. I am trying to clear out Special:WantedCategories and want to know what to do with these entries. I see some conflicting information about Hawaiian having adjectives (I even started a Duolingo class on it to see what they said). Hawaiian grammar says that they exist and Wiktionary:About Hawaiian is pretty underdeveloped and does not discuss the topic. Can anyone weigh in on this so and add it to WT:AHAW so that we don't have repeated adding and deleting? —Justin (koavf)TCM 18:01, 18 September 2021 (UTC)[]

Those are stative verbs, according to the standard grammars and dictionaries. —Μετάknowledgediscuss/deeds 18:05, 18 September 2021 (UTC)[]
I added some language to WT:AHAW. —Justin (koavf)TCM 18:17, 18 September 2021 (UTC)[]
Out of curiosity, would i ʻula ia mean “it was red” (but ceased to be so) or “it became red”? Or is this an ungrammatical sentence?  --Lambiam 07:54, 20 September 2021 (UTC)[]
@Lambiam: As far as I can tell, although I'm not very good at Hawaiian, so I'm not sure, what you're looking for is Ua ʻula ia (or simply ʻula ia) - "It became red". I ʻula ia is simply "It was red", it doesn't even say anything about the present state of affairs. Thadh (talk) 15:36, 20 September 2021 (UTC)[]
Thanks. I was not wondering how to say “it became red” in Hawai’ian, merely what the interpretation would be of “i ʻula ia”.  --Lambiam 21:56, 20 September 2021 (UTC)[]
  • Serious point: it depends on how we decide to define "adjective".
Arguably, Japanese has a category of words that are, strictly speaking, classable as "stative verbs": these words (commonly called "-i adjectives") describe the quality of a thing, and can directly modify a noun or noun phrase, and thus function adjectivally, but they can also form the predicate of a statement, inflect for tense, and include an inherent "to be" sense, and thus function verbally.
For good or ill, English-language materials describing the Japanese language tend to call these "adjectives", outside of academia.
It's been a while since I was working much on my Hawaiian knowledge, but what I recall is that Hawaiian has a similar category of words that have similar qualities: they describe the quality of a thing, can be used to modify a noun or noun phrase, but also take markers for aspect and tense, and can be used predicatively.
Arguably, for an audience of English-language readers, I think a case can be made that these words should be called "adjectives", with the WT:AHAW page providing a fuller explanation of how these are also stative verbs.
Conversely, if a preponderance of existing English-language materials describing the Hawaiian language tend to call these "stative verbs", outside of academia, then presumably we should follow suit.
On the third hand, Hawaiian is somewhat fluid when it comes to parts of speech, a bit similar to the way things work in Chinese, another notably analytic language -- a given word could be used as a verb, noun, adjective, adverb, etc. In the Wehewehe entry for ʻula, for instance, I see that they group the noun, verb, and adjectival senses together. Probably not an approach we would adopt here, but food for thought. ‑‑ Eiríkr Útlendi │Tala við mig 23:04, 24 September 2021 (UTC)[]

English prepositionsEdit

It appears so far that there is a disappointingly low level of interest in this, but may I nevertheless encourage anyone who does have a view to participate at Wiktionary:Votes/2021-08/Scope_of_English_prepositions. Mihia (talk) 22:00, 18 September 2021 (UTC)[]

I did not even understand a shred what the vote is about. Was it some school grammar book prescription? Sorry, my IQ is kinda low. ·~ dictátor·mundꟾ 23:14, 18 September 2021 (UTC)[]
I would not expect anyone to participate in the vote if the issue did not mean anything to them. Regarding this edit, may I ask what you mean by "collapsible option sections"? I did not see any collapsibility, and the heading levels were to my mind logical before your change, and illogical after. Is this some special setting that you have? (Also, may I ask why you, or anyone, would have a signature completely different to your username? I had no idea until I accidentally discovered it just now that " ·~ dictátor·mund" was the same person as "Inqilābī". Isn't this just confusing?) Mihia (talk) 01:15, 19 September 2021 (UTC)[]
@Mihia: I thought you would attempt to explain the vote to me, but never mind, your reply was as intriguing as your vote. You should create vote pages with the correct layout; I have fixed the page further— if you have no idea then just check out other vote pages before bickering. Using a level 3 heading causes collapsible sections on mobile. Congratulations on your discovery; that’s nothing to fuss over, but I guess it’s because of less interaction ’twixt us. And you mispelt my signature. Anyway, good luck with your ‘vote’ (if it is really one). ·~ dictátor·mundꟾ 11:56, 19 September 2021 (UTC)[]
The heading levels that you have created are illogical. If other votes do the same then they are illogical too. Mihia (talk) 19:24, 19 September 2021 (UTC)[]
Logicality is a subjective thing. We act according to the norm. If you dislike the norm, you have to make a proposal first, to change the norm. ·~ dictátor·mundꟾ 19:59, 19 September 2021 (UTC)[]
If there is one thing that should not be subjective, it is logicality. Mihia (talk) 22:50, 19 September 2021 (UTC)[]
The vote is about the best assignment of a POS descriptor in some murky cases.  --Lambiam 07:42, 20 September 2021 (UTC)[]

What the vote really meansEdit

The section WT:CFI § Idiomaticity contains the following clause:

In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination of editors that inclusion of the term is likely to be useful to readers.

It was added with the argument (in the edit summary)

This is what the vote really means.

Here, “the vote” refers to Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep, which failed with 7 in favour, 9 against. However, in no way does the discussion around the vote refer to a judgement about a term’s being “useful to readers”; this interpretation appears to have sprouted from the imagination of one editor. (Note that Wiktionary:Idioms that survived RFD also does not refer to “usefulness”.) I also think that the meaning of the vote was that a consensus-to-keep overrides the CFI in general, not specifically for the application of the idiomaticity criterion. Such exceptional overriding applies likewise (IMO) to names of specific entities, or terms originating in fictional universes. Thus, it is misplaced in the section Idiomaticity. But does it need to be mentioned at all? Surely, this is not meant to be a free pass for ignoring the criteria, as some appear inclined to see it. I think it is better if its addition is undone.  --Lambiam 07:31, 20 September 2021 (UTC)[]

Delete the sentence. Enough people already vote keep in RFD without a valid policy rationale that we don't need to enshrine that "keep just because" attitude in the CFI. PUC – 13:26, 20 September 2021 (UTC)[]
Keep something of the sort: Terms like gagana Tokelau and ižoran keeli are useful for the readers (for instance, they portray the standard formula for a language name), but strictly speaking SOP and not subject to the phrasebook. I would be okay with some rewriting though, if that's necessary. Thadh (talk) 13:36, 20 September 2021 (UTC)[]
@Thadh: There have been several RFD's about this (Talk:bulgarian kieli, Talk:afrikanų kalba, and others), and the recent consensus has been to delete these combinations (while they were kept in older RFD's). The pattern can be described at keeli and gagana, and I think the entries you've mentioned should be deleted. PUC – 09:49, 22 September 2021 (UTC)[]
I don't agree. Moreover, I'm not really sure why language names are SOP anyway, because a language is not equal to its place of origin, and the current rule contradicts our goal of being descriptive, not prescriptive. For English, it's another story, because "English" or "French" are already nouns denoting the language on their own. However, that "Tokelau language" means "Tokelauan", rather than English is not as evident as it seems from our Euroamerican point of view. Thadh (talk) 12:53, 22 September 2021 (UTC)[]
Delete because it is deceptive, being included on a policy page, when no vote explicitly endorsed it, AFAICT. If we keep things against our policy, we can continue to do so OR we can have a vote to enshrine some more careful version of the language in question. DCDuring (talk) 15:48, 20 September 2021 (UTC)[]

Does this need a vote for it to be removed? It was IMO inserted out of process. The sentence may be a valid observation of voting behaviour at RfD’s, but apart from one’s view on the usefulness of terms like lingua corsa, the sentence is not itself useful, as it does not give any guidance, one way or another.  --Lambiam 08:57, 22 September 2021 (UTC)[]

I think that it should be removed without a vote, and that if people want to see a form of that argument appear in the CFI, it's that proposal that should be put to the vote: "usefulness as a criterion for keeping" or something. But this is likely to raise some objections by the person who added the sentence.
Moreover, it could be interesting to run the original vote a second time. The editor base has changed quite a bit since 2014, and there have been additions to the CFI (for example the translation hub policy) that could make certain people reconsider their position. Also, the scope of the vote could be broadened a bit to make it less partisan: it's not only that entries which don't mee CFI should be deleted despite a consensus to keep, but also that entries which do meet CFI should be kept even if there's a consensus to delete (even though I think that scenario never happens in practice). I suspect this proposal would fail again, but it would be interesting to see what arguments are raised this time. PUC – 09:42, 22 September 2021 (UTC)[]
I think it is problematic to think that the criteria for inclusion of terms can readily shift back and forth based on the changing editor base. With respect to "usefulness" as a criteria, what exactly is the point of adding another dictionary to the world at all, other than usefulness? bd2412 T 21:39, 22 September 2021 (UTC)[]
It is not without reason that Wikipedia explicitly mentions “it’s useful” as an argument to avoid in deletion discussions. By adding all material that may be useful to some users, the dictionary as a whole may become less useful to most users. As I see it, the CFI are designed to provide guidance for striking a balance between the incidental and the universal so as to promote general usefulness as a dictionary. The judgement of usefulness on the basis of individual entries is bound to be very subjective and will predictably result in uneven application depending on which small selection of editors happens to weigh in.  --Lambiam 08:24, 23 September 2021 (UTC)[]
"it is problematic to think that the criteria for inclusion of terms can readily shift back and forth based on the changing editor base": you're putting words in my mouth, though admittedly it wasn't very clear what I had in mind by mentioning the changing editor base. What I meant is that since the players aren't the same, a new vote wouldn't just be a rehash of the old one, but would give us the opportunity to see if the arguments that prevailed at the time are still found to be relevant by current editors; to see if more recent editors, who arrived after the introduction of the translation hub policy, still find CFI too imperfect to consent to its being binding; to perhaps read new arguments, etc.
Also, I don't know what you're getting at with your second sentence, and I don't see how it pertains to the matter at hand. In fact, it is not the first time I have trouble following your argumentation; I'm sorry to say this, but when I read your answers around here, I often feel like I'm being gaslighted: you're kicking in touch by resorting to vague general statements that are impossible to refute. I'd be grateful if you could spell things out a bit more in debates. PUC – 18:38, 23 September 2021 (UTC)[]
Control over what words mean. Equinox 20:04, 23 September 2021 (UTC)[]
"Gaslighted" is offensive, and feels like an attack. Reference to a changing editor base certainly sounds like a hope that different people will mean a different outcome. With respect to the addition to the policy, if anything, take out the "based on the determination..." portion, and leave it at "In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community". That is the outcome of a discussion clearly rejecting the opposing position. We need something to counter what I can only describe as a creeping parochial prescriptivism, wherein editors ignore the existence of set phrases and various other tests for the utility of a collocation, and push for deletion because it's obvious in their corner of the world that a phrase intends sense 5 of one word and sense 7 of the other, or the like. In a recent discussion, someone asserted that a phrase was SOP by comparing it to "fairies at the bottom of the garden", oblivious to the fact that in American English we don't refer to gardens as having a "bottom", so the phrase is gibberish. bd2412 T 15:40, 24 September 2021 (UTC)[]
What you are just conveying is the languagist position that the vulgar Usonian usage you are familiar with is the measure of all things, even though it is well possible that idioms are gibberish and your feelings too. Absent erudition, gibberish becomes laws. Isn’t it more and more fashionable now in the US, that even the laws make no sense and estrange you? Similar it is with common language usage, it might be outright absurdity that rejects systematization. Or with “art” …
So the mere “idiomaticity” or “utility of a collocation” is not the essence of what makes a word-combination demand inclusion, although opinion-makers must know such combinations covered by no dictionary, to manipulate the masses that is.
We might have {{&lit}} linking specific senses via {{senseid}} or usage notes under a {{&lit}} entry.
So delete, we shan’t have statutes contradicting themselves. Fay Freak (talk) 16:25, 24 September 2021 (UTC)[]
Perhaps those editors who want a dictionary based exclusively on British English usage (which amounts to 6% of the English-speaking world) should go write one under their own banner. bd2412 T 16:50, 24 September 2021 (UTC)[]
Ignoring the unnecessarily inflammatory part, I think the key here is that there's a difference between the practice of how the rules are applied and the rules themselves. The CFI are the codification of community consensus, the rules that the community has decided to follow. The community's occasional practice of setting aside the letter of the rules in rare cases is part of the way Wiktionary works, but should not be part of the rules. An analogy in US law is the principle of jury nullification: the fact that a jury may choose not to convict someone who is guilty according to the letter of the law doesn't mean that there should be a clause in the statutes saying "in rare cases, a jury may decide not to convict".
Besides which, it's not specific to the context where it's mentioned, though there may not have been occasion for it to occur elsewhere. Chuck Entz (talk) 17:02, 24 September 2021 (UTC)[]

CAT:English terms containing 'n', CAT:English terms spelled with &Edit

I think ’twould be better to not have these redundant categories; instead we can relocate all terms in the Derived terms section of the entries 'n' and & to avoid unnecessary duplication. See previous discussion. (I did not take to rfd for I thought ’tis better to have a consensus on this first.) ·~ dictátor·mundꟾ 12:25, 21 September 2021 (UTC)[]

How are the categories populated and how will the derived term section be maintained? I'm not convinced that A & E is derived from A, & and E; it is etymologically derived from abbreviating the three words, so it is unnatural to record these abbreviations as derivatives of &. If you rely on normal editor action, it ain't gonna happen. The category seems to rely on a wetware bot by the name of @Wikihistorian. --RichardW57m (talk) 15:25, 21 September 2021 (UTC)[]
Of course the cats are added manually to each page, but I am convinced with your suggestion that A & E is not etymologically derived from &. Hence, can we tweak this proposal, that is: delete the Derived terms section at &? ·~ dictátor·mundꟾ 15:40, 21 September 2021 (UTC)[]

How to cover a topicEdit

Has anybody written an essay or guideline for how to cover the terms of a particular topic? One could start by locating the closest topic category (say, en:Electronics) and see which words are already there, and then brainstorm a list of missing entries, adding existing entries to the category and creating missing entries. But maybe there are more structured approaches? If you have done this, how did you do it? --LA2 (talk) 21:34, 21 September 2021 (UTC)[]

You could look for existing technical dictionaries for your chosen topic. Or extract words from relevant Wikipedia articles. DTLHS (talk) 21:42, 21 September 2021 (UTC)[]

Notability requirement for names of peopleEdit

In addition to the regular criteria for inclusion, I propose a notability or popularity requirement for names of people. This will reduce the incentive to argue about whether a name should be treated as belonging to a particular language. And as important, it will reduce clutter. Clutter is bad because even with infinite storage space the crap still gets in the way. (Recall the angst of the librarian in A Fire Upon the Deep over adding another level of indexing to the vast library.) Names these days are tags. Carter is not limited to carters. One of the Dutch editors has a policy of not adding names with less than a thousand name-bearers. That seems like a reasonable threshold, so I propose the following rules for names of people in well-documented languages:

  • To be included a name must have had at least a thousand name bearers at the same time, including alternative spellings but not variants like diminutives. Each individual only counts towards one language (L2 header).
  • Exceptions may be made for less common names clearly in widespread use (not merely with three citable uses) such as
    • A name of a famous person known by a one word name. So, Prince: A name of men, especially the name of a former musician.
    • A well known metaphorical use, like Faustian.
    • A translation hub when cognates of a rare English name are common in other languages.

The only value we add to the countless online lists of names or baby names is etymology, so we could make the threshold different for names with and without etymology.

Thoughts? Vox Sciurorum (talk) 13:17, 22 September 2021 (UTC)[]

I don't think we should have stage names like "Prince". Oh, it's his real name! Well, he shouldn't have a sense line anyway. He's just an individual person with that name; something for Wikipedia. Equinox 15:25, 22 September 2021 (UTC)[]
We have a history of including notable people as senses. We do not need to continue that tradition if it clutters our pages. Xi is a better example because the real name is not an English word. Vox Sciurorum (talk) 16:23, 22 September 2021 (UTC)[]
The required number of bearers should be by percentage of native speakers. A thousand bearers of Hawaiian or Icelandic names would be impossible. Also data is missing from many countries, or the native tongue of name-bearers is not known. Basically I would welcome such rules. Good luck to Vox Sciurorum for setting up a vote. Discussions about given names tend to become very emotional ("This is aimed against MY name or MY mother tongue"). --Makaokalani (talk) 20:07, 24 September 2021 (UTC)[]
Hawaiian is not a well-documented language and would not be subject to the rule. For Icelandic, we can say that inclusion in a government's list of acceptable names takes precedence over the notability rule. Vox Sciurorum (talk) 20:21, 24 September 2021 (UTC)[]

enwiki transwikisEdit

Hello Wiktionarians, a discussion on enwiki is open about how to handle cases where one of our articles is actually a definition that likely belongs over here instead. If you have any tips (hopefully a page you could link to?) please feel free to drop us a line at: w:en:Wikipedia:Village_pump_(policy)#Transwiki_to_enwikt Thank you, Xaosflux (talk) 18:57, 24 September 2021 (UTC)[]

We don't want transwikis. I have commented there. —Μετάknowledgediscuss/deeds 19:04, 24 September 2021 (UTC)[]

Issues with rhymesEdit

User:Surjection implemented automatic categorization of rhymes. This generated maybe 50,000 new categories (over 30,000 from Finnish alone); I've finally cleared them. In the process I noticed a ton of inconsistencies and I'd like some comments on them:

  1. One thing in particular I notice repeatedly are rhymes beginning with a consonant, e.g. Category:Rhymes:Zazaki/ma, Category:Rhymes:Turkish/si, Category:Rhymes:Indonesian/tal, Category:Rhymes:Czech/r̝aːp, Category:Rhymes:French/tif, Category:Rhymes:Nepali/t̪e, Category:Rhymes:Neapolitan/tʃa, Category:Rhymes:Malay/ŋunan, etc. etc. Presumably these are all mistakes? Do any languages actually have rhymes beginning with a consonant? Hungarian in particular has a lot of them: Category:Rhymes:Hungarian/riː, Category:Rhymes:Hungarian/t͡su, Category:Rhymes:Hungarian/kːaː; this seems consistently to be the case for rhymes ending in a vowel. @Adam78, Panda10 Are these correct?
  2. Telugu rhymes like Category:Rhymes:Telugu/స. All Telugu rhymes are like this. Who are the native Telugu speakers at Wiktionary who could help clean this up?
  3. Translingual rhymes: There are lots of them and they appear to use British English pronun. Should Translingual rhymes exist at all? It seems questionable to me.
  4. Issues for English rhymes:
    1. English rhymes randomly use /r/ or /ɹ/ (lots of examples; cf. Category:Rhymes:English/ɛdə(r) vs. Category:Rhymes:English/ɒdə(ɹ)), cf. also Category:Rhymes:English/ɪntɚ vs. Category:Rhymes:English/ʌntə(r) vs. Category:Rhymes:English/ʌntə(ɹ). What should be the convention here?
    2. English rhymes are inconsistent in schwas vs. syllabic resonants: Category:Rhymes:English/ɪlɪkəl vs. Category:Rhymes:English/ɛlədʒəbl̩.
    3. Most English rhymes only use British English pronunciation, including examples like Category:Rhymes:English/ɔː(ɹ)dʒi, Category:Rhymes:English/ɒɡɹəfi, Category:Rhymes:English/əʊ, Category:Rhymes:English/ɪə(ɹ). Shouldn't we include both British and American rhymes, at the least?
    4. Pretending Middle English is pronounced according to Modern English rules: Category:Rhymes:Middle English/ɪə(ɹ) for eglatere.
  5. Issues for French rhymes:
    1. French rhymes beginning with a semivowel e.g. Category:Rhymes:French/wœʁ, Category:Rhymes:French/jy
    2. French rhymes with multiple syllables: Category:Rhymes:French/uldɔɡ
    3. French rhymes with schwa at the end: Category:Rhymes:French/ɑ̃dʁə (vs. Category:Rhymes:French/ɑvʁ)
  6. Issues for Spanish rhymes:
    1. Spanish rhymes inconsistent about mid vowels: Category:Rhymes:Spanish/mo̞s vs. Category:Rhymes:Spanish/mon.
    2. Stray accents in Spanish rhymes e.g. Category:Rhymes:Spanish/édine.
    3. Spanish rhymes potentially with unnecessary phonetic info: Category:Rhymes:Spanish/iŋxe, Category:Rhymes:Spanish/iβtiko, Category:Rhymes:Spanish/aʝa; Category:Rhymes:Spanish/undo vs. Category:Rhymes:Spanish/uðiko with the same phoneme.
    4. Spanish rhymes inconsistent about how much phonetic info to include and how to represent it: Category:Rhymes:Spanish/aɡtiko vs. Category:Rhymes:Spanish/iɣtiko; Category:Rhymes:Spanish/oiko vs. Category:Rhymes:Spanish/ejko; Category:Rhymes:Spanish/eunja vs. Category:Rhymes:Spanish/au̯la; Category:Rhymes:Spanish/avia (BTW /v/ occurs as neither a phoneme nor allophone in Spanish between vowels) vs. Category:Rhymes:Spanish/eθja; Category:Rhymes:Spanish/ar vs. Category:Rhymes:Spanish/aɾ; Category:Rhymes:Spanish/armako vs. Category:Rhymes:Spanish/eɾma; Category:Rhymes:Spanish/adra, Category:Rhymes:Spanish/ada, Category:Rhymes:Spanish/able vs. Category:Rhymes:Spanish/aða, Category:Rhymes:Spanish/aβulo.
  7. Issues for German rhymes:
    1. German rhymes inconsistent about syllabic nasals: Category:Rhymes:German/ɔstn̩ vs. Category:Rhymes:German/ɔstən
    2. German rhymes inconsistent about /j/ vs. /i̯/: Category:Rhymes:German/aljə vs. Category:Rhymes:German/aːli̯ən
  8. Issues for Portuguese and Galician rhymes:
    1. In Portuguese, both Portugal and Brazilian rhymes (maybe this is OK): cf. Category:Rhymes:Portuguese/ɐmɨ vs. Category:Rhymes:Portuguese/ɛk(i) vs. Category:Rhymes:Portuguese/aʎi (but last two probably inconsistent); Category:Rhymes:Portuguese/azɨ vs. Category:Rhymes:Portuguese/azi
    2. In Brazilian Portuguese, rhymes inconsistent about written -te /tʃi/: Category:Rhymes:Portuguese/awtʃi, Category:Rhymes:Portuguese/ɐ̃tʃi vs. Category:Rhymes:Portuguese/ɛt(ʃ)i, Category:Rhymes:Portuguese/et(ʃ)i vs. Category:Rhymes:Portuguese/ɔt͡ʃiku vs. Category:Rhymes:Portuguese/ɔtiku (this latter one from Portugal?)
    3. Galician rhymes inconsistent about final written o: Category:Rhymes:Galician/anʊ vs. Category:Rhymes:Galician/aɲo vs. Category:Rhymes:Galician/aθo̝
  9. Icelandic rhymes inconsistent about voiceless resonants (phonetic detail?): Category:Rhymes:Icelandic/aulkʏr vs. Category:Rhymes:Icelandic/aul̥kʏr; Category:Rhymes:Icelandic/ir̥ka vs. Category:Rhymes:Icelandic/irtna
  10. Russian rhymes inconsistent about /l/ vs. /ɫ/; cf. Category:Rhymes:Russian/al vs. Category:Rhymes:Russian/aɫ (the former much more common)
  11. Malay rhymes: same word will have three possible rhymes like Category:Rhymes:Malay/apar, Category:Rhymes:Malay/par, Category:Rhymes:Malay/ar (one beginning with a consonant). What is going on here?
  12. Catalan rhymes like Category:Rhymes:Catalan/oɾ vs. Category:Rhymes:Catalan/o(r) (no consistency on final -r vs. -ɾ and presence or absence of parens).
  13. Czech and other language rhymes inconsistent in semivowels: Category:Rhymes:Czech/ouʃɛk vs. Category:Rhymes:Czech/ou̯ɦiː, Category:Rhymes:Czech/outɛk vs. Category:Rhymes:Czech/ou̯t; Category:Rhymes:Dutch/ɛin vs. Category:Rhymes:Dutch/ɛi̯nər (in Dutch, the latter is much more common).
  14. Inconsistent whether to write /t͡ʃ/ /t͡s/ or /tʃ/ /ts/ (and similar) cross-linguistically and sometimes within a language; cf. Czech Category:Rhymes:Czech/ɛt͡ʃɛr vs. Category:Rhymes:Czech/ɛtʃɛk
  15. Inconsistent whether to write /mː/ or /mm/ (and similar) cross-linguistically (and sometimes within a language?); the former seems more common.
  16. Stray dots in rhymes; lots of examples in Urdu, e.g. Category:Rhymes:Urdu/iː.mɑː, also in other languages like Category:Rhymes:Malay/ə.ŋ̩ah.
  17. Stray hyphens in rhymes like Tagalog Category:Rhymes:Tagalog/-ot and Category:Rhymes:Tagalog/-is.
  18. IPA stress marks in rhymes, e.g. Category:Rhymes:Danish/ˈʌɪ̯lə, Category:Rhymes:English/aɪˌsɪkəl.
  19. Rhymes using regular /g/ instead of IPA /ɡ/, cf. Category:Rhymes:Polish/ɛgas (very few examples).

Benwing2 (talk) 06:02, 26 September 2021 (UTC)[]

Malay /-ə.ŋ̩ah/ is an error. The nasal is not syllabic.
Personally, although I speak US English, I'd prefer the rhymes to be rhotic British. That's because RP makes more distinctions than GA. Just as rhymes can be grouped together for those that conflate for non-rhotic speakers, they can be grouped for ppl who make the marry-merry-Mary merger. It doesn't make sense to me to have separate RP and GA rhymes. For ppl who don't speak either RP or GA, they'd have to conflate some of the rhymes anyway. Better IMO to ask the same thing of everyone. kwami (talk) 08:35, 26 September 2021 (UTC)[]
4) Probably special-casing English somehow to only have categories with one of the r-signs; then categorize by the rhotic pronunciation and contain links forwarding to the corresponding rhyme without /r/ in each rhyme category page with a rhotic consonant.
Strictly the problem comes from categorizing from IPA in the first place. For English we would have to give the lexical set a vowel belongs to. Editors tend to reflect the north–force merger of course, for example, but it does not give the ideal categorization, as thus we don’t get to know, for example, whether the vowel is [ɒːɹ] or [oːɹ] in conservative Irish English: it is the former in stork but the page does not tell. It would also make sense to distinguish, lacking NURSE mergers, the vowel [ʊːɹ] in dirt, from the /ɛr/ and germ. So what you actually want is an abstraction layer from the actual realizations, which likely needs to be based on the Middle English vowels, that would allow a poet to sort the rhyme lists according to his dialect. Since this is practically different owing to the taciturnity of language material about the lexical sets, or the specialist knowledge demanded by this that only Middle English editors are expected to have, one might have concurrent systems, one of which gives more information than the other. (I did not give this view before, not to completely confuse and unsettle you pending your making a first implementation, Benwing.)
7a) I think it should be Category:Rhymes:German/ɔstən since that is the phonematic level, and the pronunciation aimed at by speakers—syllabic nasals are exaggerated by language students. Even when the majority of speakers in the majority of cases realizes a syllabic nasal, express schwa is by far not fictitious.
7b) /i̯/ is not /j/. There is a difference whether it is a part of a diphthong or already a semiconsonant. In careful speech words which have the former may even have the vowel as a whole syllable, perhaps extra-short. Though ultimately the words which have it can all have /j/ and this sounds authentic at least in Northern Germany.
19) I guess IPA /ɡ/ is logically expected. For if we have to use it in {{IPA}} it is confusing to have the opposite in any other place related to pronunciation sections. Fay Freak (talk) 14:16, 26 September 2021 (UTC)[]

@Benwing2, as far as Hungarian is concerned, I may have made a mistake by creating consonant-initial rhyme pages. However, the original concept of rhymes was not workable for Hungarian as it always has the stress on the first syllable, so e.g. we would have had to create Rhymes:Hungarian/ɛkːylømbøstɛthɛtɛtlɛnʃeːɡ for megkülönböztethetetlenség, resulting in an inordinate amount of pages (tens of thousands). For words ending in a vowel, I had practically two choices: either creating very few pages for their last vowel with tens of thousands of entries, or creating (again) tens of thousands of pages for their last -VC(C(C))V sequence with very few entries on most of them. I intended to take the golden mean by creating rhyme pages for their -CV endings. It also had the advantage of lending itself to this table (on the left). If we want to change it, the only alternative I can imagine is splitting them into -VC(C(C))V pages because some -i final rhyme pages are already crowded (e.g. Rhymes:Hungarian/ʃi). Then again, we'd lose the option of having an overview of them all in a single chart. True enough, there are also separate pages for each of the 14 -VC(C(C(C))) rhymes (like Rhymes:Hungarian/ɛ-, available with the "Navigation" bar above). Maybe there's a meaningful way of arranging -VC(C(C))V pages as well. (?) Adam78 (talk) 15:41, 26 September 2021 (UTC)[]

What counts as a rhyme in Hungarian poetry? That's the issue here. In English, a rhyme starts with the vowel of the stressed syllable, but we can't assume that's the case for other languages. E.g. French has no stress, so stress rules obviously aren't going to work. kwami (talk) 22:33, 26 September 2021 (UTC)[]
But doesn't e muet count for French rimes? >:) --RichardW57m (talk) 14:27, 27 September 2021 (UTC)[]

@Kwamikagami I never really thought of it in terms of poetry, but instead in the phonetic sense, like a reverse dictionary (except that the latter gives preference to spelling over pronunciation). In poetry, Fönn az égen ragyogó nap; ¶ Csillanó tükrén a tónak and Királyasszony, ném, ¶ Az egekre rném may be rhymes, despite the difference in the very last consonant and the consonant (cluster) preceding the last vowel, respectively. (Examples taken from Rím in Hu. WP; you can browse it for other types.)

In fact, I've been wondering for a long time why it's not possible in the English-language Wiktionary to browse categories in the reverse order, as in the Catalan-, German-, Latin-, Polish-, and Russian-language Wiktionaries. I must note that the German version is especially impressive.

Overall, if stress is not accounted for (as in Hungarian or in French), I think rhymes could be taken either (1) as an agreement of the last n sounds (phonemes), with whatever n working best for the arrangement of entries available, or (2) as an agreement of the last one or two syllables, counted from the last or the penultimate vowel. (I can rearrange the vowel-final rhyme pages in Hungarian to include two syllables, from the penultimate vowel.) Adam78 (talk) 14:35, 27 September 2021 (UTC)[]

Does anybody use rhymes categories? If so, who and how? Vox Sciurorum (talk) 14:57, 27 September 2021 (UTC)[]

I don't use them on Wikt, because IMO they're not complete enough to be useful, but I have used rhyming dictionaries. The reasons were (a) finding a rhyme for poetry and (b) verifying claims of refractory rhymes. AFAIK, (a) is what they were created for.
From above, it sounds like it would be difficult to decide what goes into a particular rhyming category for Hungarian. kwami (talk) 18:29, 27 September 2021 (UTC)[]
  • I've deleted ~50 Spanish categories that had one issue or another, plus some egregious characters that don't belong in other languages. Spanish is mostly standardized now, except @Benwing2 isn't Category:Rhymes:Spanish/aʝa correct? It's the module's default and the most common pronunciation, it seems. I'm not sure how to tackle parentheses in Catalan. Do we care if rhymes are separated by spelling? Probably not, right? If that's the case, I'd categorize e.g. agricultor as rhyming with both -o and -oɾ. Ultimateria (talk) 04:28, 28 September 2021 (UTC)[]
As for Catalan see at ca.wiki -oɾ, -o(ɾ) and -o categories. Depending of the dialect, the rhyme is completed with 1, 2 or 3 categories. For a Valencian it is confusing to find agricultor at -o, and for a Central Catalan it is confusing at -oɾ. That's the reason for rhyme -o(ɾ), as in ca:agricultor. Using /r/ or /ɾ/ in a coda is not phonemic. --Vriullop (talk) 08:58, 28 September 2021 (UTC)[]

English compounds containing spaces?Edit

Are English compounds always single or hyphenated words (for example, heavyweight and blue-collar), or can a term consisting of several words separated by spaces (for example, barking mad, ground zero, and prince regent) be considered a compound? "w:Compound (linguistics)" says: "As a member of the Germanic family of languages, English is unusual in that even simple compounds made since the 18th century tend to be written in separate parts. This would be an error in other Germanic languages such as Norwegian, Swedish, Danish, German and Dutch. However, this is merely an orthographic convention: As in other Germanic languages, arbitrary noun phrases, for example "girl scout troop", "city council member", and "cellar door", can be made up on the spot and used as compound nouns in English too." However, @Metaknowledge doubts the correctness of this. (@Inqilābī.) — SGconlaw (talk) 17:29, 27 September 2021 (UTC)[]

Note that "Category:English compound words" and its subcategories contain numerous entries that are terms made up of two or more words separated with spaces. — SGconlaw (talk) 17:34, 27 September 2021 (UTC)[]

You can generally tell a compound because one of the components loses its stress. E.g. a 'high school' is stressed on 'high', while 'high' + 'school' (a school at a high altitude? on pot?) is stressed on both. Orthography isn't a good guide. kwami (talk) 18:21, 27 September 2021 (UTC)[]
@Sgconlaw is misrepresenting what I said, and I do not appreciate him doing so. Here is what I said: "In a comparative Germanic context, I certainly agree with Wikipedia's statement. However, for the purposes of English lexicography, a distinction is generally drawn between spaced and unspaced compounds, with only the latter being considered compound words." There is no reason to include "high + school" as the etymology of high school when both words are already linked in the headword line. —Μετάknowledgediscuss/deeds 20:00, 27 September 2021 (UTC)[]
@Metaknowledge: sorry, I did not mean to. But then I don't quite understand what you mean – are spaced compounds then not compound words, despite the use of the word compounds? — SGconlaw (talk) 20:13, 27 September 2021 (UTC)[]
If you don't know how to summarise someone else's statements, you should simply quote them. The distinction is that traditionally, English lexicography has drawn a division between lemmata and words; we generally refer to them interchangeably on Wiktionary. —Μετάknowledgediscuss/deeds 20:25, 27 September 2021 (UTC)[]
There’s no reason why English should be receiving a special treatment in a multilingual dictionary. ·~ dictátor·mundꟾ 20:28, 27 September 2021 (UTC)[]
Still it’s preferable to categorize open compounds using {{com}}. I do not even understand why individual words should be linked in the headword line in the first place. ·~ dictátor·mundꟾ 20:08, 27 September 2021 (UTC)[]
These open compounds are more succinctly dealt with in the current way. The multiword form shows that it is a compound (or phrase), and avoids the need for separate linking. Or are you saying that there is an ambiguity between compound words and phrases? --RichardW57 (talk) 20:54, 28 September 2021 (UTC)[]
I would just add that, while Metaknowledge objects to ‘treating multiword terms as compounds in the etymology section’, we actually have a convention of manually categorizing such terms as compound nouns and compound adjectives. Anyways, following the orthography to determine compounds is indeed a backward and ignorant practice, that needs to be eradicated in favour of proper linguistic treatment of the term. ·~ dictátor·mundꟾ 20:02, 27 September 2021 (UTC)[]
That is not a standardised convention, but it also a separate proposal from adding these excessive etymologies. —Μετάknowledgediscuss/deeds 20:25, 27 September 2021 (UTC)[]
  • It's inefficient to treat multiword terms as compounds in the etymology section, and individually linking the components (which is done at the headword). The prevalent practise now is just using the etymology section to show the literal meaning or anything other than A + B. I think that this current practise is better. Svārtava2 • 07:19, 29 September 2021 (UTC)[]

──────────────────────────────────────────────────────────────────────────────────────────────────── Apart from what (if anything) should be indicated in the etymology section, are we quite sure that, say, a noun entry consisting of two or more words with spaces between them is not to be considered a compound noun? Thus, even if {{compound}} is not used in the etymology section, the entry should not be manually added to "Category:English compound nouns"? In other words, is it correct to assume that English compound nouns must either be single words, or words separated only by hyphens? (And if so why? I'd love to know the reason, in the light of what the Wikipedia article says.) — SGconlaw (talk) 17:34, 29 September 2021 (UTC)[]

@Sgconlaw: This proposal looks like a vote material, as it involves standardising the way we treat compounds. ·~ dictátor·mundꟾ 04:14, 2 October 2021 (UTC)[]

Why {{inh+}}/{{bor+}}Edit

Following the suggestion of editors like @Sgconlaw, PUC, Erutuon that the underlying issue regarding the templates needs to be resolved first, I am going to state some serious facts that should encourage everyone to use the templates.

  • The linguistic terms ‘inherited’ and ‘borrowed’ need to be linked to Glossary because we have a convention of linking terminologies or any technical jargons. For example the grammatical cases, numbers, tense and aspect, labels like dated, obsolete, literary, etc. etc. When some people bring the argument that the link is really unneeded, they at first have to consider the fact that on Wiktionary we link any key terms, and we do not even spare definitions!
  • The terms ‘inherited’ and ‘borrowed’ are necessary to display because the reader has to check the source code to know if the term is inherited. Because of the prevalence of {{der}} (when in fact more specific etymology templates should be used), we cannot ignore the fact that we have simply failed to give a good presentation of etymology, so using {{inh+}} and {{bor+}} solves the problem.
  • None of the opposers of the new templates work in Indo-Aryan languages in general (here I am not counting dealing with Sanskrit etymologies, but am referring to Middle and New Indo-Aryan languages specifically), and yet they are opposed to new templates that the advocates of the new templates find very useful. So I think the opposers have no right to oppose their usage when in fact they do not, or hardly, work in those languages.
  • Just like other etymology templates like {{lbor}}, {{slbor}}, {{clq}}, {{pclq}}, etc., the new templates help prevent typographical errors. When I was new here, I used to publish edits with typos like borowed, I herited; I am still prone to these but I now avoid doing them by carefully checking the preview. No doubt, new editors would find the templates extremely helpful.
  • The display of the etymological wording is very, very useful to the uninitiated; most people in the world do not know the discipline of Linguistics, and this display helps any to-be amateur linguists to learn about word origin. Helping spreading knowledge is beneficial given that most people have incorrect conceptions about etymologies. For instance, most Indo-Aryan speakers (including lexicographers) think that inherited words are corruptions of tatsamas, they regard learned loans as the correct form, and have no idea what an inheritance is. People do not even know what a Germanic language is, and thus assume English is a Romance language. Thus, the objective of any Wikimedia project is to make knowledge accessible to the whole population: just because 0.01% of the populace is knowledgeable about linguistics does in no wise suggest that we can assume our readers are all-knowing.

So are the opposers willing to acknowledge that the new templates solve existing problems and do no harm to the project?


General recommendation
Both {{inh+}} and {{bor+}} should be freely used in all language entries.
Strong recommendation
The new templates must be used in language entries where the editors favour their usage.

That said, there are of course cases where {{inh+}} and {{bor+}} are unneeded. Protolanguages need not use these new templates. Also, when the etymon is a substrate word, the wording ‘borrowed from’ is unnecessary. ·~ dictátor·mundꟾ 20:57, 27 September 2021 (UTC)[]

Some of your arguments are quite unconvincing, and I hope I'm speaking for most opposers of the templates:
  • "The linguistic terms ‘inherited’ and ‘borrowed’ need to be linked to Glossary because we have a convention of linking terminologies or any technical jargons." - this already implies the terms are needed in the etymology, and thus not an argument if they aren't.
  • "The terms ‘inherited’ and ‘borrowed’ are necessary to display because the reader has to check the source code to know if the term is inherited." - This implies that readers don't assume inheritance, which I'm pretty sure they do, unless we do our best to resist this.
  • "[T]he prevalence of {{der}} (when in fact more specific etymology templates should be used), we cannot ignore the fact that we have simply failed to give a good presentation of etymology" - When more specific etymology templates should be used, the etymology is done badly, and should be fixed. We can't do anything about errors in our dictionary, other than fix them.
  • "[M]ost people in the world do not know the discipline of Linguistics, and this display helps any to-be amateur linguists to learn about word origin." - I know it sounds harsh, but that's simply not our job. If we were to include the terms anyway, I would agree to link them, but I doubt we need to. Furthermore, most of our readers are familiar with linguistics to some degree, and if they aren't, it's doubtful they want and need to know the distinction between inheritance and borrowing
  • "Just like other etymology templates [...] the new templates help prevent typographical errors." - I doubt that makes much of a difference: We still need L3s and L4s spelled properly, and IIRC, ToilBot (talkcontribs) cleans up such misspellings quite well. And, once more, it implies we need the terms.
  • "None of the opposers of the new templates work in Indo-Aryan languages in general" - Sure, but my problem doesn't lie in Indo-Aryan languages, but rather in the templates spreading to other communities, which I do edit.
So, all in all, I do not think the templates solve any real problems in mother-daughter relations where inheritance isn't ambiguous, and that we should rather focus on creating an environment where these templates are not needed. Thadh (talk) 21:45, 27 September 2021 (UTC)[]
@Thadh: I see that you did not really address the concerns. 1) Many people are opposed to the linking of the words ‘inherited’ and ‘borrowed’, hence I justified & explained why the link should be there. 2) If the creation of new templates makes fixing errors easier, we should definitely have them. 3) Making knowledge accessible to the whole of humanity is our foremost job; I myself I did not know about linguistics before 2019, and Wiktionary helped me to understand the difference between inheritances and borrowings — your assumption about readers is not helpful. 4) Toilbot fixes only headings, not any other typos, I think you are mistaken there (correct me if I am wrong).
All I can understand is that you just personally dislike the new templates. ·~ dictátor·mundꟾ 13:32, 28 September 2021 (UTC)[]
@Inqilābī: I feel like we're having the same conversation over and over again. First of all, the creation of the new templates doesn't make fixing errors easier, because "Inherited from Old Hindi, from Sanskrit" is just as 'ambiguous' as a bare "From Sanskrit". Second, the fact you learned through Wiktionary doesn't mean we should be focussing on teaching linguistics on this website - that's not what this wiki is for, you ought to attend classes for that. We use the practices of linguists to write the dictionary, and the fact some people don't know what a noun is doesn't mean we should linkify our headers either! Finally, if misspellings like "b rrowed" or "iherited" are such a big problem, I'm sure @Erutuon wouldn't mind adding these to their code.
And yes, I don't like the templates for multiple reasons. In my opinion, "Inherited from" is cluttering and useless, "Borrowed from" shouldn't be obligatory either and finally, creating these templates after the vote about them didn't pass without any discussion with the opposers and continuing the use of these after multiple pleas to stop is just outright uncollaborative. Thadh (talk) 18:11, 28 September 2021 (UTC)[]
using only from is a prevalent bad practise even for {{bor}}. Not all {{bor|hi|sa}}s are properly preceded by Borrowed from. This can really lead to mistakes. one statement I find 100% true is most Indo-Aryan speakers (including lexicographers) think that inherited words are corruptions of tatsamas, they regard learned loans as the correct form, and have no idea what an inheritance is. yes, i regarded the inherited words as corruptions. See also User_talk:Bhagadatta/2020#Borrowings,_Learned_borrowings_and_inherited_words. i had not as much linguistic knowledge back then. Strangely I considered "derived" as the corrupted ones/tadbhavas and "borrowed" as tatsamas. From seeing जाल (jāl) (at that time which I believed was a tatsama, as of the similar spelling) I got the ridiculous thought the inherited were actually tatsama! Once I actually used {{inh}} for a tatsama. I can only say had these templates been used then, all this wouldn't happen. These templates are especially beneficial in IA languages for newcomers and readers alike. Svārtava2 • 06:41, 28 September 2021 (UTC)[]
As someone who works in Indo-Iranian languages, I can say the above is total nonsense with the purpose of creating a false dilemma. Tatsamas are no different than any other borrowings from Latin or Greek. --{{victar|talk}} 06:58, 28 September 2021 (UTC)[]
it isn't. Svārtava2 • 07:19, 29 September 2021 (UTC)[]
@Victar: You deal with protolanguages, and I said there is no need to use those templates for protolanguages— so why do the templates bother you? ·~ dictátor·mundꟾ 13:00, 28 September 2021 (UTC)[]
You think because I create proto entries means I don't understand and work in the languages below them? SodhakSH/Svartava is also adding the templates to every page he edits, whether it be Pali, English, French, whathaveyou. --{{victar|talk}} 15:44, 28 September 2021 (UTC)[]
There is no problem. As it is I don't regularly edit Pali, English, French. Svārtava2 • 03:19, 29 September 2021 (UTC)[]

──────────────────────────────────────────────────────────────────────────────────────────────────── To resolve the underlying issues, I feel what is needed is a discussion about whether we should (1) always use terms like "Borrowed from" and "Inherited from" in etymology sections; (2) never use these terms; or (3) leave it to editors to use their discretion as to whether the terms should be used. Then the {{bor}} and {{inh}} templates can be updated accordingly, and {{bor+}} and {{inh+}} deleted. — SGconlaw (talk) 13:39, 28 September 2021 (UTC)[]

@Sgconlaw: In a wiki, we the editors should cater to the needs of both the layperson and the educated — thus what is supposed to be done is to provide the full, detailed information that would satisfy someone who studies that subject/topic, and yet in a manner that the uninitiated is able to the grasp it. Now, in this case, most people in the world do not know about etymologies, so they have a right to see the linguistic terminologies that are used to describe word origin. We are, at the same time, showing the actual information someone interested (whether an amateur or an expert) is looking for. This justifies the implementation of the templets {{inh+}} and {{bor+}} in our etymologies. However the opposers of the templates do not care about all this.
Now, due to the opposition, the best way is to seek a compromise. The templates are favoured by Indo-Aryan editors the most, so the templates should be allowed in Indo-Aryan entries. There are of course editors working in various other families that support the templates as well, but at this stage I am not really sure how the option of ‘leave it to editors to use their discretion [] ’ would really work out, given this malevolent attitude: ‘If you do such replacements, I will revert you (particularly in the languages I work with); I'll also feel free to replace new instances with plain text (particularly in the languages I work with)’. Maybe you would be able to settle the dispute as an uninvolved editor? ·~ dictátor·mundꟾ 16:49, 28 September 2021 (UTC)[]
@Inqilābī: the way I see it, these are arguments that should be raised in a discussion about the underlying issue. So long as the underlying issue is not resolved, I don't see how arguing about why {{bor+}} or {{inh+}} should be used is going to change anything. I also don't see how agreeing to the use of the templates in Indo-Aryan entries and no others is in any way principled. — SGconlaw (talk) 18:26, 28 September 2021 (UTC)[]
Having both {{inh+}} and {{inh}} is an easily remembered way of choosing to have or not to have the preamble 'inherited from'; not all of us can remember the parameters that we may need - it is hard enough to remember what tailoring capabilities are available. As a consumer, 'from' is irritatingly ambiguous - has the author left out a word or not? There's also the problem that the reader may not agree with out ancestry assignments. If an etymology merely says 'From Sanskrit' for a word in a living language, it must surely mean '[ultimately] borrowed from Sanskrit', for in normal parlance no living language descends from Sanskrit. Nynorsk is another problem area. The downside of having two templates is that some editors hope to understand the raw wikicode, and every extra notation is a further burden on the memory. That is why we try to keep parameter names consistent between templates. --RichardW57m (talk) 16:08, 28 September 2021 (UTC)[]
Dude, all due respect, if you falsely believe "no living language descends from Sanskrit", you really shouldn't be making any statements on the language or its family. --{{victar|talk}} 18:23, 28 September 2021 (UTC)[]
I would refer the learned gentleman to Ancestor of Middle Indo-Aryan. But rest assured, my belief is correct with regard to the normal meaning of the word 'Sanskrit'. It is Wiktionary that is bending the term. --RichardW57 (talk) 20:18, 28 September 2021 (UTC)[]
@RichardW57: We could of course switch over to the name Old Indo-Aryan, and reserve the name ‘Sanskrit’ only for learned loans. But then the way we present our etymology needs to change: ‘From OIA (cf. Sanskrit [Term])’. Would you be happy with such a change? This would especially be helpful seeing as {{inh+}} looks doomed to die. ·~ dictátor·mundꟾ 02:58, 2 October 2021 (UTC)[]
I won't be happy. The current one with sa as ancestor of IA is better, even linguists like Turner and McGregor take Sanskrit as IA languages' ancestor. Anyways that's not the issue to discuss here. {{inh+}} isn't doomed to die. Svartava2 (talk) 03:11, 2 October 2021 (UTC)[]
Yea, a troll is never happy with anything. ·~ dictátor·mundꟾ 03:27, 2 October 2021 (UTC)[]
wow, sweet language. i am one of the most active IA editors, dealing with descendants of Sanskrit every now and then. I wouldn't opt for more complexity, we already have PIA. thanks for the lovely comment Svartava2 (talk) 03:37, 2 October 2021 (UTC)[]
You, Svartava2, are coming close to a lie here. They use it as a substitute for the unattested ancestor, because for most words Sanskrit is close enough. To answer Inqilābī, I would be happy with that. What I am distinctly unhappy with is linking to the hindutvin-infested Wikipedia article for 'Sanskrit'. --RichardW57 (talk) 06:12, 2 October 2021 (UTC)[]
user talk:Bhagadatta/2020#Sanskrit vs "pre-Sanskrit", @Bhagadatta, @AryamanA Svartava2 (talk) 06:41, 2 October 2021 (UTC)[]
@RichardW57: As youre quoting me in that link, obviously I agree with you to some extent. But just as English is descended from various dialects of Old English, so are MIA languages from various dialects of OIA. We could reconstruct Proto-Old English and call that the ancestor of English, but the redundancy makes it a silly notion, and instead we use West Saxon as the default. Its not a perfect analogy, but it gets the point across. --{{victar|talk}} 00:16, 3 October 2021 (UTC)[]
@Victar: The imperfections in the analogy are where the differences arrive. The analogy Old English:West Saxon::OIA:Sanskrit breaks down because we generally trace etmologies back to 'Old English' rather than 'West Saxon', and one can easily find find references to explicitly Anglian forms. There's the additional wrinkle that West Saxon does not borrow from Middle English, whereas Sanskrit does borrow from Prakrit. When a reader sees the word 'Sanskrit', he is liable to think that we mean Sanskrit. --RichardW57 (talk) 10:30, 3 October 2021 (UTC)[]
The key point for this discussion is that when a Middle or New Indo-Aryan word is described as being 'From Sanskrit', in simple cases different readers will have different interpretations - inheritance, borrowing or non-resolution of the issue. That is not good. --RichardW57 (talk) 11:01, 3 October 2021 (UTC)[]
I would also remind people that if we use a glossary to define our terms, our usage of those terms, at least, when linked to the glossary, should conform to the definition in the glossary. --RichardW57m (talk) 16:08, 28 September 2021 (UTC)[]
For the record, I am in support of {{bor+}} if: 1. users don't systematically replace {{bor}} with {{bor+}}, and 2. the overkill glossary link is removed. {{inh+}} is a non-starter for me, because I see absolutely no use for it. This was essentially the compromise put forth a month ago, --{{victar|talk}} 18:23, 28 September 2021 (UTC)[]
@RichardW57m: my preference is for just {{bor}} and {{inh}} but I have no strong feelings about this if editors prefer to have {{bor+}} and {{inh+}}. The main thing is to find consensus on the use of the phrases "Borrowed from" and "Inherited from" in etymologies instead of arguing about the templates. — SGconlaw (talk) 18:29, 28 September 2021 (UTC)[]
True. SodhakSH/Svartava believes all etymologies should explicitly state "Inherited from", including English entries, and is willing to edit war over it. --{{victar|talk}} 18:42, 28 September 2021 (UTC)[]
Yeah, even I was thinking about that. Inqilābī also adds {{inh+}}. I don't believe it is a problem if inheritance is stated. And many English entries like this and many French entries like this had "inherited" before the birth of the templates. Svārtava2 • 03:18, 29 September 2021 (UTC)[]
Victar and Thad seem to think that 'inherited from' is redundant. There is also an opinion around that readers will know what the ancestors of a language are. I believe both of these are unsound assumptions. I therefore favour starting an entry with an explicit 'inherited' or 'borrowed'. Thad raises the issue of chained etymologies, such as Hindi from Old Hindi from Sanskrit. In cases like this, I think we can sensibly rule that readers should read the subsequent modes as being by inheritance, i.e. instruct editors to visibly note intermediate borrowings. (Personally, I dislike chained etymologies because of the risk of multiple repetitions of an incorrect or out-of-fashion etymology.)
An example that would then need to be fixed is:
From {{bor|en|frm|impeccable}}, from {{der|en|la|impeccabilis||not liable to sin}}, from {{m|la|im-||not}} + {{m|la|pecco|peccare|to err, to sin}}.
The Latin word is borrowed by Middle French, not inherited from Middle French. There is no template available to generate the word 'borrowed' for that transmission. --RichardW57m (talk) 15:48, 30 September 2021 (UTC)[]
Incidentally, systematic suppression of 'inherited from' or 'borrowed from' as appropriate could be implemented on the basis of the languages if we used {{inh+}} and {{bor+}}. For example we could program that Latin to French is inheritance by default (if that is right) but that Latin to Romanian is borrowing by default. Personally I think such rules would be extremely confusing to the reader. --RichardW57m (talk) 12:09, 29 September 2021 (UTC).[]
This isn't the most important issue, but my name is Thadh with a final dh Thadh (talk) 16:13, 29 September 2021 (UTC)[]
Sorry @Thadh, that wasn't my only mistake in the post. While Romanian may have more words ultimately borrowed from Latin than inherited from Latin, a lot of them were borrowed via another Romance language, so for words whose 'immediate' source is Latin, inheritance will still be commoner. There may be other significant features, as I gather a lot of the loans entered the language as learned semi-learned borrowings to replace words of Slavonic origin. --RichardW57 (talk) 07:59, 30 September 2021 (UTC)[]
Certainly we must assume that the average consumer of Wiktionary knows nothing of linguistics; they're merely using a dictionary. Likewise, all this fuss about linking inherited or borrowed seems absurd; when in doubt, link them.--Prosfilaes (talk) 01:07, 30 September 2021 (UTC)[]

Glossary definition of inherited.Edit

I've improved (I hope) the glossary definition of inherited to read 'through regular or sporadic sound change' rather than 'through regular sound change'. --RichardW57m (talk) 12:24, 29 September 2021 (UTC)[]

October 2021

Definitions of LettersEdit

As words of a particular language, many letters have definitions such as "the second letter of the Welsh alphabet". (The Welsh entries themselves are not quite so bad, as they also then spell out the letter and gives their predecessors and successors.) Such definitions are intrinsically unstable, for letters may be inserted in an alphabet. For example, the letter 'j' has been added to the Welsh alphabet since I was a child, and as a result of different sources we now have the opening definition "the fourteenth letter of the Welsh alphabet" for both J and L! As a result of the deletion of letters, both Ll and N are defined as 'the 14th letter of the Spanish alphabet'. --RichardW57m (talk) 11:11, 1 October 2021 (UTC)[]

I therefore feel that it would be appropriate to change definitions of one-character letters from "the nth letter of the WW alphabet" to "the letter of the WW alphabet used as the header word of this entry", and add "It is the nth letter of the WW alphabet" to the "Trivia" section of the entry. History may cause the trivium section to expand. Multi-character letters would be handled by analogy. As boldly making this change might be considered vandalism, what do people feel about this proposed change? Does it need a vote? --RichardW57m (talk) 11:11, 1 October 2021 (UTC)[]

Should we be documenting the use of letters in non-additive numbering systems, such as 'Section 5(c)'? The most significant feature of such systems is that some letters are not used in such lists. I can see an argument that such documentation belongs to a grammar, rather than a lexicon.--RichardW57m (talk) 11:11, 1 October 2021 (UTC)[]

I feel like this discussion will be pointless if the vote about letters entries passes. Thadh (talk) 11:23, 1 October 2021 (UTC)[]
@Thadh: How so? Are you assuming that all the letter entries of a language can be squeezed into a single table? --RichardW57m (talk) 12:30, 1 October 2021 (UTC)[]
@RichardW57m: Not necessarily in a table, but they probably won't look the same way they do now, so it doesn't make much sense to discuss the way they look in entries before we know where the vote's heading. Thadh (talk) 13:47, 1 October 2021 (UTC)[]

Rhyming categories for Middle ChineseEdit

I think all the data for Middle Chinese rhymes are already there. Those data were sourced from rhyme dictionaries in the first place. Is there plan for actually implementing Middle Chinese rhyming categories? This may even be a fairly good case for automation. --Frigoris (talk) 16:53, 1 October 2021 (UTC)[]

HSK lists of Mandarin words updateEdit

Currently, Wiktionary has Appendix:HSK list of Mandarin words accumulating all the vocabulary of the old (pre-2010) HSK test. Recently, the exam was reformed, and the lists of words and characters were published. See this pdf for official specifications. Thus, I propose to update the appendix.

I made drafts of the new HSK word lists:

HSK Beginner (levels 1-3): all three levels

HSK Intermediate (levels 4-6): level 4, level 5, level 6

HSK Advanced (levels 7-9): a-h, j-s, sh-zh

The words are OCRed from the paper, and then converted into traditional characters with some manual corrections. I think some proofreading is still needed.

The following problems arise here:

  1. What should be done with the old appendix?
  2. How should the new appendix be divided? The current version of the HSK has 9 levels grouped in 3 ranks. The high levels (7-9) are not delimited, but they contain roughly as many words as all the preceding levels combined (5636 vs 5456). Note that it's computationally heavy to have a huge amount of words in Template:zh-l on a single page.
  3. There is a category tied to the old word lists, see Category:Mandarin by difficulty level. You may want to reorganize it.
  4. Many words in the HSK can be considered SoPs, and some of them were previously deleted on that ground (see the red links on my drafts).
  5. Many words in the HSK have optional erhua. How should they be listed in the new Appendix?
  6. I think everyone would agree on inclusion of traditional forms of the words, but what should be done about the variant pronunciations (Taiwanese or colloquial Mainland) not listed in the official HSK paper? Should they also be included? --YousuhrNaym (talk) 23:51, 3 October 2021 (UTC)[]

Let's talk about the Desktop ImprovementsEdit


Have you noticed that some wikis have a different desktop interface? Are you curious about the next steps? Maybe you have questions or ideas regarding the design or technical matters?

Join an online meeting with the team working on the Desktop Improvements! It will take place on October 12th, 16:00 UTC on Zoom. It will last an hour. Click here to join.


  • Update on the recent developments
  • Sticky header - presentation of the demo version
  • Questions and answers, discussion


The meeting will not be recorded or streamed. Notes will be taken in a Google Docs file. The presentation part (first two points in the agenda) will be given in English.

We can answer questions asked in English, French, Polish, and Spanish. If you would like to ask questions in advance, add them on the talk page or send them to sgrabarczuk@wikimedia.org.

Olga Vasileva (the team manager) will be hosting this meeting.

Invitation link

We hope to see you! SGrabarczuk (WMF) 15:09, 4 October 2021 (UTC)[]

Unifying the transliteration of ʾalef and ʿayin in Semitic languagesEdit

Dear Wiktionary Semitists, I'd like to bring to your attention the current lack of consistency in how ʾalef and ʿayin are transliterated across Semitic languages. Have a look at the following pages and compare transliterations, for example:

  1. Reconstruction:Proto-Semitic/ʕaśar-#Descendants.
  2. Reconstruction:Proto-Semitic/tišʕ-
  3. Reconstruction:Proto-Semitic/šabʕ-

The inconsistency is both inter- and intra-linguistic. It is quite confusing, and since it's basically just a stylistic question, I'd like to start a discussion on whether we should unify to the more traditional (but not user friendly, since they're small and difficult to tell apart) /ʾ/ and /ʿ/ or the more modern (and much more user friendly) /ʔ/ and /ʕ/. Opinions? Thoughts? Let's discuss! —⁠This unsigned comment was added by Sartma (talkcontribs) at 12:22, 5 October 2021 (UTC).[]

For Amharic, ʾ and ʿ are the ones in use and since these aren't contrastive, I would like to keep following that practice. I don't have a strong opinion on other Semitic languages though, but /ʔ/ and /ʕ/ do seem more user-friendly in languages where that distinction is relevant. Thadh (talk) 19:30, 5 October 2021 (UTC)[]
I'd rather consistency between languages that frequently appear together, like the Ge'ez-script languages or Arabic topolects. I don't see any reason why there should be consistency between all Semitic languages, which only appear next to each other on protolanguage entries. —Μετάknowledgediscuss/deeds 20:32, 5 October 2021 (UTC)[]
In my own handwritten notes I find I'm using the IPA symbols as just clearer. We don't have to use pure IPA in transcriptions, but the traditional little curly apostrophes, barely readable in a printed book, become impossible in a computer typeface. The IPA symbols magnify them and make them readable. If you're going to use š rather than sh in transcriptions, you're half way to pure phonetic symbols. The apostrophes are appropriate for semi-technical formats like maps and history books, but for a more linguistic purpose, use clear, readable, unambiguous symbols. --Hiztegilari (talk) 20:58, 5 October 2021 (UTC)[]
I support the IPA symbols except for the Gəʿəz-script languages, in which field the half rings seem uncontested, and as mentioned are also the distinction is less contrastive. For Akkadian I don’t know. Fay Freak (talk) 22:57, 5 October 2021 (UTC)[]
I have seen that in the Routledge volume The Semitic Languages, most authors use ʔ and ʕ even when they use conventional non-IPA symbols otherwise, e.g. ʔǝgziʔ-ä sämay yä-ṣnǝʕ mängǝśt-ǝyä (Butts, chapter "Gǝʕǝz"). I don't know if this is a general trend, but consistently using ʔ and ʕ in place of ʾ and ʿ is nothing unseen. –Austronesier (talk) 10:33, 6 October 2021 (UTC)[]
@Austronesier: True, I remember these fashionable books. They owe it to their character as general overviews, while Wiktionary’s mission is to document the individual languages in detail and as one does when one deals with a narrow selection of languages in detail. I framed the field as Ethiopian studies (Äthiopistik). While this is an Orchideenfach I do not know the people of who nowadays study it, I doubt little that the bulk of the field is gutted if it sees a deviation from that certain transcription system which we currently automatically put and which is of course followed and presented without even any question or any glance on an alternative by the Wikipedia article on Geʽez script—so nobody seeks an article like Romanization of Arabic for Ethiopian Semitic—, and Ethiopists would rather refrain from any change to it. Fay Freak (talk) 16:05, 6 October 2021 (UTC)[]
@Fay Freak: Good point. I can confirm from my very own experience that editors of such overview volumes set standards which contributors wouldn't normally follow in more specialized works: e.g. I was urged to change the name of a language to make it confirm with the ISO-standard (in that special case a real abomination). Since you say that the Gəʿəz transliteration in that book was adjusted to an in-volume standard that is otherwise uncommon, I agree we shouldn't really follow it. –Austronesier (talk) 16:28, 6 October 2021 (UTC)[]
I've just picked Al-Jallad as an example: in the Routledge volume, he uses ʔ and ʕ in the Safaitic chapter; but in his Safaitic grammar (Brill), he naturally uses ʾ and ʿ. –Austronesier (talk) 16:46, 6 October 2021 (UTC)[]
@Austronesier, Fay Freak, Thadh, Metaknowledge Ok, it looks like the majority is ok with using different signs depending on the language. But what about those languages that don't seem to have a standard at the moment? Like Aramaic (the various variety), Hebrew, Arabic and its topolects? To be honest, despite much preferring ʔ and ʕ, I'm more than happy to unify everything to ʾ and ʿ. In the end, there's no real "tradition" that uses ʔ and ʕ, these are just the more "modern" style. To me it's really strange to see Standard Arabic using ʾ/ʿ and other Arabic topolects using ʔ/ʕ, for example. There's no reason why it should be like this. What shall we do? Sartma (talk) 15:16, 7 October 2021 (UTC)[]
ʔ and ʕ—the easier if standardization is less relevant. I made the exception only for Ethiosemitic—which is separated by a mere, anyway; I think it will vex you not if we have ʾ and ʿ for Ethiosemitic and ʔ and ʕ elsewhere. Fay Freak (talk) 15:31, 7 October 2021 (UTC)[]
Arabic needs input from a great deal more people than will see and interact with this; we'd want a dedicated discussion at Wiktionary talk:About Arabic. As for Aramaic, it will never be completely unified, because some of the modern neo-Aramaic varieties have romanisation traditions that emerged independently from scholarly usage, and should be left as they are. For the long-extinct Aramaic varieties, we can do as we like, and though ʾ and ʿ are the closest we have to a standard for them, I would be happy to switch them over to ʔ and ʕ — although that could be putting the cart before the horse, in that most of the entries don't have romanisation at all and the scheme isn't completely settled anywhere. —Μετάknowledgediscuss/deeds 17:37, 7 October 2021 (UTC)[]

Request for new language family and proto-language codes: North Halmahera / Proto-North HalmaheraEdit

User:Alexlin01 and I (or better, mostly Alexlin01 who has been active as IP in the past) have started to add lemmas from languages of the North Halmahera family, together with etymologies from reconstructed proto-forms. There is an existing corpus of 180 proto-forms available, and we might carefully add more reconstructions based on regular sound correspondences.

The North Halmahera languages are part of the proposed West Papuan macrofamily which has the code [paa-wpa] in WT. While West Papuan is still tentative and only based on resemblance sets, North Halmahera is universally accepted, since it is as self-evident as e.g. the Slavic languages. Therefore, we request a code for North Halmahera and Proto-North Halmahera. North Halmahera would be under [paa-wpa] (West Papuan), and include the following languages:

  • Galela [gbi]
  • Gamkonora [gak]
  • Ibu [ibu]
  • Kao [kax]
  • Laba [lau]
  • Loloda [loa]
  • Modole [mqo]
  • Pagu [pgu]
  • Sahu [saj]
  • Tabaru [tby]
  • Ternate [tft]
  • Tidore [tvo]
  • Tobelo [tlb]
  • Tugutil [tuj]
  • Waioli [wli]
  • West Makian [mqs]

Currently, they are under [paa-wpa] (West Papuan) or the generic [paa] (Papuan). ‑Austronesier (talk) 07:35, 6 October 2021 (UTC)[]

Hi! Also, from these, Ibu is already extinct. Alexlin01 (talk) 14:34, 6 October 2021 (UTC)[]
@Austronesier Created paa-nha and paa-nha-pro. DTLHS (talk) 03:10, 8 October 2021 (UTC)[]
@DTLHS Great, many thanks! –Austronesier (talk) 08:43, 8 October 2021 (UTC)[]

Inconsistent treatment of Arabic words in Persianate languagesEdit

(Notifying AryamanA, Atitarev, Benwing2, Smettems, Kutchkutch, Bhagadatta, Msasag, Svartava2, Getsnoopy): @Allahverdi Verdizade

There is an inconsistency in the treatment of Arabic words in Persianate languages.

  • In South Asian languages, the proximal donor is given as Persian.
  • In Turkic languages (especially Turkish and Azeri), the proximal donor is given as Arabic.

For example, Hindi किताब (kitāb) is given as coming from Classical Persian کتاب(kitāb), while Azerbaijani kitab or Uzbek kitob is given as ("ultimately") coming from Arabic كِتَاب(kitāb) with no mention of Persian.

Could this be resolved one way or another? I suppose it's a bit iffier for Anatolian Turkish given that the Ottomans had direct contact with Arabic-speaking subject populations, but for Azeri Turkish or the Central Asian languages it should be the same situation as with South Asian languages, i.e. these words entered the language through the means of a Persianate literati class who used both Persian and Arabic, but whose primary language of writing was the former.

My understanding is that there is evidence of Persian mediation for both South Asian and Turkic languages, e.g. Hindi फ़ुर्सत (fursat) meaning "spare time" or Turkish macera meaning "adventure".--Tibidibi (talk) 13:32, 6 October 2021 (UTC)[]

Also ping @Vox Sciurorum, @Fay Freak.--Tibidibi (talk) 13:50, 6 October 2021 (UTC)[]
I mark Ottoman Turkish and Turkish terms as derived from Arabic unless I have evidence that one was borrowed from Persian. If the word has been in Turkic languages from before the 13th century or so I may assume it was borrowed from Persian. Nineteenth century borrowings I assume were directly from Arabic, if not Ottoman coinages based on Arabic grammar. If there are any phonological or temporal guidelines to use, let me know. Vox Sciurorum (talk) 13:53, 6 October 2021 (UTC)[]
@Vox Sciurorum I think there is a stronger justification for having Ottoman terms be derived directly from Arabic because Persian was neither the language of the Ottoman administration nor that of any significant part of the population. For the Turkic languages east of the Ottoman-Safavid border, and for all South Asian languages, the influence of Persian as a prestige language was much more direct.--Tibidibi (talk) 14:10, 6 October 2021 (UTC)[]
What Squirrels Voice said.
Also, I see zero value in clogging up the etymology of Arabic derivatives with an extra piece of information, which is hardly provable anyway if it came in through Persian or directly via bookish contexts. Allahverdi Verdizade (talk) 14:15, 6 October 2021 (UTC)[]
The Seljuk dynasty that invaded Anatolia after their victory in the Battle of Manzikert was a Persianate society. While Ottoman Turkish was not Persian, the language was replete with loanwords from Persian covering cultural and administrative terminology, while Arabic was the donor for many religious terms. Some of the Persian loanwords the Seljuks brought with them to Anatolia came from Arabic. It is IMO truly impossible to decide whether the proximate source of Ottoman Turkish فلسفه‎ was the (identically spelled) Persian term, or, directly, Arabic فلسفة‎. The choice not to mention Persian as a possible donor is then merely a choice for the sake of convenience, not a matter of principle.  --Lambiam 17:25, 10 October 2021 (UTC)[]
The distribution of Persian has a cohesive epicentre while Arabic has been scattered all around the world. Have you heard of Uzbeki Arabic? Now Samarqand clearly was a hotspot of Arabic communication; from there Arabic-speaking tradesmen in low concentration reached Uyghuristan, in the vicinity of which Arabs learned words like خُتُو(ḵutū), on the entry of which I included a quote where Samarqand occurs as a casual station of Arabic rulers; I don’t think one has to imagine the mediation of communication by Persian, contact was generally Arabic language to Turkic language, this regard is most parsimonious. Fitting this picture, Persian words use to reach Mongolian but via Tibetan (!). For Anatolian Turkish it is only most prominent and most obvious, to a Westerner, that contact with Arabic was there, because Arabs were Ottoman subjects (but so they were Kipchak and Turkmen subjects before …). Fay Freak (talk) 15:38, 6 October 2021 (UTC)[]
@Fay Freak: Arab speakers in Khorasan are a small minority because the colonists there assimilated quickly. From The Cambridge History of Iran, Volume 4, page 602:
Alongside both the early dialects and dari, which had spread everywhere with a greater or lesser degree of local variation, Arabic had also taken root in Iran. It was of course the everyday language of the Arab immigrants: certain towns such as Dinavar, Zanjan, Nihavand, Kashan, Qum and Nishapur had a considerable Arab population and Arab tribes had also settled in Khurasan. However, these Arab elements were more or less rapidly assimilated: in the middle of the 2nd/8th century the majority of the Arabs in the army of Abu Muslim spoke dari.
In fact, the Islamic conquest led to the expansion of Persian and its replacement of local Eastern Iranian languages like Sogdian.
Since major urban centers such as Bukhara and Samarqand were clearly predominantly Persophone by the period when the region was becoming increasingly linguistically Turkic, I don't see any justification for claiming that most Arabic loans in e.g. Uzbek are directly from the small community of native Arabic speakers instead of reflecting Arabic's position as a prestige language upheld by a primarily Persophone literati elite.
Chagatai, the direct literary ancestor of Uzbek, was marked by extensive Persian influence (to the point that some texts have virtually no Turkic content words) and became a literary language explicitly on the model of Persian in Timurid and Shaybanid courts, both of which retained Persian as the chief bureaucratic language. I understand that Chagatai has little additional Arabic influence beyond what is already systemically found in Persian. Tibidibi (talk) 16:16, 6 October 2021 (UTC)[]
The point was that there had been a constant latent presence of Arabic, not only as traces in Persian. Be the communities more or less native or be they acquainted with it due to trade or war or education. Arabic was never eradicated and the influx was continuously renewed. While in India this latent presence lacked, Arabic was really remote and for the educated. Oddly of course Persian scholars wrote Arabic – for Samarqand I think of Najib ad-Din Samarqandi – while Indians wrote Persian, does this tell us something for the question of the thread? So in the former borrowings could be more from Arabic due to some familiarity. Fay Freak (talk) 16:30, 6 October 2021 (UTC)[]
If you actually read about Central Asian Arabic, you'll see that they bear signs of having close ties to dialects in Arab countries, which allows us to reconstruct migration events. This is clearly inconsistent with a "constant latent presence" of actual speakers (as opposed to scholars and clerics, who could only influence the language on a literary or religious level). For Indian and Central Asian Turkic languages, there is no reason not to assume a Persian intermediary unless specific evidence is brought to bear for a given word; for Turkish and Azerbaijani, I don't think it's generally knowable. —Μετάknowledgediscuss/deeds 03:30, 8 October 2021 (UTC)[]
@Metaknowledge Why do you think it's unknowable for Azerbaijani? I'm not really sure what the major difference would be between Azerbaijani and Chagatai vis-a-vis their relationship to Arabic/Arabs and Persian/Persians. Tibidibi (talk) 14:09, 10 October 2021 (UTC)[]
Because the West Oghuz tribes have actually been geographically adjacent to Arabs since around 1000 AD. Allahverdi Verdizade (talk) 10:53, 13 October 2021 (UTC)[]

Romanization pages for Mandarin and Cantonese - possible update task for a bot?Edit

Currently, the various romanization pages for Mandarin Pinyin and Cantonese Jyutping are in a poor state. I presume due to the quantity and ancillary nature of such entries, many are lacking updated content with common characters and there are inconsistent presentation of the relevant characters. Some examples:

  • For 烹, the pinyin entry pēng shows characters such as 硷 and 軽, which are simplified or variant forms but the linked traditional forms do not show this pronunciation. In the case of 軽, this character is more commonly recognised as Japanese Shinjitai since the regularly observed Chinese forms are 輕 and 轻.
  • paang1 does not show 烹 at all
  • xiǎn shows in list items 5 崄 and 6 嶮 which are the simplified and traditional version of the same character, while lower down item 23 lists 猃, 獫 together.
  • Also in xiǎn, item 17 濁 is shown but the simplified form is not included.

This seems to be a good target for a bot to update the entries if it is able to take all the existing pinyin and Jyutping pronunciations for all characters and to update the entries systematically, while also standardising the presentation of simplified and variant character forms. A good example to reference is shí which has a good number of entries (however I'm not sure if it includes all) and most entries list the traditional and simplified forms together. This entry does however list item 2 as "実, 实, 實, 寔", which is a bizarre ¿alphabetical? order of Shinjitai, simplified, traditional and variant characters. As for item ordering, it might seem like it is ordered by radical and stroke - this might be something that needs consideration for standardisation of the romanisation entries.

Would anybody be able to take on this task?

I can try to built such a bot but I have not built bots before and I believe it requires data scraping the pronunciations off all the existing entries, which will be a arduous task in itself, even if done with automation.

Zywxn (talk) 17:14, 6 October 2021 (UTC)[]

User TheNicodene - revert war to hide unresolved abuseEdit

The user is trying to obstruct my efforts at bringing to attention at addressing the abuse they've perpetrated against me deleting and archiving the discussion at Talk:formaticus. They're trying to hide the abuse and break the existing links in other discussions. The issue is not resolved and cannot be archived until it is. I request this user be blocked if they continue the edit war. Brutal Russian (talk) 05:31, 7 October 2021 (UTC)[]

I did not 'hide' the discussion; that is a flat-out lie which can be disproved by clicking the link. I placed the discussion in an archive and added a link at the top of the talk page; doing so with discussions over 75000 bytes, in order to free up space for new discussions, is standard Wiki practice. The discussion has not even been replied to for four months now. Nor did archiving it 'break links', which is another flat-out lie. Talk: formaticus functions exactly as it always did.
See here for a write up of only some of the insults this user has thrown at me over several months, for which he has even been temporarily blocked. I have no idea why he is suddenly acting up again after a merciful three-month hiatus. The Nicodene (talk) 05:53, 7 October 2021 (UTC)[]

Macedonian: standard, non-standard, misspellingEdit

@Chuck Entz, Erutuon, Metaknowledge Since I am now creating entries for non-lemma forms of verbs, I would like to discuss some issues relating to the treatment of non-standard and misspelled words. We scratched the surface with User:Erutuon in August, but there are quite a lot of problems to be addressed:

Currently, my entries are formatted as follows:

Assigned to: verbs, lemmas (I am omitting less relevant categories)
Assigned to: misspellings, non-lemmas
Assigned to: participles, non-lemmas
Assigned to: participles, misspellings, non-lemmas
  • очерупа - nonstandard word, non-lemma: "verb" in the headword line, {{lb|mk|nonstandard}} in the definition
Assigned to: verbs, non-standard terms, lemmas
Assigned to: participles, non-lemmas

The problems are as follows:

  • It is also possible to treat корегиран as a misspelling of коригиран, i.e. to link two non-lemma forms to each other, rather than defining each as an inflected form a lemma. I have always tended to opt for the second solution, including with categories other than partciples.
  • Putting "misspelling" in the headword line of a misspelled verb lemma prevents it from being assigned to "verbs", but putting "misspelling" in the headword line of a misspelled participle (non-lemma form) of a verb does not prevent it from being assigned to "participles", because the parameter "part" inside {{infl of}} seems to populate that category.
  • "misspelling" does not distinguish between misspelled lemmas and misspelled non-lemmas.
  • Non-lemma forms of non-standard words are not labelled in any way to indicate that they are non-standard, because if I write {{lb|mk|nonstandard}}, they will get categorized as non-standard terms, which is wrong (they are not terms but non-lemmas), whereas if I write {{lb|mk|nonstandard forms}}, that will technically be correct, except that this label is used elsewhere for non-standard forms of standard words (comparable to English "goed", a non-standard preterite of the standard "go").

Further complications:

Participles have their own inflection, e.g. "коригираниот", which is the definite form. I do not want this to link back to the verb коригира; it is more appropriate for it to be defined as {{infl of|mk|коригиран||def|m|s}}. It will then be assigned to participle forms, with the help of the headword line {{head|mk|participle forms}}. However, if the inflected participle is misspelled as "корегираниот", it would be defined as {{infl of|mk|корегиран||def|m|s}} and the headword line would be {{head|mk|misspelling}}. Consequently, there would be nothing to assign "корегираниот" to participle forms. This would be a second inconsistency, in addition to the aforementioned one ("misspelling" suppresses the category "verbs" but not the category "participles") Martin123xyz (talk) 11:48, 7 October 2021 (UTC)[]

Ideal solution:

Redefine the category system to have the following:

  • lemmas
  • non-lemma forms
  • misspelled lemmas
  • misspelled non-lemma forms
  • non-lemma forms of misspelled lemmas
  • non-standard lemmas
  • non-standard non-lemmas forms
  • non-lemma forms of non-standard lemmas

Each of these would contain subcategories for "noun", "verb", "adjective" instead of "lemma", e.g. "misspelled nouns", "misspelled noun forms", "forms of misspelled nouns", etc. There would be separate headers for each, e.g. {{head|mk|noun}}, {{head|mk|misspelled noun}}, {{head|mk|form of misspelled noun}} (with abbreviations for easier typing).

For dealing with non-lemma forms of non-lemma forms, like the declined forms of Macedonian participles, we would need the following:

  • participles < verb forms
  • misspelled participles < misspelled verb forms
  • participles of misspelled verbs < non-lemma forms of misspelled verbs
  • non-standard participles < nonstandard verb forms
  • participles of non-standard verbs < non-lemma forms of non-standard verbs
  • participle forms
  • forms of misspelled participles
  • forms of participles of misspelled verbs
  • forms of non-standard participles
  • forms of participles of non-standard verbs

This is in my opinion the maximal categorization that we arrive at when we take into account all the relevant factors that my creating Macedonian entries has brought to the fore so far. Any other system, including the current one, seems to me to be bound to blur at least one of the empirically established distinctions highlighted above.

I am assuming that no one will be happy to implement such a categorization system, but the overview I have provided above should still be helpful for keeping track of what exactly the current system obscures and coming up with improvements addressing individual problems only. Needless to say, the distinctions that I have presented will also apply to many other languages.

Pending improvements, I would like to ask if the way I format the six types of entries listed at the start of this post is appropriate for the time being, or is there something I could do better, or even should, according to Wiktionary policies. Martin123xyz (talk) 11:48, 7 October 2021 (UTC)[]

In my opinion, a misspelt noun or verb is still a noun or verb, and should be categorised as such. Converting the header line of a lemma to the header line of a misspelling is Visigothism, even if committed by @Equinox, and in English loses the mentions of inflections that one could otherwise find by searching. {{misspelling of}} provides the appropriate information and categorisation. --RichardW57 (talk) 02:52, 8 October 2021 (UTC)[]
When adding "misspelling" to the header line in addition to using {{misspelling of}}, I was complying with the instructions provided at Wiktionary:Misspellings. However, your suggestion resolves the two inconsistencies I referred to above. Martin123xyz (talk) 07:03, 8 October 2021 (UTC)[]
My thought on reading that is 'Quo Warranto?'. I don't know whether to amend Wiktionary:Misspellings, tag it as unadopted or simply request its deletion. Can anyone justify not treating misspelt English verbs as verbs? One problem is that a manual maintenance action needed for verbs will not happen simply because misspelt verbs are not listed as verbs. --RichardW57 (talk) 08:03, 8 October 2021 (UTC)[]
Requesting its deletion without providing new instructions would not be helpful. As long as there are some instructions, at least a certain degree of consistency between different users' contributions is ensured. And if you leave it as it is, more users will find it, assume that it is an official policy which enjoys the consensus of the community, and continue to adhere to it. Either way, the instructions for contributors regarding things like "misspellings" need to be significantly expanded - currently they are simplistic, in addition to being biased in favour of English entries. I am considering writing a user guide for Macedonian contributions, except that so many things are unregulated or poorly regulated on the English Wiktionary as a whole that I would need to make my own arbitrary decisions or keep asking here about every point. Martin123xyz (talk) 10:04, 8 October 2021 (UTC)[]
'Term' covers both lemma and non-lemma. --RichardW57 (talk) 02:52, 8 October 2021 (UTC)[]
Full information about a non-lemma should be given under the lemma; one would not wish to repeat the multiple meanings of a lemma for its inflected forms. Accordingly, it should suffice to record that something is the inflected form of a non-standard term by recording the non-standardhood at the parent term itself. --RichardW57 (talk) 02:52, 8 October 2021 (UTC)[]
Thank you for the input. Martin123xyz (talk) 07:03, 8 October 2021 (UTC)[]

I have noticed a further problem: not only is "nonstandard form" ambiguous between "inflected form of a nonstandard lemma" and "non-standard form a standard lemma", it can also be understood as "nonstandard equivalent/variant of a standard lemma" (on the analogy of "alternative form of". I had used it in this sense at допринесува recently. Regrettably, {{nonstandard form of}} does not address this threeway ambiguity. Martin123xyz (talk) 14:00, 8 October 2021 (UTC)[]

I just created a page for витруелен (vitruelen), using {{head|mk|misspelling}} and {{misspelling of|mk|виртуелен}}, and the entry appears in Category:Macedonian non-lemma forms and Category:Macedonian misspellings, which is wrong, because the word is misspelled lemma, not a non-lemma form. Maybe we need to use {{head|mk|misspelled lemma}} instead, and put those entries in Category:Macedonian misspelled lemmas? Gorec (talk) 14:47, 8 October 2021 (UTC)[]
The argument for using misspelling as a part of speech actually argues for splitting the lemma categories into misspelt and 'correctly' spelt lemmas. I'd rather add a parameter to {{mk-noun}} and {{en-verb}} etc. I'm waiting for an old hand to weigh in. --RichardW57 (talk) 16:48, 8 October 2021 (UTC)[]


As I've suggested before, we should establish an arbitration committee (much like the one Wikipedia has) to settle entrenched disputes among users. The finer details can be discussed later, but in general, is there any considerable support for this proposal? Imetsia (talk) 19:14, 8 October 2021 (UTC)[]

There is from my part! Of course we hope to not have any disputes at all, but as the previous year has shown, they are inevitable in a project of our size. Thadh (talk) 19:36, 8 October 2021 (UTC)[]
Just as seatbelts and airbags have lead to more automobile accidents, creating an arbitration committee is guaranteed to lead to more intransigence. Participants in such disputes are all fairly confident that they are in the right and that their PoV will be the prevailing one, with only minor concessions to the other side. Also, there will be less avoidance of potentially controversial edits and other changes because one's point of view will be perceived as more likely to prevail. DCDuring (talk) 20:23, 8 October 2021 (UTC)[]
I think I should clarify: I don't know how WP's arbitration works, but my idea was similar to what Vox Sciurorum proposes below. I think we ought to have some system where unaffiliated admins can resolve ongoing disputes. Thadh (talk) 10:36, 11 October 2021 (UTC)[]
ArbCom over at Wikipedia has not been a roaring success. It is very important that we recognise that the way their judicial system works is not ideal, it is simply how things happened to play out. Their ArbCom has three distinct purposes: policy, block appeals, and conflict resolution. There is no reason that one body should decide on all three, nor is this necessarily a good thing. As it stands, Wiktionary is much more democratic than Wikipedia, and we handle more policy through votes. I think this should remain the case. So the question is then whether block disputes (not just appeals, which are usually spurious, but where admins are actually in disagreement) and conflict resolution could be handled better than they are now, and at what cost. I think we could do better, so this idea has some merit — but we would also create a venue for the bickering that already distracts from the actual work of editing, and this has been a major effect of Wikipedia's ArbCom. —Μετάknowledgediscuss/deeds 20:39, 8 October 2021 (UTC)[]
  • I volunteer as head arbitrator Roger the Rodger (talk) 23:14, 8 October 2021 (UTC)[]
    • If I ever have a head that needs to be arbitrated, I'll know who to call... Chuck Entz (talk) 00:02, 9 October 2021 (UTC)[]
  •   Support because this would prevent long endless disputes like the recent one ({{inh+}} & {{bor+}}). Svartava2 (talk) 06:08, 9 October 2021 (UTC)[]
I agree with Μετα that WP:ArbCom is not as functional as one might wish, and with DCD that the laudible intention of avoiding arbitrariness in arbitration has led to rule codification paving the road to hell endless wikibickering. We should be careful what we wish for. A dispute over a deep disagreement can be held in an amicable way; what made recent disputes unpleasant were the sometimes implied, often straightforward accusations of bad faith cast at the other side. Perhaps an etiquette committee might do some good.  --Lambiam 16:57, 10 October 2021 (UTC)[]
I don't like the idea. I know I'm a bit of a handful but it's not "I don't want to be officially reprimanded" (I don't care if I'm officially reprimanded, that's fine), it's more, as Meta suggests above, I think that creating a special little judicial system-in-system does more to foster bullshit than it does to fix actual project issues. Equinox 17:22, 10 October 2021 (UTC)[]
It would be useful to have a way to resolve disputes where neither of two contradictory and strongly-held positions has supermajority support. I doubt a formal arbitration committee is the way. Maybe we can find a less formal way to have senior administrators cut the knot in cases like derivation wording without having every vote appealed to them. Vox Sciurorum (talk) 18:44, 10 October 2021 (UTC)[]
Say the proposal is instead to create "Wiktionary:Requests for Arbitration," where users can make their case, and well-established editors can vote in support of one disputant or another. I'd imagine this would be very similar to how we run RFD - no committees, formal procedures, rules of evidence, etc. And by the end of one month, we count the number of votes and act according to what the majority decides. Is this a "less formal way" that you'd support? (Really, this question goes to all users in this discussion who don't like the idea of forming an ArbCom). Imetsia (talk) 23:23, 13 October 2021 (UTC)[]
@Vox Sciurorum, Metaknowledge? —⁠This unsigned comment was added by Imetsia (talkcontribs).
The problem is that this doesn't differ much from a simple vote... I really do think we ought to restrict the solving of such disputes to the (uninvolved) administrators. Thadh (talk) 21:43, 15 October 2021 (UTC)[]
This solution introduces so many new problems that it more than counterbalances the ones it solves. I think that instead of throwing half-baked ideas at the wall and seeing what sticks, it's worth asking what you really want and how to achieve that. If what you want is to know whether you're allowed to use {{bor+}}, then I would say that you're going about it the wrong way — a Supreme Court shouldn't be making policy. —Μετάknowledgediscuss/deeds 22:14, 15 October 2021 (UTC)[]
The + templates situation would have been something an arbitration committee could have helped solve. However, it is a moot case at this point, and I wouldn't use a proposed ArbCom to continue to litigate it. For a more current issue, I'd point to the Brutal Russian versus TheNicodene complaints, even though I have no personal stake in that issue and am very unfamiliar with the fact pattern. Again, a board of well-established users voting in his favor/opposition is one possible avenue to put this issue to rest once and for all. Indeed, I think it is the best way to resolve the two above issues declaratively. Such conflict-resolution is squarely in the province of a judicial branch, whose sole purpose it is to interpret policy and settle disputes among litigants. But ultimately, I also understand the objections (though I still think the benefits outweigh the detriments), and I won't continue to pursue the creation of an arbitration committee in spite of myself. Imetsia (talk) 23:44, 15 October 2021 (UTC)[]
  • I share concerns that establishing a bureaucratic structure here with formal committees probably wouldn't help in the way proponents are hoping. I worry about the risk of "borrowing trouble", as a wiser fellow expressed to me a while back. ‑‑ Eiríkr Útlendi │Tala við mig 21:39, 13 October 2021 (UTC)[]
With the number of people actively in this community, an arbitration committee would feel like a sitcom or Alice in Wonderland trial, where there's an argument and someone puts on a wig and a fine bit of farce is had that satisfies nothing. The English Wikipedia ArbCom works in part because the Committee is not tangled up in all the issues that reach them; I can't see that happening here. Referring our issues to the English Wikipedia ArbCom might work.--Prosfilaes (talk) 23:40, 13 October 2021 (UTC)[]
I am deeply reticent to refer any EN Wiktionary concerns to the EN Wikipedia ArbCom. Our organizational cultures and norms are very different. We've had various issues arise because Wikipedia editors engage here, based on Wikipedia norms, requiring much cleanup and coordination. I can't imagine that issues referred to the WP ArbCom would be handled with any ease. ‑‑ Eiríkr Útlendi │Tala við mig 02:48, 14 October 2021 (UTC)[]
Like Prosfilaes, I don't think we have a big enough active editor base to have an Arbcom. I like the suggestion that if there's an intractable issue where neither position can get supermajority support, or it's unclear what the status quo is (since votes are structured as changes to the status quo) but we have to do something, we should have a majority vote. It isn't without issues, but...it's an idea. I don't know if Wikipedia's Arbcom would be keen to accept cases from us, since they have a workload as it is, and they (or we) also might often feel they lacked the relevant expertise to judge things like disputes over what template wordings are best for a dictionary. For intractable disputes over blocks, we could ask global sysops to weigh in. - -sche (discuss) 01:33, 14 October 2021 (UTC)[]
Global sysops are just as bad as outsourcing to Wikipedia. In my experience, they generally neither know nor care about Wiktionary, and would probably be annoyed at the very suggestion of foisting another local task on them. —Μετάknowledgediscuss/deeds 18:00, 14 October 2021 (UTC)[]


This page survived RFD, but many users pointed out the need for a cleanup. Modernization/expansion from experienced editors is welcome. (Discussion here, to be archived at Wiktionary talk:Etymology.) Ultimateria (talk) 00:02, 10 October 2021 (UTC)[]

Wording of RFD bannerEdit

I propose that we change the banner message generated by {{rfd}} as follows:

Current text:

This entry has been nominated for deletion
Please see that page for discussion and justifications. Feel free to edit this entry as normal, though do not remove the {{rfd}} until the debate has finished.

Proposed new text:

This entry has been nominated for deletion
Please see that page for discussion and justifications. While voting is in progress, please do not edit this entry in a way that may alter or make unclear the apparent intention of votes already cast. Do not remove the {{rfd}} template until the debate has finished.

What do you think? Mihia (talk) 21:02, 10 October 2021 (UTC)[]

I noticed that someone put a noun sense under the verb sense of push and shove, which seemed like a good idea but made the voting less clear. None Shall Revert (talk) 06:56, 11 October 2021 (UTC)[]
Also wiki things are not supposed to be "votes" None Shall Revert (talk) 06:58, 11 October 2021 (UTC)[]
It does happen from time to time. I have observed several cases where fundamental changes have been made to the whole basis of an entry while voting is in progress, and moreover people sometimes do not even bother to mention that they have done this at the RFD discussion. So an entry is listed at RFD, people vote "Delete" let's say, and then the entry is completely changed or rewritten, or redirected maybe, with no notice, leaving the status of the pre-existing votes totally unclear. I definitely do not agree that we should simply say "Feel free to edit this entry as normal" on the RFD banner -- it's just a question of exactly what we do say. Rather than my suggestion above, we could say "please mention any substantial changes at the RFD discussion", but this still leaves the problem of what should be done with pre-existing votes that may no longer be applicable. Mihia (talk) 08:12, 11 October 2021 (UTC)[]

Alternative suggestion (a bit more permissive):

This entry has been nominated for deletion
Please see that page for discussion and justifications. You may continue to edit this entry while the discussion proceeds, but please mention significant edits at the RFD discussion and ensure that the intention of votes already cast is not made unclear. Do not remove the {{rfd}} template until the debate has finished.

Mihia (talk) 08:22, 13 October 2021 (UTC)[]

I like the last one. Ultimateria (talk) 17:16, 13 October 2021 (UTC)[]
I like this wording better than the first proposal. - -sche (discuss) 01:35, 14 October 2021 (UTC)[]
Likewise, I support this last wording. Imetsia (talk) 17:06, 14 October 2021 (UTC)[]
OK, I have implemented the second suggestion. Mihia (talk) 17:07, 14 October 2021 (UTC)[]

Proposal for new parameter in linking templates: "alternative script"Edit

I suggest a new parameter for linking templates which will input alternative (non-lemma) script forms within parantheses. This is already partly done for Korean and Vietnamese:

But these language-specific templates are not ideal because they lack most key functions (e.g. part of speech, literal meaning, suppression of transliteration.) and cannot be integrated with other templates such as {{alter}}, {{syn}}, {{bor}}, etc.

An "alternative script" parameter would be useful for various languages:

  • In the case of Korean, especially formal or academic language, there is a very large number of Chinese-derived homophones. An example is 연기 (yeongi), whose entry currently features nine not uncommon and completely unrelated words:
연기 (演技, yeongi, “acting”), 연기 (煙氣, yeongi, “smoke”), 연기 (延期, yeongi, “postponement”), 연기 (緣起, yeongi, “dependent origination”), 연기 (年記, yeongi, “date of composition recorded on an artwork”), 연기 (年期, yeongi, “certain number of years”), etc.
A fully integrated "alternative script" parameter would allow far easier disambiguation of these. To a lesser extent, this is also true of Vietnamese.
  • Many languages are written in multiple scripts. On Wiktionary, one script is usually chosen as the lemma script, with the result that forms in the other script are neglected. For instance, the majority of Azerbaijani speakers live in Iran and primarily use the Arabic script, which has also been the script for most of Azerbaijani history. But this fact is neglected because all Azerbaijani lemmas are in the Republic's Turkish-based Latin script. The integration of an "alternative script" parameter would allow for a more equitable coverage of such languages in etymology or descendant sections, in translation charts, etc. Example:
current {{m|az|Azərbaycan}} Azərbaycan > new {{m|az|Azərbaycan|altscr=آذربایجان‎}} Azərbaycan (آذربایجان‎‎)
current {{m|ks|کٲشُر}} کٲشُر(kạ̄śur) > new {{m|ks|کٲشُر|altscr=कॉशुर}} کٲشُر‎ (कॉशुर, kạ̄śur)

Thoughts?--Tibidibi (talk) 07:11, 11 October 2021 (UTC)[]

I've found a similar need in Pali, where there are multiple scripts in use, and I anticipate a similar need for Sanskrit. The solution for Pali is documented by a full set of examples for {{pi-link}}, which generalises {{link}}. One complication there is that some Pali writing systems are ambiguous and that the Roman script is one of the major writing systems, so we end up with transliterations and Roman script equivalent sometimes having to be different. Generally we want to link to the Roman script equivalent, but sometimes it is not easily available, e.g. in inflection tables, which commonly link to the entries in the tables. Sanskrit has a similar but different complication. The Bengali script writing system is ambiguous, and Devanagari is the 'lemma' script. (Don't like the term, as we treat the equivalents in the other scripts as alternative forms, thus also lemmas.) For Pali I've built specialised forms of some linking templates on the standard templates, such as {{pi-alternative form of}} on {{alternative form of}}. I've independently encoded {{pi-nr-inflection of}}, which I ought to convert to build on the standard template using common generalisation logic. --RichardW57 (talk) 12:11, 11 October 2021 (UTC)[]
Note that my scheme treats the form in the alternative script as the primary input. --RichardW57 (talk) 12:11, 11 October 2021 (UTC)[]
Korean is an unusual case, where the hidden parameter to the conversion is meaning rather than pronunciation. --RichardW57 (talk) 12:11, 11 October 2021 (UTC)[]
@Tibidibi: It's a yes for me. Maybe with the possibility of adding a description before the alternative script, like they do in Serbo-Croatian entries (for example: dom#Noun_28). Sartma (talk) 08:27, 12 October 2021 (UTC)[]

Splitting Hebrew roots?Edit

There are a bunch of homonymous Hebrew roots that mean completely different things but just so happen to look the same and there doesn't seem to be a way to distinguish between them. חילוני, התחיל וחלל don't really share a root, right?.--The cool numel (talk) 08:47, 12 October 2021 (UTC)[]

I don’t see how the root of חילוני(khiloni, secular) can be ח־ל־ל‎, while that of חילון(khilún, secularization) is ח־ל־ן‎‎. I guess this is a typo. If we had pages for these roots, we could document several unrelated meanings like we do for other homonymous terms, such as fluke.  --Lambiam 04:30, 13 October 2021 (UTC)[]
@Lambian: I'm pretty sure the root of חילון‎ is ח־ל־ן‎‎, as it's derived from חילוני‎ which is in turn just the root ח־ל־ל‎ with the pattern קִטְלוֹנִי (like צבעוני). The thing I'm talking about is splitting categories like Category:Hebrew terms belonging to the root ח־ל־ן by meaning. --The cool numel (talk) 09:57, 13 October 2021 (UTC)[]
So I take it then the root is the inflectional root, not the etymological root. Doesn’t that make splitting categories by meaning much less interesting? IMO such splitting would best be done by creating subcategories of homonymous roots according to their different core senses, but deciding what these core senses are and recategorizing terms with homonymous roots accordingly will mean a lot of work for a very small bunch of active Hebrew editors.  --Lambiam 11:44, 13 October 2021 (UTC)[]

Adding DRAE links to all Spanish lemmasEdit

There are currently ~18,500 lemmas with links to DRAE. There are an additional ~27,000 Spanish lemmas that do not currently have a DRAE link but do have a corresponding DRAE entry.

I can run a bot to add a "Further reading" category with a link to {{R:DRAE}} to the entries missing DRAE links. Would this be desirable or just annoying clutter? JeffDoozan (talk) 17:02, 13 October 2021 (UTC)[]

If you can match the entries accurately I don't see why it would be a problem. I routinely add them manually. – Jberkel 17:10, 13 October 2021 (UTC)[]
Huh, I expected more pages to have an entry. I think it's helpful! As I expand Spanish entries I could use it to filter out a set of "core" Spanish words to work on. Ultimateria (talk) 17:14, 13 October 2021 (UTC)[]
Only if the bot checks that the target of the link is a real definition. Today I saw several French entries where people added {{R:TLFi}} but the web site has no definition. Vox Sciurorum (talk) 18:26, 13 October 2021 (UTC)[]
Yes, it does. JeffDoozan (talk) 18:39, 13 October 2021 (UTC)[]
Did the bot run on all forms? I added one earlier manually: Special:Diff/62116035/64262193 – Jberkel 19:47, 17 October 2021 (UTC)[]
Also, could you adapt it to work with {{R:TLFi}}? – Jberkel 08:24, 18 October 2021 (UTC)[]
The bot did not run on all forms, only on pages with entries containing a lemma. The page you edited was skipped because it previously contained only a verb form. If anyone is interested, I could generate a list of pages where the DRAE has a lemma but we have only a form.
I'll see what I can do with {{R:TLFi}} but I can't promise anything. JeffDoozan (talk) 15:19, 18 October 2021 (UTC)[]
Yes, such a list would be useful, thanks! Especially the adjectives often exist only as participles, presumably autogenerated at some point. – Jberkel 20:53, 19 October 2021 (UTC)[]
Here's a list of the 2,935 pages where we have Spanish forms that have corresponding DRAE lemmata. JeffDoozan (talk) 18:39, 20 October 2021 (UTC)[]
This is very much appreciated and I whole-heartedly endorse this. I work on Spanish here and I add this to all entries I make. —Justin (koavf)TCM 06:23, 27 October 2021 (UTC)[]

The phrasebook is in dire need of rules.Edit

(Not referring to the CFI, that's another topic.) Coming from languages that are both gendered and have polite forms, the translation boxes in most phrasebook entries are a mess. It's completely random whether:

  • ...only the polite, only the familiar or both versions are present.
  • ...these polite/familiar forms are qualified as such, whether this qualification comes before or after the entry and whether this qualification is called polite/familiar or formal/informal.
  • ...plural phrases are present.
  • ...all these forms are consistently present both in their male as well as their female forms (if applicable) and how those forms are annotated.
  • ...what the order of all these forms is.

My suggestions:

  • Decide whether to call it polite/familiar or formal/informal and then apply this consistently. See the inconsistencies in are you allergic to any medications
  • Split the translation box into two distinct ones in most articles (where applicable), one for familiar, one for polite forms. Languages that don't have this feature could either be automatically completed using a bot that copies over entries between the boxes or alternatively they could be barred from one of the boxes (maybe by introducing a new {{trans-top}} that only accepts languages with politeness distinctions).
    • If the above point doesn't happen, at least define a consistent scheme. Should the qualifier come before or after? Should entries without qualifiers in languages with politeness distinctions be allowed? What should come first?
  • Disallow plural translations.
  • Decide whether gender should be expressed using the gender parameter of {{t}} or using {{qualifier}}, then apply this consistently. See the inconsistencies between e.g. are you religious and are you single.

--Fytcha (talk) 02:36, 14 October 2021 (UTC)[]

I agree with all of this. But it's worth noting that in many languages, politeness and formality are not the same thing. In Korean, you can be politely informal and non-politely formal. Tibidibi (talk) 04:40, 14 October 2021 (UTC)[]
In that case, as I don't think it is within the scope of a phrasebook to give impolite phrases (except perhaps for phrases that are explicitly/obviously impolite), I would suggest that we stick with formal and informal and avoid any distinctions between politeness and impoliteness. Andrew Sheedy (talk) 05:53, 14 October 2021 (UTC)[]
I also agree with all the above, with the caveat that some languages, like Korean, have both a formal/informal and polite/familiar distinction. As you say, we can choose the most relevant one (I would probably keep polite/familiar for Korean too, since formal/informal is a distinction more pertinent to more restricted scenarios, but I guess Korean editors will make the call on that. Just a note: non-polite means "familiar" and doesn't mean impolite.). Sartma (talk) 09:12, 14 October 2021 (UTC)[]

Major opportunity for us to step in for word of the yearEdit

Heads up that OED are slipping. It's our time to strike. —Justin (koavf)TCM 16:36, 14 October 2021 (UTC)[]

"...observing that 'worms are all over the place' and 'everybody loves a good worm.' Well, I'm sold. Ultimateria (talk) 16:52, 14 October 2021 (UTC)[]
In a way it would be funnier with the computing sense of worm (something like a virus), since I can imagine somebody really out of touch thinking this was a "new" hi-tech word of the 21st century! Equinox 10:13, 15 October 2021 (UTC)[]

New SOP policy ideaEdit

I propose adding a new SOP test at WT:Idioms that survived RFD. It would have a caption like "Terms whose parts are substitutable, but with which only a few variations greatly predominate. For instance, the word "air" in air resistance can be switched out for "wind," "snow," "water," "fluid," and others; but "air resistance" is the only widely used and attested form." (A better writer could improve some of the wording). Accordingly, I would name the test WT:AIR RESISTANCE/WT:AIR, although there are probably other entries to which this logic has been applied in RFD discussions. (Talk:idle threat comes to mind). There are also the ongoing discussion about rumor has it and puré de batata.

As a community, this is a justification that has previously won the day, so it makes sense to codify it. In addition, all of our SOP policies are essentially advisory and open to great interpretation (there are no bright-line rules), and I don't think this test would depart from that tradition. Lastly, this policy would finally bring us one step closer to a more fleshed-out approach to handling set phrases and common collocations. Thoughts? Imetsia (talk) 17:36, 14 October 2021 (UTC)[]

Your idea sounds great, I like it. The reason why I'd advocate for the inclusion of articles such as air resistance isn't because they're so indecipherable (let's be honest, you really can guess what it means based on the parts) but because:
  • It is the canonical collocation to express this idea. There might be other SOPs that convey the same meaning but this one is the one that's actually used.
  • The article serves many other purposes other than just explaining the idea, such as providing translations, coordinate terms, hyponyms etc.
Your proposal shifts the focus of SOP discussions a bit away from the question "Can its meaning be guessed based on the parts?" to "Is it the principal (i.e. most widespread) collocation to express this concept?", which is a change I welcome with open arms. Fytcha (talk) 18:06, 14 October 2021 (UTC)[]
Support. This seems like a good idea. We need some way of including collocations and fixed expressions, anyway. Andrew Sheedy (talk) 19:36, 14 October 2021 (UTC)[]
I now agree that we should have a firm basis for including entries for strong set phrases -- combinations that are explicable as SoP, but in practice overwhelmingly predominate over other possible ways of saying the same thing by word substitution of synonyms (however we can best define this idea). While we are looking at this policy area, I also believe that we should have a firm basis for including SoP phrases that are particularly hard to understand from the parts if one does not already know which of many possible meanings to combine together -- another argument that is often made at RFD. Mihia (talk) 21:36, 14 October 2021 (UTC)[]
On the second suggestion, I think we'd have to firmly pin down whether there is enough of a multitude of "possible meanings to combine together" for a term to not be SOP. This seems quite hard to establish clearly through policy. Talk:amico per convenienza comes to mind. On the first try, it passed RFD because of just this justification, though the vote was later overturned. (To me, the argument that it was SOP was a slam dunk, and it shocked me that so many users initially disagreed). So I do not disagree with the idea in principle, but we would have to adjust the dials just right to ensure we are neither over- or under-inclusive. Is there really an administrable standard we can come up with to achieve just this result? Imetsia (talk) 22:01, 14 October 2021 (UTC)[]
I think both ideas are equally hard to precisely codify because there will always be an element of subjectivity. I think we just have to accept this, and establish the broad policy and let borderline or argued cases go to RFD. I think that examples of phrases that have passed RFD on the stated grounds, as we have done with other cases, are very helpful. FTR, a recent one that was undeleted on the second ground is track meet. Mihia (talk) 10:08, 15 October 2021 (UTC)[]
I oppose including common colocations because they are common colocations. We can use {{ux}} to illustrate the more common uses. Vox Sciurorum (talk) 23:30, 14 October 2021 (UTC)[]
I oppose the subjectivity of the idea. Although “idiomaticity” is close friends with commonness.
The real question should be technical utility, with cross-language perspectives (which most who want to have a say on a term don’t have, naturally since our language knowledges are limited by our origins in particular language communities). And it wasn’t even about the utility of the term alone in the case of air resistance, but people apparently wanted it as a model for other types of resistance (so we do not have to create them but look in this entry how to construct them, very remarkable). But you are unable to form a reasonable rule or guideline from this example. A particulari ad universale non valet consequentia. (Case law is bad and a meaningless Anglo-fetish.) Fay Freak (talk) 00:37, 15 October 2021 (UTC)[]
The guideline I've formulated seems quite reasonable to me. Why do you disagree? It's readily administrable and provides a good general principle that can be applied not mechanically, but by using sound judgment and discretion. Just like every other example on WT:Idioms that survived RFD, this is not a hard and fast rule, and it includes an element of subjectivity. Editors constantly disagree about the application of SOP policies; some are more permissive on the issue of term inclusion, and others are more conservative. This is not an exception to that rule. It fits in perfectly with every other advisory rule we've ever put forward about idiomacity and SOP-ness. Imetsia (talk) 15:55, 15 October 2021 (UTC)[]
I don't support individual entries for mere common collocations. I think we can find a conceptual division, albeit slightly grey and subjective, between common collocation and strong set phrase. Mihia (talk) 10:12, 15 October 2021 (UTC)[]
The idea may have merit if we can formulate a solid objective criterion, but I cannot resist pointing out that air resistance is a poor example. The term denotes a physical force, expressible in the unit newton. In general, designers try to minimize air resistance. The term wind resistance as commonly used (pace M–W) is an entirely different species, the ability to stand up to wind damage,[8] a highly desirable property (except for the sets of disaster flicks such as Twister).  --Lambiam 10:18, 15 October 2021 (UTC) Addition: In English the first component of such a compound can be the subject or the object of the action. In French you can see the distinction in the preposition used: résistance de l'air versus résistance au vent.  --Lambiam 10:39, 15 October 2021 (UTC)[]
As solid and objective a criterion as possible, yes, but it will never be mechanically objective, such that anyone can apply a rule and will always come up with the same answer. If we had only mechanically objective CFI criteria then we would never need RFD discussions. Mihia (talk) 13:14, 15 October 2021 (UTC)[]
I agree with Mihia's comment right above. In addition, is air resistance really as poor an example as you argue? M-W, as you point out, has a definition much more in line with that of "air resistance." Even if you claim it's not the most used meaning, you must accept that it is a meaning. And what for the other substitutes like "snow," "water," and "fluid?" (I haven't checked these on my own, but maybe you can make a case for your position based on these). Imetsia (talk) 15:55, 15 October 2021 (UTC)[]
Fluid resistance” is a more general term than “air resistance”. It is the resistance experienced by a body in motion, relative to a surrounding fluid. Usually the fluid is air, but when something else, the term “air resistance” is not appropriate. “Snow resistance”, “water resistance” and “wind resistance” generally refer to the ability to resist, or protect against, the intrusion or harmful effects of said phenomena or substances; having good wind resistance means the same as being windproof.  --Lambiam 10:44, 16 October 2021 (UTC)[]
@Lambiam: OK, I agree now that snow and water resistance do not fall under the same family of meanings as "air resistance." But I don't agree when it comes to wind resistance. "Wind resistance" definitely does have a similar meaning which is used quite commonly ([9], [10], [11] just for starters). According to the wiki article you linked, there's also "wave resistance," under the same family of meaning. So what would you think about the proposed policy if we switched "wind, snow, water, and fluid" with simply "wind, fluid, and wave?" Imetsia (talk) 18:29, 16 October 2021 (UTC)[]
Can I just point out that I think that people here are talking about potentially two different things. This first is whether words with different meanings can be substituted to create a parallel phrase, for example "air resistance" changed to "wave resistance", and the second is whether synonyms can be substituted to produce an equally idiomatic way of saying the same thing, e.g. per Fytcha's comment "It is the canonical collocation to express this idea. There might be other SOPs that convey the same meaning but this one is the one that's actually used." (my emphasis). Mihia (talk) 19:34, 16 October 2021 (UTC)[]
The aim of my comment regarding the example “air resistance” was to point out that it is not a felicitous example to illustrate the proposed test, and equally infelicitous to serve as its name. A better example might be the collocation disaster preparedness; while its synonyms catastrophe preparedness and disaster readiness have been used, it is clearly[12] the winner of the “canonical collocation” (con)test.  --Lambiam 20:24, 16 October 2021 (UTC)[]
I wonder whether we can come up with something a bit punchier than "disaster preparedness". What about "human rights"? Mihia (talk) 21:30, 16 October 2021 (UTC)[]
Would that be an example that fits your second criterion (synonym substitutability) but not the first (parallel phrases)? For synonym substitutability, I'm guessing we have, e.g., "people rights," "mankind rights," and similar. But are there any parallel phrases? The only ones I can think of are either [ADJ]+rights or [possessive]+rights, which don't really fit. Imetsia (talk) 23:40, 16 October 2021 (UTC)[]
"human rights" is supposed to be an example of something that would pass the first test, the clear predominance of one way to say something over other candidates involving word substitutions, such as those you mention. A parallel phrase would be animal rights. Despite our defining this as, essentially, the rights of animals, it again passes the first test because of its overwhelming predominance over e.g. "creature entitlements" or whatever. "human rights" and "animal rights" are examples of what I would call strong set phrases explicable as SoP. Mihia (talk) 08:15, 17 October 2021 (UTC)[]
Could we then just have two tests, one for parallel phrases and the other for synonym substitutability? "Air resistance" seems like a good candidate for the parallel-phrases test, while "human rights" passes the synonym-substitutability test. (The parallel phrase you mention is one for which we have an entry, so I don't think it's a great example -- isn't the idea to list parallel phrases that wouldn't be entryworthy, thus showing that the original term is in fact a set phrase?) Honestly, "air resistance" has the synonym, as argued above, of "wind resistance." So it could also work for the second case. But if we really want a more shining example of synonym substitutability, I suppose we could just include both of the tests. What do you think? Imetsia (talk) 16:18, 17 October 2021 (UTC)[]
My desire would be a rule to explicitly allow strong set phrases / fixed expressions even if explicable as SoP (and also, on a separate point, a rule to explicitly allow SoP combinations that are particularly hard to understand from the parts). Of course, the problem is how best to define "set phrase" or "fixed expression" (or, at least, the sort that we would want to include). The idea that "It is the canonical collocation to express this idea", aka (more or less) non-synonym-substitutable, will apply in some cases, perhaps not all. I am less clear how much additional help the "no parallel phrases" test would be. My suggestion, if we want a basis to include entries that we presently wouldn't, or that would presently be of unclear eligibility, is to compile as big a list of these entries as possible, so that we can check whether the proposed test(s) are adequate (and at the same time verify that the tests do not allow entries that we would not want to include, of course). A good source of these entries would probably be previous RFD discussions. Another possibility would be simply to say that we "allow strong set phrases or fixed expressions even if SoP" and, where disputed, let it be debated case-by-case what these are. Mihia (talk) 17:53, 17 October 2021 (UTC)[]
@Mihia: Could you lay out the full text of your proposed WT:HUMAN RIGHTS test? I'd like to start an informal vote below about both of them (since the discussion seems to have stalled at this point), so I'd like the full text. Imetsia (talk) 20:44, 19 October 2021 (UTC)[]
@Imetsia: Actually, of the two I would probably in the end choose WT:ANIMAL RIGHTS as perhaps slightly less susceptible to objections that it is not wholly explicable as SoP. Unfortunately I don't have a full proposal at the moment except the one that I mentioned, namely "allow strong set phrases or fixed expressions even if explicable as sum-of-parts", which I think may not fly as people may reasonably ask "how do we tell what is a strong set phrase or fixed expression"? That is the difficult part. I am of the opinion, as I alluded to above, that before making a concrete votable proposal the wording should be tested against as many actual examples as possible, which might be obtained from the imagination or from failed (or even passed) RFD candidates, to ensure not only that desired phrases pass but also that undesired ones fail. Compiling such a list is something that I have had on an "eventual to do" list, but haven't got round to yet. Mihia (talk) 16:31, 20 October 2021 (UTC)[]
  • I have a comment about the practical implementation of this idea, if it should go ahead. At WT:CFI it says "An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components [...] See Wiktionary:Idioms that survived RFD for other examples." We cannot therefore just plonk a "set phrase" test at Wiktionary:Idioms that survived RFD, as initially suggested, since quite likely the meaning of a set phrase can be easily derived from the meaning of its separate components. Mihia (talk) 17:27, 15 October 2021 (UTC)[]
In fact, the same could be said about some other tests at Wiktionary:Idioms that survived RFD, such as the "tennis player" test. It seems that this problem is a pre-existing slight muddle of the wording in these sections. Mihia (talk) 17:35, 15 October 2021 (UTC)[]
Support on my end. AG202 (talk) 21:47, 15 October 2021 (UTC)[]
Some dictionaries include a separate section of common collocations involving some term in their entry for that term. For examples, see the online Cambridge Dictionary and Collins. I think this would be a good alternative for us too.  --Lambiam 11:43, 16 October 2021 (UTC)[]
A separate section would be counterproductive. Using {{ux}} does the job better; remember that there’s the |t= parameter!— thus: {{uxi|en|sick burn|t=a particularly cutting insult}}. ·~ dictátor·mundꟾ 23:08, 16 October 2021 (UTC)[]
It is not clear to me why you expect this to be counterproductive. It seems a better alternative, at least to me, than introducing another vague exception to the non-SOP rule. The |t= parameter is explicitly intended for English translations of usage examples on foreign entries, not for glossing. While one or two {{ux}}es – which per policy should be be grammatically complete sentences – will generally suffice to demonstrate usage of a term, I can easily imagine a handful of associated common collocations.  --Lambiam 10:16, 17 October 2021 (UTC)[]
@Lambiam: The parameter t= is used to translate Early Modern English and dialectal English quotes whose mutual intelligibility with Modern Standard English is low. Youth slang and other suchlike jargons also do depart from Modern Standard English, and hence have I no qualms about any misue of the parameter. Sociolinguistically, any non-Standard variety is ‘foreign’, or else I would not have been blocked for using non-Standard English to write definitions. ·~ dictátor·mundꟾ 15:04, 17 October 2021 (UTC)[]
Out of interest, how would you define the test, or criterion, that would allow us to keep "nature[-]lover" against the arguments that it is SoP? Mihia (talk) 08:40, 17 October 2021 (UTC)[]
nature lover is a collocation, unlike SoPs like wine lover and nature person. ·~ dictátor·mundꟾ 15:04, 17 October 2021 (UTC)[]
I fear that "is a collocation" will be far too permissive for our purposes. Mihia (talk) 17:24, 17 October 2021 (UTC)[]
What if I told you that idiomaticity is exclusively essentiated by comparative grounds? Like there is name names, this is easily parsed as the sum of its parts, but if you look at its German translation Ross und Reiter nennen you are soothed. An ἰδίωμα (idíōma) is there by its being ἴδιος (ídios) in contradistinction to other, more hands down ἰδιώματα (idiṓmata) (there is no peculiarity without a general mass to other from). Because judging by commonness within a community runs into the sorites paradox, too obviously and frequently. “Collocation” is just a rephrasing of the same commonness idea. Fay Freak (talk) 19:27, 17 October 2021 (UTC)[]
  • Support per dictátor·mundꟾ. Collocations should be allowed where the collocation itself is substantially more likely to be used than any substitution, where the collocation is a commonly used rhyming pair of terms or alliterative pair of terms, or where at least one term in the collocation has multiple common meanings, but the usage in the collocation overwhelmingly intends one of those meanings (particularly where it is not the most common meaning of the term). bd2412 T 02:04, 18 October 2021 (UTC)[]
Without any further stipulation, the test "where the collocation itself is substantially more likely to be used than any substitution" would apparently allow "white cat" as substantially more likely than e.g. "ivory feline", or "chair leg" as substantially more likely than e.g. "stool limb", while I personally would not want to include either of those. I'm sure examples such as these abound. This is why we need to carefully check exactly what the letter of the rule would and would not allow. Mihia (talk) 21:00, 20 October 2021 (UTC)[]

Voting to elect members to the Movement Charter drafting committee is now open (October 12 - 24)Edit

Voting to elect members to the Movement Charter drafting committee is now open. In total, 70 Wikimedians are running for 7 seats in these elections.

Voting is open from October 12 to October 24, 2021.

We are piloting a voting advice application for this election. It helps show which candidates hold positions similar to the choices entered.

According to the set up process, the committee will initially consist of 15 members in total. 7 members elected in this process, 6 members selected by Wikimedia affiliates, and 2 members appointed by the Wikimedia Foundation. Up to 3 additional members may be appointed by the committee, and steps may be taken to replace members as needed.

More details and the voting link is on Meta.

Please feel free to let me know if you have any questions about this process.

Xeno (WMF) (talk) 01:47, 15 October 2021 (UTC) (Movement Strategy & Governance Team, Wikimedia Foundation)[]

(Disclosure: I'm a candidate.) The election closes in 17 hours (at 12:00 UTC). The referenced Charter to be drafted is basically going to be a constitution for Wikimedia, binding on all our supporting organizations, and outlining the formation of new governance structures. --Yair rand (talk) 19:13, 24 October 2021 (UTC)[]

Wiktionary:Votes/2021-10/Standardising wording for showing cognatesEdit

I recently created this vote, for consistency and standardisation. Looking for feedback, concerns, comments, etc. Svartava2 (talk) 16:49, 16 October 2021 (UTC)[]

The nuisance of a lot of edits in my watch list may exceed the small benefit. Other than that, I understand the proposal to be replacing all instances of the five strings before {{cog}} with a single one of them, and leaving all other uses of {{cog}} alone. Thus typos like "cognate witth" and alternate wording like "from the same origin as ..." would be untouched. I suggest leaving "include", "with", and "compare" alone and replacing "to" and "of" with "with". Which is not on the list of options. Note that include implies additional unlisted cognates. It is not correct for a bot to replace include with anything else. Vox Sciurorum (talk) 17:02, 16 October 2021 (UTC)[]
@Svartava2, a better formulation of the vote might just be to standardize cognates like this: (1) require "Cognate with" for full cognates, (2) allow "Cognates include" when multiple cognates exist, and (3) allow "Compare" for "non-full cognates" (i.e., per Richard, terms that "are semantically similar or etymologically related"). If the vote were so phrased, it would have the typical consensus-building problems of any omnibus vote. Different users would like and dislike different parts of the proposal, and few will embrace it in full, leading then to a mixed opposition that ultimately tanks the vote. To avoid this, I do agree with Vox's solution above: a simple vote to replace "'to' and 'of' with 'with.'" Imetsia (talk) 15:59, 17 October 2021 (UTC)[]
@Imetsia: I don't understand (1) above. It contradicts (2). 'Compare' is appropriate for when the relationship is unclear or a parallel formation can be seen. --RichardW57 (talk) 16:46, 17 October 2021 (UTC)[]
OK, let me rephrase it: (1) require "Cognate with" when only one full cognate exists, (2) allow "Cognates include" when there are multiple, and (3) allow "Compare" "when the relationship is unclear or a parallel formation can be seen." I also like your wording better, so thanks for the suggestion! Imetsia (talk) 17:12, 17 October 2021 (UTC)[]
@Imetsia: Would you allow 'cognate with' if there were descendants of the cognate given? --RichardW57 (talk) 19:18, 17 October 2021 (UTC)[]
@RichardW57: I don't really understand your question. Maybe an example could help? Imetsia (talk) 19:22, 17 October 2021 (UTC)[]
@Imetsia: Suppose all that we knew of the cognates of Greek θεός were Latin fānum and the latter's English borrowing fane. Would you allow us to describe the Greek word as 'Cognate with Latin fānum', or would we have to write 'Cognates include Latin fānum'?
I would allow "Cognate with." So I guess even my revised suggestion is too imprecise. I mean to say that one must use "Cognate with" rather than of or to if they wish only to include one cognate (even if others may exist). If, however, one wants to include multiple cognates they can use "Cognate with X and Y" if X and Y are the only cognates that exist; or use "Cognates include X and Y" if X and Y are only two of the many cognates that exist. Imetsia (talk) 20:07, 17 October 2021 (UTC)[]
Scratch that. I don't see the reason to make that distinction. I guess we could just deprecate "Cognates include." Imetsia (talk) 20:09, 17 October 2021 (UTC)[]
@Imetsia: 'Cognates include' does suggest that there is no need to list them all. --RichardW57 (talk) 21:45, 17 October 2021 (UTC)[]
Sure, but there isn't really an urgent need for that to be suggested. It's already understood that we shouldn't list out every single cognate in every case. And simply using "Cognates with" neither implies that the list is exhaustive nor that it's only a subset of the possible cognates. It implies nothing in this respect. Imetsia (talk) 22:48, 17 October 2021 (UTC)[]
  • Different phrasings can mean different things: When I say "Cognates include", I imply there are more cognates. When I say "Cognate with", I imply that there aren't, or that those aren't known. When I say "Compare" I usually mean that the words aren't full cognates, but are either semantically similar or otherwise etymologically related. I would like to keep this freedom to choose the most accurate and nuanced wording. Thadh (talk) 22:44, 16 October 2021 (UTC)[]
    @Vox Sciurorum, Thadh: "Compare" is often used just before {{cog}}, for true cognates also sometimes (eg. ਗੁਝਾ). My understanding is that "cognate with" or "cognates include" doesn't really imply that only that many cognates are there unless there is an "and". For example, Cognate with LANG term, LANG2 term2, LANG3, term3 or Cognate with LANG term, LANG2 term2, LANG3, term3 and Cognates include LANG term, LANG2 term2, and LANG3, term3 or Cognates include LANG term, LANG2 term2, and LANG3, term3. Another example - Marathi थुंकणे (thuṅkṇe); it says "cognate with" but doesn't include the Urdu cognate given at 𑀣𑀼𑀓𑁆𑀓𑀇. Svartava2 (talk) 03:45, 17 October 2021 (UTC)[]
    Everybody has their own stylistic choices, and I respect their choice even if I wouldn't necessarily make it myself. And AFAIK {{cog}} may be used for partial cognates, like Saterland Frisian Bäidenstied and German Kinderzeit (only the second part of the compound is etymologically related), but maybe I misunderstand when {{cog}} must be used? Thadh (talk) 09:05, 17 October 2021 (UTC)[]
    'Cognates include' would be odd for a complete list, even if it doesn't preclude it. While I appreciate that there is now push back against 'Wiktionary is not a paper dictionary', as space on mobile phones is limited, 'cognate with' also invites padding with a complete list of cognates, or a complete list of cognates of a particular type. Note that we use 'cognate' in a wider sense than some other dictionaries, by including words related by borrowing. --RichardW57 (talk) 14:08, 17 October 2021 (UTC)[]
    @Thadh I believe {{cog}} is only to be used when a single term in the source language; like Pali sarīra, Prakrit 𑀲𑀭𑀻𑀭 both from Sanskrit शरीर. As for Saterland Frisian Bäidenstied, {{noncog}} would be more appropriate; using {{cog}} there is like {{cog}} for Prakrit 𑀅𑀡𑀼𑀕𑀘𑁆𑀙𑀇 (aṇugacchaï) and Pali avagacchati (where the prefix is different and w/o prefix Prakrit 𑀕𑀘𑁆𑀙𑀇 (gacchaï) is true cognate of Pali gacchati). Svartava2 (talk) 13:28, 17 October 2021 (UTC)[]
    @Svartava2: {{noncog}} is a bit strong for partially cognate words; plain {{mention}} would be better. --RichardW57 (talk) 16:46, 17 October 2021 (UTC)[]
    {{m}} doesn't load a hyperlinked language name, though. But I do think we may want to be more lax with the usage of {{cog}}, because using {{ncog}} for false cognates is not uncommon. Thadh (talk) 17:08, 17 October 2021 (UTC)[]
    @Thadh: {{m+}} does. —⁠This unsigned comment was added by RichardW57 (talkcontribs).
    @RichardW57: It doesn't link the language name. Thadh (talk) 19:45, 17 October 2021 (UTC)[]
    @RichardW57 There is always a chance of any cognate list being incomplete; but do we always use "cognates include […]"? Do you really think that the 42,100 (approx.) uses of "Cognate with" are 100% complete with not even a single cognate missing? No, I don't think so. "Cognate with" ≠ "Cognate [only] with". Re: "we use 'cognate' in a wider sense" - some editors do, while some don't. I don't. In case of a borrowing, I prefer showing other (borrowed) words in other languages from the same etymon as "Compare {{ncog|LANG|term}}", for example, diff (initially which said "cognate" added by Kutchkutch). Svartava2 (talk) 15:41, 17 October 2021 (UTC)[]
    'Cognates include' tells the user and other readers that the list is not intended to be complete. (Quoting from a dozen Zhuang dialects does not seem not useful, unless we're looking at a recent borrowing.) 'Cognate with' does not reveal the author's intention. --RichardW57 (talk) 16:46, 17 October 2021 (UTC)[]
  • There appear to be grammatical issues to handle in any automated processing. While it seems safe to replace 'Cognate to' with the classier 'Cognate with', merging of 'cognate of' runs the problem that 'cognate' here is a noun. 'Cognates include' actually includes a verb, and there may therefore be grammatical issues as well as a loss of connotation. Would the bot know not to change quotations? --RichardW57 (talk) 14:08, 17 October 2021 (UTC)[]
    well yes you're right, that would also need some attention. I think we could deal with this with the help of some list like user:benwing2/pra-sc. Svartava2 (talk) 16:00, 17 October 2021 (UTC)[]
  • Someone please delete this shitty, nonsensical vote! It does not help us in any wise. (@Metaknowledge) ·~ dictátor·mundꟾ 15:31, 17 October 2021 (UTC)[]
    It isn't nonsensical; per Imetsia: “A discussion on whether to incorporate some other text in the {{cog}} template by default is a better place to start.” The vote may be hurried a bit, so I removed its starting date (for now); let's do some more discussion regarding this. Svartava2 (talk) 15:53, 17 October 2021 (UTC)[]

Bot to generate Spanish formsEdit

I'm playing around with a bot to generate Spanish forms and I wanted to solicit some feedback concerning the "best" way to declare a form of. To start with, it'll just be generating forms of nouns and adjectives.

Below is a list of the templates/paramaters I would propose for the given situations. Given that these will be bot generated, I'm preferring templates that may generate the most helpful categories or other meta data without regard for how unwieldy their parameters may be.

Plural of a masculine/feminine adjective (verde -> verdes)

head: {{head|es|adjective form|g=m-p|g2=f-p}}

gloss: {{adj form of|es|verde||p}} -> plural of verde

Masculine plural of adjective (rojo -> rojos)

head: {{head|es|adjective form|g=m-p}}

gloss: {{adj form of|es|rojo||m|p}} -> masculine plural of rojo

Feminine of adjective (rojo - > roja)

head: {{head|es|adjective form|g=f}}

gloss: {{adj form of|es|rojo||f}} -> feminine of rojo

Feminine plural adjective (rojo -> rojas)

head: {{head|es|adjective form|g=f-p}}

gloss: {{adj form of|es|rojo||f|p}} -> feminine plural of rojo

Plural of a masculine/feminine noun (dentista -> dentistas)

head: {{head|es|noun form|g=m-p|g2=f-p}}

gloss: {{noun form of|es|dentista||p}} -> plural of dentista

Plural of a masculine noun (doctor -> doctores)

head: {{head|es|noun form|g=m-p}}

gloss: {{noun form of|es|doctor||p}} -> plural of doctor

Feminine equivalent of a masculine noun (doctor -> doctora)

head: {{es-noun|f}}

gloss: {{female equivalent of|es|doctor}} -> female equivalent of doctor

Plural of a feminine equivalent of a masculine noun (doctora -> doctoras)

head: {{head|es|noun form|g=f-p}}

gloss: {{noun form of|es|doctora||p}} -> plural of doctora

Plural of a masculine noun (naranjo -> naranjos) head: {{head|es|noun form|g=m-p}}

gloss: {{noun form of|es|mesa||p}} -> plural of mesa

Plural of a feminine noun (manzana -> manzanas) head: {{head|es|noun form|g=f-p}}

gloss: {{noun form of|es|mesa||p}} plural of mesa

Are there other cases I should consider or anything else anyone would like to see in a bot generated form entry (etymology, IPA, etc)?

Note: Some of the default head/gloss lines have been edited to reflect the suggestions below. I decided to keep the gender/plural declarations in the headword definition because most entries already have them and they would be difficult to add later but easy to remove.

JeffDoozan (talk) 22:08, 16 October 2021 (UTC)[]

Was there a decision on whether {{es-IPA}} is safe enough to add to all entries? If so, the bot should add pronunciation. On the bigger issue, I don't think all Spanish forms need their own pages. The list should be made by a human including common words but not rare words. Vox Sciurorum (talk) 22:47, 16 October 2021 (UTC)[]
Don't worry, I'm not going off on an anti-red-link campaign. I think there are tasks that are better suited to a bot than a human and generating forms seems like one of them. I'm open to input on how to apply this: perhaps only generating forms for lemmas that are DRAE attested that don't contain an obsolete/disused/antiquated qualifier. Additionally, the bot could monitor a page where humans could add lemmas that they deem form-worthy and the bot can save them the labor of creating them manually. JeffDoozan (talk) 18:04, 17 October 2021 (UTC)[]
@Vox Sciurorum: I asked this question recently and unfortunately, es-IPA is not 100% foolproof yet. I can get a citation if you need. —Justin (koavf)TCM 06:53, 27 October 2021 (UTC)[]
See what the convention is for Portuguese. One may not be better than the other, but two similar languages that often have identically spelled cognates should use the same wording. Vox Sciurorum (talk) 13:01, 17 October 2021 (UTC)[]
I've seen all of the variations that I posted above without an obvious consensus, so I thought it would be good to brainstorm to see if there are any nuances I've missed. JeffDoozan (talk) 18:04, 17 October 2021 (UTC)[]
  • No preference on templates for verde et al. I don't think e.g. "masculine plural of hombre" makes sense; I'd say don't mention the gender of nouns in the definition line except for female equivalents. doctora isn't exactly a noun form; we currently treat female equivalents as lemmas (as with alternative forms), and I agree with that format. On verdes et al, I'm weakly against including the gender/number in the headword line.
  • Something I would love to see you do with this bot is find instances of bluelinks with missing parts of speech. E.g. if a page has an adjective and a masculine noun, the plural of the noun will often exist while omitting the adjective form of the same spelling. It happens a lot with verb forms and nouns in -a, -e, -o.
  • As for es-IPA, I think it's ready for anything but modern borrowings and especially long words. I'm not 100% sure when secondary stress is used, but I suspect it's present with multisyllabic prefixes, compounds, and some long words. BTW, thanks for adding the DRAE links! Ultimateria (talk) 17:11, 17 October 2021 (UTC)[]
    Thank you for the feedback, especially regarding doctora, which I've adjusted above.
    Finding missing parts of speech in bluelinks is one of the motivations for writing this, as even frequently used forms can go unnoticed for a long time, like the missing adjective form alegres, which this bot will handle easily. Another part is detecting orphaned forms that reference lemmas that have been removed. JeffDoozan (talk) 18:04, 17 October 2021 (UTC)[]
    the stress on -mente adverbs isn't coded into {{es-IPA}} - the stress on normalmente, for example, goes on the "mal" syllable, just like normal QuickPhyxa (talk) 22:25, 18 October 2021 (UTC)[]


hi please creat a bot to creat plural of English names I creat some Amirh123 (talk) 12:03, 18 October 2021 (UTC)[]

@Amirh123: you can't even write a complete sentence- why are you creating entries? You were blocked for this three years ago. If you keep doing it, the next block may be permanent. Chuck Entz (talk) 12:27, 18 October 2021 (UTC)[]

Effect of Apple’s iCloud Private RelayEdit