Open main menu

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019


Contents

August 2019

Words that are both borrowings and compoundsEdit

In Oto-Manguean languages (among others), many nouns have a generic prefix indicating what type of thing they are. These prefixes are also added to loanwords, which poses a problem for Wiktionary templates because Template:prefix, Template:compound etc. only allow morphemes from within one language.

E.g. for ntamesá, I can't use Template:compound to give its etymology as inta +‎ mesá, because mesá doesn't exist as an independent word. But if I just use Template:bor, it doesn't show the relationship to inta at all.

Is there a way to produce something like "inta + Spanish mesa", so that it categorizes it as both a compound and a borrowing? --Lvovmauro (talk) 12:13, 1 August 2019 (UTC)

I think {{affix|poe|inta|mesa|lang2=es}} will do what you want. —Rua (mew) 16:09, 1 August 2019 (UTC)
Something like {{affix|poe|inta|{{bor|poe|es|mesa}}}} also seems to work /mof.va.nes/ (talk) 01:13, 7 August 2019 (UTC)
That produces bad HTML: "Spanish mesa" will be italicized and marked as San Juan Atzingo Popoloca (poe) text. — Eru·tuon 01:47, 7 August 2019 (UTC)
Noted. /mof.va.nes/ (talk) 19:55, 7 August 2019 (UTC)
Never a good idea putting templates inside of templates. --{{victar|talk}} 22:52, 7 August 2019 (UTC)
I do not subscribe to this apodictic statement. You say this about the positions of the linking templates reserved for linking in a certain languages and you have said this about |tr= and similar. It depends on what the templates do. Often {{taxlink}} belongs into |t=. {{w}} belongs into quotation templates. Even simpler: {{circa}}, {{...}} and similar replacement templates. But {{affix|poe|inta|{{bor|poe|es|mesa}}}} is bad of course. Fay Freak (talk)
Arguably, it should be required that templates be nestable or tolerant of nesting. If not they should come with warning labels. DCDuring (talk) 01:18, 8 August 2019 (UTC)
A template that is enclosing another template can only see the wikitext generated by the inner template. In theory, one of the modules that {{affix}} uses could try to strip language-tagging from the input that it gets, but it would be complex and would probably cause more Lua memory errors. It's better to scan the dump to find instances to fix. Here are cases like the one above (basic etymology templates inside the "term" parameters of the basic morphology templates). — Eru·tuon 03:37, 8 August 2019 (UTC)

Linkages to limited entriesEdit

Being more of a user than contributor, I plead guilty of inadequate mastery of the Wiktionary facilities. Often I don't even know where to look. (I have only recently discovered how to enter an example, for example.) Now one thing that has frustrated me considerably: Suppose I wish to link to a headword that has many valid definitions, and all of them occur under one headword with separate definitions, but the link I want is a specific one in the list of definitions let us say "lustre", a noun, for which we have four entries, but suppose I want to refer to precisely the third. (Or the first entry under the verb FTM). Is there a facility for that? If so, I would be grateful for a link. JonRichfield (talk) 09:54, 2 August 2019 (UTC)

There is no easy way, but I think the template {{anchor}} can be used for this purpose. If you replace the third sense for the noun "lustre” by # {{anchor|glass ornament}} a glass ornament such as ..., then [[lustre#glass ornament|lustre]] will look as before but link directly to the glass-ornament sense. I do not know if there are arguments against this approach – except that it is clumsy.  --Lambiam 18:46, 7 August 2019 (UTC)
@Lambiam, JonRichfield We have {{senseid}} specifically for this purpose. —Rua (mew) 15:14, 8 August 2019 (UTC)
I have created a documentation subpage for the template {{anchor}} and added: “See also: {{senseid}}”.  --Lambiam 18:25, 8 August 2019 (UTC)
@Lambiam, Rua Many thanks to both of you. I shall investigate. JonRichfield (talk) 18:30, 9 August 2019 (UTC)

Gerund at bayEdit

There has been an exchange at and about gerund. The discussion had been too involved to permit me to go into detail here, but I have posted an example to illustrate what bothers me. First of all, there is a lot of argument about whether the concept still is viable in English (See here for more detail), but I am not taking sides in that connection, though I am increasingly uneasy about extending the verbal noun concept to cover what is seen as for example verbal adverbial functions. It does however apparently exclude participles, in particular present continuous participles, which seems to me to amount to straining at straws that have been passed by camels. ("I smell a rat, I see it in the...") To add aggravation to lesion, the English article gerund currently has a Russian example that has only an uncomfortable equivalent in English, that isn't a verbal noun, and I am not comfortable with the idea of seeing it as a gerund at all. To illustrate the concept, I have added a natural Afrikaans construction that seems to me nearly exactly equivalent, given that my mastery of Russian stops short about at da, nyet, and tovarich. Now, I don't know where this is heading, but could some members please have a look at gerund and the examples, and offer comments or pronouncements? JonRichfield (talk) 10:13, 2 August 2019 (UTC)

Roughly speaking, a gerund is a verb form that some linguists choose to call a “gerund”. I do not think any other short description will cover the meaning across the spectrum of languages. And it is hard to find two modern linguists who agree on which forms to call a gerund. The concept is a remnant of the outmoded conception that Latin is the perfect language with the perfect grammar, and that more barbaric languages such as English should be described in the terms developed for Latin grammar.  --Lambiam 18:33, 7 August 2019 (UTC)
At a glance, it's easy to get the impression that the English translations, One shouldn’t cross a street while reading a newspaper and That fellow is crossing the street while reading!, are supposed to be examples of gerunds in English, which they are not. Mihia (talk) 13:28, 8 August 2019 (UTC)
We are lapsing a bit into prescriptivism here. If published works use the word gerund "carelessly", we should include the "careless" definition as well as at least one generally-accepted-accepted-as-correct-by-grammarians definition.
Can an English deverbal term ending in ing take a plural while also being a pure gerund? To me the ing-forms that I can think of that have a plural form seem to be full-fledged nouns, eg, tracings, rubbings, drawings, parsings. Gerunds seem to behave generally like uncountable nouns. If so these might be useful to exemplify in the entry or mention in usage notes. I haven't subjected these notions to testing or to the authority of CGEL (2002), but perhaps we have an expert reading this. DCDuring (talk) 18:09, 20 August 2019 (UTC)

صدرEdit

I don’t know if this is the right place to discuss this, but an anon has repeatedly undone my edits at صدر, without giving any explanations on whether my revision should not be kept. Is there a way to solve this? I’m pretty sure a temporary block would be useless, as the user would immediately revert my edits as soon as it expires. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 10:46, 4 August 2019 (UTC)

I really don't know who to believe here: this is probably a problem IP editor in Saudi Arabia that @Fay Freak has been battling- they have a very narrow, prescriptive view of their native language, and revert all kinds of reasonable edits. You, on the other hand, have a long, long history of making bad edits in languages you don't know- I can't trust you to actually know what you're doing. We'll have to wait for someone who knows the language to sort this out. Chuck Entz (talk) 15:49, 4 August 2019 (UTC)
But here it is only about the formatting. Indeed the IP does not explain (because its English is bad), but I don’t see a point in IvanScrooge98’s formatting either. The numbered pronunciation sections are not uncontroversial. We wanted to get rid of them for long, and recently one even argued to get rid of numbered etymology sections (won’t search out all the discussions now). In any case, the IP has correctly seen that IPA pronunciations are a bad ground to format all the page around it. Like I have recently said that the categorization of terms in a language as derived from a proto-language X is little a reason to duplicate etymologies everywhere, putting things everywhere that otherwise you would not put there but let stay at other places, so pronunciations whether in IPA or in audio do not justify to split apart all the main content by “pronunciation” sections – or by “Etymology” sections only because of different pronunciations, as has not seldom been done even if the given etymology is the same (“from the root Y”), often aggravating the obnoxiousness of the formatting by adding the same reference templates under every one of them 😵. The pronunciation sections are not of equal importance for every language: In English and Chinese one needs them, in most other languages and in Arabic the opposite is the case, and in Arabic a problem becomes frequent because different pronunciations can have the same spelling so we could to put a pronunciation to every POS section even though this is not necessary to know the pronunciation from the dictionary alone (but the vocalization or transcription is) and here the reader would ask “why would you do that?” – the Saudi IP has appreciated that after seeming me cleaning up the thus messed-up formatting of many Arabic pages.
What I opt for is to get rid of the pronunciation sections in Arabic entries altogether an adding a switch to inflection tables the readers can toggle to switch the transcriptions in the tables to IPA, or similar, and add parameters to include audio files in inflection table entries; it fits the language much more too, since else we only give a pronunciations for the lemma form which is one form of a hundred in verbs, and one of at least 15 in common nouns, else it looks quite arbitrary as opposed to English or Chinese where one does not have all these forms. Heavily inflecting or agglutinating languages which are written (ggf. with diacritics or transcriptions, but then even more so) unambiguously suggest to put pronunciations at a less showy place, while pronunciation sections are more for languages where the lemma pronunciation covers most and to give pronunciations is also required because the spelling leaves doubts about the pronunciation. WT:EL has not been written with all that in mind, sure, but is rather based on English-like requirements. So far it is already beneficial to realize the different information requirements in the treatments of various languages, not to make the entry layout a tool against legibility. Fay Freak (talk) 16:32, 4 August 2019 (UTC)
Sorry, forgot to reply. If you notice, I’ve been little active in the last few months and only edited when I was sure about what I was doing. Also, I am studying Arabic and I figured that order would have been the most sensible and complete, even though I agree we could take the pronunciations away if the division becomes cumbersome, especially since Arabic pronunciation is mostly predictable from the romanization. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 07:55, 6 August 2019 (UTC)

requesting AWB permissionsEdit

hey friends, i'd like to request AWB access to add noun class information to plural forms of Swahili words, and possibly a few other small tasks. i've created over 1,300 swahili entries here without issue, and i'm familiar with AWB having done a few tasks with it on enwiki before. i'll personally review every edit, and tag them appropriately, and before every mass task i'll write up a little description of it on my talk page as i did on enwiki. also, i plan on using JSWikiBrowser instead of AutoWikiBrowser, but it still depends on me being added to WT:AutoWikiBrowser/CheckPage so it should be no difference from the admin point of view. thanks, --Habst (talk) 19:21, 6 August 2019 (UTC)

@Habst: All set. - TheDaveRoss 12:21, 8 August 2019 (UTC)

TamilEdit

I am new to Wiktionary, so I don't know where to post but I'm wondering about the use of colloquial Tamil on Wiktionary. Just as Egyptian Arabic and other colloquial Arabic dialects used frequently in speech but not in writing (while both Modern Standard Arabic and written Tamil are used in writing but not speech) have entries, would it be possible that I, a native Tamil speaker, could add in colloquial Tamil entries? This may be as simple as including the IPA under the written Tamil entry as a separate dialect similar to how both Received Pronunciation and American English pronunciations are given for English entries or I could also create separate entries and link the written and spoken Tamil forms in the same way that Persian, Dari, and Tajik entries are on Wiktionary. (For examples of the differences between the two registers, even the numerals 1, 2, 3, 4, and 5 are pronounced quite differently in the two: oṉṟu vs. oṇṇu, iraṇṭu vs reṇṭu, mūṉṟu vs mūṇu, nāṉku vs nālu, and aintu vs añcu.) Also, I've noticed a couple inaccuracies with IPA pronunciations on Tamil entries stemming from what I assume is an inaccuracy in the code for the Template:ta-IPA. Is there a way for me to edit this template? (By inaccuracies, I mean mostly the use of [dʑ] where [s] should be and [ss] where [tʃː] should be. Other than this, I haven't noticed anything.) —This unsigned comment was added by Wokj (talkcontribs) at 03:01, 7 August 2019‎.

hi Wokj, i see you posted this twice so i responded to you at the information desk: Wiktionary:Information desk/2019/August#Tamil. it's no problem this time, but better to post things in one place in the future so we don't have duplicate answers. --Habst (talk) 03:39, 7 August 2019 (UTC)
Ok, thanks! I'll be sure to post things in one place in the future. Wokj

Renaming Bella Coola to NuxalkEdit

I am by no means related to this people or language nor do I have any contact with or knowledge of it whatsoever, but I just a) read it more commonly called Nuxalk so I would imagine more people would recognize it that way
and b) read on its Wikipedia page that Nuxalk government prefers this name. I don‘t know any more arguments against/in favor of it, maybe someone else does.
I feel like this paragraph is worded and syntaxed a bit awkwardly, English isn‘t my native tongue and I mainly use it to write things, so I‘m not sure what sounds right. Please ask if anything is unclear. |Anatol Rath (talk) 11:58, 8 August 2019 (UTC)|

What does ISO call it? We mostly use their standards for language codes etc. Equinox 15:33, 8 August 2019 (UTC)
Just looked it up: it‘s blc, so bella coola |Anatol Rath (talk) 16:42, 8 August 2019 (UTC)|
Well, the code doesn't necessarily imply anything about the name, but https://iso639-3.sil.org/code/blc does list the language name as Bella Coola. --Prosfilaes (talk) 07:58, 20 August 2019 (UTC)

Vote on "coalmine"Edit

Is there any appetite for a(nother) vote on the "coalmine" policy, whereby multi-word SoP entries are kept if the corresponding solid word can be attested? I don't know when the last major discussion of this happened. Mihia (talk) 12:57, 8 August 2019 (UTC)

I voted for COALMINE, but I don't like how it has often been used to keep a common spaced SoP by finding a tiny handful of obscure/nonstandard citations for the non-spaced form. I don't know what the solution is. Equinox 12:59, 8 August 2019 (UTC)
@Mihia: The vote happened ten years ago. I'd vote against the proposal. Canonicalization (talk) 13:05, 8 August 2019 (UTC)
We need to have some ideas if we want to revisit the vote. Otherwise we just have to be picky about the spelled-solid citations. DCDuring (talk) 13:07, 8 August 2019 (UTC)
At the time we were looking for some way of wasting less time on RfD discussions, which often led to inclusion of phrases by vote just because they were common collocations, no matter how transparent. DCDuring (talk) 13:09, 8 August 2019 (UTC)
To achieve what? Why care? Wiktionary:NOTPAPER. Also nobody has explained yet what the magic of a space would be why lacking spaces indicate inclusionworthiness. There are a lot of things that shouldn’t be included while being written together, and people fail to understand the compounds can also be just examples for the nomina simplicia: according to some “logics” one should include the term “Muselmanenmäusken” if used by three different authors and somebody removed the quote from Muselman because “the quote does not have the term”. But Muselmanenmäusken isn’t a term and won’t ever be with additional uses. But if somebody creates it with quotes, why remove the page Muselmanenmäusken? I can’t tell you why. And I can’t tell any reason more if German had spaces in compositions and that word were to be quoted thrice and created. What’s the difference? Spaces tell nothing at all on whether something should be kept or removed in a language, as also, as we have recently learned, not even the presence in a text in a certain language indicates in which language the “term” is – if in doubt, one string in a quote might attest a German simplex, a composed term and a Latin term too, if one cannot distinguish where languages end and where terms end. Romans had no spaces, so what? Include every sentence or every text as an entry? The policy-changes will continue to be Anglo-centric, one is unable to articulate or conceptualize what should be included because of unavailable language knowledge. Better not to add any policies, it usually removes people one step further from common sense. That being said, there was no reason to add WT:COALMINE and there is no reason to remove either as long as one does not see reasons to remove words of the coalmine type. I think this aporia is analogous to the distinction between language and dialect. Like one cannot wholly shed languages and dialects, in lexicography under unlimited ressources one cannot pin down the language of every string that can be quoted, and one does not see where compositions are so unnecessary that they should be deleted. Fay Freak (talk) 16:39, 8 August 2019 (UTC)
 
...why not?
NOTPAPER doesn't mean we should add literally anything arbitrarily, else why not add kitten pictures? Everyone likes those. We should hold ourselves to meaningful rules. Equinox 16:41, 8 August 2019 (UTC)
@Equinox You are right, I wouldn’t add anything. But if people put in the effort to document common compounds, though the recommendation is not to add you-know-what-we-talk-about one cannot reverse this argument and say it should all be deleted. From “do not add” does not follow “do delete”. Or can anyone prove this statement? And spaces have been a poor reason for distinguishing you-know-what-we-talk-about and what is inclusionworthy. Fay Freak 16:53, 8 August 2019 (UTC)
Why care? Because it licenses the creation of multi-word SoP entries. Why does this matter? Because it is counter to the standard principles of any dictionary and also potentially confusing to users who happen upon the phenomenon. Users should, and presumably largely do, understand that to find the meaning of "X Y" they have to combine the meaning of X with the meaning of Y in cases where "X Y" does not have any special meaning in combination. A separate entry for "X Y" gives the impression that there is a special meaning not understandable from X + Y. Then there is the issue, as Equinox mentioned, of small numbers of obscure/nonstandard citations having an impact beyond what they merit. How much sense does it make to have an entry for "cluster size", for example, purely on the basis that a few people who couldn't tell a variable name from proper English wrote it as "clustersize"? None whatsoever, in my view. Information that it is usually (or should be) written "cluster size" can be provided at "clustersize" for the benefit of anyone who lands there. Mihia (talk) 19:14, 8 August 2019 (UTC)
I would vote against COALMINE as written if there were a similar vote today. We don't need a coal mine entry in order to state at coalmine "much more commonly spelled coal mine". There are some examples where COALMINE has justified keeping a term which I felt should be kept without having other good CFI rationale, but hopefully we will be able to identify those even without this policy, or be able to figure out a narrower criteria which would eliminate pumpkin seed type questions. - TheDaveRoss 18:05, 8 August 2019 (UTC)

I would be interested in having a discussion about entries in other languages alongside one about English entries. See the discussion at WT:Requests_for_deletion/Non-English#energia_eolica that touches on the space for more specific policy. Ultimateria (talk) 16:43, 10 August 2019 (UTC)

Stunned silence/disbeliefEdit

What meaning of "stunned" applies to the following sentences where stunned modifies abstract nouns forming an adverbial, instead of the animate being who is stunned?

what is the linguistic term for such a behavior? What other adjectives act in a similar way?

I sat in stunned silence, I reacted to the news with stunned disbelief --Backinstadiums (talk) 14:40, 8 August 2019 (UTC)

The rhetorical figure is called metalepsis. I don't think it warrants a separate definition, just as with the component words (eg, face) of: "Was this the face that launched a thousand ships and burnt the topless towers of Ilium?". DCDuring (talk) 16:38, 8 August 2019 (UTC)
Maybe it's hypallage or anthimeria. DCDuring (talk) 18:02, 8 August 2019 (UTC)
The license for the phrases to be adverbial (They could also be adjectival.) is that they are prepositional phrases. Also, silence doesn't seem "abstract" to me. DCDuring (talk) 18:07, 8 August 2019 (UTC)
Silence is tangible.  --Lambiam 19:17, 8 August 2019 (UTC)
... as well as being golden. Mihia (talk) 19:53, 8 August 2019 (UTC)

Should Illyrian be a language?Edit

At the moment, Category:Illyrian language indicates that it is a reconstructed language. We currently have one reconstruction for it, which is at WT:RFD. Given that reconstructed entries require descendants, derived terms or other evidence that the reconstruction can be based on, I'm not sure if this is at all possible. No certain descendants of Illyrian are known, and therefore not much is known in the way of sound laws that would support reconstructions. Wikipedia isn't even sure if there was a single Illyrian language, and titles the article in plural: w:Illyrian languages. All this makes me think that we shouldn't have this language on Wiktionary at all. Thoughts? —Rua (mew) 09:16, 9 August 2019 (UTC)

Etymology-only language? Seems like personal names in Latin and possibly Balkan languages other than Albanian can be assumed to be of Illyrian origin – and hydronyms, names of settlements? It would be unsurprising that there isn’t a corpus of the language either, as the Slavs too did not deign to write. But then again I wouldn’t know where Illyrian would start and end, if one starts to talk about “real Illyrians” and “less real Illyrians” and one can be content with deriving from substrate. “Illyrian” is probably a meme. +I have always opined anyway that nomina propria should be categorized separately for their etymologies so we do not litter the categories “terms derived from X”, nor request categories – look at the requests for etymologies in Latin entries, it’s 3,527 pages, but the absolute majority is names, it’s ridiculous. Fay Freak (talk) 11:38, 10 August 2019 (UTC)

moniesEdit

What is the plural part in monies? The entry of -ies does not seem to include it --Backinstadiums (talk) 09:22, 10 August 2019 (UTC)

"monies" is an irregular plural. Probably the entry should state that, but I don't know how to work it into the template. Mihia (talk) 10:25, 10 August 2019 (UTC)
I would fix it, but our modularizers have obfuscated what should be a simple template to the point that it isn't worth the effort to try and untangle how it all works to make a straightforward change. Time to start a vote to disable Lua on this project, its more trouble than it is worth. - TheDaveRoss 17:11, 12 August 2019 (UTC)
I don't think -ies should include it, because it isn't a suffix added to some stem *mon. Equinox 10:33, 10 August 2019 (UTC)
According to Garner's fourth edition, page 604, moneyed vs monied current ratio is 3:1. Incidentally, what is the situation with monied? --Backinstadiums (talk) 11:47, 10 August 2019 (UTC)
monies is a regular plural of mony, which is obsolete, according to us.
FWIW, I always thought it synonymous with funds ("financial resources") or somehow similar to funds. Though I've worked in finance (US), it never came up in any actual usage in my hearing. It seemed archaic and/or UK when I ran across it in reading. DCDuring (talk) 20:24, 12 August 2019 (UTC)
It shows up in some formal-register writing, such as certain financial and legal contexts, as the plural of money, used as a countable in ways similar to the contrast between fish (plural of the single animal) and fishes (plural of the group noun), where the latter implies multiple kinds of the main noun. In legal and financial contexts, monies implies specifically that these are funds coming from multiple sources / accounts / etc. ‑‑ Eiríkr Útlendi │Tala við mig 23:19, 12 August 2019 (UTC)
@Eirikr How do you know that it is the plural of money rather than. 1., a plural of mony or, 2., a plural-only noun? DCDuring (talk) 00:43, 13 August 2019 (UTC)
I note that, money is usually fungible whereas monies are not. Monies is often used in discussion of government and not-for-profit finances in which money appropriated, donated, held in trust, etc, often can be used only for a specific purpose or object of expenditure. Something similar occurs in investment management, banking, etc. These realms are the stronghold of fund accounting (See w:Fund accounting.). Sociological literature uses monies in discussing how households often restrict given sources of income for specific purposes. Eg, ill-gotten gains fund charitable donations, children's 'fun' expenditures come from their earnings and holiday gifts. Cash currency, bank deposits, and cryptocurrency can be considered separate forms of monies, but none of them is called a mony. In fact it is almost impossible to find and use of mony as a singular in the last 100 years of more, except in works of history. DCDuring (talk) 01:12, 13 August 2019 (UTC)
  • Re: the plural of money, see also Merriam-Webster's entry for monies, stating simply, "plural of MONEY", and then their entry for money, stating "plural moneys or monies". That jives with how I learned both terms (singular and plural). Now, monies may be derived as the regular plural of now-obsolete mony, but then mony evolved into modern money with the extra e, while the plural form stayed as it was, and then we also see the new plural form moneys sprouted into existence. In certain contexts, at least, the monies form persists, and in modern usage, there's nothing else for it to be the plural of, other than money. No? ‑‑ Eiríkr Útlendi │Tala við mig 04:05, 13 August 2019 (UTC)
It could have evolved into a plural-only noun. Is moneys used in exactly the same way as monies? DCDuring (talk) 10:57, 13 August 2019 (UTC)

plural (number)Edit

In the appendix, does plural (number) means plural verbal agreement? --Backinstadiums (talk) 11:33, 10 August 2019 (UTC)

No, it means a grammatical number. Verb agreement may not generally exist. As in Arabic VSO sentences have the verb in the singular even if a plural follows. Fay Freak (talk) 11:42, 10 August 2019 (UTC)
In English entries, verb, pronoun, and determiner agreement are important. The presence or absence of a terminal 's' is self-evident, arguably of no value to our normal users, and therefore not worth noting in our entries. Some of our "plural only" entries, for example, have made a hash of this. But English speakers often alternate between singular and plural agreement for "pair" nouns. (Eg, these/those scissors is about three times as common as this/that scissors.) Treatment of plurals also gets confounded with (un)countability.
What we say about a term in Appendix:Glossary should always be applicable to English. We can have warnings about changes in a term's meaning as applied to other languages, but coverage may need to be in other Glossaries or the "about" page for such a language. DCDuring (talk) 16:40, 10 August 2019 (UTC)

@DCDuring: In youth, what does the following information mean? (uncountable, used in plural form) --Backinstadiums (talk) 09:16, 23 August 2019 (UTC)

The label conveys no useful information to me. I assume the intent was to convey some information about number agreement of youth with verbs, pronouns, and possibly determiners.
I don't think that youth is uncountable in that definition. I think it is a plural. A usage example would be "The youth of the cities don't understand agricultural metaphors. They are culturally impoverished." If I am right about this, then the inflection line is not accurate either. DCDuring (talk) 09:58, 23 August 2019 (UTC)
MWOnline has an entry for the youth with this meaning, but the is not essential. Other definite determiners can substitute: "New York's youth", "These cities' youth", "Those youth", the latter two possibly not for all speakers. DCDuring (talk) 10:09, 23 August 2019 (UTC)

Words with multiple inflection patterns: one inflection table or several?Edit

When a word can follow multiple patterns of inflection, the alternative forms are often shown next to each other within a single inflection table. But there are also entries where multiple inflection tables are shown. I'm wondering which of these approaches works better in practice. Showing multiple forms together in one table takes up less space, but it is no longer easy for the reader to separate out the different inflection patterns.

There are extreme cases like Slovene strgati, which can inflect according to no less than four different patterns: two different present tenses, and two different accentuation patterns. Many of the forms are shared between the four, in particular the infinitive, so they could theoretically all be stuffed into one table, but it becomes very hard to make out. Two tables are also possible, but which aspect of the inflection should be combined, the present tense formation or the accent pattern? A case where all forms are in one table can be seen at kopati (to bathe), where there are no less than four distinct imperative forms. Then there are cases like gaziti where the majority of the inflection has one form and accent pattern (AP a in this case), but the l-participle can follow multiple accent patterns (both a and b). Having multiple inflection tables in this case seems like overkill. Where to draw the line, though? When is it clearer to put everything in one table, and when is it better to split them? —Rua (mew) 19:37, 10 August 2019 (UTC)

I think this should be handled on a case-by-case basis. If the different patterns are truly different paradigms, it is probably better to use multiple tables. But if we have a few variations within one paradigm, showing the alternatives side by side probably works better. Here is a sketch of an algorithm for computing a score for basing the decision on; I have no idea how it will work out in practice.
  1. Combine all forms in one table
  2. For every cell that contains N alternatives, where N ≥ 2, add N to the score
  3. For every cell that contains just one form, subtract 1 from the score
  4. If the score is positive, use multiple tables; otherwise, use a single table.
 --Lambiam 23:32, 10 August 2019 (UTC)
I know exactly zero about this language or its inflections, but, looking at kopati, where there are four patterns, is it not ambiguous which form corresponds to which pattern(s) in cases when there are two (or three, though this does not occur) entries in a cell? How do you tell which is which? If you can't tell, this layout seems flawed to me. However, if there are language-related clues such that it is always obvious to anyone with enough knowledge to use the table at all, maybe it is OK. Mihia (talk) 20:57, 11 August 2019 (UTC)
There are technically only two patterns, but each pattern allows for two possible imperative forms. —Rua (mew) 21:21, 11 August 2019 (UTC)
I see, thanks. So I guess it is obvious to anyone with any knowledge of the language. Fine then. Mihia (talk) 22:53, 11 August 2019 (UTC)
A slideshow that puts the user from 1/4 via 2/4 and 3/4 to 4/4 and then again to 1/4 by his pressing arrows (sliding horizontally for horizontal writing systems and vertically for vertical writing systems like Mongolic). So you have one table that contains all but at the same time not all at once. Fay Freak (talk) 23:03, 11 August 2019 (UTC)
@Fay Freak That's not a bad idea, but what would it do when JS is disabled? —Rua (mew) 08:02, 13 August 2019 (UTC)
HTML/CSS is so advanced nowadays, that I did not presume that it needs to be JS. However it can be JS: The content can either be expanded at first and then formatted by JS so one sees all in a text-based browser one after another, or otherwise: there is no reason people who surf without JS don’t make an exception for Wiktionary, and everyone at least sees one table he can apply. Fay Freak (talk) 12:20, 13 August 2019 (UTC)

unmarked idioms in the examples/quotations of sensesEdit

The third meaning of the verb cramp adds the example "You're cramping my style", yet without any refernce to the idiomaticity of cramp someone's style, a tendency I've frequently come across. --Backinstadiums (talk) 09:24, 11 August 2019 (UTC)

I hope that when you come across instances, you’ll fix them. In this specific case, the sense and usex was added more than a year before the creation of the entry for the idiomatic expression; the creator was very likely unaware of the usex. So inasmuch there appears to be such a tendency, it may be the result of editors not knowing everything rather than lacking care.  --Lambiam 13:24, 11 August 2019 (UTC)
the solution being adding an external link, I suppose --Backinstadiums (talk) 13:44, 11 August 2019 (UTC)
I think a more complete solution would also include making sure that cramp someone's style is in the derived terms and that there is a better usage example, at least in addition to the one that includes cramp (someone's) style. DCDuring (talk) 20:44, 11 August 2019 (UTC)

passive transitive verb without agent vs adjectiveEdit

The third entry of embarrass,

"(transitive) To involve in difficulties concerning money matters; to encumber with debt; to beset with urgent claims or demands. A man or his business is embarrassed when he cannot meet his pecuniary engagements"

adds an example in the passive without the passive agent, yet the entry of the adjective embarrassed does not show the meaning derived from the sense above, which "Microsoft® Encarta® 2009" defines as "adjective, short of money: in financial difficulties because of a lack of money" (However, "Microsoft® Encarta® 2009" does not offer the monetary meaning of the verb embarrass as Wiktionary does).

Is there any reason for this? --Backinstadiums (talk) 15:52, 11 August 2019 (UTC)

Microsoft and Wiktionary did not consult each other. We cannot have two-way consultation now because they are out of the dictionary business. DCDuring (talk) 20:46, 11 August 2019 (UTC)
M-W, Oxford, Collins, and AHD all have this sense, most of them list it as archaic (which I would agree with). Also these questions are probably not Beer Parlour fodder, questions about word usage should go in the WT:Tea Room. The Beer Parlour is for discussing the project itself. - TheDaveRoss 15:56, 12 August 2019 (UTC)

Flowers wavered in the breezeEdit

what is the rhetorical device which changes the subject of an active sentence, The breeze wavered the flowers, into an adverb of the passive counterpart, Flowers wavered in the breeze? --Backinstadiums (talk) 11:29, 12 August 2019 (UTC)

I don't think it is rhetoric, just two ways to express the same idea. You could say one element (breeze or flowers) is being foregrounded, or made the subject. Equinox 14:34, 12 August 2019 (UTC)
I don't think of waver as a transitive verb, though it may be or have been one for some speakers.
While attempting to address earlier questions of yours I reviewed classical (Greek/Latin) rhetorical devices at Silva Rhetorica. I don't recollect seeing anything that specifically covers that. There might be something in transformational grammar. DCDuring (talk) 14:37, 12 August 2019 (UTC)
Verbs that can be used transitively with an object, but also intransitively in which the original object becomes the subject (without resorting to the passive voice), are called ergative verbs. The standard example is the verb break: “he broke the glass” → “the glass broke”. This is not a rhetorical device but a grammatical concept. To me, the sentence “the breeze wavered the flowers” is ungrammatical, but its author apparently sees the verb waver as ergative.  --Lambiam 22:31, 12 August 2019 (UTC)
Rua (then CodeCat) and I had a long thread about that a while back. In short, I think the term ergative is overused in relation to English verbs. In "the glass broke", we just have an intransitive, where there is no semantic patient / grammatical object, and just a semantic agent / grammatical subject. "Breaking" doesn't need to be a transitive action, semantically speaking. With intrinsically transitive actions, however, such as eat or cook, using the semantic patient / grammatical object as the grammatical subject and leaving the semantic agent unstated gives us something that can be usefully described as "ergative": "moose meat eats well", "these eggs cook up nicely", etc.
Regarding the verb waver, however, I agree with DCDuring and Lambiam -- the verb, as I understand it, is consistently intransitive, so using it transitively doesn't make any sense, and thus there cannot be a passive. You'd have to use it causatively instead, or causatively-passively: "the breeze made the flowers waver", "the flowers were made to waver by the breeze". ‑‑ Eiríkr Útlendi │Tala við mig 23:31, 12 August 2019 (UTC)

Isn't it transitive in semantic/logical terms? --Backinstadiums (talk) 08:42, 13 August 2019 (UTC)

If you take a philosophical stance that nothing has an internal cause, then mere grammar is not important. But transitivity is a syntactic, not philosophical, term as commonly used. DCDuring (talk) 11:02, 13 August 2019 (UTC)
  • @Backinstadiums: No, the English verb waver is not transitive in any sense that I've ever encountered. Our entry at waver only lists intransitive senses, as does the Merriam-Webster entry, among many others.
  • @DCDuring: Depends on the language and how you define your terms. For Japanese, teaching materials describe a "transitive" verb as generally the same thing as a 他動詞 (tadōshi, literally other-moving word), where the transitivity is a semantic property, dependent on the underlying meaning of an agent applying the action to a patient: the verb doesn't change class due to the presence or absence of any explicitly stated object. For both 食べる (watashi wa taberu, I eat) and リンゴ食べる (I eat an apple), the verb taberu ("eat") is a tadōshi regardless of the presence or absence of the object. Meanwhile, there are "intransitive" verbs described as 自動詞 (jidōshi, literally self-moving word) where the action is purely a matter of the agent doing something on their own, without directly altering or affecting any patient, but these verbs can still take objects marked with (o) in certain constructions, and that also doesn't change the class of the verb. For both 歩く (I walk) and 山道歩く (I walk the mountain road), the verb aruku ("walk") is a jidōshi regardless of the presence or absence of the object.
When talking about "ergativity" as it applies to English verbs, in order to use the label in a meaningful and useful way, we have to look at the semantics of the verb: is the action something done by an agent to a patient, or something that the agent does on its own without affecting any patient? For verbs like break or melt or turn, these could be semantically transitive where an agent does something to a patient, but they could also be semantically intransitive, where the agent just does the action of the verb without requiring a patient. Describing these verbs as "ergative" is not very useful, and I think it's more likely to confuse users who instead learned these verbs as ambitransitive: either transitive or intransitive depending on context. For other verbs like eat or read or say, these can only be semantically transitive, as the actions inherently describe an agent doing something to a patient, even though they might be syntactically intransitive if a given context leaves the object unstated. Using the object of such a semantically transitive verb as the subject is a strange construction in English, almost a kind of passive. Semantically intransitive verbs have no sensible passive, while semantically transitive ones do. Similarly, semantically intransitive verbs have no sensible ergative. ‑‑ Eiríkr Útlendi │Tala við mig 19:22, 13 August 2019 (UTC)
I'd be in favor of banning ergative, ambitransitive, and ditransitive from use in definition-line labels (generally, but at least in English) on the grounds that the terms are not readily understood by normal users. My evidence is that if one compares usage patterns of "transitive verb"/"intransitive verb" with those of "ambitransitive verb", "ditransitive verb", and "ergative verb" the latter three occur almost exclusively in scholarly books and articles, whereas one can find the former in basic texts and even in newspapers and magazines. I can provide anecdotal evidence that college graduates are not familiar with "ergative" and its ilk, but at least dimly recollect "transitive" and "intransitive". DCDuring (talk) 19:49, 13 August 2019 (UTC)
I support that idea, FWIW. I note that, if the user enters tr=both in our headword templates, the resulting display similarly avoids overly technical terminology like ambitransitive or ditransitive in favor of the wordier-but-more-straightforward option of just stating transitive and intransitive. ‑‑ Eiríkr Útlendi │Tala við mig 22:45, 14 August 2019 (UTC)

pronunciaiton of foreign lemmas used in EnglishEdit

Rarely do the plural of foreign lemmas used in English show its pronunciation; for example, for fait accompli the pronunciation of its plural faits accomplis doesn't vary at all --Backinstadiums (talk) 16:14, 12 August 2019 (UTC)

In the plural I have heard a very un-French final /z/ while the first s, that of faits, remained silent. I don’t know how general this is.  --Lambiam 22:17, 12 August 2019 (UTC)
@Lambiam: According to Longman Pronunciation dict: BrE faits accomplis ˌfeɪz ə ˈkɒmp liː ˌfeɪts-, ˌfeɪt-, ˌfez-, -ˈkʌmp-, -liːz ǁ AmE ˌfeɪz ə kɑːɯ ˈpliː —French [fɛ za kɔ̃ pli] --Backinstadiums (talk) 22:24, 12 August 2019 (UTC)
I'd expect fait accomplis to be a readily found plural. (I haven't looked yet.) DCDuring (talk) 00:00, 13 August 2019 (UTC)
And here are a few of them:
a succession of fait accomplis, legitimised afterwards by the always favourable balance of military power
The inauguration of the world banking electronic network SWIFT in 1977 and slightly later that of the world airline network SITA were presented to the ITU as fait accomplis
The war was a test of how far overwhelming military power can impose fait accomplis that reshape international norms.
The military continued to accumulate fait accomplis that were disagreeable to the politicians
Down through the years, we've had other fait accomplis
I further note that there are numerous available uses of fait accomplis as singular.
Our entry for fait accompli#English does not do justice to the rich set of alternations that authors seem to permit. DCDuring (talk) 00:13, 13 August 2019 (UTC)
There are also plenty plural uses of faits accompli, which we can label a misspelling of. Can we write in the usage notes of prospective entry fait accomplis that the singular use is incorrect? I think we should also not assign pronunciations like /ˌfezəˈkɒmpliː/ or /ˌfeɪtsəˈkʌmpliː/ to fait accompli(s) but only to faits accomplis.  --Lambiam 18:17, 14 August 2019 (UTC)

Plural lemmasEdit

How are plural lemmas with meaning of their own to be indicated in their corresponding "headword"? For example, nowhere in white is the user warned about the specific meanings of the noun whites. --Backinstadiums (talk) 09:24, 13 August 2019 (UTC)

Maybe a subsection “See also   whites”? I notice there is no category for plural lemmas.  --Lambiam 10:05, 13 August 2019 (UTC)
Many of the singular entries have a definition for the term marked as being used only in the plural. It's is duplicative, but convenient for the user. DCDuring (talk) 11:07, 13 August 2019 (UTC)
Can you give a good example of that?  --Lambiam 18:29, 14 August 2019 (UTC)
I found a few using regex search: colour, bowel, good, remain, depth. DCDuring (talk) 20:14, 14 August 2019 (UTC)
Problems like this are why I never mix lemmas and non-lemmas. All definitions go on the lemma, with context labels indicating if the meaning only applies to specific grammatical combinations, such as plural. —Rua (mew) 18:24, 22 August 2019 (UTC)

Passive voice using the preposition "with" instead of "by"Edit

Verb smite: 6. (figuratively, now only in passive) To strike with love or infatuation. Bob was smitten with Laura from the first time he saw her.

I wonder whether this smitten is rather an adjective at least syntactically --Backinstadiums (talk) 10:36, 13 August 2019 (UTC)

"John was beaten with a cudgel by Mary." DCDuring (talk) 11:10, 13 August 2019 (UTC)
@DCDuring: what's your point ? --Backinstadiums (talk) 16:23, 13 August 2019 (UTC)
An instrument (object of with) is not the same as agent of passive verb (object of by). The sentence in no way illustrates passive using with. DCDuring (talk) 17:22, 13 August 2019 (UTC)
And John got so angry that next thing you know Mary was smitten with Bob by John.  --Lambiam 16:25, 13 August 2019 (UTC)
It does pass the test of being gradable: Bob was way too smitten with Laura. John was also more smitten with Laura than was good for him. But Bob was the most smitten I have ever seen a man with a woman as smart as Laura.  --Lambiam 16:25, 13 August 2019 (UTC) — BTW, the Beer parlour is not the right room for discussing such questions.  --Lambiam 16:28, 13 August 2019 (UTC)
Also, it can be used attributively: “a smitten man”.  --Lambiam 08:47, 15 August 2019 (UTC)

intransitive verbs which become transitive if their "goal/purpose" is accomplishedEdit

I've realized many verbs follow an interesting transitive pattern, which I illustrate with an example:

Webster's defines wrangle as either intransitive dispute, argue or transitive "to obtain by persistent arguing". Is there a linguistic term for this behavior? --Backinstadiums (talk) 15:33, 13 August 2019 (UTC)

The term resultative comes to mind, even though in a sentence like “I managed to wrangle a refund” the construction does not conform to any of the four classes described in the Wikipedia article.  --Lambiam 16:09, 13 August 2019 (UTC)
The pattern is not confined to English. For example, French accoucher in the intransitive sense means “to go into labour”. But in the transitive sense it signifies the result of the labour: “to deliver (a baby)”.  --Lambiam 16:14, 13 August 2019 (UTC)

Is the "transitive" label enough?Edit

How can we indicate the need of a preposition, at, for verbs such as He smiled *(at) me, and still label them as transitive for examples such as He smile a big smile ? --Backinstadiums (talk) 11:48, 16 August 2019 (UTC)

There's nothing in the grammar or semantics that requires at after smiled a big/happy/lovely/coy/weak smile.
Please don't focus on "semantic" transitivity. In English entries we are concerned with syntactic transitivity. DCDuring (talk) 12:59, 16 August 2019 (UTC)
  • Yes, as DCDuring notes, English descriptive materials about the English language generally don't discuss semantic transitivity, unless it's a highly technical text. English-language dictionaries of the English language, to my knowledge, never touch on semantic transitivity, only focusing on syntax.
In your examples, "He smiled at me" is intransitive, albeit with an indirect object, whereas "He smiled a big smile" is transitive, and without an indirect object. Moreover, the presence or absence of an indirect object has no effect on the transitivity of the verb: only the presence or absence of a direct object. ‑‑ Eiríkr Útlendi │Tala við mig 16:31, 16 August 2019 (UTC)
I don't see an indirect object in that first case. I rather see a prepositional phrase. —Rua (mew) 17:10, 21 August 2019 (UTC)
Yes, you're correct. I used the wrong terminology there: the me is the object of the preposition at, not of the verb smile. ‑‑ Eiríkr Útlendi │Tala við mig 18:15, 22 August 2019 (UTC)

Homographs of verbal formsEdit

I propose some kind of mention (maybe a "see also") be (automatically?) added in the section of verbs with forms homographic with other parts of speech, as the verbal entries are the ones looked up first. For example, devastate shows no hint at the adjective devastated "Extremely upset and shocked: a devastated widow". --Backinstadiums (talk) 16:05, 17 August 2019 (UTC)

I think devastate itself is missing a sense ‘to severely upset’, e.g. in ‘The news devastated her’, ‘His father’s death devastated him’, and so on. Although maybe that’s what our second sense is trying to get at; hard to tell without quotes. — Vorziblix (talk · contribs) 00:44, 18 August 2019 (UTC)
It is yet another deficiency in Wiktionary that we might all agree needs to be fixed. Almost every such homograph is deverbal and could be shown as a derived term on the verb lemma page. I wonder how many of the nearly 4,000 entries that have both {{en-past of}} and {{en-adj}} don't have a derived term for the adjective on the verb lemma entry. DCDuring (talk) 03:43, 18 August 2019 (UTC)
I suppose one would also like to make sure that all the supposed adjective homographs of past forms of verbs were actually true adjectives. DCDuring (talk) 03:49, 18 August 2019 (UTC)

CGEL : conversion of gerund and past participle forms of verbs is extremely productive: It seems very promising ; He looked devastated --Backinstadiums (talk) 08:21, 18 August 2019 (UTC)

The -ing form above is not a gerund (which behaves syntactically like a noun). I am not sure what point you (and GCEL) is making here. Did anyone here belittle the degree of productivity of conversion? Which CGEL is this?  --Lambiam 12:22, 18 August 2019 (UTC)
It's almost certainly the Cambridge. BTW. I've just RfVed the supposed adjective ground (from grind), which MWOnline, for example, doesn't consider an adjective. DCDuring (talk) 16:47, 18 August 2019 (UTC)
ground#Adjective survived RfV. DCDuring (talk) 23:37, 19 August 2019 (UTC)

I wouldn't mind a little somethingEdit

Assume a reader who is unfamilar with the expressions looking at the entries something or other and some something or other. It is completely invisible that in the first one the word “something” is a fixed and immutable part of the expression, while in the second one it is a placeholder, to be replaced by some something or other. The ambiguity has bothered me for some time. Can we do something about it? Yes we can, by giving such placeholders a distinctive appearance. After considering several possibilities, I think that using smallcaps is probably the best. The head of some something or other would then become:

some something or other (uncountable)

We could let the placeholder link to some page like Wiktionary:Placeholders which could explain (among several things) that our use of “something” may include animate beings, and that “one” and “one's” refer to the subject of a verbal phrase (as in lose one's cool (not followed consistently at present; e.g., the opinion holder of in one's opinion need not be the subject)). (There are some technical issues; for example, {{mention}} does not like {{smallcaps}} in the first parameter, but I am confident that our grease pitters can work such things out.)

The Free Dictionary uses parentheses for this purpose, at least in entries from Farlex Dictionary of Idioms, as seen in their entry “throw (something) in (one's) face”. We also do this occasionally, as in give (something) a go. A reason not to follow this is that we also occasionally use parentheses for indicating an optional part of a lemma (e.g. “(the) word is”). So does, in fact, The Free Dictionary in entries from McGraw-Hill Dictionary of American Idioms and Phrasal Verbs. I think it is not a good idea to cast out one source of ambiguity by introducing another one.

Before I put this up for a vote, I’d like to offer an opportunity for comments and suggestions. So go ahead; the floor is all yours.  --Lambiam 22:31, 19 August 2019 (UTC)

Parenthesis or brackets make more sense visually than smallcaps to me. Why did you have a preference for smallcaps, @Lambiam? —Justin (koavf)TCM 22:56, 19 August 2019 (UTC)
Because (as I explained above) parentheses are also used for optional words or morphemes in the headword, so letting them serve two different purposes introduces ambiguity. Can you tell whether യിരി in (യിരി)ക്കുന്നുഎങ്ങനെക് is a placeholder or an optional part of the phrase? [The example is made-up to make a point; the phrase is meaningless.] If no ambiguity could result, parentheses would also be my first choice. But they are already commonly used to indicate optionality, so that seems the better use of the notation.  --Lambiam 12:39, 20 August 2019 (UTC)
We already often use parentheses for the purpose. I don't know of any actual instances in which the "optional item" interpretation of parentheses could be confused with the "placeholder" interpretation. Seeing some of those might help me see the problem. DCDuring (talk) 23:35, 19 August 2019 (UTC)
For a reader who is familiar with the language in question, it will almost certainly be immediately clear which interpretation is to be chosen, even if they are not familiar with the idiom in question. If they are not familiar with the language, I’m not so sure. Can you spot the placeholder in işin ucu birine dokunmak?  --Lambiam 12:39, 20 August 2019 (UTC)
Of course not: there are no parentheses! If I knew enough Turkish to use tr.wikt maybe I could, if the component terms were all in en.wikt (işin ucu birine dokunmak). I would look up each word and perhaps eventually tease it out. DCDuring (talk) 13:29, 20 August 2019 (UTC)
Bad question, and by the way it was phrased I have given away that the word in question is not optional but a placeholder. Here is another example: is कुछ in the idiom दाल में (कुछ) काला होना optional or a placeholder?  --Lambiam 14:13, 20 August 2019 (UTC)
I'm going to guess it's optional, but I take your point. At some level of ignorance of a language, one couldn't decipher the meaning of parentheses. DCDuring (talk) 16:51, 20 August 2019 (UTC)
It is indeed optional, but this is far from obvious, since Hindi कुछ (kuch) means “something”, a typical placeholder word.  --Lambiam 21:55, 21 August 2019 (UTC)
Before we make the decision, can we run through the actual affected entries and generate a list of how they would look in small caps (or bold, or whatever other distinguisher)? This should highlight any absurdities we may not spot in the abstract. Equinox 01:34, 20 August 2019 (UTC)
I cannot think of a systematic way of doing that. In sich bei jemandem lieb Kind machen, the word jemandem is a placeholder (e.g. “hat sich bei der Zarin dadurch lieb Kind gemacht”); likewise quelqu'un in de quel bois quelqu'un se chauffe (as in “Et de quel bois se chauffaient leurs femelles. For the cases in which parentheses are used in the headword line, they can be implicit (from the PAGENAME) or an explicit head= or similar parameter in any of the numerous headword-line templates.  --Lambiam 12:39, 20 August 2019 (UTC)
I could generate lists of entries in each language that contain that language's placeholder words. I have full lists of entry titles in each language (based on headers), so would just need lists of placeholder words to search for. For instance, there are 53 German entries for phrases containing jemand. — Eru·tuon 16:37, 20 August 2019 (UTC)
That will be very useful (if the proposal is accepted). For each language we will need some editors who are sufficiently conversant with it to distinguish confidently and accurately between fixed and variable uses of placeholder words; we do not want to see a misclassification like [something] or other in any language.  --Lambiam 21:55, 21 August 2019 (UTC)
I see now that smallcaps is not a good way, since many scripts do not have capital letters at all, and therefore no smallcaps. What about square brackets?
some [something] or other: placeholder
(the) word is: optional part
 --Lambiam 14:13, 20 August 2019 (UTC)
Support. This issue has bothered me for a while, although I had resigned myself to the fact that there probably wasn't a perfect solution. This might work though. Andrew Sheedy (talk) 16:34, 20 August 2019 (UTC)
It would seem that we could use piped links for this. We could link to Appendix:Glossary for English placeholders and to language-specific Appendix-space pages for other languages. Colors could work for extra warning or indication for those visually capable. DCDuring (talk) 16:51, 20 August 2019 (UTC)
I wouldn't mind this solution either. Andrew Sheedy (talk) 17:00, 20 August 2019 (UTC)
I also support the (optional) and [placeholder] markup. - TheDaveRoss 16:53, 20 August 2019 (UTC)
For us and for repeat users in general, even normal ones, the difference between parentheses and square brackets seems adequate. Some questions remain, especially for new or infrequent normal users:
  1. Are parentheses better for optional text or for placeholders? I'm guessing there will be more headwords with placeholders than with optional text.
  2. How can we make it easy for a new or infrequent user to learn? One way is by piped links and the associated hover content for the placeholder terms. Is that enough? Would it be better to, say, highlight a relevant line under Usage notes using color, a box, etc.
  3. Are there other uses of parentheses or square brackets, especially in the inflection line, in any script or language, that conflict with what we are contemplating? For example, are there characters or punctuation or combinations of these that can be easily confused with either of them. Ideally, the brackets and parentheses would work for all scripts and languages.
  4. Do we need rules of some kind to prevent overuse/abuse of either placeholders or optional words? Will contributors be tempted to use multiple optional terms or word-class names (determiner) or abbreviations (DET)?
I don't think we have to address all conceivable issues in advance, but we should try to avoid anything major and have ways of adapting to identifiable potential issues. DCDuring (talk) 17:43, 20 August 2019 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────@DCDuring – I agree, we do not have to address all conceivable issues in advance, but it is good to know of at least one acceptable approach to each issue we can identify now. Going through the above four points in order:

  1. Even children are probably familiar of the use of parentheses to indicate an optional suffix: “horse(s) / cows(s) / goats(s)”, “apple(s)”. Traditional textual use of square brackets is mainly confined to editorial stuff: editorial comments (such as “[sic]” or “[pen-corrected to Thraldome]”), editorial omissions (“[...]”), editorial alterations (mostly change of case, as in “[T]he next day”) and editorial insertions (seen mostly in transcriptions and translations: “ass[istant]”, “holy [place]”). This is unlikely to interfere with the intended use here.
  2. Most readers are probably not aware of any convention for marking placeholdership, so, whichever way we choose, readers will have to get accustomed to something new. But I expect this to be largely self-explanatory, just as the current practice is already largely self-explanatory. With this proposal, there will be an additional visual clue that a term like “[something]” is not to be taken literally, “as is”, which will be mainly useful for readers not familiar with the idiom. We may define a template {{placeholder}}, abbreviated {{ph}}, taking a language code and one of the placeholders for that language. Any decisions to link to a placeholder glossary or show hover-over content can then be delegated to a single place – the code for the template – and be revisited as desired. Editors should be encouraged to choose usage examples that illustrate the variability; in almost all cases that will obviate a need for specific usage notes.
  3. I am not aware of current uses of square brackets that may conflict with this plan. They occur, of course, in the IPA of pronunciation sections, but that should not present an issue. Round parentheses are used in the headword lines for grammatical info such as gender and inflectional suffix, but always in italics contrasting with the roman font of the terms. Round parentheses are also used for many purposes outside of headword lines, but as far as I know either for optionality (also in IPA) or for what may be considered parenthetical clarifications, such as accents in the pronunciation sections, labels and glosses in the definitions, and senses in synonym sections, again with a body in italics offsetting them from the roman of the terms.
    For other scripts, other square brackets may be needed than the conventional ASCII ones. For example, Chinese punctuation uses the full-width square brackets U+FF3B and U+FF3D, so we would have “使[某人]火冒三丈” rather than “使[某人]火冒三丈” as the Mandarin translation of make [someone's] blood boil. (That translation is a sum-of-parts, so it should not have its own entry, but it is still useful to identify the placeholder in that sum-of-parts).
  4. I do not foresee the need for rules to prevent overuse/abuse and am generally not in favour of inventing rules before the need becomes apparent. What can be done is to have a data module with the standard placeholders (per language), so that a warning can be issued when an editor uses a non-standard one (e.g. “somebody” instead of “someone”), similar to the data module for acceptable ancestors in etymology. But this may be overkill.

The best option, in my opinion, would be to include the square brackets in the page title, but the Mediawiki software does not allow that. The official reason is that they are needed for link syntax, although I do not see how that would be a problem, provided we do not use double brackets or URIs in page titles. I feel the code checking for illegal titles ought to be a bit smarter than it is. {{DISPLAYTITLE}} also does not allow one to add brackets. So, unfortunately, we won't see

give [someone] a piece of [one's] mind

(A distasteful but effective hack might be to use the Chinese full-width brackets instead, also for Latin-script titles.)

Here is a tentative definition for a template {{placeholder}}, abbreviated {{ph}}, taking a language code and one of the placeholders, meant for language languages using an alphabet:

<span style="font-style:normal;padding-left:0.1em;color=#000;">[#2]</span>

For the headline, code like

{{head|en|verb|head=[[give]] {{ph|en|someone}} [[a]] [[piece]] [[of]] {{ph|en|one's}} [[mind]]}}

will then come out as

give [someone] a piece of [one's] mind.

When using templates such as {{mention}}, code like

{{m|en|give someone a piece of one's mind|give {{ph|en|someone}} a piece of {{ph|en|one's}} mind}}

should come out as

give [someone] a piece of [one's] mind.

 --Lambiam 17:21, 22 August 2019 (UTC)

See-also in different scriptsEdit

star says (at the very top) "See also: Star, står, Stär and стар". I read it as "crap" and was going to remove it as vandalism, but no, it's the same sounds in Cyrillic. Is see-also supposed to do this? I've never seen Arabic up there for instance, or Hebrew. Equinox 01:32, 20 August 2019 (UTC)

This seems like a bad idea: what is the purpose of this supposed to be, User:Hergilei? Are you putting a translation at the top of the page? Is this supposed to be a transliteration into Cyrillic characters? —Justin (koavf)TCM 04:48, 20 August 2019 (UTC)
It's a transliteration. You can see that by simply going to стар (underneath Adjective, a transliteration is provided for every language. стар is transliterated as star for four languages). I noticed a while back that transliterations are sometimes included in See alsos and "Appendix:Variations of". I didn't realize it would be controversial if I included them myself. Hergilei (talk) 16:31, 26 August 2019 (UTC)
@Hergilei: Can't blame you for doing things that you see at other entries: can you point out any others? —Justin (koavf)TCM 17:03, 26 August 2019 (UTC)
@Koavf: Appendix:Variations of "da", Appendix:Variations of "ma", Appendix:Variations of "ta", Appendix:Variations of "na", Appendix:Variations of "ba"
Hergilei (talk) 17:36, 26 August 2019 (UTC)
@Hergilei: But do you see these outside of the Appendix namespace? Things are a lot more forgiving there; it would be a different story if we had many transliterations in our proper dictionary itself. Do you have examples in the main namespace? —Justin (koavf)TCM 18:53, 26 August 2019 (UTC)
Couldn't we do something like that if we had IPA entries? DCDuring (talk) 13:38, 20 August 2019 (UTC)
What about lookalikes, like Cyrillic со- versus Latin-script co-?  --Lambiam 13:56, 20 August 2019 (UTC)
I've seen this a number of times before. I've also seen it with Japanese, which is even odder. Andrew Sheedy (talk) 16:31, 20 August 2019 (UTC)
Until we have a voted on policy and / or a set of computer readable rules for what the scope of {{also}} is this is an entirely pointless discussion. DTLHS (talk) 18:01, 26 August 2019 (UTC)
I thought that User:OrphicBot might have added "see also" entries like this, but apparently it mainly added words that look rather than sound similar, cases like Cyrillic со- and Latin co- as mentioned above by Lambiam. (See the list of equivalences. However, some equivalences are more sound-based: wƿ, thþ) If "sound-alikes" are going to be included in our current entry layout, {{also}} or the variation appendices are the right places to put them. — Eru·tuon 18:24, 26 August 2019 (UTC)

PoS surviving in compoundsEdit

Microsoft® Encarta® 2009 adds the adverb swift: "very quickly" but the example offered is a swift-flowing river. Wiktionary labels it as "(obsolete, poetic)", yet swift-flowing is used in everyday language (for Collins it's "BrE"). What should situations like this one be dealt with? --Backinstadiums (talk) 09:14, 20 August 2019 (UTC)

I think something else is going on here. When used as the first part of a compound, English words tend to go back on their root form in some cases, as seen in a walk of four miles, which is a four-mile walk (dropping the -s). Likewise, a narcissist who talks incessantly is an incessant-talking narcissist.  --Lambiam 14:28, 20 August 2019 (UTC)
@Lambiam: Searching for "swiftly-flowing" in Google books, we find Don Lee Fred Nilsen's The swiftly flowing river, and Benilde Graña López's swiftly-flowing (river), smart(ly)-dressing (man), rapid(ly)-rising (river), fast-moving (train), often amusing (entertainer), strangely winning (smile), equally daring (suggestion), curiously sobering (thought) --Backinstadiums (talk) 14:37, 20 August 2019 (UTC)
They should ideally be dealt with in accordance with attestation. If you don't have time for attestation, consult lemmings, eg at swift at OneLook Dictionary Search. If you don't have time for that, do nothing. DCDuring (talk) 15:48, 20 August 2019 (UTC)

OED's 4.0 edition entry is divided into A. for adj and B. for adverb, then B. points out ¶ Hyphened to pres. pple. and occas. to a finite part of a verb, on the analogy of combs. in C.3. Section C3. reads "Combs. of the adv. with pples., as swift-flowing" --Backinstadiums (talk) 17:53, 20 August 2019 (UTC)

Promoting Middle Scots (sco-smi) to a full-fledged languageEdit

In the Grease Pit there's a discussion about promoting Middle Scots to a full language so that it can have its own entries separate from Scots. A Middle Scot entry has been created (threschald), but it's using the language code and templates for Scots at the moment. There might be other entries (search query: : insource:"Middle Scots") that could be migrated to this language code. I'm posting this here because the Grease Pit isn't the right place for this type of discussion.

What are people's opinions about this? — Eru·tuon 17:07, 20 August 2019 (UTC)

See also Wiktionary:Tea_room/2018/February#Middle_Scots. DTLHS (talk) 17:09, 20 August 2019 (UTC)
Should just be a Modern English label, i.e. {{lb|en|Middle Scots}} and the only thing that could change that is a vote. --{{victar|talk}} 17:51, 21 August 2019 (UTC)
Let the administrators here begin a vote for this. I am keen on seeing Middle Scots being recognised here, so that further entries may be created.—Lbdñk (talk) 17:16, 23 August 2019 (UTC)
You should create a vote yourself- being an administrator doesn't give one a knowledge of Middle Scots. DTLHS (talk) 17:21, 23 August 2019 (UTC)

New tools and IP maskingEdit

14:19, 21 August 2019 (UTC)

@Johan (WMF): Hate it. I think this change will inadvertently promote vandalism, thanks to the extra privacy it affords. I am in favor of grouping IP accounts using cookies though -- that would be fantastic! --{{victar|talk}} 18:02, 21 August 2019 (UTC)
victar: We're certainly aware of the possibility. That's why we need to build better tools for fighting vandalism first – to, in the end, make sure that we give vandal fighters at least the same or better chances of fighting vandalism. /Johan (WMF) (talk) 10:08, 9 September 2019 (UTC)
@Johan (WMF): If you're not taking peoples opinions, you shouldn't have asked for them. --{{victar|talk}} 10:15, 9 September 2019 (UTC)
victar: We're taking note, and we do appreciate that people tell us their concerns and how it will affect their workflows. I just wanted to tell you that we are indeed aware of the general problem this will cause for anti-vandalism/spam/harassment measures – some people missed the "build tools first" part of the project. (: /Johan (WMF) (talk) 12:49, 9 September 2019 (UTC)

Norwegian Bokmål: classification and etymology issuesEdit

I originally started the discussion at Module_talk:languages/data2#Edit_request:_ancestors_of_Norwegian_Bokmål, but was asked to post here instead.

(Background info: The Bokmål standard of written Norwegian came to be by branching off the common Danish language used in Norway and Denmark after the written Middle Norwegian language died off. During the 19th century a Norwegian pronunciation of this language was developed (much like how standard German is “High German with Low German pronunciation”). A spelling reform in 1907 brought the Danish language written in Norway closer to this Norwegian pronunciation (e.g. -ede > -et) and also adopted some features of the native dialects of the capital area. After that, the language was definitely considered different from the Danish written in Denmark. In the mid-19th century the Nynorsk standard was created based on a “reconstructed base dialect” of all the native Norwegian dialects. The standard was slightly adjusted in 1901 and 1910 to reflect actual use. At this point the standards were clearly different languages. Then, in 1917, 1938 and 1959, radical spelling reforms aimed to merge the two. That project is now generally considered a failure and some of the changes have been reduced. Despite this, none of the standards have been reversed to their pre-1917 counterparts and Bokmål has adopted considerable amounts of Norwegian vocabulary, syntax, inflections and syntax. Because of this, Bokmål is no longer considered a dialect of Danish (saying that it still is, is considered offensive by many of its users) – and this is where the problems start.)

Wiktionary’s Module:languages is programmed to consider Bokmål a descendant of Middle Norwegian. While it can be argued that Middle Norwegian is one of its ancestors (given the large amount of Norwegian elements it has adopted, and also the fact that there exists multiple text written in Norwegian with the common Danish orthography, as well as the fact that Norwegian authors have incorporated elements from their own native Norwegian dialects through the whole period of Danish orthography in Norway), it still contains a lot of words inherited from Danish too (with or without changed spelling). One example is hellig, which has remained unchanged from the common Danish spelling. It was incorrectly listed as a descendant of Old Norse heilagr (I suppose that the source of this mistake is Bokmålsordboka), so I wanted to correct that, i.e. write that it was inherited from Danish hellig, which in turn is inherited from Old Danish hælægh. However, since Bokmål is not registered as a descendant of Danish, {{inh|nb|da|hellig}} produces an error message. Using {{bor}} (borrowed) instead is simply wrong; the word was not borrowed – it has been there all along. The current solution seems to be the {{der}} template, even though its documentation clearly states that {{inh}} and {{bor}} are preferred for words that are confirmed to be inherited/borrowed.

Because Bokmål has roots in both Danish and New/Middle Norwegian, I would like to request that the system reflect that, i.e. that Danish is added as an ancestor of Bokmål. While it is controversial to say that Bokmål is not a Norwegian language today, it is a historical fact that it has roots in Danish and the spoken Dano-Norwegian koiné (see w:Dano-Norwegian and also the language family tree in the infobox at w:Bokmål). Listing both Middle Norwegian and Danish as ancestors will allow correct etymologies to be entered and, as far as I can see, will cause no new problems. Hått (talk) 01:21, 22 August 2019 (UTC)

I don't see why it's wrong to list it as borrowed. It wasn't there all along; it didn't exist in Norwegian until the Danes came along. —Rua (mew) 13:49, 22 August 2019 (UTC)
It is correct that it was borrowed into the spoken language, but not in the written Bokmål standard. Unlike Nynorsk, Bokmål was not created from scratch, but instead developed from the common written Danish. The transition from Danish to early Bokmål happened gradually and the last newspaper to adopt the 1907 spelling reform (Aftenposten, a major one) did not fully do so until 1923. Borrowing indicates that a foreign element enters the language. The form hellig was used during the entire transition period (and is still used today), so I do not see how it can be listed as borrowed (if it was, when did the borrowing happen?). The word bil, on the other hand, was coined by the Danes in the early 1900s and quickly borrowed into Swedish and Norwegian (both Bokmål and Nynorsk). Hått (talk) 01:59, 25 August 2019 (UTC)

@wiktionaryWOTD on TwitterEdit

Does anyone know anything about this @wiktionaryWOTD account on Twitter? It seems to have tweeted the WOTD for a few months from 2010-2011, and then stopped. Is this something we own? It would be neat to revive it. There also seems to be a plain @Wiktionary account that did the same thing until 2012, and a @WiktionaryUsers account that did some (mixed with other comments) until 2013. I would think that this could be automated. bd2412 T 17:18, 22 August 2019 (UTC)

Wonderfool briefly ran a Wikt Twitter account just to annoy me because I had been commenting about how much I hated Twitter. Forgotten which one it was. He got bored with it. Equinox 18:22, 22 August 2019 (UTC)
Even so, I think this would be a good tool to draw more attention to Wiktionary. bd2412 T 20:35, 22 August 2019 (UTC)
However, do we want to draw more attention to Wiktionary? Specifically from the Twitter crowd? I confess I don't like the idea. ‑‑ Eiríkr Útlendi │Tala við mig 21:12, 22 August 2019 (UTC)
What is the point of Wiktionary if we aren't drawing attention to it? As for the "Twitter crowd", there are apparently over 300 million active users (defined as individuals who have tweeted within the past month), so that's a substantial portion of the Internet-connected Western world. bd2412 T 21:49, 22 August 2019 (UTC)
Twitter lies about their userbase incessantly, you can probably divide that number by 100. If someone wants to set up a "Word of the Day" thing, sure why not. I feel strongly that it should not be sanctioned as an official representative of the site. DTLHS (talk) 21:58, 22 August 2019 (UTC)
I agree it shouldn't be treated as official, if we have one, due to the "proprietariness": these walled gardens that can plaster ads around, collect your data, and kick users off at any time with no recourse are the opposite of the free open spirit that wikis are supposed to have. Equinox 23:55, 22 August 2019 (UTC)
  • I thought the point of Wiktionary is to, well, be a dictionary, not to clamour for attention? The use case for Twitter is so orthogonal to the use case for Wiktionary that I honestly worry that an influx of Twitter users suddenly editing here may overwhelm the established editor base with problematic edits. We've seen strange upticks in problematic anonymous edits in the past that were apparently traceable to this or that social media platform linking through to Wiktionary, so I don't think my concern is entirely unfounded. ‑‑ Eiríkr Útlendi │Tala við mig 00:32, 23 August 2019 (UTC)
Hi! French Wiktionary has a Twitter account for almost ten years, managed by Lyokoï. It is mainly used to like and retweet when someone quote Wiktionary, and to give some answers on the way it work (descriptive, neutral, fancy, etc.). I think this action is quite effective to give a better picture of Wiktionaries and to have some feedback by the readers. For example, it was nice to know that some random people smile with the picture on l’herbe est toujours plus verte ailleurs (the grass always look better on the other side)   Noé 14:14, 23 August 2019 (UTC)
I am reasonably sure that the WMF doesn't like this sort of thing. - TheDaveRoss 14:23, 23 August 2019 (UTC)
WMF doesn't like individuals with Twitter accounts representing themselves as Wikimedia projects. Even Wikimedia has an "official" Twitter, as does Wikipedia. bd2412 T 02:19, 28 August 2019 (UTC)
Can confirm, re the Twitter handle @wikiquote. —Justin (koavf)TCM 02:30, 28 August 2019 (UTC)
I mean, that is what all of these are, individuals with Twitter accounts using the WMF marks. Even though we contribute we are still individuals who are not the WMF or Wiktionary.- TheDaveRoss 12:30, 28 August 2019 (UTC)
Oh yes, Wonderfool ran a Wiktionary Twitter account called WiktionaryUsers. He put the WOTD and FWOTD on there too, with occasional news probably like "we have 200000 entries" or "Equinox had another lame argument with SB, this time about the type of dashes to use in entries". He'd be able to restart it too. --Mélange a trois (talk) 23:58, 1 September 2019 (UTC)
    • Ironically, he was using that Twitter account longer than he'd ever managed to keep a WT account. --Mélange a trois (talk) 00:03, 2 September 2019 (UTC)
Good to hear from someone who knows WF. Please tell him to get another hobby (he said ironically). I now recall the associated user name was User:Wikt Twitterer. Equinox 00:06, 2 September 2019 (UTC)
You got it, Eq. And WF does have a new hobby - he has finally learned to play the harmonica. --Mélange a trois (talk) 00:19, 2 September 2019 (UTC)
Don't tell me... he learned it from that heiress he married. Equinox 00:29, 2 September 2019 (UTC)
Nope, SB taught him when he was down in his area a few months ago. You can find his "One for The Bristol City" version online somewhere. --Mélange a trois (talk) 07:49, 2 September 2019 (UTC)

Intransitive verbs with a specified prepositionEdit

Using the verb file as an example, the fourth meaning reads

(intransitive, with for, chiefly law) To submit a formal request to some office. 

However, the verb search shows

(intransitive, followed by "for") To look thoroughly 

I think a formal label should be added to the Appendix:Glossary explaining the currently wording with for, followed by "for" , etc. --Backinstadiums (talk) 18:09, 22 August 2019 (UTC)

I think it's an abuse of the label anyway. Context labels are meant to show meanings that occur in a specific context. But in this case it's not a meaning of the word in question, but rather the meaning of the word combined with another word. It's not that file means "to submit a formal request" whenever it's used with for but rather the combination file for that has this meaning. Secondly, we have the {{+preo}} template which is intended for this purpose. —Rua (mew) 18:17, 22 August 2019 (UTC)

Numerous etymologies in table of contents (ToC)Edit

For some entries with multiple etymologies (e.g. a-), the table of contents is only useful with one level - the languages. That's because it contains a list of mostly duplicate "Etymology 1"..."Etymology N" hierarchies with no ways of distinguishing them. An alternative, would be to replace N with a distinguishing summary, e.g.,

  • Etymology 1 - Middle English a- (“up, out, away”)
  • Etymology 2 - Old English an (“on”)
  • Etymology 3- Middle English a- ("with")

Or even briefer - just the senses. After all, we don't want the ToC to get too wide!

However, I'm not a linguist and unsure about solutions. Perhaps simply reducing the TOC limit is suitable in some cases. The general problem remains, I have found that the ToC is often less than usefull.

(If the Etymologies headers are augmented, an {{anchor}} with their old name may be required to prevent breaking links.)

Dpleibovitz (talk) 04:43, 23 August 2019 (UTC)

Adding reference to the coalmine rescinding vote to CFIEdit

Would someone please add Wiktionary:Votes/2019-08/Rescinding the "Coalmine" policy to WT:CFI#Idiomaticity as an additional reference to the COALMINE policy? Admittedly, it has not been a vote approving a change, so far. Nonetheless, it would enable CFI readers to see the most recent state of consensus on the matter; the original vote traced to from CFI is from 2009. There was another vote, Wiktionary:Votes/pl-2012-03/Overturning COALMINE, but that does not need to be linked from CFI, IMHO; it is linked from the 2019 vote anyway. --Dan Polansky (talk) 07:01, 23 August 2019 (UTC)

No one protests but also no one placed to WT:CFI#Idiomaticity a link to the vote, next to the link currently numbered "[10]". Would @bd2412 or @Mx. Granger be interested in adding the link to WT:CFI? (I picked bd2412 since we successfully worked together before, and Mx. Granger for diff; more people came to mind, but I don't want to bother more people.) --Dan Polansky (talk) 17:09, 5 September 2019 (UTC)
Let's wait until the vote has been closed before adding it as a reference to CFI. —Granger (talk · contribs) 00:27, 6 September 2019 (UTC)
@Mx. Granger: The vote has been closed. Would you please add it as a reference? --Dan Polansky (talk) 07:44, 22 September 2019 (UTC)
  DoneGranger (talk · contribs) 13:41, 22 September 2019 (UTC)

Japanese: move kana and rōmaji to the pronunciation sectionEdit

In a Japanese entry, the kana and rōmaji forms are currently placed in the headword templates. However, a look into Module:headword reveals that these places are originally for inflections instead of alternative scripts:

使う (transitive, godan conjugation, ren'yōkei 使い, past 使った)

Transliterations are supported but handled separately (and apparently Latin-only):

使う (tsukau) (transitive, godan conjugation, ren'yōkei 使い(tsukai), past 使った(tsukatta))

In light of this, I would like to propose moving the kana and rōmaji to the pronunciation section, for the following reasons:

  • The standard entry layout requires the pronunciation section before the headword lines. This means that for kanji entries like 申す, the reading first appears in phonetic transcriptions (ーす[móꜜòsù] and [mo̞ːsɨᵝ]) and then in kana and rōmaji which more people regard as the "reading" (もうす and mōsu). This reverses the usual logic of kanji entries, because kanji spellings must first be "read" to get words and words then have phonological information. If we move the reading to the pronunciation section and display it prominently, then the structure of kanji entries can become more clear:
Pronunciation
Hiragana もうす
Hepburn romanization mōsu
Kunrei-shiki romanization môsu
Historical hiragana まうす
  • For words having multiple parts of speech, the alternative scripts must be repeated in every headword template, which makes maintenance harder and error-prone. If we change the format of headword templates to the first one above, we can eliminate all repetition. If we change it to the second one above, we can still eliminate repetition of the historical kana and kyūjitai.
  • Kyūjitai can be moved to {{ja-kanjitab}} and displayed in a larger font size. Historical kana will receive better support (support of "historical hiragana and katakana" for entries like 耶蘇教, multiple historical hiragana for entries like 向こう, etc.)
  • Headword lines can be more learner-friendly (as shown above).

On the other hand, there are some disadvantages with such an approach:

  • It means a lot of work to do (possibly by bots). For example, entries lacking a pronunciation section should be supplied one, and entries with {{ja-pron}} which lack orthographical information (現代仮名遣い quirks, capitalizations, spaces and hyphens, etc.) should be fixed. Such a change would also make the pronunciation section mandatory for new entries.
  • If we go for the first format of headword templates above, then kana and rōmaji would be completely decoupled from POS or sense. This means that entries like お玉杓子 would require something like |orthn_note= (similar to |accn_note=) to indicate which spellings apply to which senses. It would also make auto generation of categories like Category:Japanese type 1 verbs that end in -iru or -eru impossible.

What do you think of such an approach?

(Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Dine2016 (talk) 07:24, 24 August 2019 (UTC)

I haven't had time to fully think through the ramifications, but broadly speaking, I'm in support of this proposal. Some notes:
  • "...then kana and rōmaji would be completely decoupled from POS or sense." This should already be the case. If a specific sense has a different reading, then that sense should be split out to a separate etymology section. As such, all entries under a given etym should have identical kana and romaji. If a specific sense has usage quirks, such as using katakana in certain contexts, those should be explained in a usage section. Alternatively, we could follow the practice of various other monolingual Japanese dictionaries, and indicate specific spellings at the start of the line.
  • "...entries lacking a pronunciation section should be supplied one..." Under current practice, we (the JA editor community) already strive so that every entry, and indeed every etym section for every Japanese entry, should have its own pronunciation section, so there is no opposition from me on that score.
  • "It would also make auto generation of categories like Category:Japanese type 1 verbs that end in -iru or -eru impossible." I don't follow. We would have to continue adding kana to headword templates anyway for proper generation and sorting of the basic POS categories.
‑‑ Eiríkr Útlendi │Tala við mig 04:42, 26 August 2019 (UTC)
I support this approach (a box to display pronunciation). I like how the romanizations are automatically generated similar to {{ko-ipa}}, and that historical hiragana is listed below for a less cluttered appearance. However, I'm not too sure regarding what needs to be removed or eliminated. Can you elaborate more on this? KevinUp (talk) 21:16, 26 August 2019 (UTC)
@Eirikr: Thanks for your reply, but I think we have reached a consensus to eliminate sortkeys, haven't we? A look at Category:Japanese nouns reveals that the current sorting scheme works poorly, but even if it works well, the mass acceleration of kana forms of kanji entries will flood the category like this: あいいろ, 藍色, あいいん, 愛飲, あいうち, 相撃ち, あいえき, 愛液, etc. If we sort entries by their usual order, i.e. kana by kana and kanji by kanji, then there will be no harm to the kana part of the entry, and you get a new way to look up entries—by kanji.
@KevinUp: For example, 必然 has two parts of speech, so the reading ひつぜん is repeated in the two headword templates. If we move it to the pronunciation section, we (1) not only make the structure of the entry more logical (e.g. kana and rōmaji before accent and IPA), (2) but also makes entries easier to maintain. If we retain it in the headword templates (while adding a pronunciation section like the one above), we get only the benefit of (1) but not (2). --Dine2016 (talk) 03:58, 27 August 2019 (UTC)
I support moving hiragana (also historical) and romaji to pronunciation sections. --Anatoli T. (обсудить/вклад) 11:34, 27 August 2019 (UTC)

Order of meanings according to their labelsEdit

Currently meanings follow no label order, so that one or several (obsolete/archaic) meanings may come before ones still used in everyday speech, hindering thereby the user's lookup. --Backinstadiums (talk) 10:07, 24 August 2019 (UTC)

For some purposes. Some users like historical order. Some like frequency, of whom some like current frequency in speech, others frequency in writings, including older writings. Some like specialized senses last. Some like literal/physical senses first. Some like grouping by grammatical properties, eg, (in)transitive, (un)countable. Unfortunately, there are many entries where these preferences conflict. We should consider ourselves fortunate that no one seems to prefer alphabetical order of definitions or ordering by the length of the definitions.
I am unaware of any work done to support any one particular approach, though many opinions have been expressed. DCDuring (talk) 16:15, 24 August 2019 (UTC)
I don't much like the in/transitive grouping: I think it's more sensible to group semantically related meanings; syntax/grammar is really secondary to what the word actually means. I have a habit of swapping ety sections and moving senses down if something obsolete and totally unused is at the top (which is curiously often the case). We won't satisfy everyone until we have a sort (by date, frequency...) feature, but we lack the data to support it. The OED marks words with a star rating to show how common they are in modern texts, which is cool, but obviously must be generated by huge amounts of data we don't see. Equinox 16:41, 24 August 2019 (UTC)
Webster's 1913 Dictionary presents the oldest, original meaning first, also when it has become quite uncommon. For example, the sense of victim as “a living being sacrificed to some deity” is the first one given. A strong argument can be made for “most common meaning first”, but a strong argument can also be made for ”original meaning first”.  --Lambiam 09:58, 25 August 2019 (UTC)
@Lambiam: as long as it follows an order, the temporal descending/ascending addition doesn't matter. victim intersperses its original sense in the second position --Backinstadiums (talk) 13:10, 25 August 2019 (UTC)

Template:infixEdit

I asked user Rua, who did a lot of the coding, and he suggested I ask here.

Could we change the display of e.g. an infix ka from -ka- to ⟨ka⟩? That's the standard format (e.g. Leipzig glossing rules). Otherwise it looks like a suffix followed by another suffix. And could we make param 1 optional, as it is with template:suffix? It's useful to be able to say 'may contain an infix ⟨ka⟩' and link to the category without providing the stem, especially if the stem is obscure.

Thanks.

We are not in the business of interlineal glossing, so I don’t see how the Leipzig glossing rules would apply. I expect that most (> 99.9%) of our casual users would not understand the significance of such angle brackets and be more likely to be confused by them than enlightened. See how the British infix -bloody- is presented here in print: “a-bloody-gain”. As to the second request, I assume param 2 is meant (the base into which the infix is inserted). There would still be a + sign.  --Lambiam 10:28, 26 August 2019 (UTC)

Need a uniform policy on handling of Latin specific epithetsEdit

(Notifying Fay Freak, Brutal Russian, JohnC5, Lambiam): User:Marontyan has been manually editing various Latin adjectives and nouns that are used as specific epithets and converting them into Translingual entries. I'm not opposed in principle to this, but

  1. We need a specific, uniform policy for handling these before editing them piecemeal.
  2. We need to agree on a format that preserves as much info as possible. Currently I include a full declensional table, which may be overkill but does clearly show the masculine, feminine and neuter nominative singular forms. For these terms I've created a template {{la-epithet}} meant to go into the Usage Notes section that indicates as far as I can tell whether the term is exclusively, mostly, or additionally used as a specific epithet, and what grammatical type it is (adjective, noun in the lemma form, noun in the genitive singular, or noun in the genitive plural). It also adds the term to Category:Specific epithets.

The tricky thing here is that a lot of specific epithets also are used at least occasionally in other Latin. This goes especially for -ēnsis terms (e.g. diplomas from Princeton are or were in Latin and used Princetoniēnsis, and the seal of the University of Arizona says "SIGILLUM UNIVERSITATIS ARIZONENSIS" on it), but may apply to other terms as well, so we need to be careful before we conclude that a given Latin term is exclusively used as a specific epithet and hence should maybe be converted to Translingual. For this reason my instinct is to leave these as Latin terms (they -- mostly at least -- follow Latin grammatical rules, after all), but others may disagree. Benwing2 (talk) 06:46, 26 August 2019 (UTC)

Per the codes of the international bodies regulating biological nomenclature, taxon names have to be grammatically correct Latin. So why classify them as “translingual”? Is that a language with a grammar? How do we know that some epithet is a translingual adjective, and not some other translingual part of speech? What about the epithets that change under gender agreement; are these Latin while the gender-invariant ones are not? Also pinging User:DCDuring.  --Lambiam 09:50, 26 August 2019 (UTC)
That’s why we shouldn’t imagine a strict distinction between Latin and Translingual. This distinction is merely a product of the minds of Wiktionary editors. The language users just use the words and do not think first which language their words are in but whether they understood, or whether they conform to any rules; or the Latin words they use can be “thought” only and code-switched to without ever having been used in Latin text. As I have mentioned elsewhere, the inflection tables – or if we are at it also the head templates – should just have a parameter that makes the tables Translingual, so not linking the forms as Latin but as Translingual or not at all (for it is then still effective for SEO if the forms are there but not linked). Then it is six and two threes whether an epithet we create is Latin or Translingual because the content will be the same. This is probably also easy to implement for @Benwing2. Like I did on situliformis. No spectacular layout, just one parameter lacking at two places. Fay Freak (talk) 12:11, 26 August 2019 (UTC)
Taxonomic Latin originated as actual Latin, and (for plants, algae and fungi at least), is occasionally still used as such for the formal description of a taxon. Mostly, though, you have nouns in the nominative and genitive as well as adjectives in the nominative and sometimes in the genitive (for parasitic species named after the host). When names aren't used in Latin sentences, accusative, dative, locative and vocative are completely absent. Generic names are nouns in the nominative singular. Names of higher taxa are generally formed by replacing the ending of the genitive of a generic name with a standard ending denoting the rank. Epithets for species and below are mostly adjectives agreeing in gender with the generic name, but also nouns in the nominative singular or plural, and nouns in the genitive singular or plural agreeing in gender and number with the referant. So taxonomic names use a subset of Latin morphology, but that subset follows Latin rules. Chuck Entz (talk) 13:59, 26 August 2019 (UTC)
After edit conflict. Some duplication of Chuck's contribution.
I can live with almost any resolution of this, including no resolution. I use {{epinew}} to allow links from inflection lines to the lemma form of specific epithets in whatever language I choose for the main entry for the epithet. Usually the epithet has an existing Latin or Translingual entry. But I sometimes declare a term that is identical in spelling to a word in, say, Old Tupi to be Old Tupi instead of creating a vacuous Latin or Translingual L2 section for the term.
Specific epithets that are adjectives need only nominative singular inflected forms. Some specific epithets are nouns and can either be nominative singular or genitive, sometimes singular, as in eponyms, sometimes plural, as in nouns referring to a habitat or host of an organism. Full declension tables are not necessary to provide information for the proper use of specific epithets.
That said, as noted above, some terms used as specific epithets have had other use. That use includes use by the Roman Catholic church in some internal communications; use in running Latin text for scientific purposes, including use outside of Linnaean taxonomy; and use in various mottos, inscriptions, and formal documents. It is usually time-consuming (and often impossible) to try to attest to such use in contexts other than taxonomic names, just as it is time-consuming (and often impossible) to attest to some rarer inflected forms of classical Latin words, especially verbs.
I would appreciate it if any resolution did not lead to time-consuming attestation efforts or revisiting each taxonomic entry. DCDuring (talk) 14:14, 26 August 2019 (UTC)
What about the following. If same case form is unattested, but no doubt can exist that this is what a native Latin speaker would have used, we just list it without qualms. So the dative plural of brooklynensis is simply brooklynēnsibus. If we need to guess but have one or a few plausible forms, we list them with an asterisk and add “ (?)”, separating alternative conjectures with slashes. If we have no idea, we just put a question mark.  --Lambiam 15:06, 26 August 2019 (UTC)
The pattern only needs to be known. The correct inflection of a word is not reconstruction; any more than it is a good idea to hunt down all Latin forms that have not yet been used because the word is only attested a few times – which starring would be random and of little value. ”So diplomarius is attested but diplomariorum is not? What a great discovery!”
What is “needed” is a weak cause for restriction here. Fay Freak (talk) 20:42, 26 August 2019 (UTC)
WT:CFI, Talk:albifrons for example, and WT:Translingual are quite clear about this:
  1. If not used in Latin, it isn't Latin (see WT:CFI, Talk:albifrons).
  2. Even if attested translingually, there can still be a Latin entry (see WT:Translingual#Other languages).
    This should also go vice versa: If there is a Latin entry, there can also be an entry for a translingually used term. (Somewhat) similar to how there is football in many languages.
Additionally:
  • Template:la-epithet is (often) wrong in Latin entries: "and thus not inflected except in ..." isn't correct for Latin. It applies to English, French etc. which isn't Latin. And even if a term isn't attested in Latin, the note can be wrong as shown in Translingual ruderalis and by the plural of Translingual Homo sapiens.
  • riobamba is wrong: It's translingually used riobambae - in Latin it would be *Riobamba as it is a proper noun. It's more obvious when comparing for example Translingual fleischmanni and Latin Fleischmanni, form of Fleischmannus.
    (Of course, the translingually used form could also occur in Latin, but than it's science-speak, rather something similar to a chemical formula like H₂O, and not usual Latin.)
  • "separating alternative conjectures with slashes": Well, bad or too dubious conjectures shouldn't be listed at all. And even the inflection of Latin amethystizon looks dubious in many ways: Most forms lack a star, and gen. sg. could also be *amethystizontos, acc. sg. m.&f. *amethystizonta, nom. & gen. & voc. pl. n. *amethystizonta, compare the Greekish acc. pl. amethystizontas instead of a Latinate *amethystizontes. And even if amethystizon would have Latin forms, why shouldn't it still be amethystizonta instead of amethystizontia similar to vetus with vetera? Anyhow, the entry lacks Greekish conjectures and might be better with no conjectures.
  • "taxon names have to be grammatically correct Latin": Which doesn't mean it is grammatically correct Latin or Latin at all. German Handy, French foot and other pseudo-anglicisms aren't English either, even if native German or French speaker might reject that such terms are classified as German, French etc.
  • "So why classify them as “translingual”?":
    • Many taxonomic terms are used in multiple languages, that is, they are used translingually.
    • "Translingual" covers international science-speak and codes (see Wiktionary:Translingual#Accepted).
    • Many taxonomic terms aren't used in Latin, and thus aren't Latin, just like pseudo-anglicisms aren't English.
  • "What about the epithets that change under gender agreement; are these Latin while the gender-invariant ones are not?": Gender and gender-agreement in case of taxonomic terms also occur translingually, even in English.
    What might be more interesting: Does the taxonomically implied gender always agree with the language's gender, is it for example always "der Homo sapiens" m or does also an incorrect "die Homo sapiens" f or "das Homo sapiens" n occur? In case of Translingual Nix Olympica the gender of Latin nix and of the translingually used term do not always match. Would be funny, if Translingual Homo sapiens is male by origin and taxonomism, but female or neuter in some language, e.g. if foreign terms (usually) become neuter in that language.
--Marontyan (talk) 20:59, 26 August 2019 (UTC)
By the mechanistic logic expressed we should redesignate as Translingual all the Latin terms ever used in a taxonomic name that has been ever used in running text in multiple languages. Limiting ourselves to adjectives, we would start with albus, niger, ruber, cyaneus and continue through the specific epithets used in the entries in Category:Species name using Latin specific epithet. We could then start on the ones in Category:Species entry using missing Latin specific epithet and then proceed to the specific epithets for the millions of other species names that we haven't entered. That should lighten the burden of adjectives that our Latinists have to maintain. DCDuring (talk) 04:17, 27 August 2019 (UTC)
@DCDuring: See WT:About Translingual#Other languages: "The classification of a term as Translingual does not prevent the article having sections for other languages". Besides Latin albus there can be Translingual albus and vice versa. --Marontyan (talk) 07:36, 14 September 2019 (UTC)
Reducing, not increasing, duplication of semantic content is the direction we generally go. Nobody voted on the content of "About Translingual". DCDuring (talk) 12:28, 14 September 2019 (UTC)

Transliteration for Mandarin, Japanese, Korean - No italics or italics depending on situation?Edit

I've been using {{zh-l}}, {{ja-r}}, {{ko-l}} much too often, which led me to think that romanizations in Chinese/Japanese/Korean are always written in italics.

My recent request at the Grease Pit made me realize that italics are for mentions, whereas romanizations in lists are usually unitalicized, as explained by User:Rua.

However, Chinese, Japanese and Korean entries using {{zh-l}}, {{ja-r}}, {{ko-l}} have romanizations in italics by default, regardless of whether it appears as a mention or in a list.

Should we apply the usual convention in Wiktionary, i.e. italics for mentions, and no italics for lists? I think making them all appear as italics by default (for Mandarin, Japanese, Korean romanizations) would be more manageable. KevinUp (talk) 17:31, 26 August 2019 (UTC)

Yes. —Suzukaze-c 19:10, 26 August 2019 (UTC)
  • Personally, I dislike "modal" display, as I find the inconsistency jarring. I would prefer that we be consistent. My ideal state for the various {{ja-r}}, {{m}}, and {{l}} templates would be for Japanese transliterations to always be italicized, and for translations to always be non-italicized.
That said, if we are to adopt different formatting for Japanese transliterations in different templates, then I would accept italics in {{ja-r}} and {{m}}, as these are commonly used in running text, and non-italics in {{l}}, as this is commonly used in lists.
‑‑ Eiríkr Útlendi │Tala við mig 16:49, 27 August 2019 (UTC)
I've noticed that most languages use {{l}} or {{der3}} for derived terms, which gives transliterations without italics. However Japanese entries use {{ja-r}} instead for derived terms, which gives transliterations in italics. This is because {{ja-r}} can also be used in running texts.
The same situation also occurs in Chinese and Korean entries, where transliterations of derived terms appear in italics even though they are lists, not mentions. KevinUp (talk) 20:49, 27 August 2019 (UTC)
I will say that the situation with Japanese is different as Romaji is actually fairly common there. No one really composes anything in Japanese with Latin characters but the Latin alphabet is definitely one of the writing systems used by the Japanese for their language, whereas that is not true with Chinese languages or Korean. Hence, italicizing to show "this isn't really what they do over there--it's foreign" is not really applicable to Japanese. —Justin (koavf)TCM 19:19, 26 August 2019 (UTC)
@Koavf: ...the Latin alphabet is definitely one of the writing systems used by the Japanese for their language - it's a completely wrong statement caused by the fact that we allow romaji entries for disambiguations. The Latin script is not a writing system for Japanese! I don't know why you keep pushing this agenda. --Anatoli T. (обсудить/вклад) 01:31, 27 August 2019 (UTC)
@Atitarev: "Keep"? "Agenda"? It is definitely true that Japanese use Latin characters regularly and not only for brand names. All schoolchildren use Romaji and it's very common for Japanese to use it. Not sure what you're going on about here. —Justin (koavf)TCM 03:11, 27 August 2019 (UTC)
Latin as a writing system for Japanese existed as far back as the 16th century as well. --Dine2016 (talk) 03:20, 27 August 2019 (UTC)
Transliterations and a helping tool for students is not a writing system. It's a great fallacy. --Anatoli T. (обсудить/вклад) 03:29, 27 August 2019 (UTC)
Tell it to the Japanese. —Justin (koavf)TCM 04:01, 27 August 2019 (UTC)
I'm telling you who made this statement. The Japanese write using the Japanese writing system and don't make such statements. --Anatoli T. (обсудить/вклад) 04:15, 27 August 2019 (UTC)
Somehow, you are under the impression that I control the Japanese education system or that I have been responsible for keyboard inputs in the 21st century. I am not. Why adopting Latin characters is "fallacious" whereas adopting Chinese ones weren't is a mystery to me... ¯\_(ツ)_/¯Justin (koavf)TCM 04:22, 27 August 2019 (UTC)
You're only responsible for your own words. Your incorrect statements don't change what the Japanese or the Chinese use as their writing system. Both the Japanese or the Chinese (can) use Latin letters to type in the native scripts in various input systems. The Japanese adopted a mixture of Chinese and native/modified writing systems but not never the actual Latin script and it's not a fallacy but just the way it is. It's the fallacy asserting that they use Latin letters as one of their writing systems, rather than a native script. --Anatoli T. (обсудить/вклад) 05:15, 27 August 2019 (UTC)
Okay. I'm not going to argue about whether or not Romaji is a fallacy. All Japanese use that system, it's very common, and therefore that is unlike Chinese and Korean populations. —Justin (koavf)TCM 05:55, 27 August 2019 (UTC)
The fallacy is your statement, not the romaji - a romanisation system for Japanese. Yes, it's one of the romanisation systems (a transliteration, an IME input, a learning medium for foreigners) but it's not a Japanese writing system. Yes, the Japanese type "t-o-u-kyo-u" on their keyboards to enter kana とうきょう and then convert to 東京 but they don't write or type "Tōkyō" or "Tokyo", it's not Japanese. "Tōkyō" or "Tokyo" are used for the foreigners. You made another mistake saying "unlike Chinese ...". The Chinese also use many romanisation systems and can use Latin letters as one of many input methods - e.g. type "beijing" and convert the Latin letters to "北京". --Anatoli T. (обсудить/вклад) 11:05, 27 August 2019 (UTC)
Diglossia and digraphia --Backinstadiums (talk) 16:14, 27 August 2019 (UTC)
@Koavf: We italicize non-English Latin script mentions too so I don't really see what point you're trying to make.
Also, I definitely think non-Mandarin transliterations should get their own templates too, it will be useful for etymologies and descendants lists particularly when the word doesn't even appear in Mandarin. —AryamanA (मुझसे बात करेंयोगदान) 11:24, 27 August 2019 (UTC)
For me, I would prefer to see the transliteration in italics if it appears right after the main form in Chinese/Japanese/Korean. So romaji listed as a headword or romanizations in {{zh-pron}} shall remain unitalicized. KevinUp (talk) 19:34, 26 August 2019 (UTC)

I found the previous discussion at Wiktionary:Beer parlour/2018/March#Unifying the display of romanisations in links and headwords: italicise romanisations by default and Wiktionary:Votes/2018-03/Showing romanizations in italics by default. KevinUp (talk) 20:51, 26 August 2019 (UTC)

I would oppose to that vote as well, because it's too general, and I prefer the output of {{l}} and {{m}} to be distinct. I think that romanizations in italics ought to be considered on a case-by-case basis, depending on the language. Also, italicized romanizations of headwords makes it appear to be less legible. KevinUp (talk) 20:51, 26 August 2019 (UTC)
@Dan Polansky Would it be possible to start a fresh vote, confined only to Mandarin, Japanese and Korean, something such as "Italic forms for romanized forms of Mandarin, Japanese, Korean which are listed after the main forms in the respective languages". KevinUp (talk) 20:51, 26 August 2019 (UTC)
Main form for Mandarin: (1) hanzi
Main form for Japanese: (1) kanji or (2) hiragana or (3) katakana or (4) combination of kanji and hiragana.
Main form for Korean: (1) hangeul or (2) hangeul followed by hanja
I'm only considering these three languages because this is the convention used by {{zh-l}}, {{ja-r}}, {{ko-l}} by default. KevinUp (talk) 20:51, 26 August 2019 (UTC)
Is there some reason I'm missing here why you have separated Mandarin from other Chinese languages (e.g. Cantonese)? Is there a reason in principle why they would be treated differently? —Justin (koavf)TCM 21:40, 26 August 2019 (UTC)
The reason is because many Chinese languages share the same headword and the default romanization of {{zh-l}} (zh for Chinese language) is currently in Mandarin. An editor was interested to have Cantonese, Min Nan, Hakka romanizations for {{zh-l}} but that discussion did not went well - see Talk:吃飽.
Currently, we don't have templates such as {{yue-l}} or {{nan-l}} or {{hak-l}} to display transliteration of these languages. My opinion is that the choice of italicization ought to be done on a case-by-case basis. KevinUp (talk) 22:53, 26 August 2019 (UTC)
@KevinUp: And would there be any outstanding reason why things would be different for Hakka, Min-Nan, Yue, etc.? —Justin (koavf)TCM 23:02, 26 August 2019 (UTC)
I suppose we could do the same for Cantonese, Min Nan, Hakka, etc. It's just that currently we don't list transliterations for words in these languages after its hanzi form. See the synonyms section of 吃飽吃饱 (chībǎo) for example. 食飽食饱 is written there without any transliteration because the lemma could be either Cantonese, Min Nan, Min Dong or Hakka and editors are unsure which should be listed and in what order. KevinUp (talk) 23:33, 26 August 2019 (UTC)
What order to list things in is a pretty easy thing to resolve, especially in a dictionary: alphabetical order (Hakka, Mandarin, Yue...). Thanks. If I were more knowledgeable about Chinese languages, I'd have something more valuable to add but in principle, I think we should definitely have all the standard romanization/transliteration schemes and for the spoken languages that are encoded as written Chinese, these will all be dramatically different. —Justin (koavf)TCM 00:22, 27 August 2019 (UTC)
Alphabetical order can be very subjective. Yue could also be Cantonese. Min Nan could also be Hokkien. --Dine2016 (talk) 03:12, 27 August 2019 (UTC)
@Dine2016: Then you just choose one as the name and alphabetize that. Alphabetical order itself is not subjective. —Justin (koavf)TCM 03:14, 27 August 2019 (UTC)

I've noticed that most languages use {{l}} or {{der3}} for derived terms, which gives transliterations without italics. However Japanese entries use {{ja-r}} instead for derived terms, which gives transliterations in italics. This is because {{ja-r}} can also be used in running texts. The same situation also occurs in Chinese and Korean entries, where transliterations of derived terms appear in italics even though they are lists, not mentions. KevinUp (talk) 12:14, 28 August 2019 (UTC)

Relax SOP a bitEdit

Contrary to rescinding WT:COALMINE (which has been proposed, and for which the proposal has drawn overwhelming opposition), we should move more in the direction of relaxing SOP a bit, and including more common collocations that don't fit neatly into current SOP exceptions. Obviously we shouldn't have an entry like yellow floppy disk to define "a floppy disk that is yellow", but if a phrase occurs for which some reasonable argument can be made that, for example, it relies on unintuitive senses of the words involved, or has component words using one of a number of possible senses. Basically, we should spend less time discussing these at RfD. I'm sure someone is going to suggest that this would impose an increased burden of maintaining those entries, but with over six million entries, the cost of maintaining any additional harmless entry is less than 1/6,000,000th the cost of maintaining the dictionary as a whole. Cheers! bd2412 T 02:14, 28 August 2019 (UTC)

I agree that we ought consider allow one SoP entry once we agree on the revised criteria for including that entry. DCDuring (talk) 02:24, 28 August 2019 (UTC)
@BD2412: Agreed in principle but do you have any specific examples? That will help others understand if they want to agree with what you're proposing. —Justin (koavf)TCM 02:31, 28 August 2019 (UTC)
Just from the current list at WT:RFD, I think prison gang, race traitor, bamboo suit, lower lip, evil spirit, and snap election would be spared the trouble of being litigated. bd2412 T 02:38, 28 August 2019 (UTC)
Based on those examples I think the current policy is just fine. The ones which I think ought to be kept are likely to be kept, and the ones I think ought to be deleted are likely to be deleted. I would suggest that our current policy is probably too lax, or really that it has a bunch of loopholes that allow for entries which provide little value (or even detract). Just like in the COALMINE discussion/vote I would be in favor of amending the policy, but without an alternative being proposed I couldn't say how I might vote. I am not sure that any (reasonable) change to the policy would actually reduce the amount of RFD discussion, it would just shift the border and we would have all of the discussions about a different class of entries. - TheDaveRoss 12:46, 28 August 2019 (UTC)
Actually, in this case, we have a zero-sum game. If there are a thousand entries that could reasonably be proposed for deletion under the current rule, and relaxation of that rule reduces the number of entries possibly subject to deletion to, say, 750, then we would not merely be debating the deletion of a different thousand entries. Presuming that potential problems are found and considered for deletion at the same rate as they are now, those actually proposed for deletion would drop by a quarter, as would all the work that goes into discussing, closing, and archiving those additional 250 proposals. bd2412 T 02:02, 30 August 2019 (UTC)
The problem being that, if we relaxed the rule, people would create entries that they would not create today, and those would be added to the 750 to get us back to 1000. Obviously there may be more or less than right now, but just because some portion of current debates wouldn't happen that doesn't mean that new debates wouldn't arise. - TheDaveRoss 13:28, 30 August 2019 (UTC)
I don't get the sense that people look at the rules at all before creating entries. I think it's a non-factor. bd2412 T 00:04, 31 August 2019 (UTC)
But we look at the rules when we do RfD until we internalize them. The RfD process and its references to CFI transmits it to to newer contributors and probably serves to discourage those who don't have the required attitude and values for this kind of activity. DCDuring (talk) 02:36, 31 August 2019 (UTC)
That is possible, but some of the terms brought up for discussion were created years ago (prison gang, for example, was created in 2005), so there have to be other factors involved besides the inside baseball of our deletion processes. This should not discourage us from explicitly allowing more common collocations, rather than arguing over them and making exceptions to allow them. bd2412 T 03:16, 31 August 2019 (UTC)
How does the fact that many entries have not had contributors' eyes set on them for a decade or more have any bearing on this? At most that says that it is useful to take advantage of Recent Changes to catch bad entries when they are first made. DCDuring (talk) 14:02, 31 August 2019 (UTC)
The entries we argue about have often been here for years. These are not being newly minted in response to changes in the rules. bd2412 T 18:00, 31 August 2019 (UTC)
@BD2412: So what? DCDuring (talk) 03:46, 1 September 2019 (UTC)
It's hard to tell whether entries that haven't been "set eye on by contributors for a decade" are useful to passive users who never voice a comment. I sincerely hope any entries I make, SoP or not, are useful to that potentially large group. DonnanZ (talk) 18:29, 31 August 2019 (UTC)
@Donnanz: I wasn't speaking of users, I was speaking of contributors, ie, editors. I don't know about whether SoP entries are useful to any users at any stage of the language learning process, though I think they are less essential than non-SoP entries. DCDuring (talk) 03:46, 1 September 2019 (UTC)
"I don't get the sense that people look at the rules at all before creating entries. I think it's a non-factor." I can't believe a real-world lawyer said this. Ignorance of the law is no excuse! -- etc. I think it's very important that people can bitch about how they dislike the rules, but the rules are the gold standard. If your only point at RFV/RFD is "well this rule sucks" then really you ought to arrange rediscussion of the flawed rule, not deliberately and knowingly vote contrary to it. (Okay I've probably done that once or twice myself, since votes are so time-consuming, but I didn't feel great. StackOverflow ate me alive when I posted a comment as an "answer" because their nutso system didn't let me post comments yet.) Equinox 23:25, 1 September 2019 (UTC)
I oppose this in advance. Canonicalization (talk) 12:34, 1 September 2019 (UTC)
I also oppose, but I would like to see some way of handling common collocations better. I think we should include them, but not in the mainspace. Andrew Sheedy (talk) 18:55, 7 September 2019 (UTC)

Categories: what kinds of entries should they go in?Edit

I've been chewing on this question in the back of my mind for a while.

Japanese entries almost always have multiple forms, due to the complexities of the writing system. As one example, take 重箱読み (jūbakoyomi, a specific kind of reading of a Japanese kanji compound). The lemma form is 重箱読み, the historical (pre-spelling-reform) kanji form is 重箱讀み the hiragana form is じゅうばこよみ, the historical hiragana form is ぢゆうばこよみ, the romaji form is jūbakoyomi (or possibly jūbako-yomi, depending on convention), the katakana form is ジュウバコヨミ, and the historical katakana form is ヂユウバコヨミ.

The thing itself is labeled as {{lb|ja|linguistics}}. But which of the entries for these various forms should be included in Category:ja:Linguistics?

  • Only the lemma form?
In general, all entry details should be consolidated into the entry for the lemma form, and entries for alternative (non-lemma) forms should be bare-bones soft redirects, as I've understood things so far.
  • All forms?
In the interests of usability and discoverability, the argument can be made that all forms should be categorized, to aid users who might look up a given form in a category index, without knowing which form is the lemma.
  • Some other selection of forms?
I can't think of a use case myself, but others might have ideas about including only some forms in categories, but not all of the forms.

Any insights appreciated. ‑‑ Eiríkr Útlendi │Tala við mig 22:23, 28 August 2019 (UTC)

I would like all of the forms (minus the rōmaji forms) to be included for discoverability and also because the determination of what is the lemma form (main entry) and alternative form in Japanese can be rather subjective especially when it comes to rare or technical terms. KevinUp (talk) 22:56, 28 August 2019 (UTC)
@Eirikr: If we include in Category:ja:Linguistics only the lemma forms, then it makes sense to sort the entries by reading. If we include all forms, then it makes sense to eliminate sortkeys and sort kana by kana and kanji by kanji. I was (and still am) in favor of the latter approach so I made {{ja-see}} copy all categories. If we use the former approach it will probably make things more difficult (for example, we need more complex rules to determine which categories to copy and which not). --Dine2016 (talk) 03:47, 6 September 2019 (UTC)
@Dine2016: One concern I have about indexing the categories by kanji is collation on the one hand, and lookup on the other. How are kanji entries ordered? And is it possible to add some kind of lookup for kanji, alongside one for kana, alongside one for the Latin alphabet? ‑‑ Eiríkr Útlendi │Tala við mig 19:25, 6 September 2019 (UTC)

Should definitions be formatted like sentences?Edit

Hello! Do you like worms? And do you like convenience foods that can be kept on the shelf for years until you need them, without the hassle of refrigeration? Then you'll love CAN OF WORMS. Let's open it.

Should definitions be formatted like sentences? This doesn't mean they must grammatically be sentences ("A fruit that grows on apple trees" is not a sentence) but it means they would start with a capital letter and end with a full stop/period. I have been pondering this because I keep seeing people taking an edit as an excuse to impose their favourite style. For example, for months, myself and my evil twin User:SemperBlotto have been removing and adding the full stop (I add, he removes): it's become a sort of joke to me now, although the subject has never been aired between us: I've seen hundreds of edits of this kind. Likewise, User:Embryomystic seems to take any opportunity to change "A fruit." to "a fruit", and has sometimes revised basic words to change dozens of sense lines to that style.

Thoughts:

  1. Does it matter at all? Well yeah, I think consistency matters, in the same way that a newspaper has a "house style", or a software development company. Partly it gives us more of a brand and more consistency, rather than looking like some vague shit that was slapped together by randoms (ahem), and partly agreeing on this stuff means that we can stop wasting our time on it and work on the actual important things.
  2. What about definitions longer than one sentence? These are rare and can often be rephrased, but they do occur: something like a techy maths entry that says "A member of a subgroup X such that Y=Z. X may be either A or B." Writing these without caps would be a very weird violation of English style, regardless of our own conventions. So if we are going to support multi-sentence definitions, then for consistency we should probably make all definitions sentence-like.
  3. What about non-English stuff and translations? This is a biggie, since usually our English definitions are complex phrases ("the purple fringe of a king's cloak") whereas foreign defs are often one-words translations like just "apple". There is some argument to separate them though since, if you look at (say) fr.wikt, that does the same thing for its own language: words in the default lang (French) get long detailed definitions while others mainly just get quick translations. Basically these do not serve the same purpose: the primary language is trying to explain X in language X, in detail, whereas the secondary language is trying to give you the right word for conversation, phrasebook-style.

Who wants a vote on this? Actually that's premature. Opinions and thoughts would be welcomed.

Equinox 10:40, 30 August 2019 (UTC)

  • I think it is a matter of style. The OED does capitals and full stops, but it also puts hyphens in where they are not needed. Nobody's perfect. I don't go out of my way to change things but I remove full stops (if I remember) if I am adjusting the entry for any other reason. SemperBlotto (talk) 10:50, 30 August 2019 (UTC)
  • It is a matter of style but I'm proposing that we should have a house style rather than keep reverting each other. With computer technology we could, I suppose, present entries capitalised however viewers want them, but that would involve more markup and how many users would actually bother changing settings from the default? Equinox 11:21, 30 August 2019 (UTC)
  • IMHO as an extremely experienced user, it doesn't matter at all. For the second point, I'd like to mention that Simple English Wiktionary has complete sentences in all their entries - and that is a nice style. --Mélange a trois (talk) 11:11, 30 August 2019 (UTC)
  • I assume that we are talking only about English definitions. FL "definitions" typically have neither initial capitals, nor terminal periods. But they also are not definitions, but rather suggested single-term translations, often without a disambiguating reference to a particular definition of a polysemic English word.
I vastly prefer English definitions to be formatted as sentences.
I also find it is never impossible (though sometimes impolitic) to reduce a multi-sentence definition to a single-clause definition. I take multiple sentences as an indication of probable encyclopedic content. DCDuring (talk) 15:22, 30 August 2019 (UTC)
Definitions without periods are an extremely convenient way to find SemperBlotto's entries that have never been touched by anyone else. DTLHS (talk) 16:25, 30 August 2019 (UTC)
  • As a (possible?) counterpoint to DCDuring's description of foreign-language definitions, sometimes a one-word gloss is all that's needed -- say, for a simple concrete term like Japanese (inu, dog). However, for less-concrete terms, the concept of the term in one language may not have any direct analogue in English. I ran into that just yesterday with Japanese (nata), which is a general term for a kind of one-handed thick-bladed tool of any of a wide array of shapes and sizes. This could be called by various things in English depending on the specific type of nata, including such varied terms as hatchet, billhook, machete, froe, and probably a few others that didn't occur to me.
IFF we adopt the style of "full sentence format for all sense lines", we run into some issues:
  • Do we have to rewrite single-word glosses? For Japanese (inu, dog), how do we make a sentence out of the "dog" sense without making a mess of it?
  • Do we keep single-word and other very-short glosses as non-sentences, and only use sentence format for long senses, as at Japanese (nata)?
That might make some sense, but then we have inconsistent formatting even within a single language's entries.
IFF we don't adopt this style for non-English terms, and maintain the status quo, then we have inconsistent formatting between English and non-English entries. This is what we've had for years, so I don't anticipate any serious issues arising, other than the usual new-editor confusion. ‑‑ Eiríkr Útlendi │Tala við mig 17:30, 30 August 2019 (UTC)
We would write the "dog" sense as "Dog." But I oppose sentence style for FLs. Andrew Sheedy (talk) 17:09, 2 September 2019 (UTC)

These dots have been invented to separate sentences from each other. Since the glosses are kept separate by their formatting, the full stops are excess of information that diverts from the actual content. Think about the glosses about lists – should list entries ends with dots? No. They are a childhood disease of Wiktionary, arisen in a time when people had less experience how content on the internet is presented, thus English entries have them, as begun earlier, while foreign languages do not. – Another purpose of the dot is to signify that here the sentence ends, or depending on the language that this is no exclamation or question, this is why messages, like tweets, commonly end with it. But this dictionary glossing is not writing letters or texting. Just think: What do I want to signify with that mark? I don’t see anything: to signify “done” I save. Any entry should regularly be in a state of doneness, no dot needed to show off.
And the starting with a capital letter actually causes information loss since there is sometimes a distinction by capitalization in English, and the message gets lost not only because of diversion but also because the capital form of a word not lexicalized as bearing a capital letter is less iconic. For the apprehension of the dictionary content to be as fast as possible we should not capitalize or add dots mechanically, and the same for the sake of new editors, whom we want to spare arcane distinctions. Those talks about whether something “is a sentence” are intrinsically vain – the presentation should be utilitarian. Again the question: What do you want to signify with that, you who defends the dot and the capital letter? Fay Freak (talk) 23:23, 30 August 2019 (UTC)

Your preferences could be reconciled with standard practice in English entries by having a copy editor bring your work into conformity with the standard, such as it is. DCDuring (talk) 20:09, 2 September 2019 (UTC)
This isn't purely a question about "dots". And you might know when you finished typing and saved, but we don't, so the dot shows us that you really did mean to finish, and it wasn't an accidental save or a corrupted entry. Equinox 00:35, 2 September 2019 (UTC)
It might be in print so, but on the computer one writes a thing, then goes back to middle and adds something, and also edits do not only contain glosses. The glosses are not like programming statements that end with a semicolon. And even in programming there are languages the statements of which end with line breaks merely. Also it does not show anything 1. when one is constantly confused because in foreign languages one does not add dots 2. because one only adds the dot because it is elsewhere so, so it does not signify anything. Don’t tell me what it shows when I put the dot: It shows decidedly nothing: I do not understand it to have any meaning, hence it has none. And it cannot have any meaning. It can never acquire any meaning. I will never use it to signify anything. Fay Freak (talk) 19:46, 2 September 2019 (UTC)
I strongly support having some sort of consistency, especially sentence style for English (and maybe Translingual) and lowercase/no period for other languages. One of the things that drew me to begin editing Wiktionary to begin with was a desire to increase consistency in style, since inconsistency drives me nuts (I'm very detail-oriented and borderline OCD). I was disappointed to learn that I couldn't enforce any one style. Andrew Sheedy (talk) 17:09, 2 September 2019 (UTC)
I on the other hand oppose treating English differently from other languages. There is no reason to have one style for English and another style for all other languages. If the same definition were written in both an English and a German entry, it should be formatted the same way too. —Rua (mew) 19:54, 2 September 2019 (UTC)
But evidently most contributors to FL entries intend only to provide translation terms, not extensive definitions. Do you think we should replace the offending non-definition glosses with {{rfdef}}? DCDuring (talk) 20:04, 2 September 2019 (UTC)
Those translation-only intentions are cancerous. Apart from being shallow and inaccessible to verification for being underdefined, they are constantly duplicated and triplicated and have definitions spread over five lines what is only one meaning, based on some Anglo-centric version of reality according to which there are different meanings if the translations vary.
I can not exaggerate how oblivious people are in their contributions, and not but affirm that what many people do is just wrong. For a word that just means “zero, naught”, people “gloss” in five lines:
zero
# cipher
# dot
# nought
# naught
I sometimes add {{rfdef}} or {{gloss-stub}}, with luck ping editors, because there is nothing but a polysemic “translation” that does not bewray even the rough context. For instance breastplate has four distinct meanings but our Georgian editor gave but “breastplate” for გულსაფარი (gulsapari), and it turned out to have the meaning of four I least expected.
There shouldn’t be any different conceptual approaches in working-language and foreign-language entries but the definition lines are to give an idea or ideas what the term means. If you have hitherto seen these kinds of lines as “translations” you have thought it wrong: they only look like translations often because often, English equivalents (“translations”) unambiguously enough convey the meaning the dictionary editor is ordained to give an idea about; but this is not to be elevated to a principle or essential opposition to the wises in which English entries are to be glossed. Fay Freak (talk) 23:18, 2 September 2019 (UTC)
The way I usually handle FL entries is "translation (explanation of translation, with language specific information included (i.e. not just taken straight from the English entry)). Most FL entries are not formatted like sentences (the vast majority aren't, in fact), so I oppose drastically changing the status quo. I don't think there's any intrinsic issue with it; it helps distinguish the fact that most FL entries are giving primarily translations, whereas English entries are primarily supplying definitions. Andrew Sheedy (talk) 01:35, 3 September 2019 (UTC)
  • Capitalize and punctuate just like any other complete sentence. bd2412 T 02:50, 3 September 2019 (UTC)
    Very few English definitions are complete sentences. They are typically phrases, sometimes with subordinate clauses. DCDuring (talk) 21:14, 6 September 2019 (UTC)

September 2019

Unified approach for Korean hanja entriesEdit

Using the entry at (ju) and (su) as an example:

  1. Should we have separate etymology sections for every hanja in hangeul entries? Many of these are only used as affixes rather than unbound morphemes, and some entries such as (i) can be assigned to as many as 250 hanja.
  2. Would it be better to set a criteria, e.g. only create individual etymology sections at hangeul entries for basic hanja or for those that have entries in major Korean dictionaries?
  3. Where would Sino-Korean compounds be listed to prevent duplication of content? At the hangeul entries or the hanja entries? KevinUp (talk) 00:28, 1 September 2019 (UTC)

Merge Middle Korean hanja and modern Korean hanjaEdit

Modern Korean dictionaries do not distinguish between Middle Korean hanja and modern Korean hanja. Using the entry at 顋#Korean as an example:

  1. Shall we merge Middle Korean hanja and modern Korean hanja under a unified Korean header using the format of ?
  2. Is the {{hanja form of}} template suitable for the definition line of such entries?

Note that hanja is used more frequently in Middle Korean literature compared to modern Korean literature, but readings are only available in modern Korean because they are not explicitly stated in Middle Korean literature.

Please state here if you oppose a unified approach for Korean hanja entries. KevinUp (talk) 00:28, 1 September 2019 (UTC)

Article layout revisitedEdit

Previous discussion: Wiktionary:Beer parlour/2018/November#confusing article layout, Wiktionary:Beer parlour/2016/November#Rethinking the approach to the presentation of senses

As of 2019, what are the community's thoughts on an approach similar to User:Wyang/zh-def?

I like the distinct background color which makes definitions easier to find. Some languages (not all) may benefit from a single "definitions" header.

Currently, Chinese Han character entries which uses a single "definitions" header does not indicate whether a particular definition is a "noun", "verb", "particle", etc. and would benefit from proper categorization.

Comments are welcome. KevinUp (talk) 03:20, 1 September 2019 (UTC)

I support the layout 100%. I don't support putting everything on a page into 1 template. DTLHS (talk) 03:24, 1 September 2019 (UTC)
Putting everything on a page into one template - This would affect only the definitions. Other templates can still be used within this "definitions" template. KevinUp (talk) 07:10, 1 September 2019 (UTC)
I generally like the layout or at least an approach that is more beautiful and I also like having data structured in templates. I do not like expanding the width 100% (e.g. what happens with pictures or other media?) and having things collapsed--this is not accessible to users. —Justin (koavf)TCM 04:12, 1 September 2019 (UTC)
Yes, the collapsible approach is perhaps not that practical. Some of us might be looking for something specific and collapsing everything would cause some information to be hidden when CTRL+F is used. KevinUp (talk) 07:10, 1 September 2019 (UTC)
Not handy for search and basic display but also not useful for users who have scripts disabled or who use screen readers/text browsers or who have certain sensory motor issues that make tapping on a million links to display content on a page a real chore. —Justin (koavf)TCM 07:15, 1 September 2019 (UTC)
Well, we could apply visibility options such as "Show derived terms", "Show quotations" similar to what we currently have on the desktop site. KevinUp (talk) 08:02, 1 September 2019 (UTC)
Sure, but I am opposed to all of the collapsing content that we have now for the same reason. To be sure, entries like set or a are going to be long: that's the nature of those sorts of entries. Making things inaccessible with collapsing content (even for Finnish declensions that I am never going to look at, let alone understand) is just bad practice. JavaScript is great but it shouldn't be mandatory for interacting with basic text like this. —Justin (koavf)TCM 08:56, 1 September 2019 (UTC)
@Koavf: Is the collapsible content inaccessible without JavaScript? My impression is that it only disappears when the JavaScript code runs. — Eru·tuon 16:19, 1 September 2019 (UTC)
@Erutuon: Turn off scripts and everything is expanded by default (which is good). Non-script users will have no problem seeing this content. —Justin (koavf)TCM 17:27, 1 September 2019 (UTC)
@Koavf: Hmm, I thought you were saying that users without JavaScript wouldn't be able to see collapsible content; maybe I misread you. I think collapsible content is collapsed by default for new visitors. What if it were expanded? Then users who have difficulty with the buttons wouldn't have to click anything to see content, but would if they wanted to be able to scroll more quickly. Perhaps it would be optimal if various categories of content were shown or hidden based on which state would lead to less clicking, but I don't know how to get that information. — Eru·tuon 17:56, 1 September 2019 (UTC)
You did not: I was just sloppy. Basic functionality shouldn't be based on scripts unless it's really something dynamic. The site we have doesn't include interactive elements like a game or anything that really needs to change state in front of someone's eyes or based on his inputs: it's a reference work made up of text with some accompanying media. Scripts just to collapse things that are a mild nuisance to scroll past are just a bad idea. It's generally easier to hit "Page Down" or smash the space bar a couple of times (these don't require very fine motor skills) to go past something you don't care about than it is to tab over to the little arrow that will expand the box or click on it. Finding data would be difficult and informative but I would still be in favor of not hiding anything that is the actual content of the dictionary (but I'm fine with the option of allowing it to be collapsed based on user interaction or preferences--unfortunately, our "expand all declension tables" preferences don't stick around at the moment.) —Justin (koavf)TCM 18:21, 1 September 2019 (UTC)
I would also prefer for lists (not tables) to be expanded by default with an option to hide it if one wishes to do so. KevinUp (talk) 18:28, 1 September 2019 (UTC)
@Koavf: When you click the "show x" or "hide x" buttons in the "Visibility" menu in the sidebar, the resulting state is saved in your browser. It's not saved on a per-user basis though; do you mean that the state changes when you switch between browsers? — Eru·tuon 20:02, 1 September 2019 (UTC)
@Erutuon: No, using the same browser, it eventually goes away as a preference. It would be better if it were an actual user preference. —Justin (koavf)TCM 20:32, 1 September 2019 (UTC)
@Koavf: That sounds like a bug. The setting is saved in localStorage (source code in MediaWiki:Gadget-VisibilityToggles.js), so it shouldn't go away. I am not sure how to add it in Special:Preferences if that's what you mean. One difficulty with having a checkbox for each category of visibility toggle is that there isn't a set number of categories (synonyms, translations, inflection, derived terms, etc.); they are generated based on section headers or the contents of HTML tags in the parser output. (In MediaWiki:Gadget-defaultVisibilityToggles.js, the category is the first argument to window.VisibilityToggles.register.) — Eru·tuon 20:49, 1 September 2019 (UTC)
Disgusting. Hard no. --{{victar|talk}} 18:06, 1 September 2019 (UTC)
Could potentially go for something like this. It's hard to judge from a Chinese entry since I don't understand that language. We would also need to be careful about what we hide/collapse by default and what we don't (and possibly tie that into individual user settings). Oh yes, and I agree with whoever made a fuss about JavaScript-less users. It should remain readable in Lynx etc. (it doesn't have to be beautiful, as long as we show all the content to those clients rather than some unusable JS placeholder). Equinox 10:00, 5 September 2019 (UTC)
Additional point I just remembered: quite a large number of people are colour-blind (in one way or another) and it's hard to find sets of colours that will suit all those different colour-blindnesses. With graphs and charts, you can ameliorate this by using texture (red spots, blue stripes, green crosses), but with text you can't do a lot. So we shouldn't rely on colour alone to indicate anything: it should only be a bonus hint, and also needs to have strong contrast with the background. Equinox 11:55, 5 September 2019 (UTC)
I do like the look of this, I would like to see an expanded version which would demonstrate how other key parts of an entry would be handled (e.g. etymology, translations). While I don't like using a single uber-template for this sort of thing, the benefits of having the data in the entry organized in a machine-readable manner may outweigh the costs of such a method. - TheDaveRoss 11:58, 5 September 2019 (UTC)

Code for comparisonEdit

New code
{{zh-def
|n|[[sugar]]
|syn: 食糖
|ant: 鹽
|x1: {{zh-x|糖尿病|[[diabetes]]}}
|x2: {{zh-x|糖{tong4}水|[[sugar water]]|C}}
|-
|n|[[candy]]; [[sweets]]
|mw: m:塊-“piece”,c:嚿-“piece”
|syn: 糖果
|x1: {{zh-x|棒棒糖|lollipop|C}}
|x2: {{zh-x|糖 食 得 多 冇益。|Eating too much '''candy''' is unhealthy.|C}}
|-
|n|{{zh-alt-form|醣|[[saccharide]]}}
|lb: organic chemistry
|x1: {{zh-x|多糖|polysaccharide}}
}}
Current code
# [[sugar]]
#: {{zh-x|糖尿病|[[diabetes]]}}
#: {{zh-x|糖{tong4}水|[[sugar water]]|C}}
# [[candy]]; [[sweets]] {{zh-mw|m:塊|c:嚿}}
#: {{zh-x|棒棒糖{tong4-2}|lollipop|C}}
#: {{zh-x|糖 食 得 多 冇益。|Eating too much '''candy''' is unhealthy.|C}}
# {{lb|zh|organic chemistry}} {{zh-alt-form|醣|[[saccharide]]}}
#: {{zh-x|多糖|polysaccharide}}

====Synonyms====
* {{sense|sugar}} {{zh-l|食糖}}
* {{sense|candy}} {{zh-l|糖果}}

====Antonyms====
* {{sense|sugar}} {{zh-l|鹽}}



















I copied the code from User:Wyang/zh-def#Code so that other users can comment on the approach rather than the appearance.

Some languages (not all) may benefit from such a structure. KevinUp (talk) 18:28, 1 September 2019 (UTC)

I have a feeling this styling can be done with CSS and JS, rather than having to put so much load on Lua modules. —AryamanA (मुझसे बात करेंयोगदान) 01:35, 11 September 2019 (UTC)
I love it! Even if without hide/show, if everything is shown, it is great. I like the little buttons: Synonyms, Example... sarri.greek (talk) 09:36, 12 September 2019 (UTC)
Please don't do this. It will be a maintenance and (editor) usability nightmare. Individual templates are easier to understand, composeable and potentially cacheable. The proposed solution nests templates and has parameters inside parameters, with its own syntax. Also, I don't understand the point of "this would affect only the definitions" – definitions make up the bulk of the dictionary. Instead of moving the data into templates we should be looking at moving data to a Wikibase instance (in the long term). Jberkel 17:52, 17 September 2019 (UTC)

Moving forwardEdit

If this is going to go anywhere at all, I feel that we need to put some work into creating several hundred examples (with complex entries) of the proposed format: pages with multiple etymologies, pages with multiple pronunciations, pages with a single sense, inflected entries. Otherwise it's impossible to see the edge cases and the potential amount of effort it will take. DTLHS (talk) 17:20, 17 September 2019 (UTC)

Is there any way to do this without using a module to do the heavy lifting? If not we should test very large entries as well since we run into Lua errors frequently and this will potentially exacerbate that issue. - TheDaveRoss 17:25, 17 September 2019 (UTC)
We don't know until we have actual examples to work with. DTLHS (talk) 17:57, 17 September 2019 (UTC)
I think that the Lua memory issue has gotten out of control. I'll point this out to meta:Community Tech when the 2020 version of meta:Community Wishlist Survey 2019 is available. KevinUp (talk) 18:24, 17 September 2019 (UTC)
Overall, the comments regarding this proposed format are positive. The colors will need to be tweaked and collapsibility made expanded by default. Anyway, the closest example we have for the appearance of entries using this format can be found at entries such as かん (kan) and とうきょう (Tōkyō). This is just an example of how entries might look in future if we decide to implement such an approach. KevinUp (talk) 18:24, 17 September 2019 (UTC)
There's definitely a long way to go before this actually gets implemented. We could perhaps test this out with Chinese Han character entries, which has already replaced the parts of speech header by a single definitions header. (I would like to see more precise categorization of Category:Mandarin nouns, Category:Cantonese nouns, etc.) KevinUp (talk) 18:24, 17 September 2019 (UTC)
I am not talking about changing anything in the mainspace. You should create examples in your own user space. And especially you need to create examples with more than just Japanese and Chinese entries. DTLHS (talk) 18:27, 17 September 2019 (UTC)

Requesting language code for Middle JapaneseEdit

Previous discussion at Wiktionary:Beer parlour/2018/February#Middle Japanese, https://en.wiktionary.org/wiki/User_talk:Poketalker#Template_%7B%7Bbor%7Cja%7Cltc%7D%7D

Category:Japanese language currently lacks an ancestor, Middle Japanese, which can be further broken down into:

  1. Early Middle Japanese (800 to 1200AD)
  2. Late Middle Japanese (1200 to 1600AD)

This is because there are no ISO language codes for Middle Japanese. Therefore I would like to propose three new language codes for:

  1. Middle Japanese - ja-mid
  2. Early Middle Japanese - ja-mid-ear
  3. Late Middle Japanese - ja-mid-lat

By having these language codes we are able to create categories such as:

  1. Category:Middle Japanese terms with quotations
  2. Category:Middle Japanese reference templates
  3. Category:Early Middle Japanese terms borrowed from Middle Chinese
  4. Category:Late Middle Japanese terms borrowed from Early Mandarin

Technical considerationsEdit

These three languages can be designated as etymology-only languages because Middle Japanese is already merged with modern Japanese based on current practices. KevinUp (talk) 03:20, 1 September 2019 (UTC)

The etymology language codes can be created, but unfortunately categories starting with an etymology language name aren't supported. That is, templates can't categorize into them and there aren't category boilerplate templates for them. For instance, {{der}} only accepts an etymology language code as its second parameter (the language from which a term is derived), not first. Changing this would at least allow for more specificity in etymologies.
Middle Japanese can't treated as an ancestor of Japanese if it is an etymology language with Japanese as its parent. It doesn't make sense for a language to descend from a subvariety of itself. (That sort of relationship makes Module:family tree crash with a stack overflow, and it breaks the link to further-back ancestors. I tested this by making grc-koi, Koine Greek, an ancestor of grc, Ancient Greek, and previewing some pages. In ἐπί, Ancient Greek was not seen as a descendant of Proto-Indo-European anymore. I suppose this would be fixed by giving the etymology language an ancestor, though.) It would make sense for Modern Japanese (an etymology language) to descend from Middle Japanese, though without a category for Modern Japanese terms inherited from Middle Japanese, this relationship would only be used in family trees, if Module:family tree would display it. — Eru·tuon 08:03, 1 September 2019 (UTC)
Thank you for looking into this. If we can't create categories for etymology-only languages, I think (1) Middle Japanese will have to be designated as a full language code with Old Japanese as its ancestor and Japanese as its descendant. As for (2) Early Middle Japanese and (3) Late Middle Japanese, these two languages can be set as etymology-only languages with Middle Japanese as their ancestor.
Meanwhile, Category #3 and #4 above can be replaced by Category:Japanese terms derived from Middle Chinese and Category:Japanese terms derived from Mandarin (derived instead of borrowed and not that specific). KevinUp (talk) 18:28, 1 September 2019 (UTC)
At the moment, Middle Japanese can only have scripts if it is given a full language code. In any case, if it were an etymology language, its scripts couldn't be used anywhere but in etymology templates.
With Middle Japanese as an etymology language, Module:etymology would currently allow Category:Japanese terms derived from Middle Japanese (not to express an inheritance relationship, but the situation where one language borrowed from a second language, which borrowed from a subvariety of the first language), but would not allow Category:Japanese terms inherited from Middle Japanese, because it resolves an etymology language to its parent before checking that the first language can inherit from the second. So "Japanese inherited from Middle Japanese" is resolved to "Japanese inherited from Japanese", which the module objects to. If Middle Japanese is not made a full language, two ideas: allowing a term in one language to be inherited from a subvariety of the language, or allowing etymology languages in both positions of the derivation relationship (Category:Modern Japanese terms inherited from Middle Japanese instead, which makes more sense than Category:Japanese terms inherited from Middle Japanese). — Eru·tuon 21:23, 1 September 2019 (UTC)
I moved your comment below up here in case you haven't read my reply above. I think Middle Japanese will have to be made a full language, like how Middle Chinese is made a full language to avoid the complications you mentioned above. KevinUp (talk) 21:36, 1 September 2019 (UTC)
mid is the code for Mandaic, so mid-anything is not appropriate for a code for anything but a variety of Mandaic.--Prosfilaes (talk) 19:25, 1 September 2019 (UTC)
Thanks for pointing this out. I've changed the proposed language code to ja-mid instead. KevinUp (talk) 21:36, 1 September 2019 (UTC)
  • Various thoughts.
  1. Will we also create a code for Early Modern Japanese? Broadly speaking, "modern Japanese" can be dated from around the mid-to-late-1800s with the fall of the Edo Shogunate and the rise of the Meiji, the opening of the country and the influx of foreign words and concepts, the repurposing of existing words for new meanings, and the deliberate forging of new vocabulary in an attempt to modernize and standardize the language.
  2. Do we really need to make these new codes into full-fledged, separate and distinct languages, with their own entries and template infrastructure and the like? This seems like the wrong way to work around what seems to be a minor technical issue with the etym inheritance implementation.
I'll hazard a guess to say that most of the entries that we might put in the proposed new "language" headings for Early and Late Middle Japanese would be duplicating content from our modern Japanese entries. The main differences come down to things like sense development (such as ありがとう (arigatō) shifting from "in a manner difficult to exist" to "in a manner difficult to bear" to "welcome" and then the modern "thanks" sense), phonetic realization (such as /je/ and /we/ merging ultimately into /e/) and conjugation patterns (like the 下二段 (shimo nidan) lower bigrade conjugation pattern flattening out into the 下一段 (shimo ichidan) or modern lower monograde pattern). I feel much more comfortable trying to explain all of this in the context of "Japanese", rather than duplicating entry data across multiple different language headings, especially as the older senses and sometimes even conjugations are still used. I'd also like to point out that monolingual sources treat Middle Japanese as a matter of footnotes and formatting within entries for the modern language, rather than as a distinct entity.
‑‑ Eiríkr Útlendi │Tala við mig 23:03, 3 September 2019 (UTC)
  1. @Eirikr: Yes, I think it would be a good idea to create a code for Early Modern Japanese called ja-ear set as an etym-only language with Category:Japanese language as its ancestor. By doing so we can have categories such as Category:Chinese terms borrowed from Early Modern Japanese.
  2. The early and late varieties (Early Middle Japanese, Late Middle Japanese, Early Modern Japanese) will not be having their own entries and template infrastructure because these languages will only be used in the etymology section to display statements such as "From Early Modern Japanese X, from Late Middle Japanese Y", etc. to reflect sound or spelling changes.
As for Middle Japanese, it will be made a full language so that we can use the language code in templates and quotations within the Japanese section. I agree that some of the older senses and conjugations are still used in written Japanese so it is not necessary to create a separate entry for Middle Japanese. Middle Japanese can be merged into Japanese like how monolingual dictionaries treat the language. {{ja-see}} can be used to redirect entries with archaic spelling to the modern spelling. KevinUp (talk) 22:27, 4 September 2019 (UTC)
Okay, so the proposal is to have Middle Japanese as a full language but with no entries of its own? At the moment that means that Middle Japanese links would go to the Middle Japanese section, not the Japanese section as intended. Perhaps Module:links could be made to direct Middle Japanese links to the Japanese section. It would complicate linking in other modules because they couldn't rely on the section name being the canonical name anymore. — Eru·tuon 02:16, 5 September 2019 (UTC)
Yes, the plan is to have Middle Japanese as a full language with no entries of its own, similar to how Middle Chinese is unified with Chinese. The linking problem is an issue for languages that use such an approach. For example, all the hanzi entries in Category:Cantonese nouns link to TERM#Cantonese rather than the correct form TERM#Chinese. One way to overcome the linking issue for Middle Japanese is to periodically search for the following:
  1. {{l|ja-mid|TERM}} → convert to {{ja-l|TERM}} {{q|Middle Japanese}}
  2. {{m|ja-mid|TERM}} → convert to {{ja-mid-inline|TERM}} (new template similar to {{okm-inline}})
  3. {{cog|ja-mid|TERM}} → convert to {{cog|ja-mid|-}} {{ja-l|TERM}}
  4. {{desc|ja-mid|TERM}} → convert to {{desc|ja-mid|-}} {{ja-l|TERM}}
This is of course, an inefficient way to deal with this issue, but it is not uncommon to have links that link to nowhere, For example, I often click on Middle French links that only have a French section. I wonder if there's a way to identify links that already have a page but lack an entry in the target language so that false positives can be identified. KevinUp (talk) 03:15, 5 September 2019 (UTC)
Yeah, actually Jberkel's "wanted" lists check for that. For instance, quite a few of the links in the Serbo-Croatian list go to pages that already exist. So that's good, it won't be too hard to clean up the links. — Eru·tuon 03:30, 5 September 2019 (UTC)

Practical considerationsEdit

Pinging also @Dine2016, Eirikr, Poketalker, Suzukaze-c, TAKASUGI Shinji to inform them about this proposal.
  1. Currently, we have quotes from Nippo Jisho (日葡辞書) which are written in Latin script. Shall we add Latin as one of the scripts for Middle Japanese along with the Japanese script?
  2. Any thoughts on adding entries into Category:Japanese terms inherited from Middle Japanese after the language code is available? KevinUp (talk) 18:28, 1 September 2019 (UTC)
    This would include pretty much everything that is not a modern coinage or borrowing. I'm not sure about the utility / usefulness / use case for this category. See my comment above about keeping this within the context of "Japanese". ‑‑ Eiríkr Útlendi │Tala við mig 23:03, 3 September 2019 (UTC)
    Yes, this category would include all terms that existed in pre-modern literature. Perhaps some other category such as Category:Middle Japanese terms borrowed from Middle Chinese would be more useful. Lemmas can be put into this category if quotations of the Sino-Japanese term can be found in Middle Japanese. KevinUp (talk) 22:27, 4 September 2019 (UTC)
3. What shall we do with the following entries?
  1. かはす#Middle Japanese
  2. かはる#Middle Japanese
  3. かふ#Middle Japanese
  4. かへす#Middle Japanese
  5. かへる#Middle Japanese
  6. かめ#Middle Japanese
Shall these entries be merged into Japanese? KevinUp (talk) 21:54, 3 September 2019 (UTC)
  • @Poketalker When you have the time, please take a look at these entries and merge it with the modern form. KevinUp (talk) 22:27, 4 September 2019 (UTC)

Category:User_la-5Edit

This category was created by a single user who grossly overestimates their skill in the Latin language - they haven't managed to even correctly write the description, although they refer to themselves in it in the singular. There is no legitimate need for this category any more than there is a need for Category:User_en-5. I propose that it be deleted. Brutal Russian (talk) 13:48, 3 September 2019 (UTC)

I missed that; that’s really gross. The custom one on the author’s, Aearthrise’s, user page is likewise horrifying. He does not even inflect … Fay Freak (talk) 00:31, 4 September 2019 (UTC)
LOL, though. Mélange a trois (talk) 21:55, 4 September 2019 (UTC)
Was going to suggest the same thing, also the Category:User la-N should be deleted. 𐌷𐌻𐌿𐌳𐌰𐍅𐌹𐌲𐍃 𐌰𐌻𐌰𐍂𐌴𐌹𐌺𐌹𐌲𐌲𐍃 (talk) 04:22, 5 September 2019 (UTC)

User_la-x category templatesEdit

(Notifying Fay Freak, JohnC5, Benwing2, Lambiam): @Urszag

Are all written in broken Latin.

  • The verb "contribuere" is not used in the intended sense - nor any other single verb to my knowledge;
  • "usor" in all category descriptions should read "usuarius" as in the template;
  • la-0: Hic usuarius aut nullam aut paucam linguam intellegere potest - should read "Hic usuarius aut nihil aut pauca latine intellegere potest";
  • la-2: "media latinitas" means "medieval Latin" - rephrase as "medius gradus", "satis bene...potest";
  • la-3: "callidissima latinitas" means "most ingenious" or "extremely cunning Latin" - rephrase as "probe ac latine";
  • la-4: the whole phrasing smacks of translationese, should probably say "latine loquuntur pariter~similiter ac/tamquam sermone patrio".

Could someone kindly direct me to the templates that should be edited? In addition, as far as I understand it's important that the phrasing reflect one's active knowledge of a language, and not just passive understanding - is this correct? In that case I'm planning to change the phrasing to "latine scit et scribere potest". My thinking is that due to the general lack of active Latin users people might be rather judging their reading skill - there are more la-3 tagged users than it-3 and ru-3, which I find difficult to believe (it says "speaks fluently" in the Russian template). If anyone has further translation suggestions, they'd be very welcome. Brutal Russian (talk) 14:57, 3 September 2019 (UTC)

Template:User la-0 DTLHS (talk) 14:59, 3 September 2019 (UTC)
And Template:User la-1, Template:User la-2, Template:User la-3, Template:User la-4, and Template:User la. (: Maybe they should be renamed la-I through la-IV. :) The first sense of contribuo at Gaffiot is “to bring in one’s share”, which is similar in meaning to “to contribute”.  --Lambiam 20:13, 3 September 2019 (UTC)
Thank you both. You can get a sense of using a word similar in meaning if you substitute contribute in the English description for any of its synonyms, e.g. "this user can bestow in simple Latin". Brutal Russian (talk) 21:29, 3 September 2019 (UTC)
  • contribuere Hm, on the internet? The English contribute, the German beitragen etc. arguably hadn’t this sense either before Stallman invented it; though I see that this stretches the Latin meaning more (simply put, Latin contribuere does not mean “to contribute”; in German it is more zuweisen, zuschlagen). What do you suggest? We should ponder more how to do GNU propaganda in Latin. But maybe as a Discord user you aren’t much into it.
  • Yes, usor isn’t even a word, except as a scanno for uxor or ūsūrīs etc.; see also Talk:proprietor. Ridiculous.
  • Indeed, media latinitas is Middle Latin aka Medieval Latin.
  • Yeah, the superlative callidissima is off, and evidently translates Romance.
  • Yeah, they tried to translate modern linguistic categories (“native speaker”, “natively” – how would a Roman say?)

When you have invented true Latin formulations, you should not miss to get Meta Wiki and the rest to fix their bad Latin. They have similarly bad phrasings, though not the same. I do not understand where the data is saved on Meta Wiki, possibly it is in software (can somebody find the texts?), but you find the texts displayed via meta:Category:User la. Fay Freak (talk) 22:35, 3 September 2019 (UTC)

  • It's not on Meta Wiki. If I remember correctly, they set up an independent organization that oversees Babel. Chuck Entz (talk) 03:01, 5 September 2019 (UTC)
Vicipædia defends usor as Neo-Latin.  --Lambiam 10:45, 4 September 2019 (UTC)
  • @Fay Freak I don't know whether there is an established word for it (even if a medieval one), I just know that contribuere is not that word - neither could I properly define what exactly it means, I just know it doesn't mean the same as its English or Romance look-alikes. Whatever the GNU propagandists come up with should at the very least be usable intransitively, for instance conferre. Btw, could you elaborate on why Discord users aren't supposed to be into it? =P
  • I've never come across a proficient speaker saying "native speaker" or discussing how to say it - I myself wouldn't outright censure nātīvus locūtor/ōrātor, but I wouldn't endorse it either - I'm not even sure which of the several options S&H give for "native" is the best one. For the time being, I don't think there's a need to use an equivalent to the English expression - the phrasing I suggest in the initial edit looks fine to me.
  • @Lambiam As a fellow Discord Latinist who writes excellent Latin says - and we don't agree in everything, but here we definitely do - if that's what it says on Vicipaedia, it's definitely wrong. They've managed to get the name of the whole language wrong, for Jupiter's sake (it's not Latina any more than it's Italiana, Española, Française etc). I think it remains so broken because people who see that it is aren't the same people who know how to fix it, and the former despair before even trying (that's true for me at least). I wouldn't be surprised if that page is usor's first attestation xD Brutal Russian (talk) 15:55, 4 September 2019 (UTC)

A category for images?Edit

I think it would be useful to have a category which shows which entries (in each language) include images. However I'm not sure if there is a way that can be found to automatically categorise them rather than adding a category manually. Anyway, let's see what the reaction to this proposal is like first. DonnanZ (talk) 13:36, 5 September 2019 (UTC)

Maybe. Why is it useful? Who will use it, for what? Equinox 13:48, 5 September 2019 (UTC)
That's what I want to find out. I for one would make use of it, even to find entries that don't have images and would be better with them. And we seem to have categories for virtually everything else. DonnanZ (talk) 13:57, 5 September 2019 (UTC)
But how would you use a category of images to find entries that don't have images? I think there is no technology to say "list all entries NOT in a category". Equinox 14:11, 5 September 2019 (UTC)
Technology, no. One could notice an entry missing from the category, investigate and possibly rectify the omission if a suitable image can be found on Commons. And if one can't be found creating one from your own work is possible if you know where to find and photograph something suitable, I have been doing that lately. DonnanZ (talk) 14:20, 5 September 2019 (UTC)
Actually, there is a method to search for entries that lack a certain template or entries that are not in a certain category. Try this: KevinUp (talk) 15:36, 5 September 2019 (UTC)
I am not aware of any way to do this (automatically, it could easily by done asynchronously by analyzing database dumps) unless we started putting all images in templates. DTLHS (talk) 17:42, 5 September 2019 (UTC)
A template attached to the image entry is what I am thinking of, as long as it would allow access to the image itself. DonnanZ (talk) 18:39, 5 September 2019 (UTC)
To create a category for images such as Category:English terms with images, a template for images such as {{image|en|File:Carrots with stems.jpg}} will be needed. KevinUp (talk) 22:25, 5 September 2019 (UTC)
Hmm, not what I intended. I would like that to be superseded by the pagename if possible, e.g. {{image|PAGENAME|File:Carrots with stems.jpg}}. DonnanZ (talk) 23:15, 5 September 2019 (UTC)
Then we need to use templates, in place of using Mediawiki code. And the templates can never keep up with the capabilities of Mediawiki to display images. If you think that one way to embed images is the only used here then you are mistaken. Given that words for plants denote both the plant and the fruit it often makes sense to have a gallery showing the fruit maybe in different processing stages, the plant from the near in different seasons (bearing fruits, or when yet only blossoms), the plant from afar. Example: بَلاذُر(balāḏur) which meant the marking-nut before the New World explorations and now means cashew, so it has two, slightly different to German Neugewürz with its descendants. That’s what I expect from a dictionary to tell me what a plant name means.
We also use {{multiple images}}, or mostly only Sgconlaw and I use that, for example on moccasin, for different purposes – I use it to have horizontal grouping if there is space to the left but not enough content in the bottom to show one image under another. Turkish ispinoz / اسپنوز‎: Male and female finch.
This {{multiple images}} is problematic though because it doesn’t let me change relative image sizes so I have to pick roughly fitting ones :/.
It is probably better to only have the category system (if it has any use) but let editors categorize manually with {{cln}}. Fay Freak (talk) 00:14, 6 September 2019 (UTC)
A
The purpose of a category for entries that had images would be what? To review the appropriateness and adequacy of images? One could use search to find them in groups by using searchbox searches like 'incategory:"English nouns" incategory:"English lemmas" insource:/\[\[((fF)ile|(iI)mage)\:/'. One could add searches for galleries and other display generators.
I would have thought that the main problem is finding entries that might need images and also fit with one's topical interest or skills.
One could always use, say, Category:Requests for images in English entries to find a few entries that need images. Intersecting that category with a topical category would narrow the search. One could look for items in such a category that also had {{comcatlite}} or {{commons}} to find some that would be easy to fill. One could also exclude pages that already had images using -insource:/\[\[((fF)ile|(iI)mage)\:/'
A problem with a category like 'English entries without images' is that the category is so large as to not be usable in doing intersection searches in the search box. Another is that it would miss large numbers of definitions that were missing images because the English L2 section already had one definition with an image.
I would think that using {{rfi|en}} and adding "topic=" tags (or equivalent) would enable targeted searches (with or without categorization) for definitions that needed images. DCDuring (talk) 00:41, 6 September 2019 (UTC)
  • I must admit I forgot about {{rfi}}, which I have been able to comply with once or twice, but this isn't added in every case where images are desirable. DonnanZ (talk) 09:12, 6 September 2019 (UTC)
I was thinking of Category:English terms with images as mentioned by KevinUp (above), I feel Category:English terms without images would end up being far too large, and therefore a definite non-starter. DonnanZ (talk) 12:46, 6 September 2019 (UTC)
You can find pages with images using the following searches: insource:/(File|Image):.*(jpg|jpeg|png)/. This can be combined with incategory keyword. (A duplication od DCDuring's information above, oh well.) --Dan Polansky (talk) 10:39, 6 September 2019 (UTC)
There is so much that an active contributor can do with CirrusSearch. I'm reading up on regular expressions to do fancier things. But the "basics" are quite powerful. See CirrusSearch Help. DCDuring (talk) 13:52, 6 September 2019 (UTC)
All this is great to know - we should include it in a Help page like Help:Advanced Wiktionary skills. --Mélange a trois (talk) 21:33, 6 September 2019 (UTC)

Isekiri or Itsekiri?Edit

Discussion moved from WT:Tea room/2019/September#Isekiri or Itsekiri?.

Module:languages/data3/i gives “Isekiri” as the main name for language code its. But “Itsekiri” is much more common.  --Lambiam 03:20, 7 September 2019 (UTC)

This is about the site as a whole, so I moved it from the Tea room, which is for discussing specific entries.
As for the topic itself, I notice that Wikipedia has w:Isekiri language as a redirect to w:Itsekiri language. Chuck Entz (talk) 03:56, 7 September 2019 (UTC)
I might add that its' spelling can lead to some head-scratching when used without quotes in sentences... ;p Chuck Entz (talk) 04:01, 7 September 2019 (UTC)
I support the change. @-sche, in case you want to weigh in. —Μετάknowledgediscuss/deeds 01:16, 8 September 2019 (UTC)
A look at Glottolog makes it seem like the 't'-less form has become more common in more recent works that are specifically about the language; however, in broader literature both exist, and in an Ngram Itsekiri is much more common, as Lambiam says. So, support. - -sche (discuss) 02:28, 15 September 2019 (UTC)

Replacing de-sysop votes with confirmation votesEdit

Please read and comment on this proposal here: Wiktionary:Votes/2019-09/Replacing de-sysop votes with confirmation votes. —Μετάknowledgediscuss/deeds 03:56, 8 September 2019 (UTC)

Wikipedia Moss ProjectEdit

Over at Wikipedia, some of you may be aware, there is a project called Moss which seeks to eliminate a pet hate of mine, tyops. A useful byproduct of this project is that it finds words potential missing Wiktionary words. Below around 100 such entries. The number at the beginning represents the number of occurrences in en.wikipedia. Enjoy and happy editing! --Mélange a trois (talk) 17:27, 8 September 2019 (UTC)

If only WP were valid attestation. This doesn't belong here. We could put it in WT:REE or a subpage thereof, an appendix, or a userpage [sic]. DCDuring (talk) 23:40, 8 September 2019 (UTC)
Technical, moth anatomy
Technical, AUS and NZ, ecology
Science fiction. Primarily Babylon 5 universe, but some other usage.
It would be cute/fun/useful? to have some automated bot-o-matic thingy that would connect that project with this one, along the lines of Wiktionary:Wanted entries. But don't go mad because a lot of them will be rubbish words. Equinox 23:59, 8 September 2019 (UTC)
Unsurprisingly, it's much easier to create long lists of "things that could be words" than to actually create entries. DTLHS (talk) 00:26, 9 September 2019 (UTC)
Very true, but to me that suggests that we just need a way to (permanently? or for some period of years) strike non-words from the record. I've noticed BTW that sometimes the very same word appears twice in WT:REE in successive years, perhaps because somebody forgot they'd added it before. Equinox 00:27, 9 September 2019 (UTC)
How about a list of failed requests, with explanation of why? Chuck Entz (talk) 01:50, 9 September 2019 (UTC)
Yes yes but once again we're suffering from not having any structure to our entries, just big lists and bullets and indents. If we have a list of "words to avoid" who's to say anyone will look at it? If we had a form where you filled in WORD + DEFINITION + SOURCES then we could validate it right off. Equinox 02:05, 9 September 2019 (UTC)

Inconsistent use of qualifiers in translationsEdit

I noticed e.g. here in the Swedish translations, that qualifiers sometimes comes before the terms. TranslationAdder.js always inserts them after the term. I found no guidance in EL.

I would like this to be more consistent so that my input filler script can pick up the qualifiers consistently. I suggest we agree here and then instruct a bot to do the job of moving them right. WDYT?--So9q (talk) 08:39, 11 September 2019 (UTC)

I think that in this specific instance {{sense}} is more appropriate – which IMO should always come before the term. The use of {{sense}} in translations should of course be rare, because there ought to be a single sense per list, but occasionally a target language will have distinctions that are not present in English. I see though that other examples that I can think of also use {{qualifier}}; e.g. at sister we have “Turkish: abla (tr) (elder), ...”, while I feel “Turkish: (elder sister): abla (tr), ...” would be better.  --Lambiam 10:32, 11 September 2019 (UTC)
Another example where qualifier comes first (see german translations). There is also semicolons as separators instead of commas. What a mess.--So9q (talk) 13:45, 11 September 2019 (UTC)
You will not find any consistency here. Translation sections are a free for all. Trying to clean up translations has driven several users off of the project in frustration. DTLHS (talk) 15:27, 11 September 2019 (UTC)
Thanks for the warning. I just found Wiktionary:Translations and it contains nothing about qualifiers or senses to my surprise.
Two questions come to mind:
  • Will we need a vote or discussion about where to put the qualifier or can we agree that always putting it after the term is correct?
  • Can we agree to always use commas between terms and not e.g. semicolons, colons or full stops?--So9q (talk) 13:44, 13 September 2019 (UTC)
You would need a vote since most of the people who add translations probably aren't reading this. DTLHS (talk) 15:26, 13 September 2019 (UTC)
Shouldnt’t it technically be distinguishable by whether it comes before or after commata?
It can also be more complicated. For stubble “short stalks left in a field after harvest” I could for some Slavic languages discern a distinction between the stubble itself and a field of stubble, which latter might likewise been sought and are more commonly used in the respective languages, and I added them inside of the qualifiers. Guess one needs AI for the translation tables.
I support more markup though, so at least those who know can do better. @DTLHS I am myself shocked how many wrong langcodes or uses of {{t}} outside of translation tables by me you had to correct. I believe it is facilitated by the different source code formatting, the fact that in the translation tables the language names are in plain text. If the name was fetched like with {{m+}} the former error would be impossible and the latter less likely because it arises from copying terms to reuse them and deleting the plain text language name but forgetting to replace the template name. The language name is fetched somewhere anyway for the section link, unlike with {{t-simple}}.
(On the other hand one adds more language names than there are codes. Sometimes Christian Palestinian Aramaic, Jewish Palestinian Aramaic, Jewish Babylonian Aramaic under “Aramaic”, example in bed) Fay Freak (talk) 16:12, 11 September 2019 (UTC)
I'm no tech maven, but further automation of the translation tables is likely to have a dramatic negative effect on entry load times for some of our larger entries, especially those with highly polysemous English terms. [[a]] and [[water]] come to mind, but there are others. Maybe someone with good tech foo can come up with Summer-of-Code projects that would help with these. DCDuring (talk) 19:05, 11 September 2019 (UTC)
Please keep the discussion on topic about the use of qualifiers in the translation section.--So9q (talk) 13:44, 13 September 2019 (UTC)
My understanding is that it has always been preferred that qualifiers be after the translations, but the Translation Adder used to insert them before the translation (I think because that was "easier" to code). I recall that there was a short thread somewhere which led to that behavior of the Translation Adder being fixed (it may have been Ruakh who fixed it). - -sche (discuss) 02:38, 15 September 2019 (UTC)
And I recently saw a user (forgotten the name) using either qual or gloss to give a literal English rendering of the linked foreign term! Equinox 02:42, 15 September 2019 (UTC)
Oh, I've seen (and done!) that in a few places. I think there are a few places where it's useful, like devil's beating his wife. - -sche (discuss) 02:47, 15 September 2019 (UTC)

Should we include non-native audio pronunciations?Edit

I came across a French user's audio file for the English word bicycle and removed it on sight as being unhelpful for English dictionary users, as well as nominating multiple files by the same user for deletion here. Two users including the uploader are arguing to keep the files, and I wanted some Wiktionary editors to weigh in. Maybe I'm wrong in wanting them deleted from Commons since I don't know their policy, but can we agree that there's no place for these in Wiktionary? Ultimateria (talk) 15:11, 11 September 2019 (UTC)

I also want to know if @Derbeth and @0x010C will choose to stop importing such recordings. Ultimateria (talk) 15:24, 11 September 2019 (UTC)
They might be kept, provided that they are sufficiently marked that they are not added automatically by bots. Who knows what one can use them for, maybe to illustrate articles about learning languages. They should not be included in the dictionary. There are enough native accents, it’s only noise. Fay Freak (talk) 16:18, 11 September 2019 (UTC)
User:Fay Freak brings up a possible tangential use of them but that would really be more appropriate at b: or v: (likely at b:fr: and v:fr, specifically) for interactive media teaching someone to speak English. I mean, there are already so many accents and lects of English let alone all of the hundreds of millions (billions?) of non-natives who have such wildly varying accents. I really don't see the value of this here since the goal is to show how a word is used by the community that uses that word. If a word gets adopted into another language, then use that pronunciation for that entry. —Justin (koavf)TCM 17:13, 11 September 2019 (UTC)
I am having a similar problem in trying to delete an incorrect Armenian pronunciation here. Maybe we should block the bots for importing incorrect pronunciations, until their owners learn to maintain blacklists. --Vahag (talk) 18:16, 11 September 2019 (UTC)
Yeah that’s what I mean: They can be kept on Commons as for example usable to illustrate language learning, or erroneous pronunciation, as on Wikiversity, but the files have to be marked in one way or the other so the bot can distinguish. Fay Freak (talk) 18:26, 11 September 2019 (UTC)
Some imports (including the bicycle one) are from Lingua Libre via Commons. Lingua Libre collects all sorts of audio, including non-native pronunciation. The recording metadata has a reference to the speaker (example), which includes language levels, so a bot could only import recordings made by native speakers. – Jberkel 18:35, 11 September 2019 (UTC)
I think not, since the majority of users are likely to want to know the standard (Inner Circle) versions. At least we should ensure we have those before we try for anything more exotic. Equinox 18:21, 11 September 2019 (UTC)
I think we should stop trying to treat Commons names as meaning anything. Just because they aren't useful for us doesn't mean anything for Commons.--Prosfilaes (talk) 00:58, 12 September 2019 (UTC)
I use LinguaLibre to record pronunciations, and admittedly I got a couple of them wrong. For one I even recorded a fart sound, just to see if anyone actually listens to them (they did, and quite quickly removed it from the site). As for non-native pronunciation, it seems obvious that it is not as good as, for example, my beautiful voice. Also, I'd love to hear each word spoken by 10 different accents - Liverpudlian, Scottish, South African, Deep South etc. Hmm, I wonder which word has the most audio pronunciations? I'm sure one of the more geeky users can find out. --Mélange a trois (talk) 09:46, 13 September 2019 (UTC)

On the decline of Urban DictionaryEdit

https://www.wired.com/story/urban-dictionary-20-years/

Not sure how much applies to this online crowd-sourced dictionary effort but it's worth thinking thru some of the problems with UD's methods and ensuring that we don't fall the same fate. —Justin (koavf)TCM 20:42, 11 September 2019 (UTC)

"Where Oxford and Merriam-Webster erected walls around language, essentially controlling what words and expressions society deemed acceptable," really? I find very little value in this article and I don't think the author knows much about lexicography. I guess it points out (indirectly) that there is more to a dictionary than a 2D line between "descriptivism" and "prescriptivism": there are also "dictionaries" that simply invent vocabulary out of the ether ("inventionism"). DTLHS (talk) 20:53, 11 September 2019 (UTC)
Yeeaaahhhh (long drawn-out expression of dubiousness) ... The article author doesn't understand lexicography, and clearly doesn't distinguish between a list of memes, and a dictionary. UD is great if you want some idea of the current zeitgeist for a particular term, but it's useless as, well, a dictionary.
Ah, well. ‑‑ Eiríkr Útlendi │Tala við mig 21:11, 11 September 2019 (UTC)
I would guess the main reason for decline in sites with "user-generated content" is that now there are a lot more users and many of them are young children. A bit like what Eternal September did to Usenet. Equinox 21:19, 11 September 2019 (UTC)
It’s not like we couldn’t need a lot more users. Which decline? All runs fine until one tries to suppress information for politics, or for special rights. Fay Freak (talk) 21:56, 11 September 2019 (UTC)
"Which decline": well, politics aside, you don't think that Pewdiepie screaming and swearing his way through Minecraft, and the typical crop of YouTube comments, are a bit more inane than the quiet intelligent bloggers of the early 2000s? Equinox 22:04, 11 September 2019 (UTC)
No, I think that this is what is left after the legal trammels have grown ever heftier. Regulation everywhere, only some big players can calculate with it, and children who can’t care. If you are a blogger you will possibly drown in cease-and-desist letters because your privacy notice misses some trifle. It’s how at some time there were many producers of CPUs, now there are two – the laws have provided loopholes to eliminate competition, and the desire to become competitive. And in compliance with modern identity politics everyone is triggered by everything and tolerant towards addictions and degenerations thus tame, on top of coming out of schools more damaged than educated, which has always left majorities inane. Education always depended on incalculable details, and the cult of equality has stifled it. Everyone goes to school but learns nothing; everyone converses in cramped networks but may not tread on anyone’s toes; everyone may work but idlers stand at the door to have a share of it. That’s how you nurture the ugly. Fay Freak (talk) 23:41, 11 September 2019 (UTC)
  • FWIW my wife is a teacher and a damn good one, and I disagree with your characterization of "education" with such a broad brush. ‑‑ Eiríkr Útlendi │Tala við mig 21:15, 16 September 2019 (UTC)
Are you saying we need more users? I can't parse that. DTLHS (talk) 22:09, 11 September 2019 (UTC)
To answer a question that was not directed at me (pardon, Equinox): we need quality and quantity. A lot of one does not make the project that we want to make. —Justin (koavf)TCM 22:35, 11 September 2019 (UTC)
So they complain about what? That Urban Dictionary depicts harsh reality and does not censor enough or that it has too much joke content and does not censor enough?
Nothing to applaud there. The internet should have dumps, and there should be lawless zones. Urban Dictionary still often has the definition you need that could not pass elsewhere. Fay Freak (talk) 21:56, 11 September 2019 (UTC)
I'm pleased about the fact that the author didn't think Wiktionary significant enough to be worth a mention in his article. --Mélange a trois (talk) 09:40, 13 September 2019 (UTC)
I doubt the author knows what it is lol —AryamanA (मुझसे बात करेंयोगदान) 21:09, 15 September 2019 (UTC)

Clean up of templates for derived and related termsEdit

Hi, I read the previous discussion at Wiktionary:Beer_parlour/2018/November#Titles_of_morphological_relations_templates more or less in its entirety. I suggest we do a thorough clean up of these templates and only keep the one(s) we all agreed on keeping and using.

To be more precise we currently have a whole host of templates in use in our main space:

  • {{col3}} and related ones
  • {{der-top}} and related ones
  • {{rel-top}} and related ones
  • {{Template:User:Donnanz/der3-u}} found here.

I would very much prefer to only have one template left when we are done if possible. All terms that should appear alike should be inserted using the same template with a few different parameters for e.g. title, number of columns, etc.

As an aside I got interested in this topic when browsing on my mobile with the default skin Vector on entries with a lot of derived terms (like rock) and where they were not collapsed by default (see picture of the rendering using the default mobile frontend skin minerva). Terrible to scroll this and apparently no way to collapse. WDYT?--So9q (talk) 09:05, 12 September 2019 (UTC)

Is that really Vector? It looks like the mobile site, which uses Minerva. You can check with mw.config.get("skin") in JavaScript. (Special:Preferences only controls the skin used in the desktop site.) The mobile site just doesn't run many of the collapsibility scripts, only NavFrame. I wouldn't feel confident working on it myself because I never use it. — Eru·tuon 04:37, 13 September 2019 (UTC)
Oh, you are probably right. I would really like a menu on mobile for easily changing skin. Do you know how I could inject JS or HTML to do that?--So9q (talk) 06:19, 13 September 2019 (UTC)
You can change the skin by adding the query string ?useskin=whatever to the URL (for instance, https://en.m.wiktionary.org/wiki/aer?useskin=vector), but other skins aren't designed for mobile so the page just looks a lot like the desktop site. — Eru·tuon 08:37, 13 September 2019 (UTC)
FYI I went ahead and submitted a request for deletion: .--So9q (talk) 22:16, 14 September 2019 (UTC)

Why do we mark race commonalities of English-language surnames?Edit

Pretty much exactly that. It seems a bit strange, given the abstractness and variance of race. Starbeam2 (talk) 01:36, 13 September 2019 (UTC)

I assume you are referring to things like "Aggarwal is most common among Asian/Pacific Islander (94.32%) individuals."? That information was added by a particular user and I have seen other users support removing it, but no action has been taken. DTLHS (talk) 01:37, 13 September 2019 (UTC)
Don't know about surnames but I have wondered about names like Shaniqua, which are not seen outside of black American communities. Well that one has a usage note. Given names are usually chosen by a parent, who belongs to such-and-such a culture. Surnames are a bit different... Equinox 02:13, 13 September 2019 (UTC)
Sure but surnames are also very culture-bound usually. —Justin (koavf)TCM 02:22, 13 September 2019 (UTC)
So what can possibly be more "culture-bound" than given names that only black Americans use, like Shaniqua? I would wager money that no black Briton has that name. It's absolutely part of the culture. Equinox 02:27, 13 September 2019 (UTC)
No one is arguing otherwise. Certainly, the name Shaniqua is an African-American one. Not sure what your point is. Both "Moishe" and "Cantor" are Jewish names. Similarly, "Rodrigo" and "Hernandez" are both Hispanic. —Justin (koavf)TCM 02:29, 13 September 2019 (UTC)
My point is that your immediate parents usually choose your given name, but your surname is usually left alone, and persists for a long time. Saying "Shaniqua is a black name" is an observation about how black Americans tend to name their kids; saying "Goldstein is a Jewish name" is quite another matter: maybe my great-great-grandfather was the last Jew in the family. Equinox 02:32, 13 September 2019 (UTC)
For every "Goldstein" who has a very tenuous connection to the Jewish people, there are 10,000 "Jacob"s who have no relationship to the Jewish people. Personal names are much more likely to not be culture-bound/associated than surnames. —Justin (koavf)TCM 18:59, 14 September 2019 (UTC)
Because it tells you about how English language surnames are distributed, at least in the US. For all the problems with race, it's still a good proxy for ethnic groups in the US.--Prosfilaes (talk) 02:33, 13 September 2019 (UTC)
There's more than one ethnic group per race. Starbeam2 (talk) 19:41, 17 September 2019 (UTC)
I think it's useful having this information. It provides at least vague information about where the name came from, and can be useful to fiction writers who might be trying to find a name that suits a particular demographic. Andrew Sheedy (talk) 02:55, 13 September 2019 (UTC)
I understand the need for demonstrating association, but it does rub me the wrong way how obsessively it shows up. I admit surnames should mention their connotations on the page, but only if it's A) especially prominent and B) not a repeating a general rule. Rule A is for names like Poindexter, which is associated with nerdy people, and Rule B means to exclude Steinberg for Jewish stereotypical names, since it demonstrates the -berg suffix used in stereotypical formations of "[Ashkenazic] Jewish names". Also, the US Census doesn't perfectly reflect how race is seen in the US: as Middle Eastern North African people (MENA), many Hispanics, many Portuguese-descended people, many Latinos, Sephardic Jews, Romanis, Ashkenazic Jews, Armenians, and Kartvelians are considered "White" on legal papers despite it not socially being the case for many of them, especially the first 6 groups. Nonetheless, I don't plan on touching those parts of the pages at the moment. Starbeam2 (talk) 18:47, 14 September 2019 (UTC)
You do realize that the race/ethnicity questions in the census are mostly self-reported. It's quite possible to object to the default categories, but I believe "other" is an option. See infobox for question at w:Race and ethnicity in the United States Census#21st century. DCDuring (talk) 22:14, 15 September 2019 (UTC)
I'm aware, but "other" doesn't always elucidate things, and race is basically decided by society at large not the individual person. Starbeam2 (talk) 19:41, 17 September 2019 (UTC)
I find it useful for researching the etymology. --Vahag (talk) 04:40, 13 September 2019 (UTC)
We do add the etymology often, or at least we try to. Starbeam2 (talk) 19:41, 17 September 2019 (UTC)
I don't think these statistics really belong in a dictionary. —AryamanA (मुझसे बात करेंयोगदान) 21:04, 15 September 2019 (UTC)
I decided to include that information when I added the surnames. I decided to do so for two reasons: I had the information and it seemed lexically relevant. I concur that there are problems with the stats (as has been mentioned above) and that the relevance to a dictionary is not inarguable. I would not strenuously object to their removal if people felt that they don't belong, I would even be willing to remove them myself if that is the verdict. - TheDaveRoss 22:47, 15 September 2019 (UTC)

Requesting AWB/JWB rightsEdit

Hi, I would like to semi-automate tedious editing tasks with JWB. I need an administrator to add my name to the list of approved users.

I promise to be careful and responsible as always in my use of this tool. Thanks in advance.--So9q (talk) 06:03, 14 September 2019 (UTC)

dialectalEdit

Where's the information for the label "dialectal" come from? When meaning Of or relating to a dialect, such a dialect should be added as well as the source, for example in unlight --Backinstadiums (talk) 15:52, 15 September 2019 (UTC)

Our glossary, to which the label (dialectal) links, gives two meanings:
  1. Of or relating to a dialect.
  2. Not linguistically standard.
The latter sense need not be tied to any specific identifiable dialect; it could also be slang or a colloquialism. It may be unfortunate that the label combines these two senses, especially as we also have the label (nonstandard). Many of the Turkish terms labelled “dialectal” can more properly be called “regionalisms”, but the regions in which the terms are current do usually not correspond to a well-defined and named geographic subdivision. Compare the distributions of faucet vs. spigot in the US, where the latter (in the sense of “faucet”) also does not keep to a well-defined border but spills over from Philly into the Midland dialect region.[1]  --Lambiam 16:38, 15 September 2019 (UTC)
"Such a dialect should be added as well as the source" is a counsel of perfection (sense 3). We rarely have very specific information. But it is useful to know that a given definition may not not be generally understood in all places a language is spoken. DCDuring (talk) 18:21, 15 September 2019 (UTC)
@DCDuring: It all depends on the definition of "dialect" to begin with, but if an editor knows a term is dialectal, they must at least know some dialect/region or, for further investigation, add where the info comes from --Backinstadiums (talk) 18:59, 15 September 2019 (UTC)
False. The editor might know that it is regional but be unsure about the region. Also if he writes one region it looks like the term is only from this region. Fay Freak (talk) 19:34, 15 September 2019 (UTC)
What Fay Freak said. DCDuring (talk) 22:11, 15 September 2019 (UTC)
For entries to specify dialects rather than using "dialectal" is a good ideal/goal, but it will take a long time before all the entries currently labelled only as "dialectal" can be labelled more specifically. - -sche (discuss) 19:21, 15 September 2019 (UTC)
This is not possible as a principle. If a term is used only in certain villages and an author uses such a term you do not know which village it points to or whether it is picked up elsewhere. Such can only be solved with dialectological atlantes which are based on surveys and are thus topically restricted. Fay Freak (talk) 19:34, 15 September 2019 (UTC)

I want to add Church Slavonic termsEdit

  Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

I want to add Church Slavonic terms (not Old Church Slavonic), may I? Which code should I use? —This unsigned comment was added by ПростаРечь (talkcontribs) at 06:51, 16 September 2019 (UTC).

w:Church Slavonic and the ISO 639-2 standard says that it uses the same code as Old Church Slavonic, cu.--Prosfilaes (talk) 06:54, 16 September 2019 (UTC)
As I said in my talk page, our convention is to use "Old Church Slavonic" L2 header, which usually just corresponds to "церковнославянский" in Russian sources. It would be wrong to use the language code "cu" and have a header anything but "Old Church Slavonic". The "cu" will add to "Old Church Slavonic" categories, not "Church Slavonic". --Anatoli T. (обсудить/вклад) 10:19, 16 September 2019 (UTC)
@ПростаРечь: I am not well-versed in varieties of Church Slavonic, we may want to have a split, an example ПростаРечь has used is блꙋдити (bluditi) (Church Slavonic language) vs блѫдити (blǫditi) (Old Church Slavonic). ПростаРечь is eager to contribute in "New Church Slavonic" (or simply Church Slavonic) for which we don't have a code and infrastructure. @CodeCat, -sche What do you think? Is the split merited? --Anatoli T. (обсудить/вклад) 11:42, 16 September 2019 (UTC)
In a google you may now find блꙋдити only with diacritic блꙋди́ти ПростаРечь (talk) 11:56, 16 September 2019 (UTC) I only want to add translations for words from the Ostrog Bible.
@ПростаРечь: We haven't been adding stress marks in Old East Slavic or Old Church Slavonic terms, as these are hard or impossible to confirm with certainty and completeness but these may only be verified in this form with stress marks. I've mentioned a possible split in Wiktionary:Requests_for_moves,_mergers_and_splits#Church_Slavonic_from_Old_Church_Slavonic, not sure if it's merited and/or will happen. You can probably do a better analysis of differences. --Anatoli T. (обсудить/вклад) 12:04, 16 September 2019 (UTC)
@Atitarev: I don't want to add stress marks, I only give an example of блꙋдити existence.
@ПростаРечь: Thanks. Острожская Библия (Ostrog Bible) is one of sources for anyone wanting to have a look for an assessment. --Anatoli T. (обсудить/вклад) 12:16, 16 September 2019 (UTC)
@Atitarev:Anyone may also see it in the original ПростаРечь (talk) 12:23, 16 September 2019 (UTC)
(edit conflict) @ПростаРечь: Question. Are you sure, it's not "Old East Slavic" (древнерусский) or old New Russian, rather than "Church Slavonic"? By this time (1581), the Russian language has fully formed and it seems like a mixture of Russian with Old Church Slavonic or just a very ecclesiastic form of older Russian (ru)? When I get accustomed to the fonts, I can actually read and understand it as a Russian text, perhaps with a bit more ease than modern English speakers can read Shakespeare. Sorry, I just can't dedicate too much time at the moment but we need to assess what language it is. (Notifying Benwing2, Cinemantique, Useigor, Wikitiki89, Stephen G. Brown, Guldrelokk, Fay Freak, Tetromino, Canonicalization): Does anyone want to assess the language of the Ostrog Bible? Is it cu, ru or something completely new? --Anatoli T. (обсудить/вклад) 12:35, 16 September 2019 (UTC)
@Atitarev: The Ostrog Bible doesn't have a polnoglasie "full vocalisation" (Old East Slavic feature) ПростаРечь (talk) 13:21, 16 September 2019 (UTC)
It might be worth considering changing our convention so that the canonical name of cu is Church Slavonic, which we can then divide as needed into dialects such as Old Church Slavonic (= Old Bulgarian?), Serbian Church Slavonic, Russian Church Slavonic, Middle Bulgarian, Bosnian Church Slavonic, Croatian Church Slavonic, and whatever other varieties editors deem desirable. —Mahāgaja · talk 12:26, 16 September 2019 (UTC)
After (edit conflict). Yes, this particular one would be the Russian Church Slavonic, especially for 16th century. It's just too Russian grammatically, even though there are differences. --Anatoli T. (обсудить/вклад) 12:35, 16 September 2019 (UTC)
But the syntax is not what is visible in the dictionary. And this can be seen as Medieval Latin, where the Latin was also too German, too Spanish, too French grammatically, but yet never was Spanish or French. If the endings are like in the Old Church Slavonic original and Old Church Slavonic is still the model intended by writers then this speaks for unity. Also, if we don’t know where to draw the line this also speaks for a more flexible approach with labels. блꙋдити (bluditi) can be added with {{spelling of}} or {{form of}}. Fay Freak (talk) 12:46, 16 September 2019 (UTC)
The main problem with that approach is that pretty much all of our etymologies use cu to mean Old Church Slavonic- are we going to have to add qualifiers to all of them?. Chuck Entz (talk) 12:45, 16 September 2019 (UTC)

May I use "from the Ostrog Bible" label for a while? ПростаРечь (talk) 14:11, 16 September 2019 (UTC)

@ПростаРечь:. Yes, please do for forms not used in other forms of (currently) "Old Church Slavonic" (i.e. words or forms that are specific to this variety and you know it). Please don't use any other language header for now, just "cu", as language codes go with language names and categories. We need to create a new language at Wiktionary to avoid a mess. I think we're dealing with "Church Slavonic" here with the Russian specifics. Technically, it's not a very big deal, I think, just need an agreement. Are you OK to continue using "cu" and "Old Church Slavonic" and a label for a while? We need the community to wake up from slumber!
Please also keep all the discussion here. I don't want to make the decision myself and I'm not so great at creating a new language structure.
I agree with Mahāgaja that we need separate varieties. "cu-r" ("Russian Church Slavonic") seems like a good candidate. Some linguists may cringe but people should realise that what they call using "Church Slavonic" have very distinct flavours on many levels.
Do you agree with creation of a new language code cu-r with a new L2 header "Russian Church Slavonic"? If yes, will start a mini-vote below? --Anatoli T. (обсудить/вклад) 11:27, 17 September 2019 (UTC)
@Atitarev: Russian Church Slavonic (or Russian Synodal recension) is the language of books since the second half of the 17th century, in my opinion. The Ostrog Bible published in Ostroh (Grand Duchy of Lithuania) in 1581 (the 16th century). I would prefer more politically neutral naming unit, e.g. Old East Church Slavonic (or simply Church Slavonic / Middle Church Slavonic for a while) or something like that. ПростаРечь (talk) 11:49, 17 September 2019 (UTC)
@ПростаРечь: Old East Church Slavonic sounds good and it's accurate. I disagree it should be generic, it's distinct from South or West Slavic. We can go with cru code. Starting the vote now. --Anatoli T. (обсудить/вклад) 12:06, 17 September 2019 (UTC)
Just to clarify, I am not suggesting creating a new L2 language code. I am suggesting treating all the varieties mentioned as dialects of cu, which would be renamed "Church Slavonic". Then L2 would read ==Church Slavonic==, and definition lines would include labels like {{lb|cu|Russian Church Slavonic}} (or whatever name we decide on) which would then categorize into a CAT:Russian Church Slavonic (not CAT:Russian Church Slavonic language), which would itself be a subcat of CAT:Church Slavonic language. —Mahāgaja · talk 12:12, 17 September 2019 (UTC)
@Mahagaja: Rather than opposing, can you rewrite the vote, check with User:ПростаРечь and get the ball rolling + revote? I am basically OK with this too. We only have a few people talking. We should be able to agree on something. --Anatoli T. (обсудить/вклад) 12:21, 17 September 2019 (UTC)
@Vorziblix please, don't divide terms from the Ostrog Bible in Ruthenian, Old Russian and so on, otherwise, we risk having an edit war because there are many reliable sources, that contradict each other. Such a sittuation we also have with some Old Dutch texts (e.g. Wachtendonck Psalms, it is hard to determine whether a text actually was written in Old Dutch or in other western Low German dialects). I really want to use a collective name. You may offer such a term. Note: Ivan Fyodorov was the first known Russian printer in the Grand Duchy of Moscow and the Polish-Lithuanian Commonwealth. ПростаРечь (talk) 08:42, 21 September 2019 (UTC)
@ПростаРечь: Apologies. While I agree that calling these recensions Ruthenian/Russian/etc. is not ideal, I do think it’s preferable to use some name that’s seen actual use in academic work. As far as recensions of Middle Church Slavonic go, most papers I’ve looked through seem to agree that there are three or four recensions, viz. a ‘Bulgarian’ recension, a ‘Serbian’ recension, and the East Slavic recensions, which some papers divide into ‘Ruthenian’ and ‘Muscovite’ and some label with a single blanket term, usually ‘Russian’ or ‘Ruthenian’. (See for example Robert Mathiesen (1984), The Church Slavonic Question: An Overview (IX-XX Centuries).) Other terms that occasionally show up include ‘Rusian Church Slavonic’ (with one ‘s’) and ‘East Church Slavonic’. My problem with ‘Old East Church Slavonic’ is mostly the word ‘Old’, which makes it very confusing given that it’s a variant of Middle Church Slavonic and not either OCS or OES, and the fact that it doesn’t seem like anyone else has ever used this term. But perhaps there’s no good solution here. Feel free to change my labels as long as they’re consistent. — Vorziblix (talk · contribs) 12:06, 21 September 2019 (UTC)
@Vorziblix: I would simply use "East Church Slavonic". If there is no objection. ПростаРечь (talk) 12:44, 21 September 2019 (UTC)

Create Old East Church Slavonic with language code cru - a mini-voteEdit

Support
  1.   Support We have a lot of material in this language and it's distinct. --Anatoli T. (обсудить/вклад) 12:06, 17 September 2019 (UTC)
Oppose
  1.   Oppose as mentioned above. I would treat Russian Church Slavonic as a dialect of Church Slavonic, not as a separate language. (And our ad-hoc language codes always begin with an official ISO code, so if we do create a new code for RCS, it should be something like sla-cru or sla-rcs, not just cru, since that is already a deprecated code for a variety of the w:Karu language.) —Mahāgaja · talk 12:12, 17 September 2019 (UTC)
  2.   Oppose Apart from being invented it is needlessly clumsy. Fay Freak (talk) 13:37, 17 September 2019 (UTC)
  3.   Oppose Not ISO compliant. There is already a language with this code as well. —Rua (mew) 17:15, 19 September 2019 (UTC)
Abstain
  1.   Abstain Bad code, as pointed out by Mahagaja, and unfortunately ad-hoc name. Apart from that I wouldn’t object to keeping the Church Slavonic recensions distinct. — Vorziblix (talk · contribs) 21:48, 17 September 2019 (UTC)

Change canonical name of cu from "Old Church Slavonic" to "Church Slavonic" and create dialect tags and categories for the various recensions of Church Slavonic - a mini-voteEdit

Support
  1.   SupportMahāgaja · talk 12:32, 17 September 2019 (UTC)
  2.   Support This will work as well. --Anatoli T. (обсудить/вклад) 12:39, 17 September 2019 (UTC)
  3.   Support I support, but typographical conventions from this page in such a case should be revised. ПростаРечь (talk) 13:04, 17 September 2019 (UTC)
  4.   Support. Very practical. I assume the recensions and Old Church Slavonic will have their codes like grc-aeo for Aeolic Greek, and we have extra module data for {{lb}}. We have also recently removed “Old Latin”, and I suppose that “Classical Syriac” should also be “Syriac” sooner or later if we respect that people used it in the 19th century or even use it, in so far as Latin is now “used”, as a literary language (we had codes for “Syriac” and “Classical Syriac” but people just got confused and added stuff for the latter under the former). Fay Freak (talk) 13:37, 17 September 2019 (UTC)
  5.   Support, with the proviso that main lemmas should be at the OCS spellings wherever possible, with post-OCS forms entered as alt-form entries. Also note that we already have some later Church Slavonic entries such as телѧ (telę) that should be updated if this succeeds; more can be found by running a search for "later Church Slavonic". — Vorziblix (talk · contribs) 21:48, 17 September 2019 (UTC)
  6.   Support but it's unclear for me how it affects Etymology/Descendants section, because there are 3 cases of Church Slavonic: OCS, NCS0/NCS1 (NCS without/with specified recension).
    • Before: (OCS) "{{cog/desc|cu|WORD}}", (NCS0) "Church Slavonic {{m/l|cu|WORD}}", (NCS1) "Russian Church Slavonic {{m/l|cu|WORD}}"
    • After1: (OCS) "Old {{cog/desc|cu|WORD}}", (NCS0) "[New] {{cog/desc|cu|WORD}}", (NCS1) "Russian {{cog/desc|cu|WORD}}"
    • After2: (OCS) "[Old] {{cog/desc|cu|WORD}}", (NCS0) "New {{cog/desc|cu|WORD}}", (NCS1) "Russian {{cog/desc|cu|WORD}}"
    • After3: (OCS) "Old {{cog/desc|cu|WORD}}", (NCS0) "New {{cog/desc|cu|WORD}}", (NCS1) "Russian {{cog/desc|cu|WORD}}"
    • ("Old" is required to avoid confusion if "New" is default, and vice versa). —Игорь Тълкачь (talk) 02:06, 21 September 2019 (UTC)
Oppose
  1.   Oppose Too great a risk of confusion. We can't count on every editor to label terms appropriately, which reduces the value of Wiktionary for those wanting to distinguish genuine old forms from later inventions. OCS was, at least in its original Bulgarian-Macedonian form, a very close reflection of the local language, which makes it valuable for historical linguistics. The later recensions, especially modern Russian forms, are not particularly useful for that purpose. If we do decide to merge the two, the model of Latin should be followed, with genuinely old forms unmarked and later inventions marked. —Rua (mew) 17:25, 19 September 2019 (UTC)
    We could not count on editors not adding non-old terms as Old Church Slavonic already. Now we want to make it more current for the first time. Fay Freak (talk) 10:48, 20 September 2019 (UTC)
    @Rua Can you offer any solution rather than opposing both options, so should this language (as e.g. in the Ostrog Bible) be ignored and not allowed to have entries? --Anatoli T. (обсудить/вклад) 09:05, 21 September 2019 (UTC)
    I offered the solution of following the Latin model, where "old" is considered the default and later forms/recensions get a context label. —Rua (mew) 10:47, 21 September 2019 (UTC)
    @Rua Basically, as it is now, right? Do you have any concerns at how entries are added now by User:ПростаРечь from the newer Old Russian recensions? --Anatoli T. (обсудить/вклад) 11:06, 21 September 2019 (UTC)
    More or less, but I think "Russian recension" is a better label and will probably be more widely understood, since it's the term used in general studies of CS. —Rua (mew) 15:44, 21 September 2019 (UTC)
    @Rua But we don’t call later recensions of Latin names like “Medieval Old Latin”. If we keep the L2 named “Old Church Slavonic” the recensions are “Russian Old Church Slavonic” and so on – the labels contradict the header. Also unlike between antiquity and the Middle Ages where history, the Dark Ages create a patent gab, it is not so clear where Old Church Slavonic ends, it slowly degrades. The difference between the vernacular and the literary language was never that great. Still in 18th century Russia one thought the Church Slavonic to be some kind of “High Russian”. And one can never be sure if something is “only late” or rather later authors have preserved something old since we don’t have complete dictionaries of the ancient lect like with Latin. Fay Freak (talk) 14:05, 21 September 2019 (UTC)
    That's a naming issue more than anything. The problem I have is with forms missing yers, or worse, non-native outcomes of certain phonemes that clearly give a local colour to certain words. OCS is by default assumed to be an early form of Bulgarian-Macedonian, and therefore people will use that in historical assessments. If we have words with clearly Russian developments like ъ > o, ǫ > u and ę > ja then it does a disservice to the people using Wiktionary for historical linguistics, because they cannot tell that it's not Bulgarian-Macedonian in origin. We could require that everything be labelled and nothing left to chance, but that's hugely messy when OCS is at its core Bulgarian-Macedonian and the other recensions are basically mixed languages combining true OCS forms with the local language. —Rua (mew) 15:44, 21 September 2019 (UTC)
Abstain
  1.   Abstain unless we figure out how to map existing uses of cu to cu-old or whatever code is chosen for it. For example, generally when I have entered references to Old Church Slavonic, I use {{cog|cu|...}} or {{bor|cu|...}}/{{der|cu|...}} whereas if I need to enter a reference to e.g. Russian Church Slavonic, I say "Russian Church Slavonic {{m|cu|...}}". If this convention is generally adhered to, we can (maybe) replace all uses of cu in cog/bor/der with cu-old. My concern is that editors have been entering OCS terms using the cu code since that's what its canonical name is, and this info will be lost if we switch the name to just Church Slavonic. Benwing2 (talk) 04:16, 18 September 2019 (UTC)
    @Benwing2 Switching the header by bot and then adding the label “Old Church Slavonic” via {{tlb}}? There aren’t many possibilities thinkable. Some entries might already be not Old Church Slavonic but another Church Slavonic, but by that execution the entries do not become any more wrong. Fay Freak (talk) 10:48, 20 September 2019 (UTC)
    @Fay Freak You're still thinking in terms of entries. The main problem is in etymologies. Right now, {{cog|cu}} displays "Old Church Slavonic", and people have been adding various templates to the etymologies with the "cu" code for many years with the expectation that the display would be exactly that. The second the change is made to the module, all of those etymologies are going to say "Church Slavonic", including those that don't link to any entry. If we don't change all those etymologies to say that it's Old Church Slavonic they're referring to, an etymologically-important distinction will be lost.
    A few of those etymologies may be already referring to later Church Slavonic, but I would imagine that to be very rare. Switching the name would mean that the rate of deceptive naming would switch from almost all okay with a few rare exceptions to almost all deceptive with a few rare exceptions. OCS is a very important language in etymologies, so we need to come up with a solution before going through with this change. Chuck Entz (talk) 21:10, 20 September 2019 (UTC)
    I have not mentioned what I imagine to be then: under the assumptions of existing usage you have made and which I share, one would change these usages to an etymology-language code, cu-ocs or OCS I deem most likely. Then analogously what I have said about entries: Some stated Old Church Slavonic words might already be not Old Church Slavonic but another Church Slavonic, but by that execution the statements do not become any more wrong. It would not switch to “deceptive” in any case. I imagine that for entries the “Old” part goes to {{tlb}} and in links in etymology and descendant sections the code “cu” is replaced with the new code for Old Church Slavonic: Apparently one first creates etymology-only code, then switches cu in etymology and descendant sections to it, then renames L2 sections by removing “Old ”.
    In etymology sections there are by the way probably comparatively few – though from a Slavic viewpoint I can only stress the importance of Old Church Slavonic –, for example I get only 110 hits for the search terms "Cognate to" "Old Church", i. e. any mainspace space that contains the wording “Cognate to” and "Old Church", which does not even comprise only pages which do mention Old Church Slavonic words in the sections in question but also Old Church Slavonic pages. Whereas in Proto-Slavic entries there is virtually always Old Church Slavonic meant; editors have been wary enough to use formattings likeRussian Church Slavonic {{m|cu|...}}, and other editors usually have not added any Church Slavonic term at all.
    I don’t see what I could have missed: If 1. we change cu to return “Church Slavonic” instead of the former “Old Church Slavonic” but in etymology and descendant sections change the occurences of cu that return a name to cu-ocs before and 2. in entry pages we change the L2 headers to have “Church Slavonic” instead of the former “Old Church Slavonic” but in the same bot edit put “Old Church Slavonic” into {{tlb}} unless there are contrary labels (as those ПростаРечь has now deployed), then we do not change any statements. Fay Freak (talk) 01:04, 21 September 2019 (UTC)

Old Church SlavonicEdit

(from the Ostrog Bible)

Declension of swedish uncountable nounsEdit

Do we have a template for that? I tried looking in Category:Swedish_noun_inflection-table_templates but found none that fit. I need it on tull. Thanks in advance.--So9q (talk) 16:37, 16 September 2019 (UTC)

Is tull not countable in the sense of “custom house”? The Swedish Wiktionary lists a plural, not only for the sense tullstation but also for the sense avgift som betalas när vara förs över gräns. Can’t you say tullarna är höga (for which “customs duties” may be a better translation than “toll”, which is more like vägtull )?  --Lambiam 20:35, 16 September 2019 (UTC)
There are two declension templates for uncountable nouns:
 --Lambiam 20:43, 16 September 2019 (UTC)
Yeah, you are right. Thanks for the help. As an aside I asked Gamren to create a new ACCEL template for swedish like he did for danish.--So9q (talk) 20:54, 16 September 2019 (UTC)

Unhide request entriesEdit

  • I am of the opinion that categories for requests for various things like translations, etymologies, definitions et cetera should not be in “hidden categories”. Now only editors who have opted in in their settings see from the mainspace that there are categories like Requests for etymologies in Russian entries. If they were displayed then users who don’t know about them but are inclined to solve them could be lead into much-needed partipication.
  • A related issue is that the etymology request entries however are cluttered. Category:Requests for etymologies in Latin entries counts 3,550, but the bulk is names and one does not see the bigger fish to fry. Since it is likely that one is interested to solve appellative nouns but not proper nouns and on the other hand people who are interested in proper nouns likely want to solve personal names, demonyms, settlement names, hydronyms, and the like asunder, and these are special fields with special sources and dynamics, I propose that we add a parameter to {{rfe}} / {{rfelite}} to sunder the requests into subcategories at least thus far. Fay Freak (talk) 12:55, 17 September 2019 (UTC)
I'm not quite sure what you're driving at. For example click on "edit" for trading post, it has 10 hidden categories, at the bottom "This page is a member of ten hidden categories" - click on that and they are all revealed. DonnanZ (talk) 19:58, 17 September 2019 (UTC)
@Donnanz: The categories as visible under the page, not the editing window. I have a line “Hidden categories” where I find request categories, if a page is in such a category, under every page because I have set it up in my preferences but if people don’t set it they don’t see these categories. I have argued to unhide the requests to optimize user attraction. Fay Freak (talk) 20:18, 17 September 2019 (UTC)
@Fay Freak: No, I don't think that is necessary. Looking in the translations section for trading post one can see a few red-linked entries, so it's obvious there is no entry, as well as languages marked "please add this translation if you can". It's worth mentioning that in some cases blue links can be false, appearing in one language but no entry exists in another language spelt the same. DonnanZ (talk) 20:39, 17 September 2019 (UTC)
@Donnanz: I mean that people could follow the category links to find more pages where translations are needed. Fay Freak (talk) 20:40, 17 September 2019 (UTC)
You can always add {{t-needed|+ code}} for any missing language, which will generate a request. DonnanZ (talk) 21:06, 17 September 2019 (UTC)
@Donnanz: That’s not what I am about. It’s that people should find the category to find the requests. Now it’s hidden. Fay Freak (talk) 03:17, 18 September 2019 (UTC)
I would like to see a category for images (Wiktionary:Beer_parlour/2019/September#A_category_for_images?), but I don't think it's going to happen. DonnanZ (talk) 12:45, 18 September 2019 (UTC)

First attestations in the etymology sectionEdit

I'm interested to know what other editors think of the following format:

  1. Special:Diff/45606179/54157995
  2. Special:Diff/52498442/52718625
  3. Special:Diff/52513026/53352340

Imagine the entry for England#English with the following under the etymology section:

 

Attested in The Canterbury Tales, 14th century, as Middle English Engelond.
Attested in A Looking Glass for London, 1594, as Early Modern English England.
Also attested in The History of England, 1754-1761 as Modern English England.

 

According to Wiktionary:Etymology, etymologies should be brief and include a simple list of previous forms.

I would prefer to see "From Middle lang term1, from Old lang term2, from Proto-lang term3." rather than "First attested in work W, 15?? as term1. Also attested in work X, 16?? as term2. Also attested in work Y, 16?? as term3 ..."

Shouldn't those statements be added as quotations instead? My impression of "first attestation" is that it implies that the word is a newly coined word that first appeared in that particular written work. For example, we have English words first attested in Chaucer and Category:English terms first attested in Shakespeare.

So what shall we do about these etymologies? Move them to the Citations namespace? Continue to add multiple first attestations? KevinUp (talk) 16:27, 17 September 2019 (UTC)

Yes probably they should be added as quotations. But I wouldn't just move them to the citations namespace (unless you actually have quotations) where they will be forgotten about. DTLHS (talk) 16:31, 17 September 2019 (UTC)
The contributor would seem to be following and extending our common practice of burying definitions in favor of alternative forms, pronunciations, and lists of cognates, just to mention what can appear in each L3 section above the definitions. DCDuring (talk) 17:12, 17 September 2019 (UTC)
I don't fault B2V22BHARAT for this formatting, since there was precedent in Korean entries. Ideally, the Middle Korean entries should actually be created, and it should be moved there, IMO. —Suzukaze-c 19:28, 17 September 2019 (UTC)
Yeah, as I recall, this was the previous format and both of us converted it into this. I think the "also attested" parameter was misused because it was originally meant for terms that had slightly different spellings. I think it would be better to add quotations at the Middle Korean entry and indicate only "from Middle Korean X" in the etymology section for a less cluttered appearance. KevinUp (talk) 19:53, 17 September 2019 (UTC)
@Atitarev, Metaknowledge, TAKASUGI Shinji Any comments? I would suggest these two options:
  1. Create proper entries for the attested form in Middle Korean.
  2. Move these statements to the citations page. Suzukaze-c has created some such as Citations:잡다. More can be found here. KevinUp (talk) 02:24, 18 September 2019 (UTC)
    I support making Middle Korean entries — it's always better to document extinct languages rather than write etymologies as if they don't deserve entries. —Μετάknowledgediscuss/deeds 03:50, 18 September 2019 (UTC)

The hidden category Category:Korean etymologies with first attestations that need to be moved to Middle Korean entries has been created for cleanup purposes. I propose we use the following format for the etymology of native Korean words from now on:

  • Generic format:
> {{ko-etym-native}} From {{inh|ko|okm|-}} {{okm-inline|TERM|Yale-Romanization}}
Output Of native Korean origin. From Middle Korean TERM (Yale: Yale).
  • Examples:
Using 잡다 (japda) as an example:
> {{ko-etym-native}} From {{inh|ko|okm|-}} {{okm-inline|잡다|capta}}
Output Of native Korean origin. From Middle Korean 잡다 (Yale: capta).
Using 짧다 (jjalda) as an example:
> {{ko-etym-native}} From [[Modern Korean]] {{m|ko|져르다}}, from {{inh|ko|okm|-}} {{okm-inline|뎌르다|capta}}, {{okm-inline|댜르다|tyaluta}}.
Output Of native Korean origin. From Modern Korean 져르다 (jyeoreuda), from Middle Korean 뎌르다 (Yale: tyeluta), 댜르다 (Yale: tyaluta).

The reason for using {{okm-inline}} is because Middle Korean uses Yale romanization which is different from Revised Romanization used by South Korea for modern Korean. For example, Middle Korean 뎌르다 is tyeluta not dyeoreuda; Middle Korean 잡다 is capta not japda.

And of course, terms such as Modern Korean 져르다 (jyeoreuda), Middle Korean 뎌르다 (Yale: tyeluta), 댜르다 (Yale: tyaluta) deserve their own entries with quotations, not mere mentions in the etymology section.

If anyone is opposed to the usage of this format please state here. KevinUp (talk) 11:20, 19 September 2019 (UTC)

I understand what you're trying to do. However, I don't understand how reconstructed words(or consonant), namely the "Proto-Indo" European words, which has no record, only based on ideas, can be justified in favor of Latin and Greek Cognates, which has actual records.
To be specific, I fully agree on Latin(cor,cordis)---> Heart(Modern English) shift, because there is actual record of cor, cordis on Latin, but I'm not convinced of the kerd--> Heart part, since 'kerd' is merely a reconstructed word, which has no record that your ancestors have used it.
More examples: quod(Latin)--> what, centum(Latin)--> Hundred --> OK.
Kwod(Latin)--> What, Kemtom(Latin)--> Hundred --> I'm not convinced.
Sincerely, B2V22BHARAT (talk) 13:49, 19 September 2019 (UTC)
If you can present to people the actual usage of Kwod and Kemtom(or at least K*-) in other languages, such as German, Portugese, Spanish, French, etc, then I think people including myself will be more easily convinced. B2V22BHARAT (talk) 14:07, 19 September 2019 (UTC)
For example, like this: *kerd-
Proto-Indo-European root meaning "heart."
It forms all or part of: accord; cardiac; cardio-; concord; core; cordial; courage; credence; credible; :: credit; credo; credulous; creed; discord; grant; heart; incroyable; megalocardia; miscreant; myocardium; :: pericarditis; pericardium; quarry (n.1) "what is hunted;" record; recreant; tachycardia.
It is the hypothetical source of/evidence for its existence is provided by: Greek kardia, Latin cor, Armenian sirt, Old Irish cride, Welsh craidd, Hittite kir, Lithuanian širdis, Russian serdce, Old English heorte, German Herz, Gothic hairto, "heart;" Breton kreiz "middle;" Old Church Slavonic sreda "middle."
I don't know why Greek, Hittite and Breton language are chosen as representation of Proto-Indo European language, but at least in this presentation I can somewhat understand *kerd- sense. Sincerely, B2V22BHARAT (talk) 02:20, 20 September 2019 (UTC)

Images in non-English termsEdit

Hi, I searched the archives in here and WT:EL and found no information about how to handle images for non-english terms.

My questions is whether it is a good idea to include images on a page for every language of a term. E.g. the article bolt has 3 images on the english tab. Would it be a good idea to copy those to be shown also on the Danish, Old English and Norwegian tabs?--So9q (talk) 19:22, 17 September 2019 (UTC)

I don't think it's a good idea to use the same image in every language entry for the same meaning. That is a bit boring. Other suitable images are often available on Wikipedia Commons. DonnanZ (talk) 20:55, 17 September 2019 (UTC)
What makes sense for users who use tabbed languages does not make sense for those not using that gadget. Sometimes you can't both have your cake and eat it. DCDuring (talk) 23:15, 17 September 2019 (UTC)
Yes at least for things that don't usually have a term in other languages (like Finnish kalakukko, a loaf of bread with a fish baked into it — though this example is spoiled by the fact that we seem to count it as an English word too). Equinox 14:52, 21 September 2019 (UTC)
It is a good thing to add varying images – but boring and wasting bandwidth to have the same –, there are so many unused images, and many things can benefit from multiple images, and if all do not fit on the English page we can at least have multiple across languages. Look at оман / oman, elecampane in all Slavic languages, as an example. It would be very silly to repeat the same image, innit. Effectively it’s one dictionary entry for one word, having inflections and pronunciations for multiple languages. Fay Freak (talk) 15:13, 21 September 2019 (UTC)
I don't think wasting bandwidth is a real consideration: including the same image (it's only a link!) in multiple entries is only a few dozen bytes for the markup. We aren't actually making copies of image files. Equinox 15:39, 21 September 2019 (UTC)
Ultimately this is yet another thing that could, in theory, be solved by separating "meanings" from "renderings" (a bit like HTML vs. CSS, heh): if there is a general concept "an apple" and 3,000 languages have words for it, then the image really belongs to the concept and not to the words. I know it ain't that simple but this issue will keep coming up, and those OmegaWiki people seem to have realised it. Equinox 15:41, 21 September 2019 (UTC)
That was thought for the case when one accesses a foreign language entry defined as “X” with an image and then one clicks on the definition only to find the same image in the English entry. This would load the image twice assuming it is not cached. Fay Freak (talk) 16:20, 21 September 2019 (UTC)
But it would be cached since you just came from a page on the same domain containing the same image. Equinox 18:17, 21 September 2019 (UTC)

Three Questions on Hebrew EntriesEdit

Just a few questions: 1) why do we mark words with bekadgefat letters? Aren't those words 100% predictable? Maybe we should mark irregular pronunciations/ stressings instead? 2) Why do latinizations even for monsyllabic words have accents? 3) Do we capitalize proper noun latinizations? This one applies to more than just Hebrew. Starbeam2 (talk) 19:46, 17 September 2019 (UTC)

1) Probably because for beginners it is not yet so predictable, or it might be relevant if a Hebrew word is mentioned in an etymology section of an other language and the reader cannot be expected to know about it. Then the transcriptions include even this detail so they can just be copied. 2) They shouldn’t. Or perhaps they weren’t monosyllables because of lost schwas. 3) Opinions vary. Fay Freak (talk) 20:05, 17 September 2019 (UTC)
1. I mean, all of them have the same six letters each time. Even if could not read said letters, i could probably recognize the individual shapes. 2) Aye aye, guess i'll fix it. 3) I honestly think it should be the case, as latinization is the only time capitalization is required. Furthermore, i have one more question: 4) If i don't know the stress, but i know the pronunciation of a Hebrew word, can i make a latinization without stress and mark it as such? Starbeam2 (talk) 19:40, 18 September 2019 (UTC)

Policy for Tungusic EntriesEdit

Hey all - I've been editing the Tungusic section of Wiktionary for a little while now, and I'm finding it extremely frustrating to add entries correctly or consistently due to the lack of coherence among experts in how they write things in their papers. And even more frustrating is the fact that currently, I have to convert these Latin-script texts into Cyrillic, when many of the languages do not have a clear, defined orthography or conversion protocol from converting between Latin and Cyrillic. What does one do about this? It seems as if each expert has their own system for representing sounds - some use IPA-based transcription, and some use one that represents the underlying vowel harmony rather than the actual phonetic realisation - both have their merits, in my opinion. And due to the patchy documentation available online, it's extremely difficult, and in some cases, impossible to determine how several different systems represent the same word. Then there's the transcription into Cyrillic, which poorly represents the sounds and is not standardised, which I feel leaves a huge possibility of inaccuracy in entries - something which I feel very uncomfortable about. I want to be comfortable that my entries represent exactly what is presented in linguistic journals, which I do not feel is currently possible.

To amend this, I believe that we should use the Latin-script orthographies used in the journals, even though there are several in use; then decide as a community on the Cyrillic standard to be used across all of the Tungusic languages, which accurately represents the information contained in the Latin orthographies, before transliterating the Latin entries as Cyrillic ones. The vast majority of the Tungusic languages do not have a standard form, or any form at all that is widely utilised - the exceptions being the likes of Evenki and Manchu. The Evenki orthography however, is plagued with all the difficulties of the other languages. Manchu, in my opinion, is highly regular and standardised across all circles of experts and their literature; and thus, does not need adjustment. I'd rather, personally make use of several accurate orthographies, than use one without a standard, that I have to convert from Latin myself, and that is full of inconsistencies and inaccuracies.

Then there is also the question of dialects - Tungusic is made up of many dialect continuums, and I feel that these dialects should be represented accurately, distinctly, and clearly, which is currently not the case. We as a community need to decide on the dialect categories for each of the languages and do our best to label each entry with them. This, in my opinion, is a major part of accurately representing the languages as they are spoken/were spoken.

Please do give me your thoughts on this - I'd love to see this resolved so we can increase the quality of our coverage of these absolutely fascinating languages. TheSilverWolf98 (talk) 00:42, 18 September 2019 (UTC)

Since I've not had any replies, I've created a page that lists Oroch words extracted from numerous academic journals, just to illustrate the variation in the ways linguists represent this language. And how little overlap there is between papers in terms of content. Oroch Wordlist. The case is similar to the one presented here for all Tungusic languages except Manchu. TheSilverWolf98 (talk) 01:54, 20 September 2019 (UTC)
The issue of using Latin vs. some more "native" script is a fraught one. I personally favor using Latin transcription for languages without a standardized native script. This includes cases like Moroccan Arabic and maybe Egyptian Arabic, where (especially in the former case) the Arabic script cannot accurately represent the sounds of the language without extra diacritics and such that (in practice) are never used. Benwing2 (talk) 16:26, 20 September 2019 (UTC)
BTW if we do use Latin transcription, I'd much prefer that we pick one of the academic systems in use (probably whichever one is most common or well-documented) and convert all other representations into that one. Otherwise it will be total chaos for users trying to actually read the entries. Benwing2 (talk) 16:28, 20 September 2019 (UTC)
The method of transcription I see most often (probably because there are many articles by him made available online) is the one used by J A Alonso de la Fuente, though Peter Piispanen, Alexander Vovin, Sergei Sarostin, and others use their own transcriptions. Due to the lack of overlap between the papers (in that they all present different items of vocabulary), it is difficult to see how to convert one to another. I personally am a fan of Alonso de la Fuente's transcription system, as it very clearly displays vowel harmony, and makes use of some simple diacritic sets. If you visit my Oroch Vocabulary page, which I linked above, you can see many examples of his transcriptions. Of course, I'd still like others to offer up their ideas on this. TheSilverWolf98 (talk) 01:06, 21 September 2019 (UTC)

Automatically replacing "Foolang {{m|bar|...}}" with "{{cog|bar|...}}"Edit

Hi. I have written and run a script to automatically replace "{{etyl|FOO|...}} {{m|BAR|...}}" and "Foolang {{m|BAR|...}}" (and similar variants) with {{cog|FOO|...}}. Basically, it looks for expressions of this sort preceded by "Cognate with/of/to" or "Cognates include" or "Compare with/to". It is smart enough to handle chains of terms of the sort "Compare Low German {{m|nds|dick}}, Dutch {{m|nl|dik}}, English {{m|en|thick}}, and Danish {{m|da|tyk}}". It is also smart enough to handle etymology languages. When running over the 20190901 dump, it finds 30,692 replaceable cases on 16,441 pages. However, it also finds 1,733 cases where it can't do the replacement due to an unrecognized language name, a language name not agreeing with the code, etc. Some of these cases have to be handled manually, but some can be automated. For example:

  • There are 506 cases of the form "Danish and Norwegian {{m|da|...}}" or "Spanish and Catalan {{m|es|...}}" or similar. How should we handle these?
    1. replace with e.g. "{{cog|es|TERM}}, {{cog|ca|TERM}}" (which duplicates the term — although that isn't necessarily bad — but includes links to both-language variants of the term);
    2. replace with e.g. "{{cog|es|-}} and {{cog|ca|TERM}}" (which preserves the same appearance except with properly linked language names, but doesn't allow for a link to the term in the first language);
    3. replace with e.g. "{{cog|es|TERM|lang2=ca}}" (which requires changes to the implementation of {{cog}} that I could make; we'd have to decide how to display this, e.g. maybe the first language could display as a language name but link to the term);
    4. leave as is.
  • There are 60 cases of language name "Hindustani" followed by the Hindi and Urdu forms, e.g. on پل we have "Hindustani {{m|hi|फूल}} / {{m|ur|پھول|tr=phūl}}". How should we handle this? Maybe replace with e.g. "{{cog|hi|फूल}} / {{cog|ur|پھول|tr=phūl}}"?
  • There are 35 cases of language name "Mooring North Frisian" along with language code frr (North Frisian). There's no etymology language "Mooring North Frisian", maybe we should create this?

Benwing2 (talk) 04:05, 19 September 2019 (UTC)

Generally, I am in   Support of the replacement of "Foolang {{m|bar|...}}" with "{{cog|bar|...}}" if this occurs after the keywords "Compare ...", "Cognate of ...", "Cognates include ...". There will also be some cases of "Compare unrelated ..." that will need {{noncog}} instead. KevinUp (talk) 08:24, 19 September 2019 (UTC)
A few more things:
  • Some users are incorrectly using {{etyl}} instead of {{cog|-}}. I found some (55 entries) by searching for:
The ones that have "Cognate to {{etyl|lang|-}} [[term]]" can be automatically replaced by {{cog|lang|term}} while the rest that have "Cognate to {{etyl|lang1|lang2}}" will need to be hand-checked.
  • This one is totally unrelated, but there are a lot of entries using "[[term]]" instead of {{l|lang|term}} under "Synonyms", "Derived terms", "Related terms", etc. I suppose this can be automatically done by bot? KevinUp (talk) 08:24, 19 September 2019 (UTC)
@KevinUp I have a script to do this. I ran it before on certain languages, mostly languages with non-Latin scripts. It's safer to do that because you can check the script of the link to make sure it's correct, which helps weed out raw links to English terms. Currently it has the properties of certain languages hardcoded in it (the full name and language code, ranges of script characters, and how to strip accents from the link to see whether a two-part raw link can be converted to a one-part templated link), but I'm pretty sure I can get this info from the languages modules. Let me see if I can resurrect the script and get it working on all languages. Benwing2 (talk) 02:52, 20 September 2019 (UTC)
BTW any languages you know of that are particular offenders? Benwing2 (talk) 02:54, 20 September 2019 (UTC)
For non-Latin script languages, raw links are mostly present in recently created entries (2018-2019). As for Latin-script languages, I've come across raw links for Spanish and Italian in older entries. KevinUp (talk) 08:01, 20 September 2019 (UTC)
I just realized that the main offenders are actually English entries. For example, I recently fixed historical method#English which had raw links since 2008 when {{l|en}} was not yet used for semantic terms.
Also, I just came across this Finnish entry which had a lot of raw compound links. I've converted it to {{der4|fi}}, so you might want to run the bot on Finnish. KevinUp (talk) 16:58, 20 September 2019 (UTC)
@Benwing2: Mooring North Frisian is a dialect of North Frisian. Personally, I'd just replace it with "Mooring {{cog|frr|...}}", just as we might write "Australian {{cog|en|...}}". —Mahāgaja · talk 09:26, 20 September 2019 (UTC)
There is also the following layout: Compare [[w:Hebrew language|Hebrew]] {{m|he|רשם|tr=rasham}}. Fay Freak (talk) 10:41, 20 September 2019 (UTC)
@KevinUp I resurrected and cleaned up my script. I ran it for about 20 languages and it replaced 82,574 raw links on 50,400 pages. I then expanded it to 88 languages and reran it, and it replaced another 39,732 raw links on 15,704 pages. I then did a postprocessing run that should have gotten all or nearly all of the false positives (it found about 800 potential false positives, which I checked by hand and fixed as necessary). With a bit more work I could probably get it working on all 7,000+ languages but I think I'm reaching the point of diminishing returns. Note that I purposely didn't do English (because I'm not sure whether it's universally agreed to replace raw English links with templated links), also Chinese, Japanese, and Korean (because I'm not sure whether {{l}} is appropriate for them or if there are language-specific variants that should be used instead). If you have the answer to my uncertainty for any of these four languages, please let me know. Benwing2 (talk) 06:12, 22 September 2019 (UTC)
BTW most of the false positives were due to badly formatted entries in Finnish, Esperanto or Icelandic; not sure why these languages in particular were offenders. Benwing2 (talk) 06:17, 22 September 2019 (UTC)

Implementing the ISO 639-6 code for the Hachijō language, hhjm, into wiktionary.Edit

How would we go about implementing the ISO 639-6 code for the Hachijō language, hhjm, into wiktionary.

We don't use four letter codes, it would be something like und-hjm. DTLHS (talk) 02:17, 20 September 2019 (UTC)
According to the Wikipedia article, it's generally considered a dialect of Japanese. If we need something besides dialect marking, it'd be ja-hjm or something. I'd prefer if we used something consistent with w:IETF language tags; that is ja-hachijo (6-8 letter extension name that I'd be happy to try and register officially.)--Prosfilaes (talk) 12:42, 20 September 2019 (UTC)

Creating new entries with no definitionEdit

We have the rfdef template that allows us to add a sense line with no actual definition, so that Kiwima another user who knows the meaning can fill it in. (This is perennially abused by Wonderfool for phrases he has encountered in sports journalism.)

I have a pretty large list of "good" words that are definitely CFI-attestable but whose meaning I can't work out, usually because it's very specialised (particle physics etc.) though some might be regionalisms or what not. I'm tempted to start creating entries for these, since some info can be given (part of speech, pronunciation, 3 citations, etc.) even without a definition — and perhaps users are more likely to work on them than they would be with a big list of red links.

Would people approve of this or object? Equinox 14:46, 21 September 2019 (UTC)

  Support Of course you can do as much as you know, and you cannot be blamed for being lazy if defining the term requires special knowledge or a clear mind you do not have. If you leave requests, it is inviting (especially if the request categories won’t be hidden). And English coverage is probably at the point where definitions become harder for that reason. Fay Freak (talk) 15:00, 21 September 2019 (UTC)
  Support I reserve the right to change my mind if this is somehow abused. I have done this a few times, but usually because I wanted to look something up, because the citations I found didn't support the first definition that I had added, or because I was interrupted before finishing. DCDuring (talk) 19:04, 21 September 2019 (UTC)

Getting rid of dialects from "other names" in Module:languages/data2, etc.Edit

I would like to either delete dialects from the "other names" section of languages (e.g. other name "Italian Walser" for language "Alemannic German") or move them to a separate "dialects" section. IMO, "other names" should only contain synonyms for the language (e.g. Farsi = Persian, High German = German, Slovenian = Slovene, Serbo-Croat = Serbo-Croatian, Daco-Romanian = Romanian, etc.). Otherwise, certain bot jobs get much harder. Any objections? Benwing2 (talk) 08:49, 22 September 2019 (UTC)

A dialects field seems a better approach than just deleting dialect names. There may be hairy issues, like of which language a given dialect in a dialect continuum is a dialect. But there is no hard rule that the dialects of different languages form disjoint sets. (The current coding also allows a language to have several ancestors, like for Saramaccan). For flexibility, I can envisage a scheme in which dialects are treated on the some footing as languages, with their own codes, which then would require a field status, with values like "language" and "dialect" but also allowing a later extension to "language family". And for recording synchronic relationships there should then be fields like parents and members, where the latter could replace dialects.  --Lambiam 10:49, 22 September 2019 (UTC)
Yes, those should go into a new field in the language data. The reason why they shouldn’t be removed is that sometimes it is hard to find how a language is named on Wiktionary because the spelling of the language name varies. Having synonymous or in this case often used pars pro toto names there helps to find the language category. Fay Freak (talk) 13:05, 22 September 2019 (UTC)

Etymologies: categorization vs redundancyEdit

In my opinion, etymologies should be as succinct as possible while still giving enough information for it to be useful. For instance: this etymology of French bedeau has so much information it could be confusing to a reader. This one ignores the terms Germanic origins, which is interesting and useful to me at least. And this one conveys the right amount information in my opinion, however its categorization is not exhaustive.

A compromise could be to have etymology categorization without it being stated explicitly in the written etymology. That still doesn't completely solve the "how much etymology" dilemma (do we want to still categorize Cebuano sin as being from Old High Germanic? or Icelandic skunkur from Proto-Algonquian?), but it'd be easier to write concise etymologies without sacrificing categories.

I have a (very) rudimentary outline of what a categorization template might look like here: User:Julia/etycat. Let me know if this is something that people would be on board with, or if anyone has other/better ideas about how we can improve our etymologies. Julia 15:07, 22 September 2019 (UTC)