Open main menu
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← July 2019 · August 2019 · September 2019 → · (current)


Words that are both borrowings and compoundsEdit

In Oto-Manguean languages (among others), many nouns have a generic prefix indicating what type of thing they are. These prefixes are also added to loanwords, which poses a problem for Wiktionary templates because Template:prefix, Template:compound etc. only allow morphemes from within one language.

E.g. for ntamesá, I can't use Template:compound to give its etymology as inta +‎ mesá, because mesá doesn't exist as an independent word. But if I just use Template:bor, it doesn't show the relationship to inta at all.

Is there a way to produce something like "inta + Spanish mesa", so that it categorizes it as both a compound and a borrowing? --Lvovmauro (talk) 12:13, 1 August 2019 (UTC)

I think {{affix|poe|inta|mesa|lang2=es}} will do what you want. —Rua (mew) 16:09, 1 August 2019 (UTC)
Something like {{affix|poe|inta|{{bor|poe|es|mesa}}}} also seems to work / (talk) 01:13, 7 August 2019 (UTC)
That produces bad HTML: "Spanish mesa" will be italicized and marked as San Juan Atzingo Popoloca (poe) text. — Eru·tuon 01:47, 7 August 2019 (UTC)
Noted. / (talk) 19:55, 7 August 2019 (UTC)
Never a good idea putting templates inside of templates. --{{victar|talk}} 22:52, 7 August 2019 (UTC)
I do not subscribe to this apodictic statement. You say this about the positions of the linking templates reserved for linking in a certain languages and you have said this about |tr= and similar. It depends on what the templates do. Often {{taxlink}} belongs into |t=. {{w}} belongs into quotation templates. Even simpler: {{circa}}, {{...}} and similar replacement templates. But {{affix|poe|inta|{{bor|poe|es|mesa}}}} is bad of course. Fay Freak (talk)
Arguably, it should be required that templates be nestable or tolerant of nesting. If not they should come with warning labels. DCDuring (talk) 01:18, 8 August 2019 (UTC)
A template that is enclosing another template can only see the wikitext generated by the inner template. In theory, one of the modules that {{affix}} uses could try to strip language-tagging from the input that it gets, but it would be complex and would probably cause more Lua memory errors. It's better to scan the dump to find instances to fix. Here are cases like the one above (basic etymology templates inside the "term" parameters of the basic morphology templates). — Eru·tuon 03:37, 8 August 2019 (UTC)

Linkages to limited entriesEdit

Being more of a user than contributor, I plead guilty of inadequate mastery of the Wiktionary facilities. Often I don't even know where to look. (I have only recently discovered how to enter an example, for example.) Now one thing that has frustrated me considerably: Suppose I wish to link to a headword that has many valid definitions, and all of them occur under one headword with separate definitions, but the link I want is a specific one in the list of definitions let us say "lustre", a noun, for which we have four entries, but suppose I want to refer to precisely the third. (Or the first entry under the verb FTM). Is there a facility for that? If so, I would be grateful for a link. JonRichfield (talk) 09:54, 2 August 2019 (UTC)

There is no easy way, but I think the template {{anchor}} can be used for this purpose. If you replace the third sense for the noun "lustre” by # {{anchor|glass ornament}} a glass ornament such as ..., then [[lustre#glass ornament|lustre]] will look as before but link directly to the glass-ornament sense. I do not know if there are arguments against this approach – except that it is clumsy.  --Lambiam 18:46, 7 August 2019 (UTC)
@Lambiam, JonRichfield We have {{senseid}} specifically for this purpose. —Rua (mew) 15:14, 8 August 2019 (UTC)
I have created a documentation subpage for the template {{anchor}} and added: “See also: {{senseid}}”.  --Lambiam 18:25, 8 August 2019 (UTC)
@Lambiam, Rua Many thanks to both of you. I shall investigate. JonRichfield (talk) 18:30, 9 August 2019 (UTC)

Gerund at bayEdit

There has been an exchange at and about gerund. The discussion had been too involved to permit me to go into detail here, but I have posted an example to illustrate what bothers me. First of all, there is a lot of argument about whether the concept still is viable in English (See here for more detail), but I am not taking sides in that connection, though I am increasingly uneasy about extending the verbal noun concept to cover what is seen as for example verbal adverbial functions. It does however apparently exclude participles, in particular present continuous participles, which seems to me to amount to straining at straws that have been passed by camels. ("I smell a rat, I see it in the...") To add aggravation to lesion, the English article gerund currently has a Russian example that has only an uncomfortable equivalent in English, that isn't a verbal noun, and I am not comfortable with the idea of seeing it as a gerund at all. To illustrate the concept, I have added a natural Afrikaans construction that seems to me nearly exactly equivalent, given that my mastery of Russian stops short about at da, nyet, and tovarich. Now, I don't know where this is heading, but could some members please have a look at gerund and the examples, and offer comments or pronouncements? JonRichfield (talk) 10:13, 2 August 2019 (UTC)

Roughly speaking, a gerund is a verb form that some linguists choose to call a “gerund”. I do not think any other short description will cover the meaning across the spectrum of languages. And it is hard to find two modern linguists who agree on which forms to call a gerund. The concept is a remnant of the outmoded conception that Latin is the perfect language with the perfect grammar, and that more barbaric languages such as English should be described in the terms developed for Latin grammar.  --Lambiam 18:33, 7 August 2019 (UTC)
At a glance, it's easy to get the impression that the English translations, One shouldn’t cross a street while reading a newspaper and That fellow is crossing the street while reading!, are supposed to be examples of gerunds in English, which they are not. Mihia (talk) 13:28, 8 August 2019 (UTC)
We are lapsing a bit into prescriptivism here. If published works use the word gerund "carelessly", we should include the "careless" definition as well as at least one generally-accepted-accepted-as-correct-by-grammarians definition.
Can an English deverbal term ending in ing take a plural while also being a pure gerund? To me the ing-forms that I can think of that have a plural form seem to be full-fledged nouns, eg, tracings, rubbings, drawings, parsings. Gerunds seem to behave generally like uncountable nouns. If so these might be useful to exemplify in the entry or mention in usage notes. I haven't subjected these notions to testing or to the authority of CGEL (2002), but perhaps we have an expert reading this. DCDuring (talk) 18:09, 20 August 2019 (UTC)


I don’t know if this is the right place to discuss this, but an anon has repeatedly undone my edits at صدر, without giving any explanations on whether my revision should not be kept. Is there a way to solve this? I’m pretty sure a temporary block would be useless, as the user would immediately revert my edits as soon as it expires. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 10:46, 4 August 2019 (UTC)

I really don't know who to believe here: this is probably a problem IP editor in Saudi Arabia that @Fay Freak has been battling- they have a very narrow, prescriptive view of their native language, and revert all kinds of reasonable edits. You, on the other hand, have a long, long history of making bad edits in languages you don't know- I can't trust you to actually know what you're doing. We'll have to wait for someone who knows the language to sort this out. Chuck Entz (talk) 15:49, 4 August 2019 (UTC)
But here it is only about the formatting. Indeed the IP does not explain (because its English is bad), but I don’t see a point in IvanScrooge98’s formatting either. The numbered pronunciation sections are not uncontroversial. We wanted to get rid of them for long, and recently one even argued to get rid of numbered etymology sections (won’t search out all the discussions now). In any case, the IP has correctly seen that IPA pronunciations are a bad ground to format all the page around it. Like I have recently said that the categorization of terms in a language as derived from a proto-language X is little a reason to duplicate etymologies everywhere, putting things everywhere that otherwise you would not put there but let stay at other places, so pronunciations whether in IPA or in audio do not justify to split apart all the main content by “pronunciation” sections – or by “Etymology” sections only because of different pronunciations, as has not seldom been done even if the given etymology is the same (“from the root Y”), often aggravating the obnoxiousness of the formatting by adding the same reference templates under every one of them 😵. The pronunciation sections are not of equal importance for every language: In English and Chinese one needs them, in most other languages and in Arabic the opposite is the case, and in Arabic a problem becomes frequent because different pronunciations can have the same spelling so we could to put a pronunciation to every POS section even though this is not necessary to know the pronunciation from the dictionary alone (but the vocalization or transcription is) and here the reader would ask “why would you do that?” – the Saudi IP has appreciated that after seeming me cleaning up the thus messed-up formatting of many Arabic pages.
What I opt for is to get rid of the pronunciation sections in Arabic entries altogether an adding a switch to inflection tables the readers can toggle to switch the transcriptions in the tables to IPA, or similar, and add parameters to include audio files in inflection table entries; it fits the language much more too, since else we only give a pronunciations for the lemma form which is one form of a hundred in verbs, and one of at least 15 in common nouns, else it looks quite arbitrary as opposed to English or Chinese where one does not have all these forms. Heavily inflecting or agglutinating languages which are written (ggf. with diacritics or transcriptions, but then even more so) unambiguously suggest to put pronunciations at a less showy place, while pronunciation sections are more for languages where the lemma pronunciation covers most and to give pronunciations is also required because the spelling leaves doubts about the pronunciation. WT:EL has not been written with all that in mind, sure, but is rather based on English-like requirements. So far it is already beneficial to realize the different information requirements in the treatments of various languages, not to make the entry layout a tool against legibility. Fay Freak (talk) 16:32, 4 August 2019 (UTC)
Sorry, forgot to reply. If you notice, I’ve been little active in the last few months and only edited when I was sure about what I was doing. Also, I am studying Arabic and I figured that order would have been the most sensible and complete, even though I agree we could take the pronunciations away if the division becomes cumbersome, especially since Arabic pronunciation is mostly predictable from the romanization. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 07:55, 6 August 2019 (UTC)

requesting AWB permissionsEdit

hey friends, i'd like to request AWB access to add noun class information to plural forms of Swahili words, and possibly a few other small tasks. i've created over 1,300 swahili entries here without issue, and i'm familiar with AWB having done a few tasks with it on enwiki before. i'll personally review every edit, and tag them appropriately, and before every mass task i'll write up a little description of it on my talk page as i did on enwiki. also, i plan on using JSWikiBrowser instead of AutoWikiBrowser, but it still depends on me being added to WT:AutoWikiBrowser/CheckPage so it should be no difference from the admin point of view. thanks, --Habst (talk) 19:21, 6 August 2019 (UTC)

@Habst: All set. - TheDaveRoss 12:21, 8 August 2019 (UTC)


I am new to Wiktionary, so I don't know where to post but I'm wondering about the use of colloquial Tamil on Wiktionary. Just as Egyptian Arabic and other colloquial Arabic dialects used frequently in speech but not in writing (while both Modern Standard Arabic and written Tamil are used in writing but not speech) have entries, would it be possible that I, a native Tamil speaker, could add in colloquial Tamil entries? This may be as simple as including the IPA under the written Tamil entry as a separate dialect similar to how both Received Pronunciation and American English pronunciations are given for English entries or I could also create separate entries and link the written and spoken Tamil forms in the same way that Persian, Dari, and Tajik entries are on Wiktionary. (For examples of the differences between the two registers, even the numerals 1, 2, 3, 4, and 5 are pronounced quite differently in the two: oṉṟu vs. oṇṇu, iraṇṭu vs reṇṭu, mūṉṟu vs mūṇu, nāṉku vs nālu, and aintu vs añcu.) Also, I've noticed a couple inaccuracies with IPA pronunciations on Tamil entries stemming from what I assume is an inaccuracy in the code for the Template:ta-IPA. Is there a way for me to edit this template? (By inaccuracies, I mean mostly the use of [dʑ] where [s] should be and [ss] where [tʃː] should be. Other than this, I haven't noticed anything.) —This unsigned comment was added by Wokj (talkcontribs) at 03:01, 7 August 2019‎.

hi Wokj, i see you posted this twice so i responded to you at the information desk: Wiktionary:Information desk/2019/August#Tamil. it's no problem this time, but better to post things in one place in the future so we don't have duplicate answers. --Habst (talk) 03:39, 7 August 2019 (UTC)
Ok, thanks! I'll be sure to post things in one place in the future. Wokj

Renaming Bella Coola to NuxalkEdit

I am by no means related to this people or language nor do I have any contact with or knowledge of it whatsoever, but I just a) read it more commonly called Nuxalk so I would imagine more people would recognize it that way
and b) read on its Wikipedia page that Nuxalk government prefers this name. I don‘t know any more arguments against/in favor of it, maybe someone else does.
I feel like this paragraph is worded and syntaxed a bit awkwardly, English isn‘t my native tongue and I mainly use it to write things, so I‘m not sure what sounds right. Please ask if anything is unclear. |Anatol Rath (talk) 11:58, 8 August 2019 (UTC)|

What does ISO call it? We mostly use their standards for language codes etc. Equinox 15:33, 8 August 2019 (UTC)
Just looked it up: it‘s blc, so bella coola |Anatol Rath (talk) 16:42, 8 August 2019 (UTC)|
Well, the code doesn't necessarily imply anything about the name, but does list the language name as Bella Coola. --Prosfilaes (talk) 07:58, 20 August 2019 (UTC)

Vote on "coalmine"Edit

Is there any appetite for a(nother) vote on the "coalmine" policy, whereby multi-word SoP entries are kept if the corresponding solid word can be attested? I don't know when the last major discussion of this happened. Mihia (talk) 12:57, 8 August 2019 (UTC)

I voted for COALMINE, but I don't like how it has often been used to keep a common spaced SoP by finding a tiny handful of obscure/nonstandard citations for the non-spaced form. I don't know what the solution is. Equinox 12:59, 8 August 2019 (UTC)
@Mihia: The vote happened ten years ago. I'd vote against the proposal. Canonicalization (talk) 13:05, 8 August 2019 (UTC)
We need to have some ideas if we want to revisit the vote. Otherwise we just have to be picky about the spelled-solid citations. DCDuring (talk) 13:07, 8 August 2019 (UTC)
At the time we were looking for some way of wasting less time on RfD discussions, which often led to inclusion of phrases by vote just because they were common collocations, no matter how transparent. DCDuring (talk) 13:09, 8 August 2019 (UTC)
To achieve what? Why care? Wiktionary:NOTPAPER. Also nobody has explained yet what the magic of a space would be why lacking spaces indicate inclusionworthiness. There are a lot of things that shouldn’t be included while being written together, and people fail to understand the compounds can also be just examples for the nomina simplicia: according to some “logics” one should include the term “Muselmanenmäusken” if used by three different authors and somebody removed the quote from Muselman because “the quote does not have the term”. But Muselmanenmäusken isn’t a term and won’t ever be with additional uses. But if somebody creates it with quotes, why remove the page Muselmanenmäusken? I can’t tell you why. And I can’t tell any reason more if German had spaces in compositions and that word were to be quoted thrice and created. What’s the difference? Spaces tell nothing at all on whether something should be kept or removed in a language, as also, as we have recently learned, not even the presence in a text in a certain language indicates in which language the “term” is – if in doubt, one string in a quote might attest a German simplex, a composed term and a Latin term too, if one cannot distinguish where languages end and where terms end. Romans had no spaces, so what? Include every sentence or every text as an entry? The policy-changes will continue to be Anglo-centric, one is unable to articulate or conceptualize what should be included because of unavailable language knowledge. Better not to add any policies, it usually removes people one step further from common sense. That being said, there was no reason to add WT:COALMINE and there is no reason to remove either as long as one does not see reasons to remove words of the coalmine type. I think this aporia is analogous to the distinction between language and dialect. Like one cannot wholly shed languages and dialects, in lexicography under unlimited ressources one cannot pin down the language of every string that can be quoted, and one does not see where compositions are so unnecessary that they should be deleted. Fay Freak (talk) 16:39, 8 August 2019 (UTC)
...why not?
NOTPAPER doesn't mean we should add literally anything arbitrarily, else why not add kitten pictures? Everyone likes those. We should hold ourselves to meaningful rules. Equinox 16:41, 8 August 2019 (UTC)
@Equinox You are right, I wouldn’t add anything. But if people put in the effort to document common compounds, though the recommendation is not to add you-know-what-we-talk-about one cannot reverse this argument and say it should all be deleted. From “do not add” does not follow “do delete”. Or can anyone prove this statement? And spaces have been a poor reason for distinguishing you-know-what-we-talk-about and what is inclusionworthy. Fay Freak 16:53, 8 August 2019 (UTC)
Why care? Because it licenses the creation of multi-word SoP entries. Why does this matter? Because it is counter to the standard principles of any dictionary and also potentially confusing to users who happen upon the phenomenon. Users should, and presumably largely do, understand that to find the meaning of "X Y" they have to combine the meaning of X with the meaning of Y in cases where "X Y" does not have any special meaning in combination. A separate entry for "X Y" gives the impression that there is a special meaning not understandable from X + Y. Then there is the issue, as Equinox mentioned, of small numbers of obscure/nonstandard citations having an impact beyond what they merit. How much sense does it make to have an entry for "cluster size", for example, purely on the basis that a few people who couldn't tell a variable name from proper English wrote it as "clustersize"? None whatsoever, in my view. Information that it is usually (or should be) written "cluster size" can be provided at "clustersize" for the benefit of anyone who lands there. Mihia (talk) 19:14, 8 August 2019 (UTC)
I would vote against COALMINE as written if there were a similar vote today. We don't need a coal mine entry in order to state at coalmine "much more commonly spelled coal mine". There are some examples where COALMINE has justified keeping a term which I felt should be kept without having other good CFI rationale, but hopefully we will be able to identify those even without this policy, or be able to figure out a narrower criteria which would eliminate pumpkin seed type questions. - TheDaveRoss 18:05, 8 August 2019 (UTC)

I would be interested in having a discussion about entries in other languages alongside one about English entries. See the discussion at WT:Requests_for_deletion/Non-English#energia_eolica that touches on the space for more specific policy. Ultimateria (talk) 16:43, 10 August 2019 (UTC)

Stunned silence/disbeliefEdit

What meaning of "stunned" applies to the following sentences where stunned modifies abstract nouns forming an adverbial, instead of the animate being who is stunned?

what is the linguistic term for such a behavior? What other adjectives act in a similar way?

I sat in stunned silence, I reacted to the news with stunned disbelief --Backinstadiums (talk) 14:40, 8 August 2019 (UTC)

The rhetorical figure is called metalepsis. I don't think it warrants a separate definition, just as with the component words (eg, face) of: "Was this the face that launched a thousand ships and burnt the topless towers of Ilium?". DCDuring (talk) 16:38, 8 August 2019 (UTC)
Maybe it's hypallage or anthimeria. DCDuring (talk) 18:02, 8 August 2019 (UTC)
The license for the phrases to be adverbial (They could also be adjectival.) is that they are prepositional phrases. Also, silence doesn't seem "abstract" to me. DCDuring (talk) 18:07, 8 August 2019 (UTC)
Silence is tangible.  --Lambiam 19:17, 8 August 2019 (UTC)
... as well as being golden. Mihia (talk) 19:53, 8 August 2019 (UTC)

Should Illyrian be a language?Edit

At the moment, Category:Illyrian language indicates that it is a reconstructed language. We currently have one reconstruction for it, which is at WT:RFD. Given that reconstructed entries require descendants, derived terms or other evidence that the reconstruction can be based on, I'm not sure if this is at all possible. No certain descendants of Illyrian are known, and therefore not much is known in the way of sound laws that would support reconstructions. Wikipedia isn't even sure if there was a single Illyrian language, and titles the article in plural: w:Illyrian languages. All this makes me think that we shouldn't have this language on Wiktionary at all. Thoughts? —Rua (mew) 09:16, 9 August 2019 (UTC)

Etymology-only language? Seems like personal names in Latin and possibly Balkan languages other than Albanian can be assumed to be of Illyrian origin – and hydronyms, names of settlements? It would be unsurprising that there isn’t a corpus of the language either, as the Slavs too did not deign to write. But then again I wouldn’t know where Illyrian would start and end, if one starts to talk about “real Illyrians” and “less real Illyrians” and one can be content with deriving from substrate. “Illyrian” is probably a meme. +I have always opined anyway that nomina propria should be categorized separately for their etymologies so we do not litter the categories “terms derived from X”, nor request categories – look at the requests for etymologies in Latin entries, it’s 3,527 pages, but the absolute majority is names, it’s ridiculous. Fay Freak (talk) 11:38, 10 August 2019 (UTC)


What is the plural part in monies? The entry of -ies does not seem to include it --Backinstadiums (talk) 09:22, 10 August 2019 (UTC)

"monies" is an irregular plural. Probably the entry should state that, but I don't know how to work it into the template. Mihia (talk) 10:25, 10 August 2019 (UTC)
I would fix it, but our modularizers have obfuscated what should be a simple template to the point that it isn't worth the effort to try and untangle how it all works to make a straightforward change. Time to start a vote to disable Lua on this project, its more trouble than it is worth. - TheDaveRoss 17:11, 12 August 2019 (UTC)
I don't think -ies should include it, because it isn't a suffix added to some stem *mon. Equinox 10:33, 10 August 2019 (UTC)
According to Garner's fourth edition, page 604, moneyed vs monied current ratio is 3:1. Incidentally, what is the situation with monied? --Backinstadiums (talk) 11:47, 10 August 2019 (UTC)
monies is a regular plural of mony, which is obsolete, according to us.
FWIW, I always thought it synonymous with funds ("financial resources") or somehow similar to funds. Though I've worked in finance (US), it never came up in any actual usage in my hearing. It seemed archaic and/or UK when I ran across it in reading. DCDuring (talk) 20:24, 12 August 2019 (UTC)
It shows up in some formal-register writing, such as certain financial and legal contexts, as the plural of money, used as a countable in ways similar to the contrast between fish (plural of the single animal) and fishes (plural of the group noun), where the latter implies multiple kinds of the main noun. In legal and financial contexts, monies implies specifically that these are funds coming from multiple sources / accounts / etc. ‑‑ Eiríkr Útlendi │Tala við mig 23:19, 12 August 2019 (UTC)
@Eirikr How do you know that it is the plural of money rather than. 1., a plural of mony or, 2., a plural-only noun? DCDuring (talk) 00:43, 13 August 2019 (UTC)
I note that, money is usually fungible whereas monies are not. Monies is often used in discussion of government and not-for-profit finances in which money appropriated, donated, held in trust, etc, often can be used only for a specific purpose or object of expenditure. Something similar occurs in investment management, banking, etc. These realms are the stronghold of fund accounting (See w:Fund accounting.). Sociological literature uses monies in discussing how households often restrict given sources of income for specific purposes. Eg, ill-gotten gains fund charitable donations, children's 'fun' expenditures come from their earnings and holiday gifts. Cash currency, bank deposits, and cryptocurrency can be considered separate forms of monies, but none of them is called a mony. In fact it is almost impossible to find and use of mony as a singular in the last 100 years of more, except in works of history. DCDuring (talk) 01:12, 13 August 2019 (UTC)
  • Re: the plural of money, see also Merriam-Webster's entry for monies, stating simply, "plural of MONEY", and then their entry for money, stating "plural moneys or monies". That jives with how I learned both terms (singular and plural). Now, monies may be derived as the regular plural of now-obsolete mony, but then mony evolved into modern money with the extra e, while the plural form stayed as it was, and then we also see the new plural form moneys sprouted into existence. In certain contexts, at least, the monies form persists, and in modern usage, there's nothing else for it to be the plural of, other than money. No? ‑‑ Eiríkr Útlendi │Tala við mig 04:05, 13 August 2019 (UTC)
It could have evolved into a plural-only noun. Is moneys used in exactly the same way as monies? DCDuring (talk) 10:57, 13 August 2019 (UTC)

plural (number)Edit

In the appendix, does plural (number) means plural verbal agreement? --Backinstadiums (talk) 11:33, 10 August 2019 (UTC)

No, it means a grammatical number. Verb agreement may not generally exist. As in Arabic VSO sentences have the verb in the singular even if a plural follows. Fay Freak (talk) 11:42, 10 August 2019 (UTC)
In English entries, verb, pronoun, and determiner agreement are important. The presence or absence of a terminal 's' is self-evident, arguably of no value to our normal users, and therefore not worth noting in our entries. Some of our "plural only" entries, for example, have made a hash of this. But English speakers often alternate between singular and plural agreement for "pair" nouns. (Eg, these/those scissors is about three times as common as this/that scissors.) Treatment of plurals also gets confounded with (un)countability.
What we say about a term in Appendix:Glossary should always be applicable to English. We can have warnings about changes in a term's meaning as applied to other languages, but coverage may need to be in other Glossaries or the "about" page for such a language. DCDuring (talk) 16:40, 10 August 2019 (UTC)

@DCDuring: In youth, what does the following information mean? (uncountable, used in plural form) --Backinstadiums (talk) 09:16, 23 August 2019 (UTC)

The label conveys no useful information to me. I assume the intent was to convey some information about number agreement of youth with verbs, pronouns, and possibly determiners.
I don't think that youth is uncountable in that definition. I think it is a plural. A usage example would be "The youth of the cities don't understand agricultural metaphors. They are culturally impoverished." If I am right about this, then the inflection line is not accurate either. DCDuring (talk) 09:58, 23 August 2019 (UTC)
MWOnline has an entry for the youth with this meaning, but the is not essential. Other definite determiners can substitute: "New York's youth", "These cities' youth", "Those youth", the latter two possibly not for all speakers. DCDuring (talk) 10:09, 23 August 2019 (UTC)

Words with multiple inflection patterns: one inflection table or several?Edit

When a word can follow multiple patterns of inflection, the alternative forms are often shown next to each other within a single inflection table. But there are also entries where multiple inflection tables are shown. I'm wondering which of these approaches works better in practice. Showing multiple forms together in one table takes up less space, but it is no longer easy for the reader to separate out the different inflection patterns.

There are extreme cases like Slovene strgati, which can inflect according to no less than four different patterns: two different present tenses, and two different accentuation patterns. Many of the forms are shared between the four, in particular the infinitive, so they could theoretically all be stuffed into one table, but it becomes very hard to make out. Two tables are also possible, but which aspect of the inflection should be combined, the present tense formation or the accent pattern? A case where all forms are in one table can be seen at kopati (to bathe), where there are no less than four distinct imperative forms. Then there are cases like gaziti where the majority of the inflection has one form and accent pattern (AP a in this case), but the l-participle can follow multiple accent patterns (both a and b). Having multiple inflection tables in this case seems like overkill. Where to draw the line, though? When is it clearer to put everything in one table, and when is it better to split them? —Rua (mew) 19:37, 10 August 2019 (UTC)

I think this should be handled on a case-by-case basis. If the different patterns are truly different paradigms, it is probably better to use multiple tables. But if we have a few variations within one paradigm, showing the alternatives side by side probably works better. Here is a sketch of an algorithm for computing a score for basing the decision on; I have no idea how it will work out in practice.
  1. Combine all forms in one table
  2. For every cell that contains N alternatives, where N ≥ 2, add N to the score
  3. For every cell that contains just one form, subtract 1 from the score
  4. If the score is positive, use multiple tables; otherwise, use a single table.
 --Lambiam 23:32, 10 August 2019 (UTC)
I know exactly zero about this language or its inflections, but, looking at kopati, where there are four patterns, is it not ambiguous which form corresponds to which pattern(s) in cases when there are two (or three, though this does not occur) entries in a cell? How do you tell which is which? If you can't tell, this layout seems flawed to me. However, if there are language-related clues such that it is always obvious to anyone with enough knowledge to use the table at all, maybe it is OK. Mihia (talk) 20:57, 11 August 2019 (UTC)
There are technically only two patterns, but each pattern allows for two possible imperative forms. —Rua (mew) 21:21, 11 August 2019 (UTC)
I see, thanks. So I guess it is obvious to anyone with any knowledge of the language. Fine then. Mihia (talk) 22:53, 11 August 2019 (UTC)
A slideshow that puts the user from 1/4 via 2/4 and 3/4 to 4/4 and then again to 1/4 by his pressing arrows (sliding horizontally for horizontal writing systems and vertically for vertical writing systems like Mongolic). So you have one table that contains all but at the same time not all at once. Fay Freak (talk) 23:03, 11 August 2019 (UTC)
@Fay Freak That's not a bad idea, but what would it do when JS is disabled? —Rua (mew) 08:02, 13 August 2019 (UTC)
HTML/CSS is so advanced nowadays, that I did not presume that it needs to be JS. However it can be JS: The content can either be expanded at first and then formatted by JS so one sees all in a text-based browser one after another, or otherwise: there is no reason people who surf without JS don’t make an exception for Wiktionary, and everyone at least sees one table he can apply. Fay Freak (talk) 12:20, 13 August 2019 (UTC)

unmarked idioms in the examples/quotations of sensesEdit

The third meaning of the verb cramp adds the example "You're cramping my style", yet without any refernce to the idiomaticity of cramp someone's style, a tendency I've frequently come across. --Backinstadiums (talk) 09:24, 11 August 2019 (UTC)

I hope that when you come across instances, you’ll fix them. In this specific case, the sense and usex was added more than a year before the creation of the entry for the idiomatic expression; the creator was very likely unaware of the usex. So inasmuch there appears to be such a tendency, it may be the result of editors not knowing everything rather than lacking care.  --Lambiam 13:24, 11 August 2019 (UTC)
the solution being adding an external link, I suppose --Backinstadiums (talk) 13:44, 11 August 2019 (UTC)
I think a more complete solution would also include making sure that cramp someone's style is in the derived terms and that there is a better usage example, at least in addition to the one that includes cramp (someone's) style. DCDuring (talk) 20:44, 11 August 2019 (UTC)

passive transitive verb without agent vs adjectiveEdit

The third entry of embarrass,

"(transitive) To involve in difficulties concerning money matters; to encumber with debt; to beset with urgent claims or demands. A man or his business is embarrassed when he cannot meet his pecuniary engagements"

adds an example in the passive without the passive agent, yet the entry of the adjective embarrassed does not show the meaning derived from the sense above, which "Microsoft® Encarta® 2009" defines as "adjective, short of money: in financial difficulties because of a lack of money" (However, "Microsoft® Encarta® 2009" does not offer the monetary meaning of the verb embarrass as Wiktionary does).

Is there any reason for this? --Backinstadiums (talk) 15:52, 11 August 2019 (UTC)

Microsoft and Wiktionary did not consult each other. We cannot have two-way consultation now because they are out of the dictionary business. DCDuring (talk) 20:46, 11 August 2019 (UTC)
M-W, Oxford, Collins, and AHD all have this sense, most of them list it as archaic (which I would agree with). Also these questions are probably not Beer Parlour fodder, questions about word usage should go in the WT:Tea Room. The Beer Parlour is for discussing the project itself. - TheDaveRoss 15:56, 12 August 2019 (UTC)

Flowers wavered in the breezeEdit

what is the rhetorical device which changes the subject of an active sentence, The breeze wavered the flowers, into an adverb of the passive counterpart, Flowers wavered in the breeze? --Backinstadiums (talk) 11:29, 12 August 2019 (UTC)

I don't think it is rhetoric, just two ways to express the same idea. You could say one element (breeze or flowers) is being foregrounded, or made the subject. Equinox 14:34, 12 August 2019 (UTC)
I don't think of waver as a transitive verb, though it may be or have been one for some speakers.
While attempting to address earlier questions of yours I reviewed classical (Greek/Latin) rhetorical devices at Silva Rhetorica. I don't recollect seeing anything that specifically covers that. There might be something in transformational grammar. DCDuring (talk) 14:37, 12 August 2019 (UTC)
Verbs that can be used transitively with an object, but also intransitively in which the original object becomes the subject (without resorting to the passive voice), are called ergative verbs. The standard example is the verb break: “he broke the glass” → “the glass broke”. This is not a rhetorical device but a grammatical concept. To me, the sentence “the breeze wavered the flowers” is ungrammatical, but its author apparently sees the verb waver as ergative.  --Lambiam 22:31, 12 August 2019 (UTC)
Rua (then CodeCat) and I had a long thread about that a while back. In short, I think the term ergative is overused in relation to English verbs. In "the glass broke", we just have an intransitive, where there is no semantic patient / grammatical object, and just a semantic agent / grammatical subject. "Breaking" doesn't need to be a transitive action, semantically speaking. With intrinsically transitive actions, however, such as eat or cook, using the semantic patient / grammatical object as the grammatical subject and leaving the semantic agent unstated gives us something that can be usefully described as "ergative": "moose meat eats well", "these eggs cook up nicely", etc.
Regarding the verb waver, however, I agree with DCDuring and Lambiam -- the verb, as I understand it, is consistently intransitive, so using it transitively doesn't make any sense, and thus there cannot be a passive. You'd have to use it causatively instead, or causatively-passively: "the breeze made the flowers waver", "the flowers were made to waver by the breeze". ‑‑ Eiríkr Útlendi │Tala við mig 23:31, 12 August 2019 (UTC)

Isn't it transitive in semantic/logical terms? --Backinstadiums (talk) 08:42, 13 August 2019 (UTC)

If you take a philosophical stance that nothing has an internal cause, then mere grammar is not important. But transitivity is a syntactic, not philosophical, term as commonly used. DCDuring (talk) 11:02, 13 August 2019 (UTC)
  • @Backinstadiums: No, the English verb waver is not transitive in any sense that I've ever encountered. Our entry at waver only lists intransitive senses, as does the Merriam-Webster entry, among many others.
  • @DCDuring: Depends on the language and how you define your terms. For Japanese, teaching materials describe a "transitive" verb as generally the same thing as a 他動詞 (tadōshi, literally other-moving word), where the transitivity is a semantic property, dependent on the underlying meaning of an agent applying the action to a patient: the verb doesn't change class due to the presence or absence of any explicitly stated object. For both 食べる (watashi wa taberu, I eat) and リンゴ食べる (I eat an apple), the verb taberu ("eat") is a tadōshi regardless of the presence or absence of the object. Meanwhile, there are "intransitive" verbs described as 自動詞 (jidōshi, literally self-moving word) where the action is purely a matter of the agent doing something on their own, without directly altering or affecting any patient, but these verbs can still take objects marked with (o) in certain constructions, and that also doesn't change the class of the verb. For both 歩く (I walk) and 山道歩く (I walk the mountain road), the verb aruku ("walk") is a jidōshi regardless of the presence or absence of the object.
When talking about "ergativity" as it applies to English verbs, in order to use the label in a meaningful and useful way, we have to look at the semantics of the verb: is the action something done by an agent to a patient, or something that the agent does on its own without affecting any patient? For verbs like break or melt or turn, these could be semantically transitive where an agent does something to a patient, but they could also be semantically intransitive, where the agent just does the action of the verb without requiring a patient. Describing these verbs as "ergative" is not very useful, and I think it's more likely to confuse users who instead learned these verbs as ambitransitive: either transitive or intransitive depending on context. For other verbs like eat or read or say, these can only be semantically transitive, as the actions inherently describe an agent doing something to a patient, even though they might be syntactically intransitive if a given context leaves the object unstated. Using the object of such a semantically transitive verb as the subject is a strange construction in English, almost a kind of passive. Semantically intransitive verbs have no sensible passive, while semantically transitive ones do. Similarly, semantically intransitive verbs have no sensible ergative. ‑‑ Eiríkr Útlendi │Tala við mig 19:22, 13 August 2019 (UTC)
I'd be in favor of banning ergative, ambitransitive, and ditransitive from use in definition-line labels (generally, but at least in English) on the grounds that the terms are not readily understood by normal users. My evidence is that if one compares usage patterns of "transitive verb"/"intransitive verb" with those of "ambitransitive verb", "ditransitive verb", and "ergative verb" the latter three occur almost exclusively in scholarly books and articles, whereas one can find the former in basic texts and even in newspapers and magazines. I can provide anecdotal evidence that college graduates are not familiar with "ergative" and its ilk, but at least dimly recollect "transitive" and "intransitive". DCDuring (talk) 19:49, 13 August 2019 (UTC)
I support that idea, FWIW. I note that, if the user enters tr=both in our headword templates, the resulting display similarly avoids overly technical terminology like ambitransitive or ditransitive in favor of the wordier-but-more-straightforward option of just stating transitive and intransitive. ‑‑ Eiríkr Útlendi │Tala við mig 22:45, 14 August 2019 (UTC)

pronunciaiton of foreign lemmas used in EnglishEdit

Rarely do the plural of foreign lemmas used in English show its pronunciation; for example, for fait accompli the pronunciation of its plural faits accomplis doesn't vary at all --Backinstadiums (talk) 16:14, 12 August 2019 (UTC)

In the plural I have heard a very un-French final /z/ while the first s, that of faits, remained silent. I don’t know how general this is.  --Lambiam 22:17, 12 August 2019 (UTC)
@Lambiam: According to Longman Pronunciation dict: BrE faits accomplis ˌfeɪz ə ˈkɒmp liː ˌfeɪts-, ˌfeɪt-, ˌfez-, -ˈkʌmp-, -liːz ǁ AmE ˌfeɪz ə kɑːɯ ˈpliː —French [fɛ za kɔ̃ pli] --Backinstadiums (talk) 22:24, 12 August 2019 (UTC)
I'd expect fait accomplis to be a readily found plural. (I haven't looked yet.) DCDuring (talk) 00:00, 13 August 2019 (UTC)
And here are a few of them:
a succession of fait accomplis, legitimised afterwards by the always favourable balance of military power
The inauguration of the world banking electronic network SWIFT in 1977 and slightly later that of the world airline network SITA were presented to the ITU as fait accomplis
The war was a test of how far overwhelming military power can impose fait accomplis that reshape international norms.
The military continued to accumulate fait accomplis that were disagreeable to the politicians
Down through the years, we've had other fait accomplis
I further note that there are numerous available uses of fait accomplis as singular.
Our entry for fait accompli#English does not do justice to the rich set of alternations that authors seem to permit. DCDuring (talk) 00:13, 13 August 2019 (UTC)
There are also plenty plural uses of faits accompli, which we can label a misspelling of. Can we write in the usage notes of prospective entry fait accomplis that the singular use is incorrect? I think we should also not assign pronunciations like /ˌfezəˈkɒmpliː/ or /ˌfeɪtsəˈkʌmpliː/ to fait accompli(s) but only to faits accomplis.  --Lambiam 18:17, 14 August 2019 (UTC)

Plural lemmasEdit

How are plural lemmas with meaning of their own to be indicated in their corresponding "headword"? For example, nowhere in white is the user warned about the specific meanings of the noun whites. --Backinstadiums (talk) 09:24, 13 August 2019 (UTC)

Maybe a subsection “See also   whites”? I notice there is no category for plural lemmas.  --Lambiam 10:05, 13 August 2019 (UTC)
Many of the singular entries have a definition for the term marked as being used only in the plural. It's is duplicative, but convenient for the user. DCDuring (talk) 11:07, 13 August 2019 (UTC)
Can you give a good example of that?  --Lambiam 18:29, 14 August 2019 (UTC)
I found a few using regex search: colour, bowel, good, remain, depth. DCDuring (talk) 20:14, 14 August 2019 (UTC)
Problems like this are why I never mix lemmas and non-lemmas. All definitions go on the lemma, with context labels indicating if the meaning only applies to specific grammatical combinations, such as plural. —Rua (mew) 18:24, 22 August 2019 (UTC)

Passive voice using the preposition "with" instead of "by"Edit

Verb smite: 6. (figuratively, now only in passive) To strike with love or infatuation. Bob was smitten with Laura from the first time he saw her.

I wonder whether this smitten is rather an adjective at least syntactically --Backinstadiums (talk) 10:36, 13 August 2019 (UTC)

"John was beaten with a cudgel by Mary." DCDuring (talk) 11:10, 13 August 2019 (UTC)
@DCDuring: what's your point ? --Backinstadiums (talk) 16:23, 13 August 2019 (UTC)
An instrument (object of with) is not the same as agent of passive verb (object of by). The sentence in no way illustrates passive using with. DCDuring (talk) 17:22, 13 August 2019 (UTC)
And John got so angry that next thing you know Mary was smitten with Bob by John.  --Lambiam 16:25, 13 August 2019 (UTC)
It does pass the test of being gradable: Bob was way too smitten with Laura. John was also more smitten with Laura than was good for him. But Bob was the most smitten I have ever seen a man with a woman as smart as Laura.  --Lambiam 16:25, 13 August 2019 (UTC) — BTW, the Beer parlour is not the right room for discussing such questions.  --Lambiam 16:28, 13 August 2019 (UTC)
Also, it can be used attributively: “a smitten man”.  --Lambiam 08:47, 15 August 2019 (UTC)

intransitive verbs which become transitive if their "goal/purpose" is accomplishedEdit

I've realized many verbs follow an interesting transitive pattern, which I illustrate with an example:

Webster's defines wrangle as either intransitive dispute, argue or transitive "to obtain by persistent arguing". Is there a linguistic term for this behavior? --Backinstadiums (talk) 15:33, 13 August 2019 (UTC)

The term resultative comes to mind, even though in a sentence like “I managed to wrangle a refund” the construction does not conform to any of the four classes described in the Wikipedia article.  --Lambiam 16:09, 13 August 2019 (UTC)
The pattern is not confined to English. For example, French accoucher in the intransitive sense means “to go into labour”. But in the transitive sense it signifies the result of the labour: “to deliver (a baby)”.  --Lambiam 16:14, 13 August 2019 (UTC)

Is the "transitive" label enough?Edit

How can we indicate the need of a preposition, at, for verbs such as He smiled *(at) me, and still label them as transitive for examples such as He smile a big smile ? --Backinstadiums (talk) 11:48, 16 August 2019 (UTC)

There's nothing in the grammar or semantics that requires at after smiled a big/happy/lovely/coy/weak smile.
Please don't focus on "semantic" transitivity. In English entries we are concerned with syntactic transitivity. DCDuring (talk) 12:59, 16 August 2019 (UTC)
  • Yes, as DCDuring notes, English descriptive materials about the English language generally don't discuss semantic transitivity, unless it's a highly technical text. English-language dictionaries of the English language, to my knowledge, never touch on semantic transitivity, only focusing on syntax.
In your examples, "He smiled at me" is intransitive, albeit with an indirect object, whereas "He smiled a big smile" is transitive, and without an indirect object. Moreover, the presence or absence of an indirect object has no effect on the transitivity of the verb: only the presence or absence of a direct object. ‑‑ Eiríkr Útlendi │Tala við mig 16:31, 16 August 2019 (UTC)
I don't see an indirect object in that first case. I rather see a prepositional phrase. —Rua (mew) 17:10, 21 August 2019 (UTC)
Yes, you're correct. I used the wrong terminology there: the me is the object of the preposition at, not of the verb smile. ‑‑ Eiríkr Útlendi │Tala við mig 18:15, 22 August 2019 (UTC)

Homographs of verbal formsEdit

I propose some kind of mention (maybe a "see also") be (automatically?) added in the section of verbs with forms homographic with other parts of speech, as the verbal entries are the ones looked up first. For example, devastate shows no hint at the adjective devastated "Extremely upset and shocked: a devastated widow". --Backinstadiums (talk) 16:05, 17 August 2019 (UTC)

I think devastate itself is missing a sense ‘to severely upset’, e.g. in ‘The news devastated her’, ‘His father’s death devastated him’, and so on. Although maybe that’s what our second sense is trying to get at; hard to tell without quotes. — Vorziblix (talk · contribs) 00:44, 18 August 2019 (UTC)
It is yet another deficiency in Wiktionary that we might all agree needs to be fixed. Almost every such homograph is deverbal and could be shown as a derived term on the verb lemma page. I wonder how many of the nearly 4,000 entries that have both {{en-past of}} and {{en-adj}} don't have a derived term for the adjective on the verb lemma entry. DCDuring (talk) 03:43, 18 August 2019 (UTC)
I suppose one would also like to make sure that all the supposed adjective homographs of past forms of verbs were actually true adjectives. DCDuring (talk) 03:49, 18 August 2019 (UTC)

CGEL : conversion of gerund and past participle forms of verbs is extremely productive: It seems very promising ; He looked devastated --Backinstadiums (talk) 08:21, 18 August 2019 (UTC)

The -ing form above is not a gerund (which behaves syntactically like a noun). I am not sure what point you (and GCEL) is making here. Did anyone here belittle the degree of productivity of conversion? Which CGEL is this?  --Lambiam 12:22, 18 August 2019 (UTC)
It's almost certainly the Cambridge. BTW. I've just RfVed the supposed adjective ground (from grind), which MWOnline, for example, doesn't consider an adjective. DCDuring (talk) 16:47, 18 August 2019 (UTC)
ground#Adjective survived RfV. DCDuring (talk) 23:37, 19 August 2019 (UTC)

I wouldn't mind a little somethingEdit

Assume a reader who is unfamilar with the expressions looking at the entries something or other and some something or other. It is completely invisible that in the first one the word “something” is a fixed and immutable part of the expression, while in the second one it is a placeholder, to be replaced by some something or other. The ambiguity has bothered me for some time. Can we do something about it? Yes we can, by giving such placeholders a distinctive appearance. After considering several possibilities, I think that using smallcaps is probably the best. The head of some something or other would then become:

some something or other (uncountable)

We could let the placeholder link to some page like Wiktionary:Placeholders which could explain (among several things) that our use of “something” may include animate beings, and that “one” and “one's” refer to the subject of a verbal phrase (as in lose one's cool (not followed consistently at present; e.g., the opinion holder of in one's opinion need not be the subject)). (There are some technical issues; for example, {{mention}} does not like {{smallcaps}} in the first parameter, but I am confident that our grease pitters can work such things out.)

The Free Dictionary uses parentheses for this purpose, at least in entries from Farlex Dictionary of Idioms, as seen in their entry “throw (something) in (one's) face”. We also do this occasionally, as in give (something) a go. A reason not to follow this is that we also occasionally use parentheses for indicating an optional part of a lemma (e.g. “(the) word is”). So does, in fact, The Free Dictionary in entries from McGraw-Hill Dictionary of American Idioms and Phrasal Verbs. I think it is not a good idea to cast out one source of ambiguity by introducing another one.

Before I put this up for a vote, I’d like to offer an opportunity for comments and suggestions. So go ahead; the floor is all yours.  --Lambiam 22:31, 19 August 2019 (UTC)

Parenthesis or brackets make more sense visually than smallcaps to me. Why did you have a preference for smallcaps, @Lambiam? —Justin (koavf)TCM 22:56, 19 August 2019 (UTC)
Because (as I explained above) parentheses are also used for optional words or morphemes in the headword, so letting them serve two different purposes introduces ambiguity. Can you tell whether യിരി in (യിരി)ക്കുന്നുഎങ്ങനെക് is a placeholder or an optional part of the phrase? [The example is made-up to make a point; the phrase is meaningless.] If no ambiguity could result, parentheses would also be my first choice. But they are already commonly used to indicate optionality, so that seems the better use of the notation.  --Lambiam 12:39, 20 August 2019 (UTC)
We already often use parentheses for the purpose. I don't know of any actual instances in which the "optional item" interpretation of parentheses could be confused with the "placeholder" interpretation. Seeing some of those might help me see the problem. DCDuring (talk) 23:35, 19 August 2019 (UTC)
For a reader who is familiar with the language in question, it will almost certainly be immediately clear which interpretation is to be chosen, even if they are not familiar with the idiom in question. If they are not familiar with the language, I’m not so sure. Can you spot the placeholder in işin ucu birine dokunmak?  --Lambiam 12:39, 20 August 2019 (UTC)
Of course not: there are no parentheses! If I knew enough Turkish to use tr.wikt maybe I could, if the component terms were all in en.wikt (işin ucu birine dokunmak). I would look up each word and perhaps eventually tease it out. DCDuring (talk) 13:29, 20 August 2019 (UTC)
Bad question, and by the way it was phrased I have given away that the word in question is not optional but a placeholder. Here is another example: is कुछ in the idiom दाल में (कुछ) काला होना optional or a placeholder?  --Lambiam 14:13, 20 August 2019 (UTC)
I'm going to guess it's optional, but I take your point. At some level of ignorance of a language, one couldn't decipher the meaning of parentheses. DCDuring (talk) 16:51, 20 August 2019 (UTC)
It is indeed optional, but this is far from obvious, since Hindi कुछ (kuch) means “something”, a typical placeholder word.  --Lambiam 21:55, 21 August 2019 (UTC)
Before we make the decision, can we run through the actual affected entries and generate a list of how they would look in small caps (or bold, or whatever other distinguisher)? This should highlight any absurdities we may not spot in the abstract. Equinox 01:34, 20 August 2019 (UTC)
I cannot think of a systematic way of doing that. In sich bei jemandem lieb Kind machen, the word jemandem is a placeholder (e.g. “hat sich bei der Zarin dadurch lieb Kind gemacht”); likewise quelqu'un in de quel bois quelqu'un se chauffe (as in “Et de quel bois se chauffaient leurs femelles. For the cases in which parentheses are used in the headword line, they can be implicit (from the PAGENAME) or an explicit head= or similar parameter in any of the numerous headword-line templates.  --Lambiam 12:39, 20 August 2019 (UTC)
I could generate lists of entries in each language that contain that language's placeholder words. I have full lists of entry titles in each language (based on headers), so would just need lists of placeholder words to search for. For instance, there are 53 German entries for phrases containing jemand. — Eru·tuon 16:37, 20 August 2019 (UTC)
That will be very useful (if the proposal is accepted). For each language we will need some editors who are sufficiently conversant with it to distinguish confidently and accurately between fixed and variable uses of placeholder words; we do not want to see a misclassification like [something] or other in any language.  --Lambiam 21:55, 21 August 2019 (UTC)
I see now that smallcaps is not a good way, since many scripts do not have capital letters at all, and therefore no smallcaps. What about square brackets?
some [something] or other: placeholder
(the) word is: optional part
 --Lambiam 14:13, 20 August 2019 (UTC)
Support. This issue has bothered me for a while, although I had resigned myself to the fact that there probably wasn't a perfect solution. This might work though. Andrew Sheedy (talk) 16:34, 20 August 2019 (UTC)
It would seem that we could use piped links for this. We could link to Appendix:Glossary for English placeholders and to language-specific Appendix-space pages for other languages. Colors could work for extra warning or indication for those visually capable. DCDuring (talk) 16:51, 20 August 2019 (UTC)
I wouldn't mind this solution either. Andrew Sheedy (talk) 17:00, 20 August 2019 (UTC)
I also support the (optional) and [placeholder] markup. - TheDaveRoss 16:53, 20 August 2019 (UTC)
For us and for repeat users in general, even normal ones, the difference between parentheses and square brackets seems adequate. Some questions remain, especially for new or infrequent normal users:
  1. Are parentheses better for optional text or for placeholders? I'm guessing there will be more headwords with placeholders than with optional text.
  2. How can we make it easy for a new or infrequent user to learn? One way is by piped links and the associated hover content for the placeholder terms. Is that enough? Would it be better to, say, highlight a relevant line under Usage notes using color, a box, etc.
  3. Are there other uses of parentheses or square brackets, especially in the inflection line, in any script or language, that conflict with what we are contemplating? For example, are there characters or punctuation or combinations of these that can be easily confused with either of them. Ideally, the brackets and parentheses would work for all scripts and languages.
  4. Do we need rules of some kind to prevent overuse/abuse of either placeholders or optional words? Will contributors be tempted to use multiple optional terms or word-class names (determiner) or abbreviations (DET)?
I don't think we have to address all conceivable issues in advance, but we should try to avoid anything major and have ways of adapting to identifiable potential issues. DCDuring (talk) 17:43, 20 August 2019 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────@DCDuring – I agree, we do not have to address all conceivable issues in advance, but it is good to know of at least one acceptable approach to each issue we can identify now. Going through the above four points in order:

  1. Even children are probably familiar of the use of parentheses to indicate an optional suffix: “horse(s) / cows(s) / goats(s)”, “apple(s)”. Traditional textual use of square brackets is mainly confined to editorial stuff: editorial comments (such as “[sic]” or “[pen-corrected to Thraldome]”), editorial omissions (“[...]”), editorial alterations (mostly change of case, as in “[T]he next day”) and editorial insertions (seen mostly in transcriptions and translations: “ass[istant]”, “holy [place]”). This is unlikely to interfere with the intended use here.
  2. Most readers are probably not aware of any convention for marking placeholdership, so, whichever way we choose, readers will have to get accustomed to something new. But I expect this to be largely self-explanatory, just as the current practice is already largely self-explanatory. With this proposal, there will be an additional visual clue that a term like “[something]” is not to be taken literally, “as is”, which will be mainly useful for readers not familiar with the idiom. We may define a template {{placeholder}}, abbreviated {{ph}}, taking a language code and one of the placeholders for that language. Any decisions to link to a placeholder glossary or show hover-over content can then be delegated to a single place – the code for the template – and be revisited as desired. Editors should be encouraged to choose usage examples that illustrate the variability; in almost all cases that will obviate a need for specific usage notes.
  3. I am not aware of current uses of square brackets that may conflict with this plan. They occur, of course, in the IPA of pronunciation sections, but that should not present an issue. Round parentheses are used in the headword lines for grammatical info such as gender and inflectional suffix, but always in italics contrasting with the roman font of the terms. Round parentheses are also used for many purposes outside of headword lines, but as far as I know either for optionality (also in IPA) or for what may be considered parenthetical clarifications, such as accents in the pronunciation sections, labels and glosses in the definitions, and senses in synonym sections, again with a body in italics offsetting them from the roman of the terms.
    For other scripts, other square brackets may be needed than the conventional ASCII ones. For example, Chinese punctuation uses the full-width square brackets U+FF3B and U+FF3D, so we would have “使[某人]火冒三丈” rather than “使[某人]火冒三丈” as the Mandarin translation of make [someone's] blood boil. (That translation is a sum-of-parts, so it should not have its own entry, but it is still useful to identify the placeholder in that sum-of-parts).
  4. I do not foresee the need for rules to prevent overuse/abuse and am generally not in favour of inventing rules before the need becomes apparent. What can be done is to have a data module with the standard placeholders (per language), so that a warning can be issued when an editor uses a non-standard one (e.g. “somebody” instead of “someone”), similar to the data module for acceptable ancestors in etymology. But this may be overkill.

The best option, in my opinion, would be to include the square brackets in the page title, but the Mediawiki software does not allow that. The official reason is that they are needed for link syntax, although I do not see how that would be a problem, provided we do not use double brackets or URIs in page titles. I feel the code checking for illegal titles ought to be a bit smarter than it is. {{DISPLAYTITLE}} also does not allow one to add brackets. So, unfortunately, we won't see

give [someone] a piece of [one's] mind

(A distasteful but effective hack might be to use the Chinese full-width brackets instead, also for Latin-script titles.)

Here is a tentative definition for a template {{placeholder}}, abbreviated {{ph}}, taking a language code and one of the placeholders, meant for language languages using an alphabet:

<span style="font-style:normal;padding-left:0.1em;color=#000;">[#2]</span>

For the headline, code like

{{head|en|verb|head=[[give]] {{ph|en|someone}} [[a]] [[piece]] [[of]] {{ph|en|one's}} [[mind]]}}

will then come out as

give [someone] a piece of [one's] mind.

When using templates such as {{mention}}, code like

{{m|en|give someone a piece of one's mind|give {{ph|en|someone}} a piece of {{ph|en|one's}} mind}}

should come out as

give [someone] a piece of [one's] mind.

 --Lambiam 17:21, 22 August 2019 (UTC)

See-also in different scriptsEdit

star says (at the very top) "See also: Star, står, Stär and стар". I read it as "crap" and was going to remove it as vandalism, but no, it's the same sounds in Cyrillic. Is see-also supposed to do this? I've never seen Arabic up there for instance, or Hebrew. Equinox 01:32, 20 August 2019 (UTC)

This seems like a bad idea: what is the purpose of this supposed to be, User:Hergilei? Are you putting a translation at the top of the page? Is this supposed to be a transliteration into Cyrillic characters? —Justin (koavf)TCM 04:48, 20 August 2019 (UTC)
It's a transliteration. You can see that by simply going to стар (underneath Adjective, a transliteration is provided for every language. стар is transliterated as star for four languages). I noticed a while back that transliterations are sometimes included in See alsos and "Appendix:Variations of". I didn't realize it would be controversial if I included them myself. Hergilei (talk) 16:31, 26 August 2019 (UTC)
@Hergilei: Can't blame you for doing things that you see at other entries: can you point out any others? —Justin (koavf)TCM 17:03, 26 August 2019 (UTC)
@Koavf: Appendix:Variations of "da", Appendix:Variations of "ma", Appendix:Variations of "ta", Appendix:Variations of "na", Appendix:Variations of "ba"
Hergilei (talk) 17:36, 26 August 2019 (UTC)
@Hergilei: But do you see these outside of the Appendix namespace? Things are a lot more forgiving there; it would be a different story if we had many transliterations in our proper dictionary itself. Do you have examples in the main namespace? —Justin (koavf)TCM 18:53, 26 August 2019 (UTC)
Couldn't we do something like that if we had IPA entries? DCDuring (talk) 13:38, 20 August 2019 (UTC)
What about lookalikes, like Cyrillic со- versus Latin-script co-?  --Lambiam 13:56, 20 August 2019 (UTC)
I've seen this a number of times before. I've also seen it with Japanese, which is even odder. Andrew Sheedy (talk) 16:31, 20 August 2019 (UTC)
Until we have a voted on policy and / or a set of computer readable rules for what the scope of {{also}} is this is an entirely pointless discussion. DTLHS (talk) 18:01, 26 August 2019 (UTC)
I thought that User:OrphicBot might have added "see also" entries like this, but apparently it mainly added words that look rather than sound similar, cases like Cyrillic со- and Latin co- as mentioned above by Lambiam. (See the list of equivalences. However, some equivalences are more sound-based: wƿ, thþ) If "sound-alikes" are going to be included in our current entry layout, {{also}} or the variation appendices are the right places to put them. — Eru·tuon 18:24, 26 August 2019 (UTC)

PoS surviving in compoundsEdit

Microsoft® Encarta® 2009 adds the adverb swift: "very quickly" but the example offered is a swift-flowing river. Wiktionary labels it as "(obsolete, poetic)", yet swift-flowing is used in everyday language (for Collins it's "BrE"). What should situations like this one be dealt with? --Backinstadiums (talk) 09:14, 20 August 2019 (UTC)

I think something else is going on here. When used as the first part of a compound, English words tend to go back on their root form in some cases, as seen in a walk of four miles, which is a four-mile walk (dropping the -s). Likewise, a narcissist who talks incessantly is an incessant-talking narcissist.  --Lambiam 14:28, 20 August 2019 (UTC)
@Lambiam: Searching for "swiftly-flowing" in Google books, we find Don Lee Fred Nilsen's The swiftly flowing river, and Benilde Graña López's swiftly-flowing (river), smart(ly)-dressing (man), rapid(ly)-rising (river), fast-moving (train), often amusing (entertainer), strangely winning (smile), equally daring (suggestion), curiously sobering (thought) --Backinstadiums (talk) 14:37, 20 August 2019 (UTC)
They should ideally be dealt with in accordance with attestation. If you don't have time for attestation, consult lemmings, eg at swift at OneLook Dictionary Search. If you don't have time for that, do nothing. DCDuring (talk) 15:48, 20 August 2019 (UTC)

OED's 4.0 edition entry is divided into A. for adj and B. for adverb, then B. points out ¶ Hyphened to pres. pple. and occas. to a finite part of a verb, on the analogy of combs. in C.3. Section C3. reads "Combs. of the adv. with pples., as swift-flowing" --Backinstadiums (talk) 17:53, 20 August 2019 (UTC)

Promoting Middle Scots (sco-smi) to a full-fledged languageEdit

In the Grease Pit there's a discussion about promoting Middle Scots to a full language so that it can have its own entries separate from Scots. A Middle Scot entry has been created (threschald), but it's using the language code and templates for Scots at the moment. There might be other entries (search query: : insource:"Middle Scots") that could be migrated to this language code. I'm posting this here because the Grease Pit isn't the right place for this type of discussion.

What are people's opinions about this? — Eru·tuon 17:07, 20 August 2019 (UTC)

See also Wiktionary:Tea_room/2018/February#Middle_Scots. DTLHS (talk) 17:09, 20 August 2019 (UTC)
Should just be a Modern English label, i.e. {{lb|en|Middle Scots}} and the only thing that could change that is a vote. --{{victar|talk}} 17:51, 21 August 2019 (UTC)
Let the administrators here begin a vote for this. I am keen on seeing Middle Scots being recognised here, so that further entries may be created.—Lbdñk (talk) 17:16, 23 August 2019 (UTC)
You should create a vote yourself- being an administrator doesn't give one a knowledge of Middle Scots. DTLHS (talk) 17:21, 23 August 2019 (UTC)

New tools and IP maskingEdit

14:19, 21 August 2019 (UTC)

@Johan (WMF): Hate it. I think this change will inadvertently promote vandalism, thanks to the extra privacy it affords. I am in favor of grouping IP accounts using cookies though -- that would be fantastic! --{{victar|talk}} 18:02, 21 August 2019 (UTC)
victar: We're certainly aware of the possibility. That's why we need to build better tools for fighting vandalism first – to, in the end, make sure that we give vandal fighters at least the same or better chances of fighting vandalism. /Johan (WMF) (talk) 10:08, 9 September 2019 (UTC)
@Johan (WMF): If you're not taking peoples opinions, you shouldn't have asked for them. --{{victar|talk}} 10:15, 9 September 2019 (UTC)
victar: We're taking note, and we do appreciate that people tell us their concerns and how it will affect their workflows. I just wanted to tell you that we are indeed aware of the general problem this will cause for anti-vandalism/spam/harassment measures – some people missed the "build tools first" part of the project. (: /Johan (WMF) (talk) 12:49, 9 September 2019 (UTC)

Norwegian Bokmål: classification and etymology issuesEdit

I originally started the discussion at Module_talk:languages/data2#Edit_request:_ancestors_of_Norwegian_Bokmål, but was asked to post here instead.

(Background info: The Bokmål standard of written Norwegian came to be by branching off the common Danish language used in Norway and Denmark after the written Middle Norwegian language died off. During the 19th century a Norwegian pronunciation of this language was developed (much like how standard German is “High German with Low German pronunciation”). A spelling reform in 1907 brought the Danish language written in Norway closer to this Norwegian pronunciation (e.g. -ede > -et) and also adopted some features of the native dialects of the capital area. After that, the language was definitely considered different from the Danish written in Denmark. In the mid-19th century the Nynorsk standard was created based on a “reconstructed base dialect” of all the native Norwegian dialects. The standard was slightly adjusted in 1901 and 1910 to reflect actual use. At this point the standards were clearly different languages. Then, in 1917, 1938 and 1959, radical spelling reforms aimed to merge the two. That project is now generally considered a failure and some of the changes have been reduced. Despite this, none of the standards have been reversed to their pre-1917 counterparts and Bokmål has adopted considerable amounts of Norwegian vocabulary, syntax, inflections and syntax. Because of this, Bokmål is no longer considered a dialect of Danish (saying that it still is, is considered offensive by many of its users) – and this is where the problems start.)

Wiktionary’s Module:languages is programmed to consider Bokmål a descendant of Middle Norwegian. While it can be argued that Middle Norwegian is one of its ancestors (given the large amount of Norwegian elements it has adopted, and also the fact that there exists multiple text written in Norwegian with the common Danish orthography, as well as the fact that Norwegian authors have incorporated elements from their own native Norwegian dialects through the whole period of Danish orthography in Norway), it still contains a lot of words inherited from Danish too (with or without changed spelling). One example is hellig, which has remained unchanged from the common Danish spelling. It was incorrectly listed as a descendant of Old Norse heilagr (I suppose that the source of this mistake is Bokmålsordboka), so I wanted to correct that, i.e. write that it was inherited from Danish hellig, which in turn is inherited from Old Danish hælægh. However, since Bokmål is not registered as a descendant of Danish, {{inh|nb|da|hellig}} produces an error message. Using {{bor}} (borrowed) instead is simply wrong; the word was not borrowed – it has been there all along. The current solution seems to be the {{der}} template, even though its documentation clearly states that {{inh}} and {{bor}} are preferred for words that are confirmed to be inherited/borrowed.

Because Bokmål has roots in both Danish and New/Middle Norwegian, I would like to request that the system reflect that, i.e. that Danish is added as an ancestor of Bokmål. While it is controversial to say that Bokmål is not a Norwegian language today, it is a historical fact that it has roots in Danish and the spoken Dano-Norwegian koiné (see w:Dano-Norwegian and also the language family tree in the infobox at w:Bokmål). Listing both Middle Norwegian and Danish as ancestors will allow correct etymologies to be entered and, as far as I can see, will cause no new problems. Hått (talk) 01:21, 22 August 2019 (UTC)

I don't see why it's wrong to list it as borrowed. It wasn't there all along; it didn't exist in Norwegian until the Danes came along. —Rua (mew) 13:49, 22 August 2019 (UTC)
It is correct that it was borrowed into the spoken language, but not in the written Bokmål standard. Unlike Nynorsk, Bokmål was not created from scratch, but instead developed from the common written Danish. The transition from Danish to early Bokmål happened gradually and the last newspaper to adopt the 1907 spelling reform (Aftenposten, a major one) did not fully do so until 1923. Borrowing indicates that a foreign element enters the language. The form hellig was used during the entire transition period (and is still used today), so I do not see how it can be listed as borrowed (if it was, when did the borrowing happen?). The word bil, on the other hand, was coined by the Danes in the early 1900s and quickly borrowed into Swedish and Norwegian (both Bokmål and Nynorsk). Hått (talk) 01:59, 25 August 2019 (UTC)
@Donnanz: You edit Norwegian entries; do you have an opinion on whether Danish should be considered an ancestor of Bokmål? — Eru·tuon 19:41, 10 October 2019 (UTC)
@Erutuon: User:Hått as a Norwegian should know more than I do; generally I would say that modern Bokmål was influenced by Danish as Denmark occupied Norway for many years, it was also part of Sweden before it finally gained independence (which is celebrated every year on 17 May). On the other hand Danish accepted Norwegian words such as ski (in DDO, no Danish entry yet), so there has been a two-way traffic. As Hått says Danish lingered on in Norway until the 20th century, until it was Norwegianised. Many Bokmål words have different spellings to Danish, but others have the same spelling. Both languages took words from Old Norse and Middle Low German, the latter may have come via Danish. Even now there is an unofficial language called Riksmål which comes from Danish, it was officially replaced by Bokmål a long time ago. I would say that {{inh}} and {{bor}} were added by some non-Norwegian editors, I only used {{der}} when doing etymology cleanups. DonnanZ (talk) 20:26, 10 October 2019 (UTC)
I should add that while Norwegian got a lot (most?) of its Middle Low German words through Danish, Middle Low German (and Dutch) also had direct influence on Norwegian. Notable examples include the usage of the possessive reflexive pronoun as a genitive particle (see w:His genitive) and words such as veldig (“very”, Danish retained native meget) and gutt (“boy”, Danish mostly uses native dreng). Riksmål is simply a continuation of the pre-1938 Bokmål (a spelling reform that year sought to bring Bokmål and Nynorsk much closer together by force, which attracted a great deal of resistance from both sides), since the 1950s(?) standardised by a non-governmental organization called The Norwegian Academy for Language and Literature (the name Riksmål comes from the predominant name of Bokmål from 1899 to 1929). As mentioned, the attempt to merge Bokmål and Nynorsk is generally considered to have failed, and since the 1980s, Bokmål and Riksmål have gotten closer and closer: the government abandoned the goal of merging Bokmål and Nynorsk, and following the resigning of w:Aftenposten’s editor in 1990 and a renewal of the Riksmål movement in the 1990s, most of the former “signal words” of Riksmål have fallen out of use. The current situation is that the Riksmål standard is de facto a subset of Bokmål, with a few insignificant exceptions.
I agree that there has been some two-way traffic, but words of Norwegian origin were often adapted to fit into the written Danish standard. An example is the Norwegian word molte (or molta), which was brought into the common written language as multe, and foss (“waterfall”) became fos. The core of the common language (pronouns, inflections, syntax, spelling “laws” etc.) used in Denmark-Norway was always Danish and even though it was heavily Norwegianised in Norway roughly a century after the dissolution of the union, that process happened gradually, which is why I think that Bokmål should have Danish as an ancestor. The core of Nynorsk, on the other hand, remains Norwegian even though that standard has been heavily influenced by Danish through the attempt to merge it with Bokmål.
Hått (talk) 00:47, 11 October 2019 (UTC)
This reminds me a lot of English: if you only looked at the relative numbers of words in the lexicon for each source, you might think Middle English was a dialect of Old French. The core vocabulary, though, is predominately inherited from Old English. The question is whether a similar pattern holds for Norwegian. Chuck Entz (talk) 04:36, 11 October 2019 (UTC)
I have often thought that Bokmål is a compromise between Danish/Riksmål and Nynorsk, even Nynorsk has alternative spellings which are the same as Bokmål for some words, but inflections can differ between the two. English words have also been taken into Bokmål independently of Danish, also many words are of French, Latin or Greek origin without an intermediate language being given as a source. DonnanZ (talk) 10:20, 11 October 2019 (UTC)

@wiktionaryWOTD on TwitterEdit

Does anyone know anything about this @wiktionaryWOTD account on Twitter? It seems to have tweeted the WOTD for a few months from 2010-2011, and then stopped. Is this something we own? It would be neat to revive it. There also seems to be a plain @Wiktionary account that did the same thing until 2012, and a @WiktionaryUsers account that did some (mixed with other comments) until 2013. I would think that this could be automated. bd2412 T 17:18, 22 August 2019 (UTC)

Wonderfool briefly ran a Wikt Twitter account just to annoy me because I had been commenting about how much I hated Twitter. Forgotten which one it was. He got bored with it. Equinox 18:22, 22 August 2019 (UTC)
Even so, I think this would be a good tool to draw more attention to Wiktionary. bd2412 T 20:35, 22 August 2019 (UTC)
However, do we want to draw more attention to Wiktionary? Specifically from the Twitter crowd? I confess I don't like the idea. ‑‑ Eiríkr Útlendi │Tala við mig 21:12, 22 August 2019 (UTC)
What is the point of Wiktionary if we aren't drawing attention to it? As for the "Twitter crowd", there are apparently over 300 million active users (defined as individuals who have tweeted within the past month), so that's a substantial portion of the Internet-connected Western world. bd2412 T 21:49, 22 August 2019 (UTC)
Twitter lies about their userbase incessantly, you can probably divide that number by 100. If someone wants to set up a "Word of the Day" thing, sure why not. I feel strongly that it should not be sanctioned as an official representative of the site. DTLHS (talk) 21:58, 22 August 2019 (UTC)
I agree it shouldn't be treated as official, if we have one, due to the "proprietariness": these walled gardens that can plaster ads around, collect your data, and kick users off at any time with no recourse are the opposite of the free open spirit that wikis are supposed to have. Equinox 23:55, 22 August 2019 (UTC)
  • I thought the point of Wiktionary is to, well, be a dictionary, not to clamour for attention? The use case for Twitter is so orthogonal to the use case for Wiktionary that I honestly worry that an influx of Twitter users suddenly editing here may overwhelm the established editor base with problematic edits. We've seen strange upticks in problematic anonymous edits in the past that were apparently traceable to this or that social media platform linking through to Wiktionary, so I don't think my concern is entirely unfounded. ‑‑ Eiríkr Útlendi │Tala við mig 00:32, 23 August 2019 (UTC)
Hi! French Wiktionary has a Twitter account for almost ten years, managed by Lyokoï. It is mainly used to like and retweet when someone quote Wiktionary, and to give some answers on the way it work (descriptive, neutral, fancy, etc.). I think this action is quite effective to give a better picture of Wiktionaries and to have some feedback by the readers. For example, it was nice to know that some random people smile with the picture on l’herbe est toujours plus verte ailleurs (the grass always look better on the other side)   Noé 14:14, 23 August 2019 (UTC)
I am reasonably sure that the WMF doesn't like this sort of thing. - TheDaveRoss 14:23, 23 August 2019 (UTC)
WMF doesn't like individuals with Twitter accounts representing themselves as Wikimedia projects. Even Wikimedia has an "official" Twitter, as does Wikipedia. bd2412 T 02:19, 28 August 2019 (UTC)
Can confirm, re the Twitter handle @wikiquote. —Justin (koavf)TCM 02:30, 28 August 2019 (UTC)
I mean, that is what all of these are, individuals with Twitter accounts using the WMF marks. Even though we contribute we are still individuals who are not the WMF or Wiktionary.- TheDaveRoss 12:30, 28 August 2019 (UTC)
Oh yes, Wonderfool ran a Wiktionary Twitter account called WiktionaryUsers. He put the WOTD and FWOTD on there too, with occasional news probably like "we have 200000 entries" or "Equinox had another lame argument with SB, this time about the type of dashes to use in entries". He'd be able to restart it too. --Mélange a trois (talk) 23:58, 1 September 2019 (UTC)
    • Ironically, he was using that Twitter account longer than he'd ever managed to keep a WT account. --Mélange a trois (talk) 00:03, 2 September 2019 (UTC)
Good to hear from someone who knows WF. Please tell him to get another hobby (he said ironically). I now recall the associated user name was User:Wikt Twitterer. Equinox 00:06, 2 September 2019 (UTC)
You got it, Eq. And WF does have a new hobby - he has finally learned to play the harmonica. --Mélange a trois (talk) 00:19, 2 September 2019 (UTC)
Don't tell me... he learned it from that heiress he married. Equinox 00:29, 2 September 2019 (UTC)
Nope, SB taught him when he was down in his area a few months ago. You can find his "One for The Bristol City" version online somewhere. --Mélange a trois (talk) 07:49, 2 September 2019 (UTC)

Intransitive verbs with a specified prepositionEdit

Using the verb file as an example, the fourth meaning reads

(intransitive, with for, chiefly law) To submit a formal request to some office. 

However, the verb search shows

(intransitive, followed by "for") To look thoroughly 

I think a formal label should be added to the Appendix:Glossary explaining the currently wording with for, followed by "for" , etc. --Backinstadiums (talk) 18:09, 22 August 2019 (UTC)

I think it's an abuse of the label anyway. Context labels are meant to show meanings that occur in a specific context. But in this case it's not a meaning of the word in question, but rather the meaning of the word combined with another word. It's not that file means "to submit a formal request" whenever it's used with for but rather the combination file for that has this meaning. Secondly, we have the {{+preo}} template which is intended for this purpose. —Rua (mew) 18:17, 22 August 2019 (UTC)

Numerous etymologies in table of contents (ToC)Edit

For some entries with multiple etymologies (e.g. a-), the table of contents is only useful with one level - the languages. That's because it contains a list of mostly duplicate "Etymology 1"..."Etymology N" hierarchies with no ways of distinguishing them. An alternative, would be to replace N with a distinguishing summary, e.g.,

  • Etymology 1 - Middle English a- (“up, out, away”)
  • Etymology 2 - Old English an (“on”)
  • Etymology 3- Middle English a- ("with")

Or even briefer - just the senses. After all, we don't want the ToC to get too wide!

However, I'm not a linguist and unsure about solutions. Perhaps simply reducing the TOC limit is suitable in some cases. The general problem remains, I have found that the ToC is often less than usefull.

(If the Etymologies headers are augmented, an {{anchor}} with their old name may be required to prevent breaking links.)

Dpleibovitz (talk) 04:43, 23 August 2019 (UTC)

Adding reference to the coalmine rescinding vote to CFIEdit

Would someone please add Wiktionary:Votes/2019-08/Rescinding the "Coalmine" policy to WT:CFI#Idiomaticity as an additional reference to the COALMINE policy? Admittedly, it has not been a vote approving a change, so far. Nonetheless, it would enable CFI readers to see the most recent state of consensus on the matter; the original vote traced to from CFI is from 2009. There was another vote, Wiktionary:Votes/pl-2012-03/Overturning COALMINE, but that does not need to be linked from CFI, IMHO; it is linked from the 2019 vote anyway. --Dan Polansky (talk) 07:01, 23 August 2019 (UTC)

No one protests but also no one placed to WT:CFI#Idiomaticity a link to the vote, next to the link currently numbered "[10]". Would @bd2412 or @Mx. Granger be interested in adding the link to WT:CFI? (I picked bd2412 since we successfully worked together before, and Mx. Granger for diff; more people came to mind, but I don't want to bother more people.) --Dan Polansky (talk) 17:09, 5 September 2019 (UTC)
Let's wait until the vote has been closed before adding it as a reference to CFI. —Granger (talk · contribs) 00:27, 6 September 2019 (UTC)
@Mx. Granger: The vote has been closed. Would you please add it as a reference? --Dan Polansky (talk) 07:44, 22 September 2019 (UTC)
  DoneGranger (talk · contribs) 13:41, 22 September 2019 (UTC)

Japanese: move kana and rōmaji to the pronunciation sectionEdit

In a Japanese entry, the kana and rōmaji forms are currently placed in the headword templates. However, a look into Module:headword reveals that these places are originally for inflections instead of alternative scripts:

使う (transitive, godan conjugation, ren'yōkei 使い, past 使った)

Transliterations are supported but handled separately (and apparently Latin-only):

使う (tsukau) (transitive, godan conjugation, ren'yōkei 使い(tsukai), past 使った(tsukatta))

In light of this, I would like to propose moving the kana and rōmaji to the pronunciation section, for the following reasons:

  • The standard entry layout requires the pronunciation section before the headword lines. This means that for kanji entries like 申す, the reading first appears in phonetic transcriptions (ーす[móꜜòsù] and [mo̞ːsɨᵝ]) and then in kana and rōmaji which more people regard as the "reading" (もうす and mōsu). This reverses the usual logic of kanji entries, because kanji spellings must first be "read" to get words and words then have phonological information. If we move the reading to the pronunciation section and display it prominently, then the structure of kanji entries can become more clear:
Hiragana もうす
Hepburn romanization mōsu
Kunrei-shiki romanization môsu
Historical hiragana まうす
  • For words having multiple parts of speech, the alternative scripts must be repeated in every headword template, which makes maintenance harder and error-prone. If we change the format of headword templates to the first one above, we can eliminate all repetition. If we change it to the second one above, we can still eliminate repetition of the historical kana and kyūjitai.
  • Kyūjitai can be moved to {{ja-kanjitab}} and displayed in a larger font size. Historical kana will receive better support (support of "historical hiragana and katakana" for entries like 耶蘇教, multiple historical hiragana for entries like 向こう, etc.)
  • Headword lines can be more learner-friendly (as shown above).

On the other hand, there are some disadvantages with such an approach:

  • It means a lot of work to do (possibly by bots). For example, entries lacking a pronunciation section should be supplied one, and entries with {{ja-pron}} which lack orthographical information (現代仮名遣い quirks, capitalizations, spaces and hyphens, etc.) should be fixed. Such a change would also make the pronunciation section mandatory for new entries.
  • If we go for the first format of headword templates above, then kana and rōmaji would be completely decoupled from POS or sense. This means that entries like お玉杓子 would require something like |orthn_note= (similar to |accn_note=) to indicate which spellings apply to which senses. It would also make auto generation of categories like Category:Japanese type 1 verbs that end in -iru or -eru impossible.

What do you think of such an approach?

(Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Dine2016 (talk) 07:24, 24 August 2019 (UTC)

I haven't had time to fully think through the ramifications, but broadly speaking, I'm in support of this proposal. Some notes:
  • "...then kana and rōmaji would be completely decoupled from POS or sense." This should already be the case. If a specific sense has a different reading, then that sense should be split out to a separate etymology section. As such, all entries under a given etym should have identical kana and romaji. If a specific sense has usage quirks, such as using katakana in certain contexts, those should be explained in a usage section. Alternatively, we could follow the practice of various other monolingual Japanese dictionaries, and indicate specific spellings at the start of the line.
  • "...entries lacking a pronunciation section should be supplied one..." Under current practice, we (the JA editor community) already strive so that every entry, and indeed every etym section for every Japanese entry, should have its own pronunciation section, so there is no opposition from me on that score.
  • "It would also make auto generation of categories like Category:Japanese type 1 verbs that end in -iru or -eru impossible." I don't follow. We would have to continue adding kana to headword templates anyway for proper generation and sorting of the basic POS categories.
‑‑ Eiríkr Útlendi │Tala við mig 04:42, 26 August 2019 (UTC)
I support this approach (a box to display pronunciation). I like how the romanizations are automatically generated similar to {{ko-ipa}}, and that historical hiragana is listed below for a less cluttered appearance. However, I'm not too sure regarding what needs to be removed or eliminated. Can you elaborate more on this? KevinUp (talk) 21:16, 26 August 2019 (UTC)
@Eirikr: Thanks for your reply, but I think we have reached a consensus to eliminate sortkeys, haven't we? A look at Category:Japanese nouns reveals that the current sorting scheme works poorly, but even if it works well, the mass acceleration of kana forms of kanji entries will flood the category like this: あいいろ, 藍色, あいいん, 愛飲, あいうち, 相撃ち, あいえき, 愛液, etc. If we sort entries by their usual order, i.e. kana by kana and kanji by kanji, then there will be no harm to the kana part of the entry, and you get a new way to look up entries—by kanji.
@KevinUp: For example, 必然 has two parts of speech, so the reading ひつぜん is repeated in the two headword templates. If we move it to the pronunciation section, we (1) not only make the structure of the entry more logical (e.g. kana and rōmaji before accent and IPA), (2) but also makes entries easier to maintain. If we retain it in the headword templates (while adding a pronunciation section like the one above), we get only the benefit of (1) but not (2). --Dine2016 (talk) 03:58, 27 August 2019 (UTC)
I support moving hiragana (also historical) and romaji to pronunciation sections. --Anatoli T. (обсудить/вклад) 11:34, 27 August 2019 (UTC)

Order of meanings according to their labelsEdit

Currently meanings follow no label order, so that one or several (obsolete/archaic) meanings may come before ones still used in everyday speech, hindering thereby the user's lookup. --Backinstadiums (talk) 10:07, 24 August 2019 (UTC)

For some purposes. Some users like historical order. Some like frequency, of whom some like current frequency in speech, others frequency in writings, including older writings. Some like specialized senses last. Some like literal/physical senses first. Some like grouping by grammatical properties, eg, (in)transitive, (un)countable. Unfortunately, there are many entries where these preferences conflict. We should consider ourselves fortunate that no one seems to prefer alphabetical order of definitions or ordering by the length of the definitions.
I am unaware of any work done to support any one particular approach, though many opinions have been expressed. DCDuring (talk) 16:15, 24 August 2019 (UTC)
I don't much like the in/transitive grouping: I think it's more sensible to group semantically related meanings; syntax/grammar is really secondary to what the word actually means. I have a habit of swapping ety sections and moving senses down if something obsolete and totally unused is at the top (which is curiously often the case). We won't satisfy everyone until we have a sort (by date, frequency...) feature, but we lack the data to support it. The OED marks words with a star rating to show how common they are in modern texts, which is cool, but obviously must be generated by huge amounts of data we don't see. Equinox 16:41, 24 August 2019 (UTC)
Webster's 1913 Dictionary presents the oldest, original meaning first, also when it has become quite uncommon. For example, the sense of victim as “a living being sacrificed to some deity” is the first one given. A strong argument can be made for “most common meaning first”, but a strong argument can also be made for ”original meaning first”.  --Lambiam 09:58, 25 August 2019 (UTC)
@Lambiam: as long as it follows an order, the temporal descending/ascending addition doesn't matter. victim intersperses its original sense in the second position --Backinstadiums (talk) 13:10, 25 August 2019 (UTC)


I asked user Rua, who did a lot of the coding, and he suggested I ask here.

Could we change the display of e.g. an infix ka from -ka- to ⟨ka⟩? That's the standard format (e.g. Leipzig glossing rules). Otherwise it looks like a suffix followed by another suffix. And could we make param 1 optional, as it is with template:suffix? It's useful to be able to say 'may contain an infix ⟨ka⟩' and link to the category without providing the stem, especially if the stem is obscure.


We are not in the business of interlineal glossing, so I don’t see how the Leipzig glossing rules would apply. I expect that most (> 99.9%) of our casual users would not understand the significance of such angle brackets and be more likely to be confused by them than enlightened. See how the British infix -bloody- is presented here in print: “a-bloody-gain”. As to the second request, I assume param 2 is meant (the base into which the infix is inserted). There would still be a + sign.  --Lambiam 10:28, 26 August 2019 (UTC)

Need a uniform policy on handling of Latin specific epithetsEdit

(Notifying Fay Freak, Brutal Russian, JohnC5, Lambiam): User:Marontyan has been manually editing various Latin adjectives and nouns that are used as specific epithets and converting them into Translingual entries. I'm not opposed in principle to this, but

  1. We need a specific, uniform policy for handling these before editing them piecemeal.
  2. We need to agree on a format that preserves as much info as possible. Currently I include a full declensional table, which may be overkill but does clearly show the masculine, feminine and neuter nominative singular forms. For these terms I've created a template {{la-epithet}} meant to go into the Usage Notes section that indicates as far as I can tell whether the term is exclusively, mostly, or additionally used as a specific epithet, and what grammatical type it is (adjective, noun in the lemma form, noun in the genitive singular, or noun in the genitive plural). It also adds the term to Category:Specific epithets.

The tricky thing here is that a lot of specific epithets also are used at least occasionally in other Latin. This goes especially for -ēnsis terms (e.g. diplomas from Princeton are or were in Latin and used Princetoniēnsis, and the seal of the University of Arizona says "SIGILLUM UNIVERSITATIS ARIZONENSIS" on it), but may apply to other terms as well, so we need to be careful before we conclude that a given Latin term is exclusively used as a specific epithet and hence should maybe be converted to Translingual. For this reason my instinct is to leave these as Latin terms (they -- mostly at least -- follow Latin grammatical rules, after all), but others may disagree. Benwing2 (talk) 06:46, 26 August 2019 (UTC)

Per the codes of the international bodies regulating biological nomenclature, taxon names have to be grammatically correct Latin. So why classify them as “translingual”? Is that a language with a grammar? How do we know that some epithet is a translingual adjective, and not some other translingual part of speech? What about the epithets that change under gender agreement; are these Latin while the gender-invariant ones are not? Also pinging User:DCDuring.  --Lambiam 09:50, 26 August 2019 (UTC)
That’s why we shouldn’t imagine a strict distinction between Latin and Translingual. This distinction is merely a product of the minds of Wiktionary editors. The language users just use the words and do not think first which language their words are in but whether they understood, or whether they conform to any rules; or the Latin words they use can be “thought” only and code-switched to without ever having been used in Latin text. As I have mentioned elsewhere, the inflection tables – or if we are at it also the head templates – should just have a parameter that makes the tables Translingual, so not linking the forms as Latin but as Translingual or not at all (for it is then still effective for SEO if the forms are there but not linked). Then it is six and two threes whether an epithet we create is Latin or Translingual because the content will be the same. This is probably also easy to implement for @Benwing2. Like I did on situliformis. No spectacular layout, just one parameter lacking at two places. Fay Freak (talk) 12:11, 26 August 2019 (UTC)
Taxonomic Latin originated as actual Latin, and (for plants, algae and fungi at least), is occasionally still used as such for the formal description of a taxon. Mostly, though, you have nouns in the nominative and genitive as well as adjectives in the nominative and sometimes in the genitive (for parasitic species named after the host). When names aren't used in Latin sentences, accusative, dative, locative and vocative are completely absent. Generic names are nouns in the nominative singular. Names of higher taxa are generally formed by replacing the ending of the genitive of a generic name with a standard ending denoting the rank. Epithets for species and below are mostly adjectives agreeing in gender with the generic name, but also nouns in the nominative singular or plural, and nouns in the genitive singular or plural agreeing in gender and number with the referant. So taxonomic names use a subset of Latin morphology, but that subset follows Latin rules. Chuck Entz (talk) 13:59, 26 August 2019 (UTC)
After edit conflict. Some duplication of Chuck's contribution.
I can live with almost any resolution of this, including no resolution. I use {{epinew}} to allow links from inflection lines to the lemma form of specific epithets in whatever language I choose for the main entry for the epithet. Usually the epithet has an existing Latin or Translingual entry. But I sometimes declare a term that is identical in spelling to a word in, say, Old Tupi to be Old Tupi instead of creating a vacuous Latin or Translingual L2 section for the term.
Specific epithets that are adjectives need only nominative singular inflected forms. Some specific epithets are nouns and can either be nominative singular or genitive, sometimes singular, as in eponyms, sometimes plural, as in nouns referring to a habitat or host of an organism. Full declension tables are not necessary to provide information for the proper use of specific epithets.
That said, as noted above, some terms used as specific epithets have had other use. That use includes use by the Roman Catholic church in some internal communications; use in running Latin text for scientific purposes, including use outside of Linnaean taxonomy; and use in various mottos, inscriptions, and formal documents. It is usually time-consuming (and often impossible) to try to attest to such use in contexts other than taxonomic names, just as it is time-consuming (and often impossible) to attest to some rarer inflected forms of classical Latin words, especially verbs.
I would appreciate it if any resolution did not lead to time-consuming attestation efforts or revisiting each taxonomic entry. DCDuring (talk) 14:14, 26 August 2019 (UTC)
What about the following. If same case form is unattested, but no doubt can exist that this is what a native Latin speaker would have used, we just list it without qualms. So the dative plural of brooklynensis is simply brooklynēnsibus. If we need to guess but have one or a few plausible forms, we list them with an asterisk and add “ (?)”, separating alternative conjectures with slashes. If we have no idea, we just put a question mark.  --Lambiam 15:06, 26 August 2019 (UTC)
The pattern only needs to be known. The correct inflection of a word is not reconstruction; any more than it is a good idea to hunt down all Latin forms that have not yet been used because the word is only attested a few times – which starring would be random and of little value. ”So diplomarius is attested but diplomariorum is not? What a great discovery!”
What is “needed” is a weak cause for restriction here. Fay Freak (talk) 20:42, 26 August 2019 (UTC)
WT:CFI, Talk:albifrons for example, and WT:Translingual are quite clear about this:
  1. If not used in Latin, it isn't Latin (see WT:CFI, Talk:albifrons).
  2. Even if attested translingually, there can still be a Latin entry (see WT:Translingual#Other languages).
    This should also go vice versa: If there is a Latin entry, there can also be an entry for a translingually used term. (Somewhat) similar to how there is football in many languages.
  • Template:la-epithet is (often) wrong in Latin entries: "and thus not inflected except in ..." isn't correct for Latin. It applies to English, French etc. which isn't Latin. And even if a term isn't attested in Latin, the note can be wrong as shown in Translingual ruderalis and by the plural of Translingual Homo sapiens.
  • riobamba is wrong: It's translingually used riobambae - in Latin it would be *Riobamba as it is a proper noun. It's more obvious when comparing for example Translingual fleischmanni and Latin Fleischmanni, form of Fleischmannus.
    (Of course, the translingually used form could also occur in Latin, but than it's science-speak, rather something similar to a chemical formula like H₂O, and not usual Latin.)
  • "separating alternative conjectures with slashes": Well, bad or too dubious conjectures shouldn't be listed at all. And even the inflection of Latin amethystizon looks dubious in many ways: Most forms lack a star, and gen. sg. could also be *amethystizontos, acc. sg. m.&f. *amethystizonta, nom. & gen. & voc. pl. n. *amethystizonta, compare the Greekish acc. pl. amethystizontas instead of a Latinate *amethystizontes. And even if amethystizon would have Latin forms, why shouldn't it still be amethystizonta instead of amethystizontia similar to vetus with vetera? Anyhow, the entry lacks Greekish conjectures and might be better with no conjectures.
  • "taxon names have to be grammatically correct Latin": Which doesn't mean it is grammatically correct Latin or Latin at all. German Handy, French foot and other pseudo-anglicisms aren't English either, even if native German or French speaker might reject that such terms are classified as German, French etc.
  • "So why classify them as “translingual”?":
    • Many taxonomic terms are used in multiple languages, that is, they are used translingually.
    • "Translingual" covers international science-speak and codes (see Wiktionary:Translingual#Accepted).
    • Many taxonomic terms aren't used in Latin, and thus aren't Latin, just like pseudo-anglicisms aren't English.
  • "What about the epithets that change under gender agreement; are these Latin while the gender-invariant ones are not?": Gender and gender-agreement in case of taxonomic terms also occur translingually, even in English.
    What might be more interesting: Does the taxonomically implied gender always agree with the language's gender, is it for example always "der Homo sapiens" m or does also an incorrect "die Homo sapiens" f or "das Homo sapiens" n occur? In case of Translingual Nix Olympica the gender of Latin nix and of the translingually used term do not always match. Would be funny, if Translingual Homo sapiens is male by origin and taxonomism, but female or neuter in some language, e.g. if foreign terms (usually) become neuter in that language.
--Marontyan (talk) 20:59, 26 August 2019 (UTC)
By the mechanistic logic expressed we should redesignate as Translingual all the Latin terms ever used in a taxonomic name that has been ever used in running text in multiple languages. Limiting ourselves to adjectives, we would start with albus, niger, ruber, cyaneus and continue through the specific epithets used in the entries in Category:Species name using Latin specific epithet. We could then start on the ones in Category:Species entry using missing Latin specific epithet and then proceed to the specific epithets for the millions of other species names that we haven't entered. That should lighten the burden of adjectives that our Latinists have to maintain. DCDuring (talk) 04:17, 27 August 2019 (UTC)
@DCDuring: See WT:About Translingual#Other languages: "The classification of a term as Translingual does not prevent the article having sections for other languages". Besides Latin albus there can be Translingual albus and vice versa. --Marontyan (talk) 07:36, 14 September 2019 (UTC)
Reducing, not increasing, duplication of semantic content is the direction we generally go. Nobody voted on the content of "About Translingual". DCDuring (talk) 12:28, 14 September 2019 (UTC)

Transliteration for Mandarin, Japanese, Korean - No italics or italics depending on situation?Edit

I've been using {{zh-l}}, {{ja-r}}, {{ko-l}} much too often, which led me to think that romanizations in Chinese/Japanese/Korean are always written in italics.

My recent request at the Grease Pit made me realize that italics are for mentions, whereas romanizations in lists are usually unitalicized, as explained by User:Rua.

However, Chinese, Japanese and Korean entries using {{zh-l}}, {{ja-r}}, {{ko-l}} have romanizations in italics by default, regardless of whether it appears as a mention or in a list.

Should we apply the usual convention in Wiktionary, i.e. italics for mentions, and no italics for lists? I think making them all appear as italics by default (for Mandarin, Japanese, Korean romanizations) would be more manageable. KevinUp (talk) 17:31, 26 August 2019 (UTC)

Yes. —Suzukaze-c 19:10, 26 August 2019 (UTC)
  • Personally, I dislike "modal" display, as I find the inconsistency jarring. I would prefer that we be consistent. My ideal state for the various {{ja-r}}, {{m}}, and {{l}} templates would be for Japanese transliterations to always be italicized, and for translations to always be non-italicized.
That said, if we are to adopt different formatting for Japanese transliterations in different templates, then I would accept italics in {{ja-r}} and {{m}}, as these are commonly used in running text, and non-italics in {{l}}, as this is commonly used in lists.
‑‑ Eiríkr Útlendi │Tala við mig 16:49, 27 August 2019 (UTC)
I've noticed that most languages use {{l}} or {{der3}} for derived terms, which gives transliterations without italics. However Japanese entries use {{ja-r}} instead for derived terms, which gives transliterations in italics. This is because {{ja-r}} can also be used in running texts.
The same situation also occurs in Chinese and Korean entries, where transliterations of derived terms appear in italics even though they are lists, not mentions. KevinUp (talk) 20:49, 27 August 2019 (UTC)
I will say that the situation with Japanese is different as Romaji is actually fairly common there. No one really composes anything in Japanese with Latin characters but the Latin alphabet is definitely one of the writing systems used by the Japanese for their language, whereas that is not true with Chinese languages or Korean. Hence, italicizing to show "this isn't really what they do over there--it's foreign" is not really applicable to Japanese. —Justin (koavf)TCM 19:19, 26 August 2019 (UTC)
@Koavf: ...the Latin alphabet is definitely one of the writing systems used by the Japanese for their language - it's a completely wrong statement caused by the fact that we allow romaji entries for disambiguations. The Latin script is not a writing system for Japanese! I don't know why you keep pushing this agenda. --Anatoli T. (обсудить/вклад) 01:31, 27 August 2019 (UTC)
@Atitarev: "Keep"? "Agenda"? It is definitely true that Japanese use Latin characters regularly and not only for brand names. All schoolchildren use Romaji and it's very common for Japanese to use it. Not sure what you're going on about here. —Justin (koavf)TCM 03:11, 27 August 2019 (UTC)
Latin as a writing system for Japanese existed as far back as the 16th century as well. --Dine2016 (talk) 03:20, 27 August 2019 (UTC)
Transliterations and a helping tool for students is not a writing system. It's a great fallacy. --Anatoli T. (обсудить/вклад) 03:29, 27 August 2019 (UTC)
Tell it to the Japanese. —Justin (koavf)TCM 04:01, 27 August 2019 (UTC)
I'm telling you who made this statement. The Japanese write using the Japanese writing system and don't make such statements. --Anatoli T. (обсудить/вклад) 04:15, 27 August 2019 (UTC)
Somehow, you are under the impression that I control the Japanese education system or that I have been responsible for keyboard inputs in the 21st century. I am not. Why adopting Latin characters is "fallacious" whereas adopting Chinese ones weren't is a mystery to me... ¯\_(ツ)_/¯Justin (koavf)TCM 04:22, 27 August 2019 (UTC)
You're only responsible for your own words. Your incorrect statements don't change what the Japanese or the Chinese use as their writing system. Both the Japanese or the Chinese (can) use Latin letters to type in the native scripts in various input systems. The Japanese adopted a mixture of Chinese and native/modified writing systems but not never the actual Latin script and it's not a fallacy but just the way it is. It's the fallacy asserting that they use Latin letters as one of their writing systems, rather than a native script. --Anatoli T. (обсудить/вклад) 05:15, 27 August 2019 (UTC)
Okay. I'm not going to argue about whether or not Romaji is a fallacy. All Japanese use that system, it's very common, and therefore that is unlike Chinese and Korean populations. —Justin (koavf)TCM 05:55, 27 August 2019 (UTC)
The fallacy is your statement, not the romaji - a romanisation system for Japanese. Yes, it's one of the romanisation systems (a transliteration, an IME input, a learning medium for foreigners) but it's not a Japanese writing system. Yes, the Japanese type "t-o-u-kyo-u" on their keyboards to enter kana とうきょう and then convert to 東京 but they don't write or type "Tōkyō" or "Tokyo", it's not Japanese. "Tōkyō" or "Tokyo" are used for the foreigners. You made another mistake saying "unlike Chinese ...". The Chinese also use many romanisation systems and can use Latin letters as one of many input methods - e.g. type "beijing" and convert the Latin letters to "北京". --Anatoli T. (обсудить/вклад) 11:05, 27 August 2019 (UTC)
Diglossia and digraphia --Backinstadiums (talk) 16:14, 27 August 2019 (UTC)
@Koavf: We italicize non-English Latin script mentions too so I don't really see what point you're trying to make.
Also, I definitely think non-Mandarin transliterations should get their own templates too, it will be useful for etymologies and descendants lists particularly when the word doesn't even appear in Mandarin. —AryamanA (मुझसे बात करेंयोगदान) 11:24, 27 August 2019 (UTC)
For me, I would prefer to see the transliteration in italics if it appears right after the main form in Chinese/Japanese/Korean. So romaji listed as a headword or romanizations in {{zh-pron}} shall remain unitalicized. KevinUp (talk) 19:34, 26 August 2019 (UTC)

I found the previous discussion at Wiktionary:Beer parlour/2018/March#Unifying the display of romanisations in links and headwords: italicise romanisations by default and Wiktionary:Votes/2018-03/Showing romanizations in italics by default. KevinUp (talk) 20:51, 26 August 2019 (UTC)

I would oppose to that vote as well, because it's too general, and I prefer the output of {{l}} and {{m}} to be distinct. I think that romanizations in italics ought to be considered on a case-by-case basis, depending on the language. Also, italicized romanizations of headwords makes it appear to be less legible. KevinUp (talk) 20:51, 26 August 2019 (UTC)
@Dan Polansky Would it be possible to start a fresh vote, confined only to Mandarin, Japanese and Korean, something such as "Italic forms for romanized forms of Mandarin, Japanese, Korean which are listed after the main forms in the respective languages". KevinUp (talk) 20:51, 26 August 2019 (UTC)
Main form for Mandarin: (1) hanzi
Main form for Japanese: (1) kanji or (2) hiragana or (3) katakana or (4) combination of kanji and hiragana.
Main form for Korean: (1) hangeul or (2) hangeul followed by hanja
I'm only considering these three languages because this is the convention used by {{zh-l}}, {{ja-r}}, {{ko-l}} by default. KevinUp (talk) 20:51, 26 August 2019 (UTC)
Is there some reason I'm missing here why you have separated Mandarin from other Chinese languages (e.g. Cantonese)? Is there a reason in principle why they would be treated differently? —Justin (koavf)TCM 21:40, 26 August 2019 (UTC)
The reason is because many Chinese languages share the same headword and the default romanization of {{zh-l}} (zh for Chinese language) is currently in Mandarin. An editor was interested to have Cantonese, Min Nan, Hakka romanizations for {{zh-l}} but that discussion did not went well - see Talk:吃飽.
Currently, we don't have templates such as {{yue-l}} or {{nan-l}} or {{hak-l}} to display transliteration of these languages. My opinion is that the choice of italicization ought to be done on a case-by-case basis. KevinUp (talk) 22:53, 26 August 2019 (UTC)
@KevinUp: And would there be any outstanding reason why things would be different for Hakka, Min-Nan, Yue, etc.? —Justin (koavf)TCM 23:02, 26 August 2019 (UTC)
I suppose we could do the same for Cantonese, Min Nan, Hakka, etc. It's just that currently we don't list transliterations for words in these languages after its hanzi form. See the synonyms section of 吃飽吃饱 (chībǎo) for example. 食飽食饱 is written there without any transliteration because the lemma could be either Cantonese, Min Nan, Min Dong or Hakka and editors are unsure which should be listed and in what order. KevinUp (talk) 23:33, 26 August 2019 (UTC)
What order to list things in is a pretty easy thing to resolve, especially in a dictionary: alphabetical order (Hakka, Mandarin, Yue...). Thanks. If I were more knowledgeable about Chinese languages, I'd have something more valuable to add but in principle, I think we should definitely have all the standard romanization/transliteration schemes and for the spoken languages that are encoded as written Chinese, these will all be dramatically different. —Justin (koavf)TCM 00:22, 27 August 2019 (UTC)
Alphabetical order can be very subjective. Yue could also be Cantonese. Min Nan could also be Hokkien. --Dine2016 (talk) 03:12, 27 August 2019 (UTC)
@Dine2016: Then you just choose one as the name and alphabetize that. Alphabetical order itself is not subjective. —Justin (koavf)TCM 03:14, 27 August 2019 (UTC)

I've noticed that most languages use {{l}} or {{der3}} for derived terms, which gives transliterations without italics. However Japanese entries use {{ja-r}} instead for derived terms, which gives transliterations in italics. This is because {{ja-r}} can also be used in running texts. The same situation also occurs in Chinese and Korean entries, where transliterations of derived terms appear in italics even though they are lists, not mentions. KevinUp (talk) 12:14, 28 August 2019 (UTC)

Relax SOP a bitEdit

Contrary to rescinding WT:COALMINE (which has been proposed, and for which the proposal has drawn overwhelming opposition), we should move more in the direction of relaxing SOP a bit, and including more common collocations that don't fit neatly into current SOP exceptions. Obviously we shouldn't have an entry like yellow floppy disk to define "a floppy disk that is yellow", but if a phrase occurs for which some reasonable argument can be made that, for example, it relies on unintuitive senses of the words involved, or has component words using one of a number of possible senses. Basically, we should spend less time discussing these at RfD. I'm sure someone is going to suggest that this would impose an increased burden of maintaining those entries, but with over six million entries, the cost of maintaining any additional harmless entry is less than 1/6,000,000th the cost of maintaining the dictionary as a whole. Cheers! bd2412 T 02:14, 28 August 2019 (UTC)

I agree that we ought consider allow one SoP entry once we agree on the revised criteria for including that entry. DCDuring (talk) 02:24, 28 August 2019 (UTC)
@BD2412: Agreed in principle but do you have any specific examples? That will help others understand if they want to agree with what you're proposing. —Justin (koavf)TCM 02:31, 28 August 2019 (UTC)
Just from the current list at WT:RFD, I think prison gang, race traitor, bamboo suit, lower lip, evil spirit, and snap election would be spared the trouble of being litigated. bd2412 T 02:38, 28 August 2019 (UTC)
Based on those examples I think the current policy is just fine. The ones which I think ought to be kept are likely to be kept, and the ones I think ought to be deleted are likely to be deleted. I would suggest that our current policy is probably too lax, or really that it has a bunch of loopholes that allow for entries which provide little value (or even detract). Just like in the COALMINE discussion/vote I would be in favor of amending the policy, but without an alternative being proposed I couldn't say how I might vote. I am not sure that any (reasonable) change to the policy would actually reduce the amount of RFD discussion, it would just shift the border and we would have all of the discussions about a different class of entries. - TheDaveRoss 12:46, 28 August 2019 (UTC)
Actually, in this case, we have a zero-sum game. If there are a thousand entries that could reasonably be proposed for deletion under the current rule, and relaxation of that rule reduces the number of entries possibly subject to deletion to, say, 750, then we would not merely be debating the deletion of a different thousand entries. Presuming that potential problems are found and considered for deletion at the same rate as they are now, those actually proposed for deletion would drop by a quarter, as would all the work that goes into discussing, closing, and archiving those additional 250 proposals. bd2412 T 02:02, 30 August 2019 (UTC)
The problem being that, if we relaxed the rule, people would create entries that they would not create today, and those would be added to the 750 to get us back to 1000. Obviously there may be more or less than right now, but just because some portion of current debates wouldn't happen that doesn't mean that new debates wouldn't arise. - TheDaveRoss 13:28, 30 August 2019 (UTC)
I don't get the sense that people look at the rules at all before creating entries. I think it's a non-factor. bd2412 T 00:04, 31 August 2019 (UTC)
But we look at the rules when we do RfD until we internalize them. The RfD process and its references to CFI transmits it to to newer contributors and probably serves to discourage those who don't have the required attitude and values for this kind of activity. DCDuring (talk) 02:36, 31 August 2019 (UTC)
That is possible, but some of the terms brought up for discussion were created years ago (prison gang, for example, was created in 2005), so there have to be other factors involved besides the inside baseball of our deletion processes. This should not discourage us from explicitly allowing more common collocations, rather than arguing over them and making exceptions to allow them. bd2412 T 03:16, 31 August 2019 (UTC)
How does the fact that many entries have not had contributors' eyes set on them for a decade or more have any bearing on this? At most that says that it is useful to take advantage of Recent Changes to catch bad entries when they are first made. DCDuring (talk) 14:02, 31 August 2019 (UTC)
The entries we argue about have often been here for years. These are not being newly minted in response to changes in the rules. bd2412 T 18:00, 31 August 2019 (UTC)
@BD2412: So what? DCDuring (talk) 03:46, 1 September 2019 (UTC)
It's hard to tell whether entries that haven't been "set eye on by contributors for a decade" are useful to passive users who never voice a comment. I sincerely hope any entries I make, SoP or not, are useful to that potentially large group. DonnanZ (talk) 18:29, 31 August 2019 (UTC)
@Donnanz: I wasn't speaking of users, I was speaking of contributors, ie, editors. I don't know about whether SoP entries are useful to any users at any stage of the language learning process, though I think they are less essential than non-SoP entries. DCDuring (talk) 03:46, 1 September 2019 (UTC)
"I don't get the sense that people look at the rules at all before creating entries. I think it's a non-factor." I can't believe a real-world lawyer said this. Ignorance of the law is no excuse! -- etc. I think it's very important that people can bitch about how they dislike the rules, but the rules are the gold standard. If your only point at RFV/RFD is "well this rule sucks" then really you ought to arrange rediscussion of the flawed rule, not deliberately and knowingly vote contrary to it. (Okay I've probably done that once or twice myself, since votes are so time-consuming, but I didn't feel great. StackOverflow ate me alive when I posted a comment as an "answer" because their nutso system didn't let me post comments yet.) Equinox 23:25, 1 September 2019 (UTC)
I oppose this in advance. Canonicalization (talk) 12:34, 1 September 2019 (UTC)
I also oppose, but I would like to see some way of handling common collocations better. I think we should include them, but not in the mainspace. Andrew Sheedy (talk) 18:55, 7 September 2019 (UTC)

Categories: what kinds of entries should they go in?Edit

I've been chewing on this question in the back of my mind for a while.

Japanese entries almost always have multiple forms, due to the complexities of the writing system. As one example, take 重箱読み (jūbakoyomi, a specific kind of reading of a Japanese kanji compound). The lemma form is 重箱読み, the historical (pre-spelling-reform) kanji form is 重箱讀み the hiragana form is じゅうばこよみ, the historical hiragana form is ぢゆうばこよみ, the romaji form is jūbakoyomi (or possibly jūbako-yomi, depending on convention), the katakana form is ジュウバコヨミ, and the historical katakana form is ヂユウバコヨミ.

The thing itself is labeled as {{lb|ja|linguistics}}. But which of the entries for these various forms should be included in Category:ja:Linguistics?

  • Only the lemma form?
In general, all entry details should be consolidated into the entry for the lemma form, and entries for alternative (non-lemma) forms should be bare-bones soft redirects, as I've understood things so far.
  • All forms?
In the interests of usability and discoverability, the argument can be made that all forms should be categorized, to aid users who might look up a given form in a category index, without knowing which form is the lemma.
  • Some other selection of forms?
I can't think of a use case myself, but others might have ideas about including only some forms in categories, but not all of the forms.

Any insights appreciated. ‑‑ Eiríkr Útlendi │Tala við mig 22:23, 28 August 2019 (UTC)

I would like all of the forms (minus the rōmaji forms) to be included for discoverability and also because the determination of what is the lemma form (main entry) and alternative form in Japanese can be rather subjective especially when it comes to rare or technical terms. KevinUp (talk) 22:56, 28 August 2019 (UTC)
@Eirikr: If we include in Category:ja:Linguistics only the lemma forms, then it makes sense to sort the entries by reading. If we include all forms, then it makes sense to eliminate sortkeys and sort kana by kana and kanji by kanji. I was (and still am) in favor of the latter approach so I made {{ja-see}} copy all categories. If we use the former approach it will probably make things more difficult (for example, we need more complex rules to determine which categories to copy and which not). --Dine2016 (talk) 03:47, 6 September 2019 (UTC)
@Dine2016: One concern I have about indexing the categories by kanji is collation on the one hand, and lookup on the other. How are kanji entries ordered? And is it possible to add some kind of lookup for kanji, alongside one for kana, alongside one for the Latin alphabet? ‑‑ Eiríkr Útlendi │Tala við mig 19:25, 6 September 2019 (UTC)

Should definitions be formatted like sentences?Edit

Hello! Do you like worms? And do you like convenience foods that can be kept on the shelf for years until you need them, without the hassle of refrigeration? Then you'll love CAN OF WORMS. Let's open it.

Should definitions be formatted like sentences? This doesn't mean they must grammatically be sentences ("A fruit that grows on apple trees" is not a sentence) but it means they would start with a capital letter and end with a full stop/period. I have been pondering this because I keep seeing people taking an edit as an excuse to impose their favourite style. For example, for months, myself and my evil twin User:SemperBlotto have been removing and adding the full stop (I add, he removes): it's become a sort of joke to me now, although the subject has never been aired between us: I've seen hundreds of edits of this kind. Likewise, User:Embryomystic seems to take any opportunity to change "A fruit." to "a fruit", and has sometimes revised basic words to change dozens of sense lines to that style.


  1. Does it matter at all? Well yeah, I think consistency matters, in the same way that a newspaper has a "house style", or a software development company. Partly it gives us more of a brand and more consistency, rather than looking like some vague shit that was slapped together by randoms (ahem), and partly agreeing on this stuff means that we can stop wasting our time on it and work on the actual important things.
  2. What about definitions longer than one sentence? These are rare and can often be rephrased, but they do occur: something like a techy maths entry that says "A member of a subgroup X such that Y=Z. X may be either A or B." Writing these without caps would be a very weird violation of English style, regardless of our own conventions. So if we are going to support multi-sentence definitions, then for consistency we should probably make all definitions sentence-like.
  3. What about non-English stuff and translations? This is a biggie, since usually our English definitions are complex phrases ("the purple fringe of a king's cloak") whereas foreign defs are often one-words translations like just "apple". There is some argument to separate them though since, if you look at (say) fr.wikt, that does the same thing for its own language: words in the default lang (French) get long detailed definitions while others mainly just get quick translations. Basically these do not serve the same purpose: the primary language is trying to explain X in language X, in detail, whereas the secondary language is trying to give you the right word for conversation, phrasebook-style.

Who wants a vote on this? Actually that's premature. Opinions and thoughts would be welcomed.

Equinox 10:40, 30 August 2019 (UTC)

  • I think it is a matter of style. The OED does capitals and full stops, but it also puts hyphens in where they are not needed. Nobody's perfect. I don't go out of my way to change things but I remove full stops (if I remember) if I am adjusting the entry for any other reason. SemperBlotto (talk) 10:50, 30 August 2019 (UTC)
  • It is a matter of style but I'm proposing that we should have a house style rather than keep reverting each other. With computer technology we could, I suppose, present entries capitalised however viewers want them, but that would involve more markup and how many users would actually bother changing settings from the default? Equinox 11:21, 30 August 2019 (UTC)
  • IMHO as an extremely experienced user, it doesn't matter at all. For the second point, I'd like to mention that Simple English Wiktionary has complete sentences in all their entries - and that is a nice style. --Mélange a trois (talk) 11:11, 30 August 2019 (UTC)
  • I assume that we are talking only about English definitions. FL "definitions" typically have neither initial capitals, nor terminal periods. But they also are not definitions, but rather suggested single-term translations, often without a disambiguating reference to a particular definition of a polysemic English word.
I vastly prefer English definitions to be formatted as sentences.
I also find it is never impossible (though sometimes impolitic) to reduce a multi-sentence definition to a single-clause definition. I take multiple sentences as an indication of probable encyclopedic content. DCDuring (talk) 15:22, 30 August 2019 (UTC)
Definitions without periods are an extremely convenient way to find SemperBlotto's entries that have never been touched by anyone else. DTLHS (talk) 16:25, 30 August 2019 (UTC)
  • As a (possible?) counterpoint to DCDuring's description of foreign-language definitions, sometimes a one-word gloss is all that's needed -- say, for a simple concrete term like Japanese (inu, dog). However, for less-concrete terms, the concept of the term in one language may not have any direct analogue in English. I ran into that just yesterday with Japanese (nata), which is a general term for a kind of one-handed thick-bladed tool of any of a wide array of shapes and sizes. This could be called by various things in English depending on the specific type of nata, including such varied terms as hatchet, billhook, machete, froe, and probably a few others that didn't occur to me.
IFF we adopt the style of "full sentence format for all sense lines", we run into some issues:
  • Do we have to rewrite single-word glosses? For Japanese (inu, dog), how do we make a sentence out of the "dog" sense without making a mess of it?
  • Do we keep single-word and other very-short glosses as non-sentences, and only use sentence format for long senses, as at Japanese (nata)?
That might make some sense, but then we have inconsistent formatting even within a single language's entries.
IFF we don't adopt this style for non-English terms, and maintain the status quo, then we have inconsistent formatting between English and non-English entries. This is what we've had for years, so I don't anticipate any serious issues arising, other than the usual new-editor confusion. ‑‑ Eiríkr Útlendi │Tala við mig 17:30, 30 August 2019 (UTC)
We would write the "dog" sense as "Dog." But I oppose sentence style for FLs. Andrew Sheedy (talk) 17:09, 2 September 2019 (UTC)

These dots have been invented to separate sentences from each other. Since the glosses are kept separate by their formatting, the full stops are excess of information that diverts from the actual content. Think about the glosses about lists – should list entries ends with dots? No. They are a childhood disease of Wiktionary, arisen in a time when people had less experience how content on the internet is presented, thus English entries have them, as begun earlier, while foreign languages do not. – Another purpose of the dot is to signify that here the sentence ends, or depending on the language that this is no exclamation or question, this is why messages, like tweets, commonly end with it. But this dictionary glossing is not writing letters or texting. Just think: What do I want to signify with that mark? I don’t see anything: to signify “done” I save. Any entry should regularly be in a state of doneness, no dot needed to show off.
And the starting with a capital letter actually causes information loss since there is sometimes a distinction by capitalization in English, and the message gets lost not only because of diversion but also because the capital form of a word not lexicalized as bearing a capital letter is less iconic. For the apprehension of the dictionary content to be as fast as possible we should not capitalize or add dots mechanically, and the same for the sake of new editors, whom we want to spare arcane distinctions. Those talks about whether something “is a sentence” are intrinsically vain – the presentation should be utilitarian. Again the question: What do you want to signify with that, you who defends the dot and the capital letter? Fay Freak (talk) 23:23, 30 August 2019 (UTC)

Your preferences could be reconciled with standard practice in English entries by having a copy editor bring your work into conformity with the standard, such as it is. DCDuring (talk) 20:09, 2 September 2019 (UTC)
This isn't purely a question about "dots". And you might know when you finished typing and saved, but we don't, so the dot shows us that you really did mean to finish, and it wasn't an accidental save or a corrupted entry. Equinox 00:35, 2 September 2019 (UTC)
It might be in print so, but on the computer one writes a thing, then goes back to middle and adds something, and also edits do not only contain glosses. The glosses are not like programming statements that end with a semicolon. And even in programming there are languages the statements of which end with line breaks merely. Also it does not show anything 1. when one is constantly confused because in foreign languages one does not add dots 2. because one only adds the dot because it is elsewhere so, so it does not signify anything. Don’t tell me what it shows when I put the dot: It shows decidedly nothing: I do not understand it to have any meaning, hence it has none. And it cannot have any meaning. It can never acquire any meaning. I will never use it to signify anything. Fay Freak (talk) 19:46, 2 September 2019 (UTC)
I strongly support having some sort of consistency, especially sentence style for English (and maybe Translingual) and lowercase/no period for other languages. One of the things that drew me to begin editing Wiktionary to begin with was a desire to increase consistency in style, since inconsistency drives me nuts (I'm very detail-oriented and borderline OCD). I was disappointed to learn that I couldn't enforce any one style. Andrew Sheedy (talk) 17:09, 2 September 2019 (UTC)
I on the other hand oppose treating English differently from other languages. There is no reason to have one style for English and another style for all other languages. If the same definition were written in both an English and a German entry, it should be formatted the same way too. —Rua (mew) 19:54, 2 September 2019 (UTC)
But evidently most contributors to FL entries intend only to provide translation terms, not extensive definitions. Do you think we should replace the offending non-definition glosses with {{rfdef}}? DCDuring (talk) 20:04, 2 September 2019 (UTC)
Those translation-only intentions are cancerous. Apart from being shallow and inaccessible to verification for being underdefined, they are constantly duplicated and triplicated and have definitions spread over five lines what is only one meaning, based on some Anglo-centric version of reality according to which there are different meanings if the translations vary.
I can not exaggerate how oblivious people are in their contributions, and not but affirm that what many people do is just wrong. For a word that just means “zero, naught”, people “gloss” in five lines:
# cipher
# dot
# nought
# naught
I sometimes add {{rfdef}} or {{gloss-stub}}, with luck ping editors, because there is nothing but a polysemic “translation” that does not bewray even the rough context. For instance breastplate has four distinct meanings but our Georgian editor gave but “breastplate” for გულსაფარი (gulsapari), and it turned out to have the meaning of four I least expected.
There shouldn’t be any different conceptual approaches in working-language and foreign-language entries but the definition lines are to give an idea or ideas what the term means. If you have hitherto seen these kinds of lines as “translations” you have thought it wrong: they only look like translations often because often, English equivalents (“translations”) unambiguously enough convey the meaning the dictionary editor is ordained to give an idea about; but this is not to be elevated to a principle or essential opposition to the wises in which English entries are to be glossed. Fay Freak (talk) 23:18, 2 September 2019 (UTC)
The way I usually handle FL entries is "translation (explanation of translation, with language specific information included (i.e. not just taken straight from the English entry)). Most FL entries are not formatted like sentences (the vast majority aren't, in fact), so I oppose drastically changing the status quo. I don't think there's any intrinsic issue with it; it helps distinguish the fact that most FL entries are giving primarily translations, whereas English entries are primarily supplying definitions. Andrew Sheedy (talk) 01:35, 3 September 2019 (UTC)
  • Capitalize and punctuate just like any other complete sentence. bd2412 T 02:50, 3 September 2019 (UTC)
    Very few English definitions are complete sentences. They are typically phrases, sometimes with subordinate clauses. DCDuring (talk) 21:14, 6 September 2019 (UTC)