Open main menu

Wiktionary:Beer parlour

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019


Contents

January 2019

WARNING! The title you are using may be wrong.Edit

Very annoying and shouty. Do we still need it? If we do, it would be nice to have some option to stop showing it, or maybe disable it for non-anons. – Jberkel 13:47, 1 January 2019 (UTC)

Where is that message configured? DTLHS (talk) 16:11, 1 January 2019 (UTC)
Yes, is there any list of where all these messages are located? — SGconlaw (talk) 17:56, 1 January 2019 (UTC)
I did a search for mediawiki: insource:"the title you are using may be wrong" and found MediaWiki:Newarticletext. — Eru·tuon 19:13, 1 January 2019 (UTC)
To remove the "new article text" on any page, you can use CSS: .mw-newarticletext { display: none; }. But that will remove other messages besides the shouty case-insensitivity message. I could also add another class around the shouty message so people can selectively remove it. — Eru·tuon 05:50, 2 January 2019 (UTC)
It's still needed. The alternative is even more entries that need privileged access to remove. --RichardW57 (talk) 05:20, 2 January 2019 (UTC)
To make it less shouty, we could replace the text “WARNING! The title you are using may be wrong.” by the friendlier “Are you sure this is the right title?” Also, in the second next sentence, instead of “You probably want to edit the lowercase version of your word”, it is probably better not to presume the editor is “probably” mistaken; we may replace that by “You may want to edit the lowercase version of your term”.  --Lambiam 09:31, 2 January 2019 (UTC)
+1 to replacements proposed by Lambiam. No icons please. I can see the discussed text in Amadabacra, e.g., when I try to create it. --Dan Polansky (talk) 11:17, 5 January 2019 (UTC)
I can't seem to see this warning: I probably ad-blocked it if it was annoying. "Warning" seems good for Richard's reason above. Perhaps we could use a warning icon (yellow triangle etc.) instead of the word WARNING in caps? Equinox 20:07, 3 January 2019 (UTC)
There's already a warning triangle in the notification UI (above the notice), but it's blue instead of yellow and not very prominent. Changing the wording to be less aggressive/patronising as suggested above would be a first step to a friendlier interface. – Jberkel 21:07, 3 January 2019 (UTC)
This is not a popular opinion but I think that being aggressive is a good thing. We have certain rules and standards and we are not going to help ourselves by easing newbies into creating unusable entries. ("Patronising" is another matter...) Equinox 15:50, 6 January 2019 (UTC)
This is what I get in Amadabacra:
WARNING! The title you are using may be wrong.
Remember that Wiktionary is case sensitive. You probably want to edit the lowercase version of your word: amadabacra.
And this is what would be an improvement without reducing the warning effect much:
WARNING: Did you intend to edit amadabacra, starting in lowercase? Wiktionary is case sensitive.
--Dan Polansky (talk) 16:08, 6 January 2019 (UTC)

When do we include capitalized versions of wordsEdit

I came across Zumeendar and zumeendar, and it made me wonder what the policy is around including both letter-case versions of nouns. With older nouns it is often trivial to find alternative letter-case examples since it used to be quite fashionable to capitalize lots of words (perhaps to Impress your Friends), but that may be the equivalent of YELLING in modern internet communications. What is the right thing to do? - TheDaveRoss 16:47, 3 January 2019 (UTC)

  • I suppose that the normal rules apply - do they actually occur in print. Of course, not everybody agrees with that (I tried to add The some time ago, but it got deleted). SemperBlotto (talk) 16:52, 3 January 2019 (UTC)
Yeah, I'd disagree that the only criterion should be whether it is verifiable by our rules. As mentioned by Dave, in the past it used to be the norm to capitalize nouns, so I wouldn't be surprised if many common-or-garden nouns would be found to be verifiable in a capitalized form. Something more is needed – perhaps it is an eponym and so occurs in both capitalized and uncapitalized form. — SGconlaw (talk) 16:58, 3 January 2019 (UTC)
Isn't this the same word as zamindar? Basically any function that bestows some authority will also be found in capitalized form: ”the Protector”, “the Taxman”, “the Constable”. I think we need more than the occasional occurrence of a capitalized form before we decide it is an alternative spelling and not some instances where the custom of spelling honorifics and titles showing rank or prestige, usually only used when referring to a specific person, spilled over (possibly by ignorance of the customary rules) to a generic use.  --Lambiam 21:46, 3 January 2019 (UTC)
This is a bit of a grey area. My sense is that most of us agree that not just any capitalization should be included, since any common word is sometimes capitalized (hence Semper's The got deleted). I think that when a capitalized form does have a distinct sense,then it should be included: e.g. Native vs native (even though older works probably have instances of "Belonging to one by birth" capitalized as Native, and there are some uses of "Indian" written in lowercase as native), and likewise aboriginal/Aboriginal, black/Black, white/White, etc, where even if we host the definitions all in the lowercase entry, we have an {{altcaps}} soft redirect at the uppercase form. I think I'd rather exclude capitalizations of titles (unless they're almost always capitalized), but the current situation is haphazard; we have King, but not Secretary (it's blue only because it's a town), President but not Viceroy. I suppose you should RFD it. (I would also have deleted e.g. He, She, Me, Who, You, etc, but other people wanted to keep them.) - -sche (discuss) 22:30, 3 January 2019 (UTC)

Users Luxipa and BigDomEdit

Hello. I want to bring your attention to User:Luxipa, whose username clearly indicates that the primary function of that account is adding phonemic transcriptions of Luxembourgish words to Wiktionary. This is confirmed on his user page.

To me, this seems to be User:BigDom that could be using that account in a (somewhat) deceptive manner. This user has put hundreds if not thousands of incorrect Luxembourgish transcriptions on Wiktionary. Incorrect, because he put phonetic transcriptions such as */bəˈdʀekən/ ([ə] and [e] belong to one phoneme) to bedrécken (see [1]).

I corrected a few hundred of them (see the last ~600 edits of Mr KEBAB, my previous account), which then already got him a bit mad - notice that this edit itself contains an error, because the correct phonemic transcription of bedrécken contains either /e/ or /ə/ for all three vowels (depending on which symbol you choose to represent the phonological mid front vowel in phonemic transcription). But notice that on my user talk page, he was nothing but nice to me.

A year later, an anon appears and writes on my old user talk page (notice that was over 5 months after I switched accounts). The tone of that message is similar to the tone of the edit summary I've linked above. Similar style of interacting can be seen in this reply to my message (I also thought it unfortunate that you had already made so many edits when you did. Particularly in a language that you don't know the first thing about. I can offer your to simply revert all of our edits and go back to system we had.) Notice the last sentence: he never brings up BigDom's mistakes and blames the whole thing on me, offering to go back to the system we had - the system that had so many mistakes. Not only that, in the next sentence he writes Your system, as I said, is "halfbaked" and therefore not only unnecessary but confusing - but that's the thing - it's not my system. 99% of it is based on the JIPA article on Luxembourgish. In that reply, he also just dismisses the whole article we have about Luxembourgish phonology, most of which is based on said JIPA article.

Also, compare the latest 100 edits of Luxipa with those of Bigdom. There's also this anon who (presumably) is Luxipa as well.

I also want to bring your attention to the fact that many of BigDom's phonetic transcriptions of Icelandic words are also wrong - see e.g. brúsi - vowel length isn't phonemic in Icelandic (see [2]), and so the transcription should read either [ˈpruːsɪ] or /ˈprusɪ/, [ˈpruːsɪ] but not */ˈpruːsɪ/. Kbb2 (talk) 06:16, 4 January 2019 (UTC)

@Kbb2 Sorry to disappoint you, but I am not the new user you are saying I am, and I hope a CheckUser confirms that for you. I simply haven't been editing over the holidays as I have been travelling - I was only alerted to this as I got an email saying you had commented on my talk page (which I will reply to separately).
I admit that in the past I didn't react well in the edit summary, but as you've pointed out on talk pages I always endeavour to be nothing but polite and courteous. Having seen the way Luxipa was responding to you on his/her talk page, I am genuinely dismayed that you even considered that I would converse with you or any other user in that way.
Having looked at the edits of this new user, I too have some ideas about who it may be but without proof I don't wish to speculate on this public forum. In any case, I hope that these allegations won't stop us working together going forward - I realise we might not always agree in our methods but I do think we both want what's best for the dictionary. BigDom 07:19, 4 January 2019 (UTC)
For what it is worth, there isn't much evidence to suggest that BigDom and Luxipa are related from a checkuser perspective. Even so, it is not against policy to have multiple accounts, only to abuse them. None of that takes away from your concerns about whether or not there are patterns of incorrect edits being made, but at least that aspect of the discussion can be put to rest. - TheDaveRoss 14:55, 9 January 2019 (UTC)
@BigDom, TheDaveRoss I apologize for not replying sooner.
BigDom, this is fair enough. I also remember that I compared the hours you and Luxipa edited on Wiktionary and some of them overlap, meaning that it's unlikely that you two are the same person. I apologize if I made you feel unfairly accused of something you didn't do.
Still, I find it interesting that Luxipa stopped editing altogether pretty much immediately after I posted here.
TheDaveRoss, thanks for the clarification. Kbb2 (talk) 16:25, 6 February 2019 (UTC)

Phrases comprised of words with multiple meanings should be keptEdit

Since that's an argument regularly invoked in RFD, we might as well include it in the CFI, right? That will allow us to keep hollow victory of course, but also hollow vessel, hollow quest, hollow city; anything that can be hollow, in fact. Per utramque cavernam 18:15, 5 January 2019 (UTC)

And dear old brown leaf, which may indeed be a burned piece of paper as well as autumn foliage. Equinox 18:17, 5 January 2019 (UTC)
Ambiguity is part of the language, and there's no way we can make a dictionary ambiguity-proof without making it too wordy to use. Most words have multiple senses- wouldn't that make any random phrase containing those words includable, if it happens to be used often enough? Should we have an entry for "go to the bank" because it could refer to either a financial institution or the shore of a river? How about "go to the dry cleaner's to pick up a suit"? After all, "dry cleaner" could refer to a cleaner that's dry (not allowing alcohol?), "pick up" could refer to physically lifting something or to getting someone to go on a date, and "suit" could refer to a person in management, to cards or to a legal action. Think about the old joke where the Buddhist says to the hot dog vendor "make me one with everything", or just about any play on words- dictionary material? Chuck Entz (talk) 22:04, 5 January 2019 (UTC)
Sometimes people can't realise what a bad idea something is until you actually let them try it. I took Per's remark in that satirical vein. I might be wrong (in which case gawd help us all). Equinox 22:12, 5 January 2019 (UTC)
Yes, I'm being sarcastic. SemperBlotto isn't, though, so here we are. Per utramque cavernam 22:22, 5 January 2019 (UTC)
My interpretation was that PUC was stating the assertion positively for rhetorical purposes so we could discuss it and come to an explicit consensus. I knew he doesn't actually agree with it. Either way, I felt that the fact that people have been seriously using that argument meant that we should take the opportunity to point out its flaws. Chuck Entz (talk) 23:46, 5 January 2019 (UTC)
It would be my chance to sneak the German translator in.  --Lambiam 22:19, 5 January 2019 (UTC)
All entries of the form "[X] [Y]", where [X] and [Y] are entries (not the possible recursion), should be kept, but with the sole definition {{&lit|[X]|[Y]}} => Used other than with a figurative or idiomatic meaning: see X,‎ Y. to allow for all possible combinations of the polysemic Xs and Ys. It wouldn't be fair to compel the advocates of such entries to insert all the possible combinations, even just the attestable ones. To make implementation more gradual we should restrict this initially to combinations of single words. Further I would recommend automating the process and imposing some grammatical restrictions to eliminate automated entries like the of. Automating attestation would be a help. DCDuring (talk) 23:26, 5 January 2019 (UTC)
Wouldn't it be better if the software just made reasonable suggestions and we didn't create trillions of pages which provided no actual information? - TheDaveRoss 13:23, 7 January 2019 (UTC)
I think the real utility of a dictionary comes from collecting terms with meanings that are unexpected for its users. If we could define somehow which senses are rare, we should imo add amply attested compounds containing them, such as hollow victory, which might not be idiomatic, but is useful in a way the headlessness isn't. Crom daba (talk) 15:01, 6 January 2019 (UTC)
None of the lexicographers who control the references at OneLook seem to share your opinion. Perhaps the OED? DCDuring (talk) 22:02, 6 January 2019 (UTC)
Yeah, maybe no one does, thinking in expectations is not very common but I believe it is very useful. My suggestion might not be practically implementable or attractive for anything, but it describes well what I intuitively feel is a useful entry as opposed to clutter. Crom daba (talk) 03:17, 7 January 2019 (UTC)
These unexpected combinations belong to {{uxi}}, in this case in hollow, where it is already since yore. The SOP rule has to be understood as implying that an entry should not be created in cases when even though the meanings of the parts used need additional mental strain for recognition one rather expects a comparatively infrequent meaning than an idiomatic use of the whole, even if this expectation is slanted by the experience of dictionaries restricting themselves in their coverage of composed expressions, since why assume the fulfillment of the inclusion criteria by idiomaticity if even an averagely astute learner is expected to look up the parts. Fay Freak (talk) 03:48, 7 January 2019 (UTC)
Conversational understanding is mostly derived from context and metaphor. You don't have to "know" that hollow means "empty" (also metaphorical) to understand hollow victory. The dictionary mostly helps some learners move something from the category of understood to that of able to be used by confirming the meaning, which can be done by looking up all the terms, though usually it is clear which term has the less certain meaning. DCDuring (talk) 14:21, 7 January 2019 (UTC)

@SemperBlotto: I see you making that argument again, so could you please spell out the logic? Are you okay with having an entry for empty glass (9 senses x 10 senses)? Per utramque cavernam 18:32, 6 February 2019 (UTC)

Imagine the fun of citing 90 possible definitions of the NP entry. DCDuring (talk) 23:39, 6 February 2019 (UTC)
I suppose RfVing each and every sense could be fun. DCDuring (talk) 23:40, 6 February 2019 (UTC)
@DCDuring, don't test Kiwima s/he will absolutely do it. - TheDaveRoss 16:07, 7 February 2019 (UTC)
I was counting on a desire not to squander her? efforts. There are much better things for us to cite. The more terms and definitions we have that other dictionaries don't, the more effort we need to put into citations for those terms and definitions. DCDuring (talk) 16:15, 7 February 2019 (UTC)
Thanks, @DCDuring: - I think the effort of citing every sense would drive me away! It's like when someone in 2018 added every cliché they could think of to the requests for definition, I stopped supplying requested definitions. Kiwima (talk) 19:23, 7 February 2019 (UTC)
On a more serious note, I think the point behind this suggestion is that phrases which involve rare or surprising secondary meanings of words seem worth keeping. If the meaning is obvious from context, then there isn't really a point to it. Kiwima (talk) 19:27, 7 February 2019 (UTC)

PageNotice extensionEdit

I've finally posted on the Phabricator ticket titled "Review the PageNotice extension for deployment" that we would find it useful because it would allow {{reconstruction}} to be automatically transcluded at the top of pages in the Reconstruction namespace.

See also the previous discussions at Wiktionary:Grease pit/2018/September § %7B%7Breconstruction%7D%7D, Wiktionary:Beer parlour/2017/September § Proposal: install mw:Extension:PageNotice, and at Wiktionary:Grease pit/2017/June § Citations at citations. — Eru·tuon 00:07, 6 January 2019 (UTC)

@Erutuon I'm thinking it might be better to open a new ticket. --{{victar|talk}} 01:51, 31 January 2019 (UTC)
@Victar: Why? If the ticket is on the same topic, they'll just merge or close it. — Eru·tuon 03:34, 31 January 2019 (UTC)
@Erutuon: It might be for the same extension, but a completely different purpose. More to the point though, that ticket has too much baggage and is just going to be ignored. --{{victar|talk}} 05:55, 31 January 2019 (UTC)

Thesaurus:sexually frustratedEdit

The entry for sexually frustrated was deleted as SOP per consensus at RfD. That leaves behind Thesaurus:sexually frustrated, which could get the same treatment, or could be moved to one of its provided synonyms. bd2412 T 21:24, 6 January 2019 (UTC)

The names of Thesaurus entries don't have to be valid dictionary terms. They're supposed to be descriptive and unambiguous, which often means SOP. Chuck Entz (talk) 21:56, 6 January 2019 (UTC)
I have adjusted the thesaurus header to eliminate the red link. Further tweaking may be needed. Cheers! bd2412 T 05:13, 7 January 2019 (UTC)

Competition finishedEdit

So, the Christmas competition has finished. It was a resounding success with one and a half entries. Apparently, now the winner is to be decided democratically. I'm expecting a massive turnout for voters too. --Wonderfool Dec 2018 (talk) 10:58, 8 January 2019 (UTC)

Straw polls on criteria for including chemical formulasEdit

Previous discussions: Talk:AsH₃, Talk:CO₂, Talk:LiBr, WT:RFDN#SiGe (will become Talk:SiGe)

To gauge what criteria for including or excluding chemical formulas/formulae might have consensus, probably as a precursor to a vote, let's straw poll some possibilities. This also allows for problems with proposals to be pointed out.
For example, some people previously suggested including only formulas which would be read by letter, like "aitch two oh", but AFAICT all formulas can be read as letters and unfamiliar ones are necessarily read as letters. Other people proposed including only formulas that have unformulaic common names, but e.g. AlF₆Na₃ would meet that criterion as cryolite while CO₂ would fail as carbon dioxide, which seems opposite to what most people would expect. (As a result, I didn't list those ideas below.)
- -sche (discuss) 02:27, 9 January 2019 (UTC)

Include all attested chemical formulasEdit

Please indicate if you support or oppose including all chemical formulas, such as BaCO₃, H₂O, Al(NO₃)₃, HArF, and CH₃(CH₂)₂₄-COOH, if they are attested.

  •   Support - treat them just like any other "word". SemperBlotto (talk) 07:34, 9 January 2019 (UTC)
  •   Oppose Per utramque cavernam 10:43, 9 January 2019 (UTC)
  •   Oppose - TheDaveRoss 14:31, 9 January 2019 (UTC)
  •   Oppose  --Lambiam 16:08, 9 January 2019 (UTC)
  •   Oppose Equinox 16:12, 9 January 2019 (UTC)
  •   OpposeRua (mew) 16:41, 9 January 2019 (UTC)
  •   Support if it is pronounced as a noun in a sentence. There are many attested sentences where CO₂ is pronounced as cee-o-two and functions as a noun. If CH3(CH₂)₂₄-COOH appears only in formulae, in tables, or in lists, it is just a symbol and we cannot be sure whether it is a part of a natural language. — TAKASUGI Shinji (talk) 12:31, 12 January 2019 (UTC)
  •   Abstain. I don't really know; probably exclude some chemical formulas. The opening of this poll does not provide any relevant facts, or links to where to find them, such as a rough estimate of the number of attested chemical formulas. --Dan Polansky (talk) 15:38, 13 January 2019 (UTC)

Exclude all chemical formulasEdit

Alternatively, indicate if you support or oppose excluding all chemical formulas. Indicate if you would prefer to exclude them all without exception, or just exclude them by default but with the possibility for individual formulas (such as perhaps H₂O, which passes LEMMING) to be included on a case-by-case basis via consensus (presumably at WT:RFD, which is where requests for un-deletion are normally handled, and where consensus has occasionally been reached to keep other unidiomatic, non-translation-hub entries).

  •   Oppose Some formulas intrude in material intended for broader than technical audiences, such as consumer protection, worker safety, and environmental literature. I don't see why we would limit STEM content. DCDuring (talk) 16:10, 9 January 2019 (UTC)
  •   Oppose. I think there's obvious value in including formulas like H₂O and CO₂, which have a lot of currency, and I see no reason not to include attestable formulas that are used outside of chemistry-related subjects. In scientific contexts, we can probably expect readers to understand them and not need to look them up, but in non-scientific contexts, many people probably wouldn't know what they mean. Andrew Sheedy (talk) 22:47, 10 January 2019 (UTC)
  •   Oppose. Some are so basic that excluding them would put us on the wrong side of being a dictionary. bd2412 T 14:13, 11 January 2019 (UTC)

Without exceptionEdit

  •   Support excluding them all without exception. DTLHS (talk) 02:29, 9 January 2019 (UTC)
  •   Support excluding all without exception. There are formats better suited as chemical formula databases than Wiktionary for ways of display and interaction. That is to say it seems to me that Mediawiki is an inefficient software for them: but even if one want’s them on Mediawiki and with Wikimedia, the user is still better off if they are contained on Wikipedia or other projects. Fay Freak (talk) 07:41, 9 January 2019 (UTC)
  •   Oppose excluding all chemical formulas without exception. --Dan Polansky (talk) 15:21, 13 January 2019 (UTC)
  • I tend to   Support this. Per utramque cavernam 18:08, 19 January 2019 (UTC)
  •   Oppose DCDuring (talk) 21:08, 7 February 2019 (UTC)

By default, with exceptionsEdit

  •   Support - with the ability to override with compelling justification (e.g. lemming or common usage outside of scientific works). - TheDaveRoss 14:36, 9 January 2019 (UTC)
  •   Support - exclude by default; I agree with TheDaveRoss above about possible exceptions (which should be very rare). Equinox 16:37, 9 January 2019 (UTC)
  •   SupportRua (mew) 16:42, 9 January 2019 (UTC)
    • @Rua I've split the section in two. Is your vote still at the right place, or should it be moved above? Per utramque cavernam 21:33, 10 January 2019 (UTC)
      • I'm not a fan of exceptions, but things like "CO2" are so widespread and part of normal vocabulary, it would be a disservice not to include them. —Rua (mew) 22:59, 10 January 2019 (UTC)
  •   Support, would allow numerous previous-raised exceptions for inclusion (attestation in a non-scientific text; chemical with a trivial name in use; element and non-IUPAC constituent abbreviations such as Me). — As an aside though, we already have WikiSpecies, maybe WikiChemicals should exist as well? --Tropylium (talk) 18:47, 9 January 2019 (UTC)
  •   Support – exclude by default but allow exceptions for terms used freely in, e.g., MSM news reports (such as “CO2”).  --Lambiam 20:31, 10 January 2019 (UTC)
    • This is just moving the goalposts. What exceptions? What does "used freely" mean? This section seems useless since it includes "the possibility for individual formulas". It seems this is what most people support but we probably need to be more granular. DTLHS (talk) 20:48, 10 January 2019 (UTC)
Arguing here whether there shall be exceptions allowed “on a case-by-case basis via consensus” is nonsensical because the option always remains and posterior consent switch cannot be excluded by consent (a contradiction like “consensual non-consent”, “voluntary slavery” etc.), or it would not be allowed by rules we cannot decide. So, as long as I see an outline of an exception I desire exclusion without exception since I do not know any exception. Indeed “freely” does not mean anything so far and won’t probably. You can only argue for certain exceptions, not if there shall be exceptions or for exceptions of indeterminable meaning. Fay Freak (talk) 21:22, 10 January 2019 (UTC)
Indeed, I must say I don't see much difference with the "Include only formulas that are attested in non-scientific contexts" option below. Per utramque cavernam 21:33, 10 January 2019 (UTC)
I assume that if we adopt the rule exclude but, the exceptions will be stated as part of the rule, like we have done for WT:BRAND and WT:FICTION. My current preferred rule for exceptions is stated below at #Include only formulas that are attested in non-scientific contexts – which should not be a surprise, considering that this is essentially the rule I have proposed myself. I can imagine, though, that we can live with other versions. By “used freely”, I meant, “used without further explanation”. If a news report mentions that vats were labelled with C3H8NO5P, but then goes on to explain that this is the chemical formula of glyphosate, it would not count as free use. This is similar to what we have at WT:BRAND. The commonality is whether the author assumes the reader is familiar with the term.  --Lambiam 22:57, 10 January 2019 (UTC)
  •   Support – because (as Rua says) there are definitely some formulas like CO₂ and H₂O that are so commonly used that it would be bizarre to exclude them. — Eru·tuon 23:20, 10 January 2019 (UTC)
  •   Support for contextual uses aimed at non-experts. bd2412 T 14:14, 11 January 2019 (UTC)
  •   Support: Only formulas used in layman's speech, ex. CO2, H2O, etc. --{{victar|talk}} 22:02, 11 January 2019 (UTC)
  •   Oppose I support inclusion of some chemical formulas and exclusion of others, but I do not support any defaulting as proposed. That is to say, I do not think inclusion of a chemical formula should pass the bar of 2/3 majority absent agreed-on criteria. --Dan Polansky (talk) 15:34, 13 January 2019 (UTC)
  •   Oppose defaults at this time. DCDuring (talk) 21:09, 7 February 2019 (UTC)

Exclude formulas with more than a certain number of symbols (how many?)Edit

Please indicate if there is a cutoff beyond which you think formulas should be excluded; for example, if you would exclude any formulas with more than eight element-symbols (like CH₃CH₂OCH₂CH₃). Perhaps we will be able to agree (and then vote on) an upper bound.

Exclude formulas with parenthesesEdit

Please indicate if you would support or oppose excluding chemical formulas which have parentheses in them, like Al(NO₃)₃. A rationale is that these are more clearly formulas of which the component parts should be looked up separately.

  •   Oppose What's so especially excludable about parentheses? DCDuring (talk) 21:11, 7 February 2019 (UTC)

Include only formulas that are attested in non-scientific contextsEdit

For example, a scientific paper or popular-science magazine article on the synthesis of carbon compounds would not attest CO₂, but a murder mystery saying "the air in the scuba tank had been replaced with CO2" could.

Comment: deciding whether some works are "scientific" or not will be a bit fuzzy, but we have other fuzzy policies, most notably deciding whether or not something is WT:SOP (and to some extent WT:BRAND, in deciding exactly how much can be said about a product, e.g. that someone drank it, before the product counts as having been "identified" within the text). - -sche (discuss) 02:39, 9 January 2019 (UTC)
I think that we should exclude usages in textbooks (where such formulae will be used more than normal). SemperBlotto (talk) 07:37, 9 January 2019 (UTC)
  •   Support, we can niggle over the details of what counts as we go, but I like the spirit of this option. If the formula is so common that it is being used without explanation in fiction or general news stories then it makes sense to define it. - TheDaveRoss 14:35, 9 January 2019 (UTC)
  •   Support. This is similar to other exceptions to general exclusion rules, like for brand names or entities from fictional universes.  --Lambiam 16:11, 9 January 2019 (UTC)
  •   Oppose We should certainly have formulas that are attested in popular science books (eg, Napoleon's Buttons) and journals (eg, Scientific American, Popular Science). DCDuring (talk) 16:34, 9 January 2019 (UTC)
  •   Support. Andrew Sheedy (talk) 06:55, 10 January 2019 (UTC)
    To elaborate, I'll repeat what I said above: "In scientific contexts, we can probably expect readers to understand them and not need to look them up, but in non-scientific contexts, many people probably wouldn't know what they mean." Andrew Sheedy (talk) 22:47, 10 January 2019 (UTC)
  •   Support. A formula worth including would likely be one that would occur outside of a technical context. bd2412 T 21:18, 10 January 2019 (UTC)
  •   Oppose We shouldn’t introduce inclusion criteria by literary genres. Even more so then we shouldn’t include by a scientificity criterion which is an epistemic and not a formal criterion and of dubious identification, this being aggravated by teleological interpretation giving the concept of science another twist and thus adding even more confusion. Fay Freak (talk) 21:36, 10 January 2019 (UTC)
    We allow names from fictional universes, but do not accept the fiction in which they occur for attesting citations. Instead, we require citations that are independent of reference to that universe. This is not meant to discriminate against fiction as a literary genre (although it does). It does ensure that the author of the citation assumes that the term in question has entered the lexicon. The exception proposed here serves the same purpose.  --Lambiam 11:40, 12 January 2019 (UTC)
  •   Oppose. Any time a chemical formula is used there's a scientific context. DTLHS (talk) 16:21, 11 January 2019 (UTC)
    If someone says "Drink lots of H20", that's not a scientific context, but it is a chemical formula. Andrew Sheedy (talk) 17:12, 11 January 2019 (UTC)
  • Weak   Support: Looks not too bad. I have posted other candidate criteria to "General discussion" section below. I think this discussion would have better started with exploration of candidate criteria. --Dan Polansky (talk) 15:30, 13 January 2019 (UTC)

Soft-redirect (Template:no entry) any excluded formulas to WikipediaEdit

Rationale: this way, for any formula which we exclude, people can still type the formula into the search bar and find content.

  •   Oppose, this should be handled by the software automatically if it is to be handled at all, otherwise we could be creating a very large number of "soft redirect" entries with no content. DTLHS (talk) 18:27, 9 January 2019 (UTC)
  •   Oppose Per utramque cavernam 18:57, 9 January 2019 (UTC)
  •   Oppose DCDuring (talk) 21:12, 7 February 2019 (UTC)

While we're on this topic: should we lemmatize regular or subscript numbers?Edit

For any chemical formula with numbers that we do include, please indicate if you'd rather lemmatize the form with regular numbers (H2O) or the form with special Unicode subscript numbers (H₂O). (We can create hard or soft redirects from the other form.)

I think lemmatizing the forms with subscript numbers and creating hard redirects from the other form (unless it's citable, in which case it should be a soft redirect). Andrew Sheedy (talk) 04:36, 9 January 2019 (UTC)
As has been enough noted on other occasions, citations or usage are a bad guide in finer Unicode matters. In this case it is easily an editorial decision to have entries only in one form and always hard-redirect to the other. Even if chemical formulae are included – hopefully not –, then I doubt anyone wants to pursue attesting such typographic details. Then we would also want to display structural formulae in quotation templates and many other nasty things just to quote materials/books as they display content. Fay Freak (talk) 07:41, 9 January 2019 (UTC)
That reminds me: Wikipedia seems to mostly use regular numbers with <sub> tags, and it might often be impossible to tell whether a book was typeset using a mechanism like that, or using Unicode's special subscript numbers. That said, using Unicode subscript numbers to represent subscript numbers in books would seem(?) to be technically valid/sound, unlike using ʳ in Mʳ, so it's just a question of whether we want to do it or not. - -sche (discuss) 11:14, 9 January 2019 (UTC)
I would prefer regular numbers (I believe we can change the actual displayed headword with some kind of template). Wiktionary supports all kinds of formatting (bold, superscript, etc.) so we can rely on those capabilities and not on the rather hacky variant and legacy forms that Unicode is full of. Equinox 16:14, 9 January 2019 (UTC)
It raises the question, though, are there any chemical formulae that differ only in whether a number is subscripted? Equinox 16:14, 9 January 2019 (UTC)
Numbers occurring in chemical formulas are always subscripted or superscripted, but superscripts are used for specific purposes. For example, the formula for the phosphate ion containing radioactive phosphorus-32 is [32PO4]3−. Formulas with superscripts are unlikely to pass muster for lexical purposes. Disregarding superscripts, moving subscripts to the baseline is a lossless transformation. If superscripts have to be taken into consideration, regularizing all to the baseline might create an ambiguity, although it will be difficult to construct an example, and almost certainly impossible to find a realistic example.  --Lambiam 16:51, 9 January 2019 (UTC)

Exclude all attested chemical formulas except for H2O and CO2Edit

Rationale: this is what people keep giving as examples of things that we "should" include. DTLHS (talk) 17:15, 11 January 2019 (UTC)

Ha. But a good policy incorporates reasons for what it is doing, and isn't just a sort of black box. Equinox 17:20, 11 January 2019 (UTC)
What about excluding chemical formulae that aren’t also attestable from poetry (including raps)? There could be also the exclusion ground “this poem has mainly been invented to promote chemistry”, for cases like rapping professors, but else it sets a natural limit to used formulae by meter and consonance limits. Fay Freak (talk) 20:10, 11 January 2019 (UTC)
Blackalicious would like a word with you. They use Ca(OH)2 and NO2 at the very least. - TheDaveRoss 20:21, 11 January 2019 (UTC)
Not bad. Though myself I, more radically inclined, am against these as SOP my suggestion seems practicable, like I wouldn’t care to add them but it also cuts the sharp edges. Fay Freak (talk) 20:35, 11 January 2019 (UTC)

Exclude chemical formulae except those (attested in running text) that people may reasonably mistake for acronyms or for other non-chemical-formula wordsEdit

This would allow KCN if attested in running text but not H₂O. Rationale: It's reasonable to look up KCN in a dictionary if it's found in running text. It's unreasonable to look up in a dictionary something that's obviously a chemical formula.​—msh210 (talk) 21:26, 14 January 2019 (UTC)

  •   Support.​—msh210 (talk) 21:26, 14 January 2019 (UTC)
  •   Support with reservation. I like this one as an inclusion criterion, not an exclusion criterion. Thus, a chemical formula would be excluded unless it has one of multiple redeeming qualities, and the proposed criterion would be one of those redeeming qualities. --Dan Polansky (talk) 17:59, 19 January 2019 (UTC)
  • Tentative   Support. An interesting option. Per utramque cavernam 18:07, 19 January 2019 (UTC)
  •   Oppose It seems silly to treat chemical formulae differently from, say, railway wheel-configuration terms (2-6-0, 4-6-6-4, 1A-A1) DCDuring (talk) 00:08, 8 February 2019 (UTC)

General discussionEdit

Make comments here, or add additional proposals above this section. :) - -sche (discuss) 02:27, 9 January 2019 (UTC)

  • I note that most relatively popular works that have chemical formulas have them in or with structure diagrams. I believe we should favor entries for attestable formulas for which we can provide a graphical ostensive definition and for which there is a name found in running text, however technical the source. I suppose this could be considered a "value-added" criterion. If we can add sufficient value to a potential entry, it should become an actual entry. DCDuring (talk) 16:45, 9 January 2019 (UTC)
    Isn’t this more of an encyclopedic than a lexicographic task? A name found in running text, would that include “DOTA-E{E[c(RGDfK)]2}2”, as in the sentence “The structural formula of DOTA-E{E[c(RGDfK)]2}2 is shown in Fig. 1c.”?  --Lambiam 22:32, 9 January 2019 (UTC)
    It's a matter of providing useful definitions. Definitions need to break out of the cycle of words to establish contact with the physical world from time to time. That's why we have pictures and diagrams in entries and should have more. If we are going to have some chemical formulas or even tedious chemical names, we might want to make sure that we are adding value by having them. Image availability is a consideration. It is also a form of attestation. DCDuring (talk) 23:35, 9 January 2019 (UTC)
  • I hate to throw in a new option while there are so many votes already but this is a perfect use of the Appendix: space. We shouldn't delete valid information--we should store it appropriately. —Justin (koavf)TCM 18:50, 9 January 2019 (UTC)
As you say, we should store it appropriately. Even though trigonometric formulas are valid information, we don't host them here, even in the appendix. Per utramque cavernam 18:54, 9 January 2019 (UTC)
A formula isn't a word, term, or name. Dictionaries record the latter but not the former. —Justin (koavf)TCM 19:28, 9 January 2019 (UTC)
H2SO4 (formula) is to O (oxygen) as 6+3=9 (equation) is to 3 (digit) or + (operator). We can cover the components but IMO should not attempt to include the virtually limitless "sentences" spelled out with them. Equinox 19:30, 9 January 2019 (UTC)
Correct, also imho H₂O, CO₂, NaCl should be deleted because of being SOP. (Do you think I joke? Why?) This would also solve the there being both Translingual and English entries. Remarkably enough for the constituent parts there aren’t English entries.
I wanted to pun about “sum formulae” but I see that they are called so in languages other than English and English calls them molecular formula. But look at German Wikipedia “Summenformel” which has a nice table of projections. Should we include all these projections? Sure we can’t include the structural projections because of technical reasons, but the linear projections aren’t of different nature. Wiktionary does not mean “include everything that is linear”. If Wiktionary were to grow to include chemical formulae in a remarkable extent I am surprised if there isn’t any policy by which some Wikimedia bureaucrat is obliged to bust up this project because of this luxury. In Germany it would be 1) for Wikimedia trustees embezzlement by omission to tolerate Wiktionary adding chemical formulae 2) lead to liability for additional expenses caused by these measures perhaps for anyone who voted for a policy allowing it. Fay Freak (talk) 20:45, 10 January 2019 (UTC)
"H20" means or refers to or is equivalent to "water" or "dihydrogen monoxide". "3 + (5/8)" doesn't stand for anything. Also, there can be an infinite number of mathematical statements but not an infinite amount of chemical compounds, so it would be inherently foolish to start making pages like Appendix:15+8+87+9. —Justin (koavf)TCM 19:39, 9 January 2019 (UTC)
There is in fact literally an infinite amount of potential chemical compounds. DTLHS (talk) 22:28, 9 January 2019 (UTC)
I may have to defer to your expertise here (I don't see how that's possible with ~118 elements, many of which are ephemeral) but even so, there is no H48580, H48590,H48600... but there is 1+1, 1+2, 1+3, ... —Justin (koavf)TCM 22:33, 9 January 2019 (UTC)
There is CH4, C2H6, C3H8, C4H10, C5H12, C6H14, C7H16, C8H18, C9H20, C10H22, and so on, the simplest representants of which are the linear alkanes. If the universe is infinite, there may even be an actual infinity of extant chemical compounds.  --Lambiam 22:45, 9 January 2019 (UTC)
Good to know. But the other good thing is that the citable chemicals are finite and by definition documented. So we don't need to guess if somewhere there is sililcone-based life that makes 5H785L somehow, whereas I can "document" all kinds of new and perfectly valid mathematical statements all the time that have never existed before (e.g. "−194850329328230932238239238*(1893349834710138103823/10935038430583503498)". I think the difference is frankly obvious and germane. —Justin (koavf)TCM 23:09, 9 January 2019 (UTC)
I think this discussion should have better started with exploration of putative criteria. In another discussion, I mentioned the following ones:
1) Keep a chemical formula only if it involves no more than 3 chemical elements and no more than 10 atoms.
2) Keep a chemical formula only if the chemical it denotes has a CFI-meeting name: e.g. H₂SO₄ has sulfuric acid or AsH₃ has arsine. This criterion ensures that the inclusion of chemical formulas no more than doubles the number of items in the dictionary.
I think especially 2) is worth considering. --Dan Polansky (talk) 15:28, 13 January 2019 (UTC)

Rename non-lemma categories to match the format approved in voteEdit

This vote changed the naming scheme of categories for comparatives and superlatives, but in a way that does not match any existing non-lemma categories. The standard name for such categories was "POS xxx forms" (such as Category:English noun plural forms, Category:Northern Sami noun possessive forms, Category:Armenian verb passive forms, Category:English verb simple past forms, Category:Arabic adjective plural forms, Category:Bulgarian adjective feminine forms and so on). Meanwhile, the naming scheme for subcategorised lemmas was "xxx POSs" (such as Category:Dutch diminutive nouns, Category:English uncountable nouns, Category:German reflexive verbs, Category:Armenian diminutive adjectives etc.). In some cases, an entirely different POS term is used for non-lemmas, such as "participles" and "infinitives"; these are implicitly non-lemmas by virtue of the part of speech, i.e. a participle and an infinitive are always a non-lemma and cannot be a lemma.

Now that the vote has passed, however, there are two categories which do not fit this naming scheme anymore. Category:English comparative adjectives has the name of a lemma subcategorisation, suggesting that a comparative adjective is a kind of adjective lemma like "diminutive noun" is a kind of noun lemma, but it is now categorised as a non-lemma. The same for Category:English superlative adjectives. Since this suggests that there are no longer separate naming schemes for lemma and non-lemma categories, I propose to realign all other existing non-lemma categories with these two new names. Thus:

This will make the naming consistent with the vote. The only categories that will retain the word "forms" are the base-level categories without a qualifier, e.g. Category:English noun forms and Category:English adjective forms. —Rua (mew) 13:15, 11 January 2019 (UTC)

I'm going to link a thread on RFM and another one on RFC to see the context. For the record, I do not support this change, as I consider comparatives and superlatives to be similar to participles yet different enough from other forms for them to be exactly considered comparable. — surjection?〉 13:21, 11 January 2019 (UTC)
The vote aligned "comparative adjectives" and "superlative adjectives" with "participles", by not including the word "form" anymore. This proposal aligns all the other categories with "participles" as well, by not including the word "forms" anymore. —Rua (mew) 13:30, 11 January 2019 (UTC)

Proposal: Japanese Classical (文語体) conjugation/inflection table for Japanese entriesEdit

This is how it is supposed to look like:

{{#invoke:User:Huhu9001/000|japanese_classical_conjugation|kanji=過|stem=す|ctype=2u-g}} {{#invoke:User:Huhu9001/000|japanese_classical_conjugation|kanji=得|ctype=2d-a|suffix_in_kanji=}} {{#invoke:User:Huhu9001/000|japanese_classical_conjugation|lemma=ぬ|kana_adv=ず<br>ん|kana_ter=ぬ<br>ん|kana_adn=ぬ<br>ん|kana_rea=ね}}

-- Huhu9001 (talk) 15:02, 11 January 2019 (UTC)

I like this, thank you. One concern: our readership consists of English-language readers, so listing conjugational info only in Japanese, such as ガ行上二段活用, seems inappropriate. Even more so when that potentially-illegible string isn't even included on the Appendix:Japanese_verbs page, leaving users unable to search easily. What about something like, g- stem, upper bigrade? Appendix:Japanese_verbs would also need updating to describe the situation for classical verbs. ‑‑ Eiríkr Útlendi │Tala við mig 21:25, 11 January 2019 (UTC)
To Eiríkr Útlendi: Appendix:Japanese_verbs#Classical_Japanese -- Huhu9001 (talk) 04:57, 12 January 2019 (UTC)
To Eiríkr Útlendi: How about 上二段活用 or ガ(ga)行上二段活用? -- Huhu9001 (talk) 04:59, 12 January 2019 (UTC)

It's done. {{ja-conj-bungo}} -- Huhu9001 (talk) 13:51, 13 January 2019 (UTC)

  • There are still usability issues here, presenting avoidable barriers to our English-reading users. I feel somewhat strongly that we cannot provide the conjugational type only in Japanese.
Although the Appendix:Japanese_verbs#Classical_Japanese section does describe classical conjugations, as previously noted, the strings ガ行上二段活用 and even 上二段活用 are nowhere to be found. Linking to is unfortunately of no apparent utility for explaining ガ行 in this context. While linking through to the JA entry for 上二段活用 is slightly better than plain text, it still hides the English rendering from the user, forcing them to click through. As currently (2019-01-14) implemented at {{ja-conj-bungo}}, ガ行上二段活用 links through to 上二段活用, leaving the ガ行 portion unexplained.
Could we not present this information in English instead? ‑‑ Eiríkr Útlendi │Tala við mig 20:16, 14 January 2019 (UTC)
To Eiríkr Útlendi: I suggest you give a list of the inflection names you want to apply to this template. -- Huhu9001 (talk) 13:31, 15 January 2019 (UTC)
To POKéTalker: Fixed. -- Huhu9001 (talk) 08:52, 20 January 2019 (UTC)
  • Add categories for classical conjugations? If so, the category names should be determined. -- Huhu9001 (talk) 07:43, 14 February 2019 (UTC)

Unnoticed request for unblockEdit

Hello! I write here because I haven't found a local equivalent of Wikipedia:Arbitration Committee. If it is my mistake, please direct me to the right place.

Today my semi-static IP has been unblocked by timer after 1 month of block. Soon after I have found that I am blocked, on 23 December, I have written an unblock request, but until now nobody commented it. Like a confirmation that the block was unjust and should be removed, or, contrary, that it was a proper punishment for my deeds and should remain as is. No any comment, no any action. Is the indifference to such request normal here? --109.252.109.37 17:41, 13 January 2019 (UTC)

Can we get a Russian speaker to look into this please? Equinox 18:06, 13 January 2019 (UTC)
@Equinox What was written on Atitarev's talk page? It's hard to judge without that. But the block seems harsh. Per utramque cavernam 18:20, 13 January 2019 (UTC)
@Atitarev added a Russian translation to kick-ass a few years ago. 109.252.109.37 lately disagreed with the translation and he/she had the usual options of changing the translation, adding another translation, and/or leaving a note on Anatoli's talk page explaining his/her POV about the translation (using civil, nonconfrontational language). 109.252.109.37 chose not add a different translation, but instead left this offensive message on Anatoli's talk page:
Your translation. Were you able to add the non-obscene term instead, which exists at least in the Russian Wiktionary? For example: наглый, задиристый, крутой etc. Or the Russian obscene lexicon is your primary dialect?
Anatoli initially ignored the attack and deleted it. 109.252.109.37, refusing to be ignored, came back with this comment:
That rollback is not an error, it's your moral position. Feel free to revert this post as well. Good luck in translations. --109.252.109.37 11:49, 13 December 2018 (UTC)
I would have blocked 109.252.109.37 as well. There is no excuse for this unprovoked attack. In my opinion, the block was appropriate. —Stephen (Talk) 20:32, 13 January 2019 (UTC)
I don't think that's particularly confrontational; it might just represent annoyance. Equinox 20:51, 13 January 2019 (UTC)
Anatoli's Russian translation was correct. 109.252.109.37 disagreed with the register, but there is no Russian term that means the same thing and also matches the register the of the English word. Accusing Anatoli of only being able to speak in obscenities was a deliberate and disingenuous insult. There was no reason to be annoyed. If the Anon disagreed with the translation or register, he or she could have suggested another translation. Because if his or her aggressive and combative comment, he deserved to be blocked. —Stephen (Talk) 21:03, 13 January 2019 (UTC)
Banning instead of stating the main point that Atitarev wasn’t obliged to add any translations, and indeed an obscene word is a possible translation and particulary if the translated English term contains a mildly vulgar word, and that morality claimed by the IP does not make sense since adding a vulgar translation is better than adding none? Better teach the IPs what they miss instead of blocking them. Having strange morality is not a ban reason. Worsening the dictionary is, but it does not seem to happen if an IP asks for better translations, be it with strange arguments and an insolent rhetorical question or be it without. Assume good faith. Some people are just bad at being flattering. That “insult” is a conjecture. Why do you think he wanted to insult? What is “deliberate”? Anything written here is deliberate since we try to think before posting, but it does not get good completely anyway even if we try. I don’t see the “accusation”, it is a rhetorical question the answer of which was implied as “no”, plus even if it wasn’t an insult an insult is not a ban reason: We have defined insult as being insensitive, but people just are insensitive. Maybe he hasn’t learned well to be sensitive but he still is concerned about a good dictionary and actually moved towards this goal, so what? No bannable “annoyance” (which could harm the lexicon by siphoning off attention) is there if simultaneously honest questions are asked (apparently he wasn’t so smart to see that his argument was faulty and thus asked honestly), since no plan suitable for harming is there. People let themselves be insulted too much by concluding insults: Really, if the IP was there to insult he could have written the insult and it would have been an insult, otherwise it was just maladroit. Else everything that is written anywhere is annoyance, there isn’t anyone around here that doesn’t annoy me, myself included. One could wish people wouldn’t post on talk pages and make best edits without them but somehow people need to talk on talk pages with rare avail. If we banned everyone who said objectively wrong or immoral or useless things … it’s about prognosis guys. Mojshahmiri (talkcontribs) has been banned for promising to add crank theories (his prognosis has been that it is not worth to groom him), this IP would have done what without a ban? Would it have learned to be sensitive? Man, Wiktionary is a minefield if sensitivities count. Feelings on tight reins please. Reason must prevail! Fay Freak (talk) 21:44, 13 January 2019 (UTC)
Blocking somebody and then deleting the history of what they did is damnatio memoriae at least, and Orwellianism at worst. Is one rude comment on a talk page worth this? I think that's disturbing and wrong. Equinox 21:48, 13 January 2019 (UTC)
Any blocks for actions other than blatant vandalism deserve explanation, especially if requested by the blocking party. Based on Stephen's translation I don't think the block was warranted, and certainly not without some form of communication. - TheDaveRoss 02:09, 14 January 2019 (UTC)
Anatoli DID communicate the reason for the block: Intimidating behavior/harassment. And many of us block anons all the time for the same or similar reasons, and with the same or less communication. Hardly a day goes by that I don't see some IP complaining about some admin abuse without any communication. As far as deleting the history, every one of us admins could still see it and could have looked at it just as easily as I did. This was a common action. If you want to take Anatoli to task, then let's pillory all the other admins who have done the same or worse. It's a tempest in a teapot. —Stephen (Talk) 05:17, 14 January 2019 (UTC)
I meant communication with the user prior to the block, i.e. about the translation issue and perhaps about their manner of raising their concerns, but I can see I wasn't clear. For the rest of it, the root problem is a lack of assumption of good faith in borderline cases. Even if the tone of the communication was poor, the content is perfectly reasonable and on topic, no reason to delete it and block the person for asking. I don't think Anatoli deserves to be punished for this or anything, but when issues such as this come up I think it is worth sharing how we each would handle it so that we can be more evenhanded going forward. - TheDaveRoss 13:47, 14 January 2019 (UTC)

FileExporter beta featureEdit

Johanna Strodt (WMDE) 09:41, 14 January 2019 (UTC)

Banning Altaic reconstructionsEdit

At the present there are no non-controversial reconstructions of Proto-Altaic, in fact the Altaic theory itself is a controversial hypothesis. In practice, allowing reconstructed Proto-Altaic entries means copying from the Etymological Dictionary of the Altaic Language (EDAL) by Starostin, Dybo and Mudrak.

EDAL reconstructions are based on ad-hoc soundlaws justified by semantically dubious comparisons, lack of strictness in lower-level languages, faulty philology and generally too many researcher degrees of freedom; it is not merely a controversial representation of an Altaist tradion, it is not homotopic to an earlier body of knowledge regarding sound correspondences within the proposed language family (such a thing exists only in fragments), rather it creates a completely new reconstruction using very tenuous soundlaws, with no prior precedent, to fit cognate sets which are also not traditionally accepted and which by themselves can only be called 'doubtful' at best.

I would like us to ban Altaic as a language family completely, since its only function seems to be smuggling lousy comparisons along with promising ones (usually Turkic-Mongolic, Mongolic-Tungusic or Korean-Japanese) and as a shorthand for "it appears in Mongolic and Turkic language and I can't be bothered to investigate the etymology further", but I would be fine with reducing it to an etymology-only language to stop further proliferation of garbage copy-pasted entries. Crom daba (talk) 22:59, 15 January 2019 (UTC)

  Support --{{victar|talk}} 23:25, 15 January 2019 (UTC)
  SupportTom 144 (𒄩𒇻𒅗𒀸) 11:58, 16 January 2019 (UTC)
The earlier vote for this was Wiktionary:Votes/2013-11/Proto-Altaic. Personally, I feel that the reconstruction template itself needs to display a notice about how controversial the Altaic hypothesis actually is serves as good evidence that it isn't exactly our most useful content. — surjection?〉 08:28, 16 January 2019 (UTC)
Of course it was Ivan who pushed for that. @Crom daba, if you want that vote reversed, I think you'll need to create a new vote. --{{victar|talk}} 17:53, 16 January 2019 (UTC)
I really know nothing about this so I will not vote on it, but it seems to me that even if Altaic is so controversial, the content should still be archived somewhere in an appendix - it seems like a waste to just erase it altogether. If we can have an Appendix:A Clockwork Orange, surely we can have an appendix for controversial reconstructions. (Perhaps that appendix should not be linked to from mainspace, however.) — Mnemosientje (t · c) 18:20, 16 January 2019 (UTC)
It's already archived by people who promulgate these reconstructions (StarLing). We are not and do not need to be a repository of all knowledge that is tangentially related to linguistics. DTLHS (talk) 18:23, 16 January 2019 (UTC)
  Support, but I too think that we should rightly have a vote in order to overturn a previous vote. —Μετάknowledgediscuss/deeds 20:15, 16 January 2019 (UTC)

Okay, here's the vote. Not sure if I set it up correctly though. Crom daba (talk) 23:03, 16 January 2019 (UTC)

@Crom daba: I think it would be better to keep option 1 only for now. If it doesn't pass, we can put option 2 to the vote later. Per utramque cavernam 23:34, 16 January 2019 (UTC)
Okay, sounds good. Crom daba (talk) 23:45, 16 January 2019 (UTC)

Project proposal: Enrichment of multilingual STM termsEdit

Hallo all,
I would like to propose a research project aimed to enrich the Wikitionary in the STM (scientifical, techncal, medical) domain.
As a starting point lays the observation that many terms (typically named-entities) are present in scientific literature sources, but they do not still have an entry even on the English Wikitionary, which has the best coverage. This situation is even worse for some "new" terms, which are certainly of interest, and for non-English Wikitionaries.
On the other side, it has to be observed that some of the information which is not available on the Wikitionary can be extracted from Wikipedia. Hence the project objective are:
a) the Wiktionary will be extended for STM relevant terms in English and Italian as well, for thousands of terms.
b) The whole process will be validated for two languages (English and Italian) having different coverage and characteristics between Wikitionary and in Wikipedia.
The result would be very useful for who works in the research field.

Tasks:
1) I will identify from the the STM English literature from the sampled areas, including hot topics (e.g. Artificial Intelligence) and some new terms which are not present in the English Wikitionary;
2) Then, I will create such new English Wikitionay entries with a semi-automatic supervised process which will include as much as possible what can be inferred from Wikipedia (e.g. term disambiguation, different translations, etc.).
3) Then, I will validate this entry process for the italian language also, which is my native language: in this case, I will directly enrich manually the entries in the cases when the algorithm identifies names which can not be inferred from Wikipedia.
4) Then, I would document this (multi-language) process in a detailed pseudo-code, resulting in a open-access paper as a further project. I think that this result is preferrable than delivering a language-specific implemented piece of code, since creating/mantaining software should be further tasks.

To support the project proposal please leave a comment at the bottom of the project page.
Thank you,
Best
--Marco Ciaramella (talk) 16:30, 17 January 2019 (UTC)

How will you ensure that these terms meet WT:CFI? Wikipedia tends to invent words in order to translate concepts between languages. DTLHS (talk) 16:41, 17 January 2019 (UTC)
@DTLHS Good point. Briefly, since any translations into Italian of an existing word from the English Wikitionary could be problematic, this is the reason why such entries must be validated manually (as stated at the point #3). --Marco Ciaramella (talk) 19:46, 17 January 2019 (UTC)
I think this sounds great! My only concern is about the nature of the semi-automatic process. What part of entry creation do you see as being automatic? Andrew Sheedy (talk) 04:24, 18 January 2019 (UTC)
@Andrew Sheedy Thank you for the feedback ! :-) The semi-automatic fashion would involve the English terms enrichment process (however, it is intended that the analysis of the input sources and the generated names are one part of the project), referenced at the point 2. --Marco Ciaramella (talk) 08:31, 18 January 2019 (UTC)
I'm not too keen on using Wikipedia as a prime source of technical terms. Wouldn't it be better to harvest them from online scientific journals? I'm currently working my way through PLOS ONE, finding hundreds of words we would never otherwise have. And I can't see how you are going to arrive at definitions in a semi-automatic manner. Perhaps you could start in a very slow way so we can see some examples. SemperBlotto (talk) 07:13, 18 January 2019 (UTC)
@SemperBlotto This is another interesting really to-the-point feedback, thank you. One intended task of my proposal is the discussion about the use of Wikipedia as a bootstrap source for the seeds of new (related) terms, and some related topics (e.g. how the generated terms are related each other, etc.). The human supervision at this stage is aimed mainly to assess the results of such process. However, this can obviously not exclude from the discussion what can be generated from (open) literature, which is considered a primarily source for Wikitionary too - and often cited as reference or in the Wikitionary meaning examples. I have some knowledge about your project and I would like to include some of eventual related-results or at least paper/project reference about PLOS in my final (also, open) publication. --Marco Ciaramella (talk) 08:31, 18 January 2019 (UTC)
@Marco Ciaramella: It's spelled Wiktionary, not Wikitionary :) You mention that typically named entities are missing in Wiktionary. But this is probably the least interesting type of entry, as the English and Italian translations will likely be identical? I general I don't see a problem with using Wikipedia as a source, especially for multilingual work it will be more useful. – Jberkel 23:05, 23 January 2019 (UTC)

Hyphens and dashes in entry titlesEdit

Hi, An editor just pointed to me that dashes are not currently used for entry titles on English Wiktionary. Unfortunately, the only justification they managed to quote was this page: Wiktionary:Entry titles. This page currently doesn't say anything about not using dashes, and neither does the List of unsupported characters. So is there an actual policy on whether dashes could/couldn't be used in the entry titles? There are quite a few legitimate cases for them (as well as for hyphens obviously – as each of them has their own specific usage rules). But blindly advocating for using hyphens for everything resembling them (incl. en-dashes and em-dashes) seems to be an unnecessary simplification of typographic conventions. Cherkash (talk) 02:39, 23 January 2019 (UTC)

@Cherkash, see Wiktionary:Entry_titles#Punctuation: this section has clearly stated, since December 30, 2010, that:

In most languages, the HYPHEN-MINUS is used for the hyphen, not any of the dashes.

I suggest we continue with this common practice. ‑‑ Eiríkr Útlendi │Tala við mig 00:11, 26 January 2019 (UTC)
So just to be clear, @Eirikr: the way I read this (arguably, a badly phrased) passage is: "Don't use HYPHEN-MINUS for anything but the hyphen; and certainly don't use it for any of the dashes." Is this what you meant as well, just to reaffirm that hyphen-minus sign shouldn't be used for anything else but the hyphen? Cherkash (talk) 00:55, 26 January 2019 (UTC)
What are the "legitimate cases for them"? It would seem that we would just need to exercise or add search rules that fold all the hyphens and dashes into one. DCDuring (talk) 02:52, 23 January 2019 (UTC)
Or we could create redirects. — SGconlaw (talk) 03:27, 23 January 2019 (UTC)
Looks like the search engine does not automatically redirect "Tay–Sachs disease" with en-dash to Tay-Sachs disease with hyphen-minus, but it does return the hyphen-minus version at the top of the list of results because it considers punctuation characters as word separators. — Eru·tuon 04:25, 23 January 2019 (UTC)
@Cherkash: Can you give me an example of an endash- or emdash-appropriate title? —Justin (koavf)TCM 03:16, 23 January 2019 (UTC)
@Koavf: En-dash–appropriate example: Tay-Sachs disease. Cherkash (talk) 03:26, 23 January 2019 (UTC)
Good point. Perfectly valid name--please do create it in the correct form. —Justin (koavf)TCM 03:51, 23 January 2019 (UTC)
Do not create it, and don't tell people to create it. If you absolutely insist on using a special dash character it can go in the headword line. DTLHS (talk) 03:52, 23 January 2019 (UTC)
Agreed, unnecessary. --{{victar|talk}} 04:08, 23 January 2019 (UTC)
@DTLHS: Why? Why would we be opposed to proper typography, especially when we can trivially source it? —Justin (koavf)TCM 04:10, 23 January 2019 (UTC)
So put it in the fucking headword line. Why the fuck should we waste our time creating entries with trivial punctuation differences when instead we could just say every entry uses a hyphen and be done with it. DTLHS (talk) 04:15, 23 January 2019 (UTC)
Calm down, @DTLHS! Why so much anger? Please keep it civil. Dashes are no more special than hyphens. And they are standard punctuation in their own right, which has its own usage patterns. What's your reason to insist to avoid them? Cherkash (talk) 04:20, 23 January 2019 (UTC)
@DTLHS: No one is asking you to do anything and you don't have to be rude to me. —Justin (koavf)TCM 04:22, 23 January 2019 (UTC)
@Cherkash: You're wrong, em-dashes are more "special", as you say, because they aren't typically intended to be used in URLs and may even cause some encoding problems. They would definitely be f**k-all annoying for people typing in the URL. It's a bad idea all around. --{{victar|talk}} 04:31, 23 January 2019 (UTC)
Evidence for your claims, @victar? As far as I know, URL encoding schemes, as well as the Wiki engine, handle dashes and any other non-ASCII characters gracefully. Other Wikis (e.g., Wikipedias in many different languages) also have no problem with them. Cherkash (talk) 04:57, 23 January 2019 (UTC)
URLs have to encode dashes and em-dashes as %E2%80%93 and %E2%80%94, respectively, so you're actually suggesting Tay%E2%80%93Sachs_disease as opposed to Tay-Sachs_disease. One concern I have with it, other then it being a total hassle to type for zero benefit, is all the places where Lua mw.ustring.find functions, etc., might not include these special characters. --{{victar|talk}} 05:22, 23 January 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── On Wikipedia, for titles with a dash, the title with a hyphen instead usually redirects to the proper title; for example, Tay-Sachs disease redirects to Tay–Sachs disease. We can do the same here. A bot could create such redirects where they do not already exist.  --Lambiam 12:13, 23 January 2019 (UTC)

It looks like other dictionaries tend to normalize em- and en-dashes to hyphens. I think we should include entries for both the hyphenated and dashed versions when both exist, but for sanity's sake I think we ought to keep all content at the hyphenated version (using the "proper" punctuation in the header). No reason to exclude the dashed versions, the URL concerns are a red herring, we already have lots of page titles which don't play well with URLs (see: the majority of the worlds languages) and even so the em- and en-dashes are valid in URLs anyway (https://en.wiktionary.org/wiki/Tay–Sachs_disease). - TheDaveRoss 13:55, 23 January 2019 (UTC)
Most of the entries get there in the first place because someone types in a search term and clicks on the redlink, and most people have no clue about how to produce any kind of dash on their keyboard. That means that most new entries will be "wrong" in a very subtle and non-obvious way, and end up being moved. It's bad enough that we have case sensitivity to mess people up, but at least there we have a very practical reason that people can understand. I think the reason for the depth of emotion on this issue is that it represents a type of prescriptivism, which the community here instinctively distrusts. I don't really want people going around changing double quotes to Smart Quotes and apostrophes from straight to curly ones, and this feels like the same sort of thing. Chuck Entz (talk) 14:55, 23 January 2019 (UTC)
What supports the assertion an en- or em- dash is the "correct" typography? Is it just printers' esthetics?
If I cut-and-paste a some text that contains a dash from the results page of a search engine, how often will that contain a dash that is not a hyphen?
I don't really see the point of having long dashes even in the inflection line, where their presence might cause on-page (browser-based) searches to miss occurrences of characters that normal (ie, benighted — like me) contributors might expect to be included. DCDuring (talk) 15:31, 23 January 2019 (UTC)
“Is it just printers esthetics?” – Yes, and also there are rules or guidelines when to use which to specify these aesthetical requirements, though I don’t know where normal people learn them now. They are also language-specific, even of different distribution in various English-language countries though I am not sure whether in what lexically matters (like em dashes more often used in the UK for parentheses).
“how often will that contain a dash that is not a hyphen” – Suddenly now search engines count which list everything of any quality on the web. It depends on the search engine how Unicode confusables are handled. Well I automatically write en dashes when appropriate and everybody should use a layout where correct punctuation are comfortably at hand. But there seems to be no guide anymore. Now only practicability counts. Because people wrongly assume their keyboard layout has all the signs it actually should have.
Not sure why case-sensitivity would mess anything up. Case-sensitivity is everywhere in the Unix world, and users who presume Windows qualities should check themselves. Case-sensitivity is normal and smart quotes and en dashes and em dashes are also normal on keyboard layouts. People who insist on their default layouts being correct also leave IoT decives with the default password “admin”.
The difference between a hyphen-dash and an en dash is also easily seen by the accustomed eye. We can also have ‐ U+2010 HYPHEN and − U+2212 MINUS SIGN if we abolish - U+002D HYPHEN-MINUS which is a character that doesn’t exist in any language (only in programming languages). But we won’t, because Unicode has gone too far. As I have demonstrated Wiktionary:Grease pit/2018/November § U+2019 in notWordPunc there is neither a correct way to implement apostrophes in Unicode correctly since the standard is contradictory. URL standards and Lua standards are also bad because character encoding and input is in the desolate state it is in.
This is to say what can be considered. I do not lean to any of the solutions, though I must point out that if we have a whole sentence like e. g. a proverb entered into the dictionary the hyphen difference is really visible and thus I would expect the larger hyphen used in the dictionary that is used in the respective language. I mean one should not be bullied for assuming that a dictionary uses reasonable typography! Not sure what is expected for these disease names. Anglos are responsible for Unicode and bad keyboard layouts being widespread and for ASCII-centric computer languages and schemes and for typography being lost in 2019 so they shall make the mess themselves clear – they should know. When are en dashes or em dashes totally necessary? Maybe make a list of occasions that could appear as dictionary headers? One could even make a long vote where each use case gets voted upon (but it is per language: writing Tay–Sachs disease in English does not mean German shouldn’t use Tay-Sachs-Syndrom). It is an editorial decision and one cannot make it right for everyone; anything technically works. Fay Freak (talk) 16:39, 23 January 2019 (UTC)
As a privileged, old Anglo with poor vision, I accept full responsibility for all defects of life as we now it. Applications for reparations will be duly considered.
My concern is solely with English and Translingual entries, the English etymologies, usage notes, and definitions of FL words, and other English-language text in other namespaces. I don't see the point of adding redirects or in any way impeding the use of English Wiktionary by any passive user capable of using a computer or smartphone or any contributor who doesn't want to bother with knowing the first thing about variant dashes/hyphens, etc. DCDuring (talk) 19:42, 23 January 2019 (UTC)
@Victar: I don't think percent encoding is a problem; it happens with lots of entry names already. I thought of some places where en dashes might have effects on our Lua infrastructure. For some languages, Module:headword would automatically add entry names with en dash to "spelled with" categories; I think that could be changed easily by adding en dash to the PUNCTUATION variable in several language data modules or to individual languages' standardChars fields. And Module:headword has a pattern that matches a punctuation character that cannot appear inside a word (notWordPunc) that is used in the automatic linking in headwords, but I think that wouldn't need to be changed. Currently, "Tay–Sachs disease" would be automatically linked as [[Tay]]–[[Sachs]] [[disease]], which seems right. (If en dash were added to the list, it would be linked as [[Tay–Sachs]] [[disease]].) — Eru·tuon 21:42, 23 January 2019 (UTC)
Will it create any problems if (for example) the page Bose-Einstein condensate is moved to Bose–Einstein condensate, leaving a hard redirect? (Assuming that we can agree not to consider these to be “different hyphenation forms”, but instead “entries using alternative punctuation marks”.)  --Lambiam 18:55, 25 January 2019 (UTC)
This approach is problematic, as we must engage in different handling for any such hyphenated EN term that shares its spelling with terms in any other languages.
I'm baffled by this interest in changing the typographics of our headwords. This brings zero value as the new form is lexically equivalent to the existing form. Rather, this change arguably imposes *negative* value as we're having to spend time hashing this out, moving things, redesigning things, and all for a headword form that is demonstrably more difficult for our users to accurately input. Why, for the love of Wiktionary itself, are we wasting time with this? ‑‑ Eiríkr Útlendi │Tala við mig 00:04, 26 January 2019 (UTC)
It can potentially serve to disambiguate, e.g. if Mr Fotheringay-Smythe and Ms Jones find a disease then it might be Fotheringay-Smythe–Jones syndrome. Many sources (including Wikipedia) seem to use en dash, not hyphen, to separate surnames in such terms. I've never really bothered mainly because my keyboard lacks en dash. Equinox 00:13, 26 January 2019 (UTC)

Someone needs to go through all contributions of ErminwinEdit

Obviously gibberish content on ngựa and many other pages - now blocked. Wyang (talk) 00:07, 26 January 2019 (UTC)

"Many other pages"? @Erminwin's edits on the relationship between (ine) and (yone) in OJP appears to be likely; and he corrected my edits on (kami). Are there any problems? If it regards Chinese/Vietnamese I'm out of this. ~ POKéTalker) 02:36, 26 January 2019 (UTC)
@Wyang: I've been monitoring their edits, and they don't seem to be as bad as you make them out to be. Sure, they could be making some mistakes along the way, but I think they're quite conscientious about their edits. — justin(r)leung (t...) | c=› } 02:57, 26 January 2019 (UTC)

Wyang edit-warring to unilaterally remove an attested entryEdit

User:Wyang is edit-warring to unilaterally remove an attested entry, while refusing to participate in the ongoing RFV discussion. Normally I would block the user in this situation, but Wyang is an administrator, so a block would be useless. I'm at a loss for how to deal with the situation and would appreciate input from others. —Granger (talk · contribs) 01:09, 26 January 2019 (UTC)

Again a non-native speaker thinking that he understands the language better than native speakers - the situation of Wiktionary:Beer parlour/2017/September#Modern Greek terms spelt with Latin characters played again, where non-native Greek speakers tell native Greek speakers that "marketing" is Greek. Chinese people interpret any three- or four-letter English word as acronym, and write it, pronounce it in such manner, probably because the average level of English there remains on the level of the alphabet in school. "app" is 'ei-pi-pi', "ugg" is "u-gi-gi", "doc" is "di-o-si", "jpeg" is "jei-pi-e-gi", "ppt" is "pi-pi-ti", ... you name it. Wyang (talk) 01:14, 26 January 2019 (UTC)
Thank you for finally engaging with the discussion. What you're saying is not always true—according to our entry, one counterexample is man#Chinese. I know I've encountered other examples as well and I'll mention them if I think of them. —Granger (talk · contribs) 01:18, 26 January 2019 (UTC)
You are missing the point. How does pronouncing it as 'ei-pi-pi' and writing it in ignorant capitals mean it is Chinese? Are all Category:English three-letter words Chinese words? Do Chinese people think it is Chinese and do Chinese dictionaries include it as a Chinese word? Wyang (talk) 01:21, 26 January 2019 (UTC)
立flag is another example.
No, three-letter English words are not Chinese words. I'm not sure how you got that from what I'm saying. I've given several reasons for thinking this is a Chinese word at the RFV discussion; I think the most convincing are that it is used as a word in running Chinese text and that it's the most common Chinese word for app. —Granger (talk · contribs) 01:27, 26 January 2019 (UTC)
Why would native speakers know better than non-native speakers about whether something is a word in the language for Wiktionary's purposes? Having a distinctive pronunciation in a language and have a distinctive spelling are good signs that something is a word in a language, and it's not unheard of a foreign word being adopted with new meaning and pronunciation, even when native speakers might counterfactually claim that the word is not part of their language.--Prosfilaes (talk) 01:58, 28 January 2019 (UTC)

In any case, discussion about the entry belongs at RFV. Could someone give me advice about how to deal with the edit-warring situation? —Granger (talk · contribs) 01:28, 26 January 2019 (UTC)

The reasons that you gave for proving it is Chinese, not English used in Chinese text were that it is written in capitals and it is pronounced as if it is an acronym, but neither of these is a telling argument because all English three- or four-letter words are read and used by the English-incompetent population of China as if they are acronyms. Why are you insisting on verifying something in the wrong language and pretending that you know the language better than people who speak it natively? You are wasting people's time, mate. Wyang (talk) 01:34, 26 January 2019 (UTC)
I stepped away from the computer to take a walk, and on the walk I decided this entry isn't worth the stress. I'm taking it and the associated discussions off of my watchlist, so please ping me if my input is needed. I hope that you will read my argument more carefully, including my comments above, and that someone will restore the entry. Otherwise its removal will be a loss for our readers. —Granger (talk · contribs) 02:17, 26 January 2019 (UTC)
Good. Wyang (talk) 02:21, 26 January 2019 (UTC)
  • This and above are behaviors just so unbecoming for an admin. (@Chuck Entz) --{{victar|talk}} 19:48, 31 January 2019 (UTC)
    Absolutely. That said, until WMF invests in genetic research to come up with the perfect admin, we're stuck with the imperfect humans that we have. Some of our most problematic admins have made tremendous contributions: Ivan Stambuk, Rua, Liliana60,and Vahagn Petrosyan, to name the first ones that come to mind.
    Most of Wyang's interactions are out of your sphere, but regular contributors in east Asian languages will tell you that he's been unfailingly helpful, patient, and generous with his time. That's in addition to his unmatched expertise in those languages and great technical skill with modules and templates. The problem arises when someone from outside his territory does something that affects it- he morphs from kindly Grandfather Wyang into a ruthless warrior against what he sees as outside meddling. That combination of well-intentioned virtue and disruptiveness makes it really hard to come up with an appropriate response. Chuck Entz (talk) 00:58, 2 February 2019 (UTC)
    @Chuck Entz: Yet then on the other hand, we're very quick to hand out 3-day blocks to users for the same unbecoming behavior. It seems to me like a double-standard, in a world where admins are immune from recourse. To echo a point I brought up in a current vote, I would like to see more admin tools broken up into roles -- roles that can be more easily given and taken away. In that way, we can also hold admins to a higher standard of interpersonal skills while not taking away needed tools from the less socially inclined. --{{victar|talk}} 01:30, 2 February 2019 (UTC)

Moving Translation Hubs (and Perhaps All Translations) Out of MainspaceEdit

Buried deep in an interminable deletion discussion (Wiktionary:Requests for deletion/English#address using the formal pronoun) was a proposal by User:Per utramque cavernam that I think deserves more attention and full consideration here. I've taken the liberty of extracting the parts of his message and my reply that aren't specific to the discussion:

[]

I still don't think cluttering the mainspace with entries such as "address using the formal pronoun" or "address with the formal pronoun" is a good idea. What happened of the first THUB provision: "The attested English term has to be common; rare terms don't qualify"?

I propose we create an appendix on the V-form / T-form, and put the translations there. Per utramque cavernam 10:03, 26 January 2019 (UTC)

[]

Whether you put it in mainspace or somewhere else, a translation hub is more like an appendix or a footnote rather than an entry- it's not really English, though it claims to be, it violates the spelling-first organization of the dictionary as a whole, and being based on a concept rather than a specific term in a specific language makes it rather encyclopedic. Since no one arrives at it directly, there's no practical reason for it to be in any specific namespace that can't be fixed with a tweak or two to the code. Chuck Entz (talk) 13:26, 26 January 2019 (UTC)

I think his proposal should be generalized to all translation hubs, and, given that translation sections have had to be moved to subpages in several entries because of their use of system resources, we also might consider moving all translations out of mainspace, perhaps something along the lines of the Thesaurus namespace. This would be, in effect, like replacing every translation table in the entries with {{trans-see}}.

Having a separate namespace would make it easier to avoid dueling translation sections in synonyms or regional variants, and allow foreign language entries to have translations as well (everything would be a translation hub). It would take some work to figure out the best way to nUser:Erutuon/entries with slashesame them and organize them and to deal with duplication, but I think it would be worth it.

It would also take a major draw on system resources out of the entries. Since each translation table would be its own page, there would be more capacity available in most cases.

The main drawback is that they wouldn't be tied in as tightly with the entries, especially where there are multiple, subtly differentiated senses. I suppose they could be transcluded into the entries, but that's technically impossible for some of the larger ones.

What does everyone think? Chuck Entz (talk) 15:07, 26 January 2019 (UTC)

I don't like it. "They wouldn't be tied in as tightly with the entries" is an understatement. You are now basically creating two entirely separate dictionaries complete with definitions and parts of speech, but now as an "appendix" namespace to be forgotten about and diverge. DTLHS (talk) 17:31, 26 January 2019 (UTC)
I like the idea of reconsidering how translation hubs are handled, I am less excited about moving translations out of entries without a really strong demonstration of how it could be done in a way that improves usability. - TheDaveRoss 17:51, 26 January 2019 (UTC)
Couldn't such a namespace also be used to speed loading of entries with a very large number of translations, like [[water]]?[point made by Chuck]
Do those users who use Wiktionary as a translating dictionary like to see redlinks in translation tables in search results for every search or most searches that they do? Or would they rather only see full entries?
Can't users, at least registered users, somehow (JS?) specify multiple namespaces for their default searches? AFAICT we don't have anything in preferences that facilitates this. (This would also have value for incorporating all sorts of things into search results that probably shouldn't clutter most users' search results, such as snowclones, reconstructions, collocations (sometimes proposed).) DCDuring (talk) 18:44, 26 January 2019 (UTC)
I agree with DTLHS, and for the specified reasons. I would rather see translation hubs kept in the mainspace wherever possible, moved to subpages only as necessary to avoid out-of-resources errors. Benwing2 (talk) 19:26, 26 January 2019 (UTC)
@DCDuring: Both the regular and the advanced search interfaces on the search page allow selecting the default namespaces for searches. The CirrusSearch documentation claims the place to select namespaces is in the Search tab of preferences, but that seems to be out-of-date information. — Eru·tuon 21:30, 31 January 2019 (UTC)
Thanks. I hadn't noticed how that worked. But I was really thinking about the default for unregistered and new users. For English speakers principal namespace is a good default, with a possible future collocations space being a good addition. For others principal + any translation namespace would seem better. Do we consider against our privacy and anonymity principles to read a user's preferred language to set such a default? DCDuring (talk) 21:55, 31 January 2019 (UTC)
Isn't that the concept of omegawiki? I believe they are not good when it comes to usability.Matthias Buchmeier (talk) 14:28, 31 January 2019 (UTC)
I think a Translations namespace would be a better solution for dealing with Lua memory problems than moving translations sections to /translations subpages as I have been doing. There are a fair number of entry titles that legitimately contain slashes and it's neater to reduce the number of slashes that mark subpages in the mainspace. The Unsupported titles subpages would still remain, though. I'm uncertain about the idea of moving all translations to a new namespace.
We could avoid separate pages if someone designed a way to maintain non-Lua versions of the translation templates ({{t}}, {{t+}}), not only for Latin terms with a limited number of parameters (as a script I created sort of does) but for terms with other parameters and non-Latin terms as well, but I think that would be a fairly complex task and neither I nor anyone else has volunteered to work on it yet. — Eru·tuon 01:40, 1 February 2019 (UTC)
I think we should keep separate translation pages a a workaround. The number of translations is growing very slowly, so that for the next ten years we would likely only have to use translation subpages on a couple of entries. And the Lua memory problem will evetually be fixed in the near future as server memory gets cheaper and larger. Matthias Buchmeier (talk) 15:13, 1 February 2019 (UTC)
The syncing problem is a very good point, so I've withdrawn the "all translations" part. That's not relevant to the translation hubs, though, since there's nothing to sync with. They're not really terms in English, they're concepts- the only reason we treat them as English is because we don't put translation tables under other language headers. In effect, they're already appendices- we just put them in mainspace and stick an "English" header on them.
I'm skeptical that being out of reach of the default search settings is such a big deal. After all, who actually searches for "consecrate a Buddha image"? It should be enough to link to the translation hub from Buddha and perhaps consecrate.
If we really have to have them in mainspace, an alternative might be to create a "language" header for them like we do for Translingual. Chuck Entz (talk) 23:54, 1 February 2019 (UTC)
  • I would support having a "Translations" tab similar to the current "Citations" tab, and moving all tables of translations to the namespace associated with the tab. We have a number of citations pages for terms that do not have entries (for example, terms deleted as SOP), and it is my understanding that these are permissible. I see no reason why we could not have uncoupled translations pages for terms that do not have entries for lacking an English meaning, so long as they are searchable, and are linked from some relevant pages in entry space. bd2412 T 01:03, 3 February 2019 (UTC)
    I would support having a Translations tab as you suggest, but absolutely oppose moving all translation tables to it. Translation tables grow slowly enough as it is...this would be a surefire way to get it to grind to a halt. I think the purpose of a Translations tab/namespace would be to house current translation hubs and the translations for the few entries (like water and cat) that are too numerous for them to be in the main entry. The goal would be to have all our translation tables look like that, so we could simply work towards making them all big enough to move to the Translations namespace. I just don't think we're ready to do it now. Andrew Sheedy (talk) 02:21, 3 February 2019 (UTC)

Inline "Imperfective:" and "Perfective:", similar to inline "Synonym:" etc.Edit

(Notifying Atitarev, Cinemantique, Useigor, Wikitiki89, Stephen G. Brown, Guldrelokk, Fay Freak, Tetromino, Per utramque cavernam): Russian (and other Slavic-language) verbs come in perfective/imperfective pairs. Sometimes there is a one-to-one correspondence, but sometimes now. For example, a given imperfective verb may have 0, 1 or many corresponding perfectives, and it might differ from meaning to meaning when a verb has multiple meanings. As an example, the verb коло́ть (kolótʹ) is defined like this:

  1. to split, cleave, break (sugar), crack (nuts), chop (firewood), pf - расколо́ть (raskolótʹ)
  2. to stab, thrust, pf - заколо́ть (zakolótʹ)
  3. to kill, slaughter, pf - заколо́ть (zakolótʹ)
  4. to prick, sting, pf - уколо́ть (ukolótʹ), кольну́ть (kolʹnútʹ)
  5. to have a stitch, to feel a prick or a stab, pf - кольну́ть (kolʹnútʹ)
    У меня́ ко́лет в ле́вом боку́.
    U menjá kólet v lévom bokú.
    I have a stitch in my left side.
  6. to taunt

This indicates that e.g. in the meaning "to split" it has perfective расколо́ть (raskolótʹ), while in the meaning "to stab" or "to kill" it has perfective заколо́ть (zakolótʹ) and in the meaning "to prick" or "to sting" it has perfective either уколо́ть (ukolótʹ) or кольну́ть (kolʹnútʹ). We have taken to indicating the perfective correspondences on the same line, but I find this confusing and would rather put them on a following line just like the new format for synonyms/antonyms/etc. It gets especially annoying when you have both perfectives and synonyms listed, e.g. this example from пере́ть (perétʹ):

  1. (colloquial) to steal, to pinch, pfспере́ть (sperétʹ) or упере́ть (uperétʹ)
    Synonym: тащи́ть (taščítʹ)

I am planning on creating templates {{perfectives}} and {{imperfectives}}, with short forms {{pf}} and {{impf}}, which work just like e.g. {{synonyms}} with short form {{syn}}, so you instead write this:

# {{lb|ru|colloquial}} to [[steal]], to [[pinch]]
#: {{pf|ru|спере́ть|упере́ть}}
#: {{syn|ru|тащи́ть}}

and get this:

  1. (colloquial) to steal, to pinch
    Perfectives: спере́ть (sperétʹ), упере́ть (uperétʹ)
    Synonym: тащи́ть (taščítʹ)

I'll then use my bot to convert existing entries to the new format. Comments? Benwing2 (talk) 19:04, 26 January 2019 (UTC)

Makes sense. There don’t seem to be many options to make such distinctions clear, and this is a known format already. Better than the format “{{lb|ru|colloquial}} to [[steal]], to [[pinch]], {{g|pf}} — {{m|ru|спере́ть}} or {{m|ru|упере́ть}}” at least. This format is what you mean to convert by bot? Fay Freak (talk) 19:54, 26 January 2019 (UTC)
@Fay Freak Yes. Benwing2 (talk) 00:23, 27 January 2019 (UTC)
It's okay with me. —Stephen (Talk) 08:18, 27 January 2019 (UTC)
Seems okay to me too. Per utramque cavernam 12:47, 27 January 2019 (UTC)
(Notifying Atitarev, Cinemantique, Useigor, Wikitiki89, Stephen G. Brown, Guldrelokk, Fay Freak, Tetromino, Per utramque cavernam): This is done, please use the new format from now on. Benwing2 (talk) 22:27, 27 January 2019 (UTC)
Note that User:Ungoliant MMDCCLXIV/synshide.js needs to be updated since it just says "undefined" in this case (can we make this more robust?) DTLHS (talk) 00:44, 28 January 2019 (UTC)
@DTLHS: Jberkel's version, semhide.js, works properly. He and I have been working on necessary changes to the visibility toggle framework (see MediaWiki talk:Gadget-visibilityToggling.js) and at this point I just need to double-check the scripts and install them, which I've put off for a while. — Eru·tuon 20:06, 28 January 2019 (UTC)

Novial entries are all lacking citationsEdit

There are currently 572 entries in the category Category:Novial lemmas. All of them are lacking citations. Maybe it would be a good idea if those entries are moved to the appendix instead. Or is there a big Novial corpus somewhere to attest those words? Robin van der Vliet (talk) (contribs) 00:44, 28 January 2019 (UTC)

Nothing about Novial is currently in the public domain due to age in the US, and thus they aren't accessible from Google Books or HathiTrust. Otto Jespersen died in 1943, so his works are PD in the EU, and they're available online. I can't find any evidence of other writers in the language. Unless we want to let Wikipedia attest things (an idea which I don't think would get much success, which is why I haven't seriously proposed it), I think attesting anything is going to be hard.--Prosfilaes (talk) 02:24, 28 January 2019 (UTC)
Are there other published dictionaries other than the "Novial Lexike" as mentioned in the Wikipedia article? DTLHS (talk) 02:31, 28 January 2019 (UTC)
Neither http://web.archive.org/web/20120719040016/http://www.rickharrison.com/language/bibliography.html nor w:Novial mention anything.--Prosfilaes (talk) 03:17, 28 January 2019 (UTC)
The solution is to move all Novial entries to the Appendix, as we did with Lojban. Do we need a vote for that, or can we gather consensus to do so in this discussion? Also @Mx. GrangerΜετάknowledgediscuss/deeds 06:13, 28 January 2019 (UTC)
Moving looks boni to me.  --Lambiam 12:40, 28 January 2019 (UTC)
Looks good to me too. Robin van der Vliet (talk) (contribs) 20:01, 28 January 2019 (UTC)
Sorry to say, but the inclusion of Novial in the main space is baked into CFI, so the ensuing modification of CFI to exclude Novail would require a vote. This, that and the other (talk) 01:54, 31 January 2019 (UTC)
I created a voting page here to strike it from WT:CFI#Constructed languages. Robin van der Vliet (talk) (contribs) 02:03, 31 January 2019 (UTC)

Deletion of own userspacesEdit

I've always wondered why we can't delete our own userspaces. That seems kinda silly to me. Is there any technical solution around this, and if so, is that something people agree with? --{{victar|talk}} 01:46, 31 January 2019 (UTC)

Just put {{delete}} on your user page and an administrator will delete it shortly afterwards. Robin van der Vliet (talk) (contribs) 01:57, 31 January 2019 (UTC)
Obviously. That's not what I was asking. --{{victar|talk}} 07:12, 31 January 2019 (UTC)
I agree it should be possible. I'm curious to hear the rationale for it, besides “it increases code complexity / introduces potential security problems”. Maybe it stems from the general anti-deletist culture in MediaWiki. – Jberkel 07:04, 31 January 2019 (UTC)
From a programming standpoint, it seems easy enough: if (isAdmin or ownsPage) { showDeleteButton}. --{{victar|talk}} 07:12, 31 January 2019 (UTC)
Not sure, but could there be some objection to editors deleting talk pages to hide evidence of uncordial behaviour? Of course, in such cases an administrator could still view or undelete the pages. — SGconlaw (talk) 08:18, 31 January 2019 (UTC)
Yeah, I thought about that scenario, or maybe some joint project multiple people are contributing to, but a) edit other user's userspaces at your own risk, and 2) exactly, an admin could just restore it. In truth, I think, it's more dangerous allowing users to create userspaces than it is allowing them to delete them. --{{victar|talk}} 08:23, 31 January 2019 (UTC)
There are multiple paths to deletion, so the code changes would be more extensive than that, but I doubt that is the reason. Another possible reasons are that not all installations of MW are going to have the same concept of what a user page is and who "owns" it, just because we are generally fine with people controlling their userspace doesn't mean everyone will be. I bet the actual reason is that nobody felt it was that important, and the vast majority of the people who design MW and suggest changes are administrators or developers on their primary wikis. Plus it is the status quo. - TheDaveRoss 13:39, 31 January 2019 (UTC)
Yeah, I didn't think it was that simple; I was just laying out the logic. You make a good point in that Wikt is different in allowing for userspaces, so this issue is unique to us. --{{victar|talk}} 19:53, 31 January 2019 (UTC)

February 2019

Proposal: Separate namespace for entries in Category:Chinese terms written in foreign scriptsEdit

The main issue: Various discussions from the past:

Chinese loanwords that were written in foreign script were originally used only for technical terms such as α粒子 (ā'ěrfā lìzǐ), σ鍵σ键 (xīgémǎ-jiàn), but the advent of globalization has introduced terms such as 卡拉OK (kǎlā'ōukèi), NG (ēnjī), man#Chinese into the Chinese language. Many of these are in colloquial use, but appears to be unregulated.

As a dictionary that aims to describe all words of all languages, it would be useful to include such entries, particularly entries such as fighting#Chinese which has a different meaning from what one would usually expect.

However, of late, this has turned into a rather contentious issue. The main arguments were (1) Chinese terms should be written in Chinese script, not foreign script (2) Chinese terms written entirely in foreign scripts are code-switched. KevinUp (talk) 02:48, 1 February 2019 (UTC)

Prelude: It seems that KTV#Chinese, which had passed RFV in 2014, was recently removed from Wiktionary for being "not Chinese". I wish to point out that KTV#Chinese was among the 39 pioneer entries listed in 现代汉语词典 (Xiandai Hanyu Cidian, 3rd edition, 1996) under its appendix for lemmas that begin with the Latin script (西文字母开头词语).

The following was listed after the definition for KTV:
K卡拉OKTVtelevision缩写  ―  Kēi, zhǐ kǎlāOK; TV, yīng television de suōxiě.  ―  K refers to karaoke while TV is an abbreviation of English television.
I'm not sure whether "K (kēi)" is an abbreviation of Chinese 卡拉OK (kǎlā'ōukèi) or Japanese カラオケ (karaoke), but it should be consistent with the "K" used in 唱K (chàngkèi). Note that we already have an entry for K#Chinese.

On the other hand, the appendix for lemmas that begin with the Latin script in 现代汉语词典 (Xiandai Hanyu Cidian) has expanded from 39 entries (3rd edition, 1996) to 239 entries in the 6th edition (2012). Of the 239 entries, 226 entries were capitalized, while only 7 entries - e化 (yìhuà), e-mail, hi-fi, pH值, Tel, vs, Wi-Fi) were not fully capitalized (the remaining 6 entries contained Greek α,β,γ). For comparison, the original 39 entries found in the 1996 edition are listed below: KevinUp (talk) 02:48, 1 February 2019 (UTC)

Of these, I found that the following entries were not found in the 6th edition (2012)

The volatile nature of such entries (note the removal of Internet#Chinese in the 6th edition) prompted me to come up with the following proposal:

Proposed solution: A separate appendix for Chinese loanwords (外來語外来语 (wàiláiyǔ)) that are written, either partially or fully in foreign script will be created. These "entries" will have full etymology, pronunciation etc, similar to what we have for English snowclones such as "X is the new Y", "have X, will travel" which are listed in a separate appendix. The {{zh-see}} template will then be used to redirect entries such as 卡拉OK#Chinese to a separate namespace such as Appendix:Chinese terms written in foreign scripts/卡拉OK or Appendix:Foreign words used in Chinese/卡拉OK, which is up to the community to decide. KevinUp (talk) 02:48, 1 February 2019 (UTC)

CommentsEdit

I don't like the idea of moving these to an Appendix; the Appendix has poor findability. I stand by what I wrote in the 2018 BP thread. —Suzukaze-c 03:13, 1 February 2019 (UTC)

(I am partial towards User:Fay Freak's idea of adding "code-switching quotes". —Suzukaze-c 03:27, 1 February 2019 (UTC))
I agree with Suzukaze-c for the most part. Also, putting all these words of varying degrees of acceptance into Chinese (e.g. 卡拉OK vs. part-time) in the appendix seems to sweep everything under the rug and would not be dealing with the core of the issue. — justin(r)leung (t...) | c=› } 03:30, 1 February 2019 (UTC)
It seems that "Appendix:Snowclones/X is the new Y" has the proper categorization (Category:English lemmas, Category:English phrases). The only difference is the title of the page looks different. Yes, it's a bit hard to search for "X is the new Y", but for Chinese entries we'll use {{zh-see}} so it is still searchable.
The reason of moving such entries into an appendix is for obvious reasons: Chinese entries are generally not written using foreign scripts, unless it is a transliteration. This does not solve the issue of whether or not an entry is part of code-switching. For me, code-switching hold true for overseas communities, but most of the people in mainland China do not speak much or any English at all. KevinUp (talk) 04:05, 1 February 2019 (UTC)
Why attach so much significance to whether an entry is in the appendix namespace? Are ordinary users supposed to somehow know what that means? As you say the categories are the same. DTLHS (talk) 04:07, 1 February 2019 (UTC)
One reason for this is because we want people to view Wiktionary as a serious project. It does feel awkward to have iPhone#Chinese among Danish, French, Portugese, Spanish, etc. Another reason, some of these lemmas come and go, e.g. Internet#Chinese which was found in the 3rd edition (1996) of 现代汉语词典 but removed in the 6th edition (2012). If we have a separate namespace we can better monitor such entries. I'd like to mention that KTV#Chinese was recently removed without any formal discussion (despite passing RFV in 2014), and the etymology of KTV#English contains errors (MTV = Movie TV?) KevinUp (talk) 04:27, 1 February 2019 (UTC)
I say add usage notes to all entries of this type explaining the situation or just link to an explanatory page, which explains how the situation is controversial. 'iPhone' is used by Chinese people all the time. I don't care if it is considered Chinese or not, but there's no reason for Wiktionary to ignore the fact that Chinese people use that word in amongst Chinese speech in just the same kind of way that 'taco' is used freely in English. --Geographyinitiative (talk) 04:54, 1 February 2019 (UTC)
Turns out some scholars disagree with the inclusion of such entries in w:Xiandai Hanyu Cidian#Controversies. See also news report here.
 

《人民日报》高级记者傅振国说:“《现汉》第6版在‘正文’中收录了英语缩略词等词汇之后,等于将汉语汉字的标准规范擅自改变为英语等外语可以进入汉语,英文可以代替汉字。”

 
KevinUp (talk) 12:38, 1 February 2019 (UTC)
It would be serious enough if there were “code-switching quotes”, foreign language quotes in English sections displayed as “code-switching ▼” instead of “quotations ▼”; in the citation namespace perhaps we change the {{citations}} template so the wording is not “English citations” but “Citations for English”, and perhaps with an extra parameter for subsections like “Citations for English [x] in [language y]”. The findability is good. One cannot expect anyway that a term in a text one searches is found on Wiktionary in the language of the text. If I have a word in Slovenian a section in Serbo-Croatian will usually help, and Persians make use of the Arabic sections for Persian texts. If something is in Latin script in a Chinese text, one expects people to search it as English more than as Chinese. So you editors just need to get over the novelty of this view.
It also works trans-script btw – instead of ridiculous Sanskrit-as-English entries we can put quotes for the terms used in esoteric English texts on the citation pages of the Sanskrit entries. This is like Serbo-Croatian entries in Cyrillic can contain quotes in Latin script since one intends to have mirrored Latin and Cyrillic entries, and like one does not always quote every alternative form on its own page when it would be more useful to centralize to showcase the meaning for example and when variants depend on readings of manuscripts (zancha actually contains a quote for zanca, for culullus the readings are all uncertain …), and Azerbaijani is now using {{spelling of}} like on اۆز‎, and if you quote from audio-records you can’t quote spellings anyway – but I might be too liberal here and you don’t make this second step though assenting to this code-switching quoting. Point is editors need to open their minds for the first step. Sweeping normal quotes into the appendix is caitiff. Fay Freak (talk) 13:35, 1 February 2019 (UTC)
I think this would be a better way of dealing with the current situation (to have foreign language quotes in English sections displayed as “code-switching ▼” and “Citations for English [x] in [language y]”). This is usually encountered for proper nouns (personal names, placenames, etc). The Vietnamese Wikipedia often uses the original Latin spelling without converting it to the Vietnamese alphabet. KevinUp (talk) 21:05, 1 February 2019 (UTC)

What about creating a dummy "language"/language code/Language header for cross-linguistic terms- that is, terms that are used in a given language, but don't really belong to that language. We already have "und", which displays as "undetermined". We even have entries: see Category:Undetermined language. We would have to set some ground rules so that we wouldn't be basically duplicating our coverage of a term for every language that might use it in running text, and we would still have to weed out translingual terms and genuine borrowings. Figuring out what to do about script support might be tricky, though. Chuck Entz (talk) 05:07, 1 February 2019 (UTC)

The scope of this is a bit too wide. We're currently looking at Chinese loanwords that retained part of its foreign script and whether or not such entries can be considered as Chinese lemmas. KevinUp (talk) 12:38, 1 February 2019 (UTC)
 If there are two languages to ascribe a word to the language of it is no more “undetermined” than the etymology of a word is “unknown” ({{unknown}}) if we have two etymologies we are not sure to choose between. It’s not undetermined, it is underdetermined, a thing that usually isn’t a problem in language. ΖΩΑΠΑΝ is an example of what a word of undetermined language is. If a word is sorted as of undetermined language or an etymology as unknown, there is hope that at some point the language is determined respectively the etymology is resolved (we even categorize pages with und language links). For the state of things was known somewhen to someone. In the code-switching examples it is no issue to leave it unresolved. They have arisen in a state of ambiguity. Fay Freak (talk) 13:47, 1 February 2019 (UTC)

Another solutionEdit

As suggested by User:Fay Freak, I think the inclusion of foreign language quotes in English sections which are displayed as “code-switching ▼” instead of “quotations ▼” as well as “Citations for English [x] in [language y]” in the Citations namespace would be a better solution. @Justinrleung, Suzukaze-c, any further comments? KevinUp (talk) 03:42, 2 February 2019 (UTC)

I totally oppose including a quotation in which someone uses one English word in what is otherwise a Chinese sentence under an ==English== section, no matter the formatting. If the sense used in the quotation exists in English, use English quotations to cite it; if the sense doesn't exist in English, then it shouldn't be in an ==English== section and it also strongly suggests the string does deserve a ==Chinese== (or whatever) section. (I would make exceptions for extinct languages attested in Greek manuscripts and things of that sort. But using code-switching to attest, or provide as quotations to illustrate the use of, a WDL? No.) - -sche (discuss) 23:24, 2 February 2019 (UTC)
From a wholesome view, this is language and thus to be covered. It is an extra one can have. I do not deem it likely that Wiktionary can overflow with code-switching quotes in any fashion that could significantly give offence. Note that one has the offence already. One has the quotes already but sorted in a crude fashion. This is only about showing the quotes as what they are: Multilingual text. Fay Freak (talk) 22:56, 3 February 2019 (UTC)
Yeah, some users might oppose to having foreign language quotations, because it feels weird to see Greek or Chinese text popping up within an English entry. Perhaps a new template similar to {{seemoreCites}} in the English section for code-switched/foreign language quotes would be more appropriate. KevinUp (talk) 04:35, 4 February 2019 (UTC)
Why add code-switching quotations at all? Our mission is to define all words, not to record all quotations. We can cite and define English words using English quotations. AFAICT the only reason to bring up a Chinese- or German- or whatever- language quotation in which one "English" word has been embeded, is if the "English" word (or sense) can't be attested via English quotations . . . in which case, shoehorning it into an ==English== section in any fashion is wrong, on a basic WT:CFI level, and furthermore strongly suggests the word is in fact a word in the language of the surrounding quotation. - -sche (discuss) 05:26, 4 February 2019 (UTC)
Yes, it's not our mission to record such quotations. If we were to create a section for "code-switched quotations", this will have to be restricted to lemmas written in a nonnative script. The main reason for having this section is to prevent entries such as iPhone#Chinese or iPhone#Vietnamese from appearing. KevinUp (talk)
Why/how would preventing/banning iPhone#Chinese require adding code-switching quotations? Anyone who runs across a code-switching quotation and wants to know what any of the words in it means can look up each of them, and find Chinese entries (with Chinese quotations) defining the Chinese words and iPhone#English (with English quotations) defining the English word, enlightening them on the meaning of all of the words in the quote. I don't see why the code-switching quote itself would need to be recorded. - -sche (discuss) 06:29, 4 February 2019 (UTC)
Good point. The thing is, there's currently a loophole in our system. It is not impossible to find quotations for APP#Chinese, の#Chinese or iPhone#Chinese (no quotations yet), but are these words considered actual lemmas in their respective languages? What can we do to prevent users from creating such entries? Guidelines are needed to identify whether quotations provided to lemmas written in a nonnative or nonstandard script qualify as code-switching. KevinUp (talk) 07:41, 4 February 2019 (UTC)
Consider also Talk:hiam. I also used google:"ai swee mai mia" and google:"ai sui mai mia" as the basis for creating 愛媠莫命, but really I could not find usage of the same phrase in Chinese characters, excluding Standard Chinese calques. —Suzukaze-c 06:11, 4 February 2019 (UTC)
Interesting. Now we're looking at Min Nan terms code-switched into English. I disagree with the creation of hiam#English. If it's Singlish or Singaporean English, it should at least be found here: http://eresources.nlb.gov.sg/newspapers/ (Singapore newspaper archive). I think it's Min Nan code-switched into English, because non Min Nan speakers might not be able to catch its meaning (Singapore is a fairly diverse society).
As for 愛媠莫命爱媠莫命, I think we can create POJ entries such as ài-súi-mài-miā rather than poorly transcribed "ai swee mai mia" or "ai sui mai mia". Hokkien is often transcribed without tone marks by the locals, but when read its still pronounced exactly like Hokkien. KevinUp (talk) 07:41, 4 February 2019 (UTC)

Citations namespaceEdit

This seems perfect for the citations namespace instead of an appendix. It's linked automatically from the entry, and there's no need to specifically label it as a particular language. DTLHS (talk) 23:31, 2 February 2019 (UTC)

The important lexicographic information concerning the Chinese entry, like pronunciation or the measure word it takes, cannot be placed on the citations page, so there would be a loss of information. Also, citations pages are standardly labelled by language using {{citations}}. —Μετάknowledgediscuss/deeds 19:04, 3 February 2019 (UTC)
Yeah but foreign terms which haven’t passed are arbitrarily adapted to the sound system of a language. This has also been shown for “APP” in Chinese, and is well-known of code-switching in general: As is that when multilinguals switch languages it is not unusual to take over the pronunciation of one language when one is in the other, and even conscious speech assumed there lack standards. Even the pronunciation of “passed” words is rather arbitrary, dependent on educational background and also intentionally ridiculized.
I have suggested the citation namespace, but with proper earmarking of multilingual quotes. Fay Freak (talk) 22:44, 3 February 2019 (UTC)
  Support the use of the citations namespace for entries such as APP#Chinese which has the exact same definition as its corresponding English.entry (app). As for its pronunciation, there isn't a proper guideline for that. Xiandai Hanyu Cidian states the following in its appendix for lemmas that begin with the Latin script:
漢語西文字母一般西文這裡不用漢語拼音標注讀音 [MSC, trad.]
汉语西文字母一般西文这里不用汉语拼音标注读音 [MSC, simp.]
Zài hànyǔ zhōng xīwén zìmǔ yībān shì àn xīwén de yīn dú de, zhèlǐ jiù bùyòng hànyǔpīnyīn biāozhù dúyīn. [Pinyin]
Translation: In Chinese, Western letters are generally read based on its pronunciation in the Western language. Here, there is no need to mark the pronunciation in Hanyu Pinyin.
I hope we can make a decision on this soon, i.e. (1) entries in Category:Chinese terms written in foreign scripts such as size#Chinese, part-time#Chinese, iPhone#Chinese, which has the same meaning as in English, are to be moved to the citations namespace. (2) Only entries such as man#Chinese, fighting#Chinese, NG#Chinese, which has a meaning different from what one would expect from its usual English definition, are to be included in the Chinese section. KevinUp (talk) 04:25, 4 February 2019 (UTC)
The loss of pronunciation information in entries such as size#Chinese (Cantonese: saai1 si2) is regrettable, but that information belongs to 晒士/嘥士, not size#Chinese. Until a lemma has been properly lemmatized into Han script (e.g. cheese芝士 (zhīshì)), its pronunciation is often unclear and varies depending on each individual.
Then again, I have no idea why APP#Chinese is pronounced like an initialism in mainland China, but I think this information can be included in the usage notes of APP#English instead. KevinUp (talk) 04:25, 4 February 2019 (UTC)

A specific low memory template for compounds of Japanese kanji, Korean hanja, Vietnamese Han charactersEdit

The page for (shuǐ) is currently out of Lua memory. Even after memory consuming templates such as {{Han etym}} were removed, the same problem persisted. This may have something to do with Module:columns. I think, we may need to rely on an older version, such as Module:columns/old. I found that {{der-top3}} uses less memory compared to {{der3}}.

Also, a few months back, a user was confused by the many derived terms in the Japanese section (Wiktionary:Tea_room/2018/August#者,_difference_between_derived_terms_under_Kanji_vs._under_suffix?), so whatever template used for compounds or derived terms needs to have a customizable title ({{der3}} doesn't have a title anymore). KevinUp (talk) 02:48, 1 February 2019 (UTC)

@KevinUp:

Test title

I just made a der3 with a custom title right here. Or was there a discussion about deprecating it ASAP? mellohi! (僕の乖離) 19:00, 1 February 2019 (UTC)

Yes, the customized title of {{der3}} has been deprecated. Prior discussion can be found at Wiktionary:Beer parlour/2018/November#Titles of morphological relations templates. Take a look at wine#Derived terms. The unboxed title looks out of place. KevinUp (talk) 21:06, 1 February 2019 (UTC)
Ah, you weren't referring to the unboxed titles. Speaks of me being out of the loop. mellohi! (僕の乖離) 21:09, 1 February 2019 (UTC)
No worries. Basically, the Lua memory used for {{zh-der}} (Chinese compounds) and {{der-top3}} (JKV compounds) needs to be reduced. KevinUp (talk) 03:42, 2 February 2019 (UTC)

Update: seems to have enough memory now. @Erutuon, do you know which template/module was using up the memory? I just realized {{der-top3}} does not use Lua memory. KevinUp (talk) 04:49, 4 February 2019 (UTC)

Kanji compounds for Japanese given namesEdit

Previous discussion: Wiktionary:Tea room/2018/August, User talk:Shāntián Tàiláng#Given name request

I'm interested to know what the community thinks about creation of kanji compounds such as 亜実利 that are only used in given names. There are up to up 148 possible kanji combinations listed at あみり. Are we going to create entries for all of these? Readings for Japanese given names (known as nanori) is often arbitrary and there are no strict rules on which kanji to use.

When I look at pages such as Category:Japanese terms spelled with 実 read as み, most of it appears to be given names. To isolate actual kanji compounds, one would have to search for — incategory:"Japanese terms spelled with 実 read as み" -incategory:"Japanese proper nouns" intitle:実 [3] to obtain the 14 entries of 実 with reading み that are not proper nouns.

We could perhaps include only the top 5000 kanji compounds used for given names. I think listing the possible kanji forms for Japanese given names at hiragana pages such as あみり is good enough. To find out how to pronounce a person's name written in kanji, we could just use the search box, or check the nanori readings listed at individual kanji pages.

On an unrelated note, most South Koreans still use hanja for their given/personal names, but we don't have any entries for hanja given names. There are only 3 entries in Category:Korean given names, compared to the 6229 entries we have in Category:Japanese given names, while Category:Chinese given names was recently deleted, because most Chinese given names are sum of parts, and any combination is possible as long as it's not a lewd word.

So the question is, should we continue to create such entries, or should we limit this to something like 5000 most popular kanji compounds (I have no idea where to find this). KevinUp (talk) 03:42, 2 February 2019 (UTC)

For me, all Japanese given names should be lemmatized at the hiragana form, with the kanji spellings being soft redirects to the hiragana lemmas, where an exhaustive list of kanji spellings can be added. Subsequently, Category:Japanese given names and all its subcategories should be purged of any kanji spellings of given names, leaving only the hiragana lemmas left. mellohi! (僕の乖離) 04:45, 2 February 2019 (UTC)

Pinging @Eirikr, Poketalker, Dine2016, 荒巻モロゾフ, Suzukaze-c over here for their thoughts. mellohi! (僕の乖離) 04:45, 2 February 2019 (UTC)

At least, It might be necessary to isolate articles of people names that can be made unlimitedly. It’s necessary to delete if there are which does not have actual usage.--荒巻モロゾフ (talk) 06:30, 2 February 2019 (UTC)
My only opinion is that we should not rely on the EDICT names dictionary, and should do at least the minimum effort to make sure that a name or its kanji spelling is actually used. —Suzukaze-c 23:44, 4 February 2019 (UTC)
I’d like rename all Japanese first name entries to hiragana. One can very freely choose kanji for a given pronunciation. — TAKASUGI Shinji (talk) 00:44, 5 February 2019 (UTC)
In response to the original question, speaking as a beginner reader of Japanese, I would like as many kanji personal "first" names and family names as possible to be look-uppable. Sometimes for beginners it may not even be clear that a kanji compound is a personal name. Generally, if I see incongruous characters, e.g. for topographical features or "beautiful flower"-type meanings, then I tend to guess that a personal name is meant, but sometimes it is not obvious for beginners. Mihia (talk) 01:18, 15 February 2019 (UTC)
In Japanese texts, especially those that are beginner friendly, given names are often written with the suffix さん (-san). In addition, Japanese personal names (full names) often consist of four to five character kanji compounds (usually two characters for surname, followed by two or three characters for given name), so it is not that hard to identify a Japanese personal name while reading more advanced texts. I don't think it is practical to have as many kanji personal "first" names because many different kanji variations are possible for the same Japanese given name written in hiragana. Family names, on the other hand, are fixed when it comes to kanji choice, due to strict rules in the koseki system, so kanji compounds for family names can be included as they fulfil our attestation requirements. KevinUp (talk) 03:05, 17 February 2019 (UTC)
Erm, well, thanks, I know さん, and that names are character compounds! The thing is that these characters usually have literal meanings too. When these are obviously incongruous to the subject matter it is not too bad, but this isn't always the case. Also there are no capital letters to help, of course. Mihia (talk) 00:04, 20 February 2019 (UTC)

Taxonomic names in individual languagesEdit

In Dutch, but I'm sure also in other languages, there are terms for taxonomic clades as well as members of them, that are different from the scientific/Latin-based translingual names. For example, the normal term for Felidae is katachtigen and for Mustelidae it's marterachtigen. These are plural forms of nouns, and the singulars katachtige and marterachtige refer to individuals of these groups.

There doesn't seem to be any kind of category tree for such names currently. We have a big set of categories for the translingual taxonomic names, but they don't seem to have equivalents in other languages, only translingual. Given that both the group and its members are part of a single lemma in Dutch, how should these be categorised? Only marterachtigen refers to a group, but it's not a lemma, so it shouldn't have any categories. Should the lemma have something like {{lb|nl|in the plural}} ''[[Mustelidae]]'' as a second definition? —Rua (mew) 13:27, 2 February 2019 (UTC)

In English, too, a mustelid is a member of the Mustelidae, and the whole group could be referred to by the plural of that word. But if you say (and I tend to agree) the plural isn't a lemma—if its use to refer to the Mustelidae isn't so lexical it needs to be given as a definition on the page marterachtigen / mustelids—why isn't it sufficient to define the singular as "a member of the family Mustelidae"? Is it not comparable to how "humans" in the plural can mean "humanity" / "humankind", but we probably don't need to add a sense to "human" (or "humans") that says "(in the plural) Humanity / humankind", or a sense at "elf" for "(in the plural) elfkind", "dog" "(in the plural) dogkind", etc? As far as categorization, what would you suggest would be needed beyond putting marterachtige in Category:nl:Mustelids the way mustelid is in Category:en:Mustelids? - -sche (discuss) 17:04, 2 February 2019 (UTC)
I don't really think mustelid should be in Category:en:Mustelids, based on WT:Beer parlour/2018/December#Should set-type categories also contain their namesake?. But that aside, we have a lot of categories specific to taxonomic names, but only in Translingual, not in any specific language. My question was more related to whether we should replicate this structure in all languages that have terms referring to species/taxonomic groupings (like English and Dutch, as you showed). That is, should mustelid be in a to-be-created Category:en:Taxonomic names or Category:en:Taxonomic names (family), the way that Mustelidae already is? —Rua (mew) 18:23, 2 February 2019 (UTC)
Aha, I see you what you mean. Hmm...if marterachtige(n) / mustelid(s) is categorized as Category:foo:Taxonomic names (family), would witbandgierzwaluw and black swallow-wort be categorized as Category:foo:Taxonomic names (species)? Would birds / vogels be categorized as a taxonomic name for a class? And then, would cohosh also be in Category:en:Taxonomic names (species) although it refers to two species? I guess I'm not opposed to that, though the birds/vogels (clearly just a common name/word) and cohosh (ambiguous / two species) examples seem like evidence these aren't truly taxonomic (unambiguous) names. (It seems related to the question of whether mul taxonomic names can have translations, to which the de jure answer may be no but the de facto answer—looking at Navajo, for example—is yes. On that note, I suppose marterachtigen and mustelids should be added to Mustelidae#Translations...) - -sche (discuss) 23:17, 2 February 2019 (UTC)
Perhaps languages written in non-Latin scripts can give answers. How are taxonomic names rendered in Russian or Chinese? I cannot read Chinese, but w:zh:鼬科 is the interwiki for w:Mustelidae and it has a name in Chinese characters, with the 学名 (scientific name) given after it in Latin letters. w:ru:Куньи is likewise in Russian, and gives the scientific name but labels it "Latin". Would "scientific name" and "taxonomic name" be the same thing? What is the term for native-language equivalents of taxonomic names, like 鼬科 (鼬科) and куньи (kunʹi)? Should we give them their own categories or just place them in the regular lifeform set categories? —Rua (mew) 21:07, 3 February 2019 (UTC)
“Would ‘scientific name’ and ‘taxonomic name’ be the same thing?” In the context of discussing a taxon: yes. See 学名 and scientific name.  --Lambiam 02:11, 4 February 2019 (UTC)

Format of custom header text in new {{der4}}Edit

@Erutuon: Can you please change the formatting of the custom header text in the new {{der4}}? It has the same bold text as Derived terms and this does not make it obvious to readers that the multiple tables under Derived terms still belong to this section. They seem like separate sections. I tried to get used to it but every time I see it, I find it confusing. See fárad. I'd prefer text in italics and parentheses, with closing colon, e.g. (Compound words): Thanks. Panda10 (talk) 20:07, 3 February 2019 (UTC)

A change like that needs consensus, though admittedly I didn't get input on what the header text in {{der4}} and similar templates should look like when I chose the style. But you can change it to the style you propose just by adding the following to your common.css:
.term-list-header {
	font-style: italic;
	font-weight: inherit; /* remove this line if you would like the header to still be bolded */
}
.term-list-header:before {
	content: "(";
}
.term-list-header:after {
	content: "):";
}
Eru·tuon 20:40, 3 February 2019 (UTC)
@Erutuon: I really appreciate the script but I'm not sure if modifying my common.css is the correct solution. I think it is better to see the entries as a Wiktionary reader would see it. Panda10 (talk) 21:46, 3 February 2019 (UTC)
@Panda10: Well, okay. I agree that the current style is confusing. I don't like the combination of parentheses and colon myself, but if others like it, I can implement it. In the meantime I should probably make the header not use inline CSS though. — Eru·tuon 22:01, 3 February 2019 (UTC)
Why not just hard-format it in a more satisfactory way. Let a thousand flowers bloom and then pick from among them. DCDuring (talk) 00:22, 4 February 2019 (UTC)
Well anyway, DTLHS added the CSS a few days ago. — Eru·tuon 23:38, 9 February 2019 (UTC)
  • Because of the extremely poor presentation of the modified templates ({{der3}}, {{der4}} etc.) I have stopped using them completely. I now use {{der-top3}} for new conversions; it's not perfect but the presentation is far better. Progress, huh? DonnanZ (talk) 17:28, 10 February 2019 (UTC)
I agree. I'm also using {{der-top3}} for Han character entries. At least it doesn't use any Lua memory. KevinUp (talk) 08:42, 11 February 2019 (UTC)
I'm kind of disappointed, but can't blame you. The new layout doesn't look very good, and it's annoying to have the toggle button at the bottom, because you can't collapse the list when you're reading through a page. I am open to ideas for improvement. I am not great at graphic design or whatever this is. It would be nice to at least bring it to the level where nobody hates it so much that they can't bear to use it. — Eru·tuon 22:56, 11 February 2019 (UTC)

Usage of kanji in Ryukyuan languages besides OkinawanEdit

Unfortunately, is once again out of Lua memory, even though it was working yesterday. I would like to know whether the following languages: (1) Miyako, (2) Northern Amami-Oshima, (3) Oki-No-Erabu, (4) Southern Amami-Oshima, (5) Yonaguni, (6) Yoron, are actually written using kanji (historical or modern times) by native speakers.

The sections at appears to have been added by the following two users: Special:Diff/25636005/25750073. Should these languages be lemmatized using kana instead of kanji? KevinUp (talk) 11:46, 5 February 2019 (UTC)

(are they written by native speakers at all, in the first place? 🤔 —Suzukaze-c 07:03, 6 February 2019 (UTC))
I don't think so. The entry for 海豚 even has (7) Kikai and (8) Kunigami, added by User:Nibiko in this 2016 edit. Some of these languages appear to have test wikis at Wikimedia Incubator, but I'm not sure about the script used. KevinUp (talk) 10:24, 6 February 2019 (UTC)
They use kanji in the purposes to write the lyrics of their traditional songs (examples: [4][5][6][7]). Note that those spellings are not necessarily phonologically strict, and not linked to the spellings for convenience which prepared by researchers. Modern Ryukyuan languages don't have any official orthographies defined.--荒巻モロゾフ (talk) 14:54, 11 February 2019 (UTC)

2018 ISO code changesEdit

The changes the ISO made to codes in 2018 were posted. They:

  • split and retired ais Nataoran Amis, merging the Amis part into ami "Amis" and creating a new code szy for Sakizaya (commentary).
  • merged asd Asas into snz Sinsauru (Sensauru), and renamed it Kou (alternative spelling: Kow), on solid grounds.
      Done. - -sche (discuss) 18:25, 9 February 2019 (UTC)
  • split dud Hun-Saare into uth ut-Hun and uss us-Saare.
  • retired lba Lui as spurious, citing Wikipedia, which cites that ISO document. But ISO cites other sources too, so it's not just citogenesis.
  • merged llo Klor / Khlor into ngt Kriang, saying: "In January, 2018 I happened to be sitting next to a man from Sekong province. I asked him about Klor and to my shock and his, he reported that he himself was Klor. He confirmed that it is pronounced [klɔːr] with no aspiration and that the langauge is spoken only in Ko' [kɔʔ] village. He reported that Klor is completely mutually intellibile[sic] with Kriang and that he considers the Klor to be Kriang. We counted to ten together and it was indeed the same as Kriang. This leads me to propose that Khlor [llo] be retired and Klor (note spelling difference) be added as a dialect of Kriang [ngt]."
  • merged myd Maramba into aog Angoram.
  • retired myi, which we already retired.
  • merged nns Ningye into nbr, which they renamed Numana (from Numana-Nunku-Gbantu-Numbu).

They also added codes: xsj Subi (a lect previously merged with Shubi; we merged Shubi into Rwanda-Rundi, but Subi is said to not be closely related and only often associated by confusion), lvi Lavi (which we current encode as mkh-law), lsv Sivia Sign Language, cey Ekai Chin (WP prefers just "Ekai"); the Australian languages wkr Keerray-Woorroong, tjj Tjungundji, and tjp Tjupany, about which see WP; pnd Mpinda, lsn Tibetan Sign Language, and tvx Taivoan (Taivuan). If anyone has a reason we should not follow suit on these code deprecations and creations, please speak up. (They also made a number of name changes we could look into.) - -sche (discuss) 06:43, 6 February 2019 (UTC)

Thanks, @-sche. --{{victar|talk}} 07:03, 6 February 2019 (UTC)

Tocharian BEdit

The entries in Category:Tocharian B lemmas are all written in the Latin script. Is this correct? SemperBlotto (talk) 07:31, 8 February 2019 (UTC)

They were written in the w:Tocharian alphabet (also see https://www.unicode.org/L2/L2015/15236-tocharian.pdf]]) and in the w:Manichaean alphabet. —Stephen (Talk) 08:45, 8 February 2019 (UTC)
We cannot write them differently until Unicode encodes the Tocharian script, we have a similar situation with Sogdian and Old Uyghur. Crom daba (talk) 20:07, 8 February 2019 (UTC)
@Crom daba, SemperBlotto: Sogdian was already added to Unicode 11.0, as was Manichaean (back in 7.0), so technically you could be creating Tocharian B entries in Manichaean when attested. But yes, alas, Tocharian has yet to be encoded. --{{victar|talk}} 21:08, 9 February 2019 (UTC)

Use of the term "West Frisian"Edit

On Wiktionary, the Frisian language as spoken in the Netherlands is always referred to "West Frisian". This is its usual name outside the Netherlands, contrasting with East Frisian and North Frisian. However, in Dutch "West-Fries" usually refers to a dialect spoken in the province of North Holland. This variety is Dutch, not Frisian, but is called "West-Fries" because it is spoken in the historical region of West-Friesland.
So far, so good. We all know what is meant by it, and we usually don't add word from Dutch dialects. However, there is an actual variety of Frisian that is extinct today but was spoken in (pockets of) West-Friesland until about 1700. Not much has survived of this language, but it would love to add those words that have. But to do so, we would have to settle on names. I can't call these entries "West Frisian", since that name is already in use for the living language that us Dutchpeople call "Westerlauwers Frisian". My proposal would be to adopt Dutch terminology: rename all existing West Frisian lemmas to "Westerlauwers Frisian" and reserve the name "West Frisian" for this language. I admit it would be cumbersome, but at least it would be unambiguous. What do you think? Steinbach (talk) 12:46, 11 February 2019 (UTC)

Another option would be use a geographical description like "Noord-Holland" or the historical term "Noorderkwartier". ←₰-→ Lingo Bingo Dingo (talk) 15:52, 11 February 2019 (UTC)
Do linguists treat this West-Frisia Frisian as a separate language from Westerlauwers Frisian? — Ungoliant (falai) 17:12, 11 February 2019 (UTC)
That's a difficult question. As you know, linguists tend to stay away from the arbitrary distinction between "language" and "dialect". The two varieties were clearly distinct, however. A 17th century Frisian poem could with certainty be identified as being from North Holland, not Friesland, by its text alone. At least one defining feature that sets Westerlauwers Frisian apart from East Frisian, the words sa and ta rather than so and to, did not occur in West Frisia Frisian. Some innovations relative to Old Frisian are shared, some aren't. In combination with the geographical and political separation, a solid case can be made to treat the two varieties as separate languages. Steinbach (talk) 20:24, 11 February 2019 (UTC)
@Steinbach Could you give a pointer to literature about this Frisian lect? The proposal is to move away from the usual terminology in English, so it would be useful to see how others deal with it. ←₰-→ Lingo Bingo Dingo (talk) 08:02, 12 February 2019 (UTC)
Give me some time. I'm not in that stage myself, I've been inspired to this proposal by an article in mainstream press. For the time being, here's a link to the sole surviving longer text in this dialect. It can give you an impression of how it differs from Westerlauwers Frisian. Steinbach (talk) 08:19, 12 February 2019 (UTC)
What I understand from the article is that the language of this sole surviving text of 331 words (160 different words) was known to be a quaint variant of Westlauwers Frisian, and has about a year ago been identified as being specifically a North-Holland variant (not quite surprising, seeing as it is one of the song texts in a collection titled d'Amsteldamsche Minne-zuchjens). Interesting, but hardly a reason to upset the Frisian language classification. And redefining “West Frisian” to mean neither West Frisian Dutch nor the West Frisian language as the term is commonly understood by linguists, but to reserve it for this variant, will be utterly confusing. Just like guv has a label {{lb|en|British}}, we can use some label like {{lb|fy|North Holland variant}} for words found only in this variant.  --Lambiam 13:25, 12 February 2019 (UTC)
This article (in an issue of De Vrije Fries from 1906) discusses possible printing errors in the text – apparently not considering the possibility that the language may be a variant of West Frisian. (BTW, pejeer may be an attempt to render pear.)  --Lambiam 13:42, 12 February 2019 (UTC)
I agree, setting apart a new language code goes too far for this. Any words can be included as obsolete West Frisian. ←₰-→ Lingo Bingo Dingo (talk) 08:09, 13 February 2019 (UTC)
It is definitely more than obsolete West Frisian. It differed greatly from seventeenth-century Westerlauwers Frisian, too. The work of Gysbert Japiks already looks rather similar to modern day Frisian, something that can't be said of this text. Steinbach (talk) 09:00, 13 February 2019 (UTC)
I wouldn't claim it was merely obsolete Westlauwers, just that it should be included under West Frisian and labelled as obsolete, in addition to a geographical tag. So something like {{lb|fy|North Holland|obsolete}} should in my view do the trick. I am also curious about the extent that the similarity of Japicx's Middle Frisian to modern-day West Frisian is due to his orthography influencing later orthography. ←₰-→ Lingo Bingo Dingo (talk) 13:44, 13 February 2019 (UTC)
Technical discussions aside, that's a hilarious poem. Soo molle bolle Femke! — Mnemosientje (t · c) 14:59, 12 February 2019 (UTC)
I suppose that is one way of dating it to the seventeenth century, beside the title page and the spelling. Maybe nice for use on Valentine's Day? ←₰-→ Lingo Bingo Dingo (talk) 08:09, 13 February 2019 (UTC)

English-based creoles of SurinameEdit

The Surinamese creole languages Sranan Tongo, Aukan and Saramaccan currently do not have any ancestors recognised by Wiktionary's classification. For Sranan and Aukan it is uncontroversial that these are English-based creoles (some consider Saramaccan a Portuguese-based creole instead); they in many ways resemble Guayanese Creole (which also has no ancestor languages in the categorisation) and Jamaican Creole (which is recognised as a descendant of English). Several scholars posit also posit a common creole ancestor to those variety. Implementing that latter view might go too far now, but it seems a good idea to at least enable Sranan Tongo and Aukan to have terms as inherited from English. ←₰-→ Lingo Bingo Dingo (talk) 15:47, 11 February 2019 (UTC)

(Note the prior discussion at Talk:dofu.)
Adding English as an ancestor of Sranan at least seems sensible per our earlier discussion, the other languages I can't well judge. But if Aukan is so obviously English-based, then I personally don't see the harm. — Mnemosientje (t · c) 15:55, 11 February 2019 (UTC)
My feeling is that it is misleading to say that a word like wroko was inherited from English. It suggests that some branch of the English language tree evolved in some way so as to morph into Sranan. But these words were incorporated into the creole language as it was crystallizing out of a pidgin that was not a language in the usual sense of that term but an unstable mishmosh varying from plantation to plantation. For English to be an ancestor, there should be intermediate versions of the language that are closer to English than modern Sranan is, while also closer to modern Sranan than English is.  --Lambiam 21:46, 11 February 2019 (UTC)
I too disagree that we should be marking lexifiers as ancestors of creoles; {{der}} is the best template to use. See this discussion, among others. —Μετάknowledgediscuss/deeds 21:59, 11 February 2019 (UTC)

Another matter is whether the Surinamese creoles should be linked in a similar way. Aukan is generally considered to descend from (very) Early Sranan or Proto-Sranan, and the same is often considered for Saramaccan. ←₰-→ Lingo Bingo Dingo (talk) 15:50, 11 February 2019 (UTC)

The book Pidgins and Creoles: An Introduction contains a chapter on Sranan in which the authors write: “As far as the shared histories of [the Atlantic group of English-based creole languages] are concerned, we may point to such aspects as the common supplier of the vast majority of the imported slaves — the Dutch, and the history of colonization, whereby a new colony was founded by groups from one or more existing colonies. Surinam, for instance, was first settled from Barbados, St. Kitts, Nevis and Montserrat. In this way [Sranan] is linked to the other Caribbean English-based creoles. [...] Within this group Sranan belongs to a clearly defined Surinam subgroup. This subgroup can be demonstrated in historical linguistic terms (with languages Sranan, Ndjuka-Aluku-Paramaccan-Kwinti, Saramaccan-Matawai). Outside this subgroup Sranan has a particular relationship with Krio, and other similar languages on the West African coast, as well as with the Maroon Spirit Language of Jamaica (Bilby 1983).” (Ndjuka is another name for Aukan.) Unfortunately, Google does not allow me to view most of the section entitled “History and current status”.  --Lambiam 12:43, 12 February 2019 (UTC)
You can find it on Library Genesis. Not sure if we're allowed to link it here. Some other comments of interest:

So we cannot say that Sranan (the major English-lexifier creole of Surinam; see chapter 18) derives in any gradual fashion from Early Modern English – its most obvious immediate historical precursor. [...] we are dealing with two completely different forms of speech. There is no conceivable way that Early Modern English could have developed into the very different Sranan in the available 70 or so years. [...] So creole languages are different from ordinary languages in that we can say that they came into existence at some point in time. [...] we have to reckon with a break in the natural development of the language [...] The parents of the first speakers of Sranan were not English speakers at all, but speakers of various African languages, and what is more important, they did not grow up in an environment where English was the norm.

From the section you couldn't see on the Google preview, here are some comments of interest:

The origins of Sranan (see also chapters 2 and 10) must be sought in the seventeenth century. Surinam started its post-Amerindian history as an English colony in 1651. The period of English occupation only lasted officially until 1667. English influence can be considered to have become negligible by 1680. So the period in which the direct linguistic influence of English can be assumed to have been operative was less than thirty years. [...] How precisely English functioned in the development of Sranan is highly controversial. In for instance the bioprogram hypothesis of Bickerton (see chapter 11), English lexical items and language universals combined to produce Sranan. In the substrate approach the African language(s) of the early slaves had a decisive influence (chapter 9).

From this, I have to agree with Metaknowledge that perhaps a simple {{der}} may be best, at least in the case of Sranan. — Mnemosientje (t · c) 14:50, 12 February 2019 (UTC)
Thanks. I didn’t know about LG. Not having access to a research library, it looks like a useful addition to my research tools.  --Lambiam 00:00, 13 February 2019 (UTC)
Yes, in that case {{der}} also looks like the best choice for Aukan terms deriving from seventeenth century English. ←₰-→ Lingo Bingo Dingo (talk) 07:57, 13 February 2019 (UTC)

Blocker roleEdit

What are peoples thoughts on creating a blocker role so that non-sysops can issue short-term blocks to be reviewed later by an admin? --{{victar|talk}} 21:59, 12 February 2019 (UTC)

What would be the point of "reviewing" them if they were short? Most blocks are short anyway. What action would a "reviewer" take? What happens if they aren't reviewed? Why would an admin want to review blocks that they could have done themselves?   Oppose. DTLHS (talk) 22:30, 12 February 2019 (UTC)
@DTLHS: The point would be to have more users to catch vandals in the act. If we don't need more people to do that, why are we even having admin votes for that role? You could think of them more like blocker-bots, and the second they're doing a poor job of it, you decommission them and take away the role. --{{victar|talk}} 02:16, 13 February 2019 (UTC)
I have no problem with giving more people vandalism fighting abilities. My main issue is with the "reviewing" that probably wouldn't happen. DTLHS (talk) 02:44, 13 February 2019 (UTC)
That's fair. At the very least, the blocker role users could issue their block and request an perma block when needed. --{{victar|talk}} 02:55, 13 February 2019 (UTC)
Also oppose, if there was actually a time that no admin was active there are emergency options (stewards). In my mind this is the functions of admins which is most "powerful", so anyone I would want having this ability I would be happy to have as an admin. - TheDaveRoss 00:12, 13 February 2019 (UTC)
@TheDaveRoss: You yourself were questing the quality of admins be have these days. In this way, we can have people stopping vandals in their tracks, while still holding admins to a higher standard. --{{victar|talk}} 02:16, 13 February 2019 (UTC)
I am saying I would hold the blockers to the same standard that I hold admins. - TheDaveRoss 02:53, 13 February 2019 (UTC)
Which, IMO, ensures sub-quality admins and not enough vandal blockers. --{{victar|talk}} 03:02, 13 February 2019 (UTC)
Do you feel that vandals frequently go unblocked for long periods of time? In my experience blocks happen within minutes if not seconds of vandalism taking place. If there are times that vandals are able to persist for longer periods I would be interested to hear about that, since it would be happening while I am unaware. My opinion remains unchanged about the need for a distinct role, I would not vote to approve anyone as a blocker if I would not also vote to approve them as an admin. My bar to become an admin is fairly low, I would guess I have voted yes in well above 90% of admin votes in which I have voted at all. - TheDaveRoss 15:19, 13 February 2019 (UTC)
@TheDaveRoss: I don't really feel anyway one way about it, but you see the conflict in the statements "judgement [...] has been a problem with existing admins of late" and "my bar to become an admin is fairly low", right, haha? --{{victar|talk}} 04:48, 14 February 2019 (UTC)
@Victar: Just because the bar is low doesn't mean everyone clears it. I think it is fairly easy to be civil, e.g., which is one of my criteria for voting yes on an admin vote, and yet there are some current admins who place very little value (seemingly) on civility. Most admins (and other editors, and proposed admins) easily demonstrate the level of civility that I hope for in an admin. I think I have a similar viewpoint about judgment and the other criteria which I value, most people easily surpass my expectations, some few fall short. I don't see a conflict with having a low bar and not always determining that everyone clears it. - TheDaveRoss 13:38, 14 February 2019 (UTC)
My feeling is that the need for this arises from not having admins in a certain time zone. "No admins are awake, but 4chan is attacking us, and creating a zillion stupid pages! Luckily, a blocker is here!" This raises the issues that (i) it really just means you don't have enough admins, or not a wide enough geographical spread of admins, and (ii) even if you had a special "blocker" role it would be susceptible to the exact same issue that maybe all blockers are asleep too. Equinox 02:51, 13 February 2019 (UTC)
Anyway oppose because it's easy to be whitelisted (by creating 100 entries in some under-loved language) but a lot harder to get admin status, and the ability to stop people from editing is a very significant and powerful one. (Mostly unrelated thought: what if admin responsibilities included dealing with x-percent of untouched anon edits in your language? Sometimes I find stuff I did two months ago not logged in that still hasn't been reviewed.) Equinox 05:53, 13 February 2019 (UTC)

"Eskimos have 50 words for snow"Edit

https://popula.com/2019/02/11/white-words/Justin (koavf)TCM 07:39, 13 February 2019 (UTC)

David Robson (January 14, 2013), “There really are 50 Eskimo words for ‘snow’”, in The Washington Post[8], The Washington Post. The article originally appeared in The New Scientist of 18 December 2012 under the title “Are there really 50 Eskimo words for snow?”. Instead of 50 you also find other numbers like 40, 52 and even 100, so “Eskimos have X words for snow” is a snowclone.  --Lambiam 09:25, 13 February 2019 (UTC)
Eskimos have 50 snowclones. —Justin (koavf)TCM 01:47, 14 February 2019 (UTC)

Ideophones as ur-languageEdit

https://aeon.co/essays/in-the-beginning-was-the-word-and-the-word-was-embodiedJustin (koavf)TCM 07:57, 13 February 2019 (UTC)

On crafting scientific language in ZuluEdit

https://www.theopennotebook.com/2019/02/12/decolonizing-science-writing-in-south-africa/Justin (koavf)TCM 01:48, 14 February 2019 (UTC)

Text's here; both sources aren't durably archived, thus no sources for WT (would be nicer if the article appeared in print). --Brown*Toad (talk) 07:32, 17 February 2019 (UTC)

Layout of "of" qualifiersEdit

I see "of" qualifiers written in two different ways, as these definitions, respectively from wet and fast, illustrate:

  1. Of weather or a time period: rainy.
  2. (of photographic film) More sensitive to light than average.

Is the second style preferred? Should the first style generally be converted to the second style when encountered? Mihia (talk) 20:36, 14 February 2019 (UTC)

I think the second is much more common. It also makes some sense to mark such qualification for searches. {{lb}} with of within the label serves as a fairly natural marker. DCDuring (talk) 21:54, 14 February 2019 (UTC)

Proto-Bantu VerbsEdit

Currently, all Proto-Bantu verb entries have the default suffix -a. However, I think it would be better if this suffix were removed from the lemma forms of PB verbs, as it's not part of the verb root, and not all Bantu languages make use of this suffix. Smashhoof2 (talk) 06:24, 15 February 2019 (UTC)

We've taken the route of trying to reconstruct what PB actually looked like (so putting the final vowel on verbs, putting noun class prefixes on nouns), which is contrary to the BLR style, which just shows lexical roots. I don't know what's better, but reconstructing words rather than roots is more in keeping with Wiktionary being a dictionary and attempting to treat languages similarly when possible. —Μετάknowledgediscuss/deeds 19:23, 16 February 2019 (UTC)
That's fair. Smashhoof (talk) 21:32, 16 February 2019 (UTC)

Stub entries and minimum required contentEdit

My talk page contains post to the effect that there exists some additional requirements for minimum content of entries that I am unware of. Such requirements can be created if desired, so let's have an amicable conversation about it.

My understanding of minimum content of an entry is as follows. The entry needs:

  • 1) Language header
  • 2) Part of speech header
  • 3) Somewhat controversially, a definition, translation or, for non-lemma entries, the required content for a definition line. I say controversially since some people thought that it would be a good idea to create many definitionless entries, but there was no consensus either way, from what I remember. Furthermore, a dump analysis can show that nearly all English Wiktionary lemma entries have a definition line with a definition or a translation.

The above seems consistent with WT:EL#A very simple example except that the example speaks of references, which are demonstrably lacking in an overwhelming majority of en wikt entries.

I am not aware of any further requirements on minimum entry content. In particular, as far as I know, there is no requirement on provision of pronunciation and inflection. During my time of contribution of Czech entries to the English Wiktionary, I mostly avoided entering pronunciation and inflection, focusing rather on semantics.

What do you think? Should there be increased requirements on minimum content beyond the three items above? Should such requirements be specified on a per-language basis? If so, should the decision be delegated to a small group of editors of a particular language, say 3 editors if there are no more? Thus, should the English Wiktionary be split into small oligarchies rather than there being One English Wiktionary?

--Dan Polansky (talk) 19:20, 16 February 2019 (UTC)

We don't need a legalistic framework or "small oligarchies". Dan, nobody I can think of wants to institute strict rules about what entries need to have at minimum. We were just asking you to put in a slight amount of effort, like putting in the gender of a noun when the very dictionary you're referencing gives the gender, or even just using a template like {{be-noun}} rather than {{head|be|noun}}. That's it. —Μετάknowledgediscuss/deeds 19:21, 16 February 2019 (UTC)
On my talk page, it says a Russian entries need to 1) include the accent, [...], 3) include the declension or conjugation, and 4) include the pronunciation. I ask the editors if they would be so kind and indicate whether they want to establish minimum content above my three listed items. --Dan Polansky (talk) 19:23, 16 February 2019 (UTC)
One of WF's tricks is to create rfdef entries with nothing but a lazy quotation from the sports news, sometimes SoP. But the worst ones I can remember were the dozens of [name_of_country] Sign Language entries with no definition and often not meeting CFI. I don't think it's been a big enough problem to need policy yet (in English anyway). Equinox 19:31, 16 February 2019 (UTC)
By the way, I was one of those vehemently opposing volume creation of definitionless entries. Semantics is the life and soul of a dictionary, by my lights. --Dan Polansky (talk) 19:39, 16 February 2019 (UTC)
I think this whole thing has gotten seriously out of hand. Meta maybe came across as a bit officious, but it was a reasonable request- as a request. Dan interpreted it as more of an order, and got defensive- after which it escalated. There are legitimate issues about burdening editors in specific languages with fixing up terms that they wouldn't have created themselves and hijacking their priorities- but that's a matter of courtesy, and far too complex to reduce to rules. We've all created entries that needed work by others, and the dictionary would be a fraction of what it is now without that. We need empathy and consideration, not arguments and battles- it's too easy to drive away good editors over such things. Chuck Entz (talk) 22:13, 16 February 2019 (UTC)
You see, in User_talk:Dan Polansky/2018#κλινικός, I received the following order from Metaknowledge: "do not create entries in languages you do not know and have not studied". I think interpreting communications from the same contributor on the same subject as orders in disguise is pretty reasonable. But this thread is about policy, not about me in particular, and is merely triggered by certain posts on my talk page. The key question is, shall small subcommunities be able to increase the requirements for minimum entries per language, and therefore, should the English Wiktionary be understood as a collection of oligarchies, small ruling groups? --Dan Polansky (talk) 08:04, 17 February 2019 (UTC)
The key is cooperation. You shouldn't say that you refuse to do it and say the entries are fine as they are if they are not for editors of that language. You can simply add {{rfinfl}} and {{attention}} The request for higher standard of entries based on existing entries is legitimate, even if it gets harder to keep the same minimum level of quality of entries is already high. You can make simpler stubs for languages with low contents but you can still mark them with {{attention}} so that other editors can at least find entries that require attention. As for Russian entries, it takes more effort, knowledge and time but it's not that Russian inflections and genders are unavailable. It doesn't belong to poorly documented languages. But, since it can also be error-prone, editors with less knowledge of a language shouldn't be completely discouraged from editing but are asked to mark them incomplete. Everybody does it. I did too for languages I wasn't confident in and when I knew what was required, for example, languages with complex scripts. It's strange that you vehemently opposed definitionless entries with {{rfdef}} for otherwise great entries for high frequency words. It often takes much less effort to add a definition than reformat headers and add inflections. --Anatoli T. (обсудить/вклад) 09:07, 17 February 2019 (UTC)
The subject of this thread is minimum content, not marking. To address your subject (out of scope of this thread) of marking entries with {{attention}}: no such marking is required since if there is consensus that entries with {{head|ru|noun}} need to be in a convenient category, {{head}} can be instructed to place such entries into a maintenance category automatically. Czech entries without inflection are not marked with {{attention}} and as far as I know, such marking is not a common practice for most languages, and I can make a dump analysis to check the actual facts; "everybody does it" is easily verified to be false. Here again, the general question that I saw no clear answer to so far is, should small groups make up their own rules for other editors to follow? --Dan Polansky (talk) 09:19, 17 February 2019 (UTC)

You said “I want my undivided attention to be channeled toward making sure that the semantic information I am entering is correct” but I deny that using the appropriate templates excludes it. The templates all have the same names and you can even care for it it after you added the glosses. And of course for Russian the stress is one of the main reasons why one consults the dictionary, so unless one has no information about the pattern because it is a kind of archaic word nowhere included, one can already give the complete information in the headword and in the table, which latter is important because Benwing’s bot creates the non-lemma forms and users like to look into the tables.
About adding pronunciations: For English it is not easily predictable, so editors don’t add it because they don’t know it (English is the only example for “irregular” in orthographic depth). In most other languages the pronunciation sections have indeed the character of clutter we only add because we have unlimited room, but the stress mark in the head-word or declension-table is what you need to know the pronunciation already, unless such a case like со́лнце (sólnce) which you could guess wrongly either if you know the stress is on the beginning but have not heard the word. For languages like Arabic and Aramaic where multiple pronunciations can be on the same page I am for avoiding adding IPA pronunciations because it only makes the layout complicated without adding additional information (because as I said the full vocalization or transcription gives all information already) and indeed I create the pages faster and with better overview if I do not add the pronunciation, so I think they divert me and the reader. For Russian, perhaps the bot can add IPA pronunciations since the со́лнце (sólnce) cases should be all included already.
Just ask yourself what the reader would like to know from Wiktionary: It is the stress pattern and the gender for the languages that have such, and the meaning, and even if you have the pattern you have the gender already most likely in Slavic languages and it is only one letter, all if only you know it, so the demands are really low. There being links to other dictionaries is a bad argument to omit stresses and patterns, since copying over the stresses and patterns is what you should do, and for the languages in question many web searches can confirm. “Accuracy combined with verification” does not stop you to tell people what you already know. Also add surface derivations, if you have reasonable ideas of them, else others have to add it.
BTW {{be-noun}} is an incomplete wrapper of {{head}}, some times I used it I had to use {{head}}, because it does not support |m= / |f= (Wiktionary:Grease pit/2018/October § Missed masculine and feminine counterpart parameters in some headword templates). Fay Freak (talk) 13:48, 17 February 2019 (UTC)

Dan, my main concern is that you work *with* the main contributors in a given language. Overall, I completely agree with what Atitarev (talkcontribs) said. This is not a matter of enforcing rules but of (a) keeping up the overall quality of Wiktionary by attempting to follow the example of existing entries, and of (b) maintaining harmonious relationships with others. In this case, if you had tried to figure out the prevailing structure and templates of a Russian entry, and found it too complex, and instead inserted {{attention}} or {{rfinfl}} or a similar request template, I'd have no problem with this. But you seem to have made no such attempt, and in general appear to show little interest in working with others or maintaining consistency. If everyone did this, the whole project would descend into chaos. Benwing2 (talk) 19:59, 17 February 2019 (UTC)
Is it your position that a Russian noun entry must contain pronunciation and inflection as a minimum, or is it not your position? I am puzzled. --Dan Polansky (talk) 20:26, 17 February 2019 (UTC)
As for what I am doing, which is out of scope of this thread per its title, I am interested in using the generic tools for setting up an entry, which is {{head}}, since I am basically little like a slow-moving Tbot working with a plethora of languages, using general human intelligence to verify semantics in applicable sources. Since I work in so many languages, I am not interested in learning any template peculiarities that various language groups may have set up. I need the minimum entries as places to attach verification artifacts and further reading goodies, which happen to be the same thing. As much must be pretty clear to anyone who saw my recent batch of contributions. I am not acting out of malice or disregard for wishes of particular groups, but my enterprise can only work economically if I can work with generics, or non-demanding templates such as {{be-noun}}, which I am now starting to use. I am absolutely not interested in pronunciation or inflection. I am no worse than Tbot, and in fact, I am better in multiple ways: Tbot checked in other Wiktionaries whereas I am checking in external sources even for entries that not a single other Wiktionary has, and I do human checking of semantics, not just checking for existence. I will run out of gumption pretty soon, I guess, and return to creating Czech entries; my best hope is that other editors will pick up the work, including new editors. --Dan Polansky (talk) 20:48, 17 February 2019 (UTC)
@Dan Polansky: I would discourage from using {{ru-noun}} and language-specific templates because this can produce incorrect results - a wrongly detected gender, animacy and a stress/inflection pattern (many things are automated) without your knowledge. It also requires a correct word stress. All we ask for is adding maintenance templates, so that appropriate editors could bring the entry to the required standard. When I said "everybody does it", I meant everybody who is asked to do it. E.g. people know that Chinese entries require traditional and simplified forms. What if you don't know? You need to ask people who know. --Anatoli T. (обсудить/вклад) 22:38, 17 February 2019 (UTC)
I would add that Tbot added a maintenance category to every entry it created, so that others would know to go back and check on it- in that respect, these current entries are inferior to Tbot's. Chuck Entz (talk) 01:35, 18 February 2019 (UTC)
@Dan Polansky Thank you for adding the {{attention}} template to трактористка. That allowed me to find it and fix it up. Benwing2 (talk) 16:30, 18 February 2019 (UTC)

Module:la-headwordEdit

There is an (AFAICT) undiscussed removal of valuable information going on leading to incomplete and incombrehensible head lines. --Hamator (talk) 11:47, 17 February 2019 (UTC)

Classical Malay?Edit

I changed the meaning of the worklang= param in {{quote-book}} etc. Formerly it took either a single language code or an arbitrary string like "French and Latin" or "Classical Malay". I changed it so it takes one or more comma-separated language codes, but doesn't allow arbitrary text. I fixed up all the resulting errors except for two, which are in -kah and -kan, which have quotations in Classical Malay, for which we don't have any language code. Could someone add this? I'm not sure if it should be an etymology-only language or a proper language in its own respect. (And what about Old Malay?) Benwing2 (talk) 19:15, 17 February 2019 (UTC)

Thank you for bringing this up. A few months ago I bought up a similar suggestion at Wiktionary:Beer parlour/2018/September#Suggested outcome. Currently, Classical Malay (14th to 18th century) and Old Malay (7th to 14th century) do not have proper language codes defined for them. However, because there is a lack of effort to digitize texts from Classical Malay (written in Jawi script) and Old Malay (written in Pallava script or Rencong script) in its original script form, I think we can wait for ISO 639 to define a proper language code for these languages.
Currently, only two Classical Malay works are available on Wikisource: Hikayat Hang Tuah and Hikayat Bayan Budiman. Modern transcriptions of Classical Malay works are often written in the Latin script, so it is slightly problematic to figure out its original orthography in the absence of an original manuscript.
By the way, Classical Malay and Old Malay is the missing link between the Proto-Malayic language and the modern Malay language. KevinUp (talk) 22:36, 18 February 2019 (UTC)
I have removed the |worklang parameter in -kah and -kan because the texts have been translated and modified to suit readers proficient in modern Malay, rather than transcribed word-for-word based on the original manuscript. KevinUp (talk) 22:36, 18 February 2019 (UTC)

Should I suppress the "(please add an English translation of this quote)" message for Scots?Edit

Many of the Scots quotations given are so close to English that they are readily understandable without any "translation". Example:

"Och, it's the lassies will be the pleased ones, coiling the blankets round them; it's Auld Kate that kens," and then she gave a screitchy hooch and began to sing in her cracked thin voice-- 'The man's no' born and he never will be, The man's no born that will daunton me.'

Not surprisingly no translation is given, but if this is tagged with |lang=sco, you'll see "(please add an English translation of this quote)". Given the predominance of this situation, should I special-case Scots to remove this message? Benwing2 (talk) 02:51, 19 February 2019 (UTC)

No. If Scots isn't to be translated it shouldn't be a separate language. DTLHS (talk) 02:52, 19 February 2019 (UTC)
Agreed. I'm not 100% on what this means. Per Wiktionary:About Scots, we consider it a separate language instead of a dialect of English. If we consider it English, it's a different story. —Justin (koavf)TCM 03:59, 19 February 2019 (UTC)

Constrduction namespaceEdit

Has it been suggested that constructed languages, like Esperanto, be moved to a "Construction" namespace, ex. Construction:Esperanto/eburo? --{{victar|talk}} 11:15, 19 February 2019 (UTC)

Seems reasonable. We already have a "Reconstruction" namespace. SemperBlotto (talk) 11:24, 19 February 2019 (UTC)
@Victar Is the title of this section meant to be something else? - TheDaveRoss 13:27, 19 February 2019 (UTC)
This is a great idea. Fay Freak (talk) 14:07, 19 February 2019 (UTC)
Ooh... it's a game! Reconstruction>Construction>Contraction>Distraction>Destruction... it almost worked...
Seriously, though, I'm not impressed by the name: Reconstruction: houses reconstructions, but Construction: would house terms in constructioned languages. Chuck Entz (talk) 14:42, 19 February 2019 (UTC)
And I would have gotten away with it too, if it wasn't for you meddling kids! --{{victar|talk}} 19:30, 19 February 2019 (UTC)
I think what Chuck is saying is he'd prefer "Constructed:" instead of "Construction:". Personally I'm on the fence as to whether this is needed at all, under any name. Benwing2 (talk) 15:39, 19 February 2019 (UTC)
@Chuck Entz, Benwing2: I'm not married to the name; I just thought it in keeping with the "Reconstruction:" namespace. I'm fine with "Constructed:" or simply "Construct:", but it could be "Conlang:" or "Artificial:" for all I care. I just think if we keep reconstructions off the main namespace, why should conlangs be shoehorned into natural (if you will, non-constructed if not) languages? I think it's confusing to the reader, as they might mistake Esperanto, for example, for some inherited descendent of Latin, as we have no indicator that it's a constructed languages, like we do reconstructions. I also find that they clutter up entries and every few months there seems to be some vote on allowing another (forgive the hyperbole, but you get my point). --{{victar|talk}} 19:30, 19 February 2019 (UTC)
@Victar I see your point. I don't find it especially confusing but I imagine it might be different for users who haven't heard of Esperanto, Interlingua, Lojban, etc. (OTOH a page like a already has a huge number of random languages on it, and the average user isn't likely to have heard of Kalasha, Mandinka, Lower Sorbian, or Mezquital Otomi, to name a few on that page, and won't get any more confused by the additional presence of Esperanto, Interlingua, Ido, Novial, etc. on the same page.) Benwing2 (talk) 19:56, 19 February 2019 (UTC)
Exactly. What's stopping layman users from thinking Esperanto and Kalasha are categorical equivalents? --{{victar|talk}} 20:09, 19 February 2019 (UTC)
So instead they're supposed to think Na'vi and Esperanto are equivalents? There are clearly more nuanced distinctions to be drawn than "constructed" vs "not constructed". DTLHS (talk) 21:54, 19 February 2019 (UTC)
Yes, I would say more so than Esperanto and Kalasha. --{{victar|talk}} 22:53, 19 February 2019 (UTC)
  • Oppose. Esperanto has become too big to be cordoned off, and unlike nearly every other constructed language, people are going to be looking for it where they look for other languages. —Μετάknowledgediscuss/deeds 20:24, 19 February 2019 (UTC)
    @Metaknowledge, and what would stop them to finding them at another namespace other than main? --{{victar|talk}} 21:18, 19 February 2019 (UTC)
    Let's see... they won't come up in search, they won't be in translation tables... you'd have to be looking for them to find them, which is good for Novial, but bad for Esperanto. —Μετάknowledgediscuss/deeds 21:24, 19 February 2019 (UTC)
    And by "won't come up in search" you mean in the search dropdown because reconstructions certainly show up in search results. There maybe be a technical solution for that, ditto for translation tables, though I'm less familiar with the problem there. --{{victar|talk}} 21:32, 19 February 2019 (UTC)
  • Oppose. I am opposed to deciding we don't want to include languages, and then including them anyway in a roundabout way. Conlangs that aren't in mainspace shouldn't be included anywhere at all, not in any namespace. —Rua (mew) 20:30, 19 February 2019 (UTC)
    @CodeCat, I don't think anyone is suggesting moving poorly attested conlangs like Lojban and Novial to this namespace, as those are being relegated to Appendix:. I'm cliefly referring to Esperanto and Interlingua. --{{victar|talk}} 21:16, 19 February 2019 (UTC)
  • Oppose. If a constructed language is so unused it should be banished to an appendix or deleted, do that. But e.g. Esperanto is more widely used than at least several hundred of the natural languages we include, and even has some native speakers; I don't see a reason to segregate it into a separate namespace away from e.g. Mbariman-Gudhinma or Berbice Creole Dutch just because we can identify who coined most of Esperanto's words and not the other two languages'. - -sche (discuss) 23:29, 19 February 2019 (UTC)

Constellation name definitionsEdit

Considering that the IAU has recognized 88 constellations, should the constellation names be defined as translingual, and should the English definitions be moved there? -Mike (talk) 23:39, 19 February 2019 (UTC)

Scots again, and Middle EnglishEdit

A number of entries have quotations from Template:RQ:Dictionary of the Scottish Language used to illustrate English terms. An example is forspeak, for which definition #1 says "(transitive, dialectal, Northern England and Scotland) To injure or cause bad luck through immoderate praise or flattery; to affect with the curse of an evil tongue, which brings ill luck upon all objects of its praise." Should we allow this? If so, what language should I use to tag the Scots portions of the quoted text? en (English) or sco (Scots)? If not, what should happen to these quotes? (Move to a Scots L2 section? But what about the "Northern England" label?) Note that on the same page is also a Scots entry for forspeak, defined as "To bewitch or cast a spell over, especially using flattery or undue praise; to seduce." Examples like this make me think that the entire decision to include Scots as a separate language may have been wrong, because (a) most terms (like this one) that exist in Scots and don't exist in Standard English also exist in Northern England dialects; (b) in general there's no way to make a clear distinction between Scots and nearby dialects of English. Note that if Scots had a standard literary form the situation would be different, because then we could define the nucleus of Scots as consisting of that literary form.

A related issue: A number of English entries have illustrative quotations from Middle English. An example is ashame, which has a quote from Wycliffe's Bible dated to 1390, complete with translation: "Ashame thou, Sidon, seith the se, the strengthe of the se, seiende, I trauailide not with child, and bar not, and nurshede not out ȝung childer, ne to ful waxing broȝte forth maidenes." (translated as "Be ashamed, Sidon, says the sea, the strength of the sea, saying, “I did not travail with child [give birth], and did not nurse boys, nor to full waxing bring forth maidens.") Should we allow this? If not, what should happen to these quotes? (Move to a Middle English L2 section?)

Benwing2 (talk) 23:42, 19 February 2019 (UTC)

Regarding Middle English quotations: my inclination is to move them to Middle English sections, but some editors have argued they are tolerable in English sections to demonstrate age/continuity of use. (They don't count towards attesting the term, obviously, but neither do e.g. quotations from websites, which are nonetheless infrequently included alongside ATTEST-satisfying citations if they are particularly good illustrations of a term.) - -sche (discuss) 23:54, 19 February 2019 (UTC)
Regarding Scots quotations: right now, they should be moved to Scots entries. Since both Scots and English are WDLs, merging them might need a vote, or at least strong consensus support. But it would certainly simplify distinguishing Scots from Scottish English at RFV if we, erm, didn't distinguish them. And we already include several rather divergent dialects (e.g. Geordie) under English, so I don't expect the issues of e.g. different inflected forms and the like to be much harder to handle than for those dialects. And other (monolingual) English dictionaries tend to include Scots as English. - -sche (discuss) 01:43, 20 February 2019 (UTC)
I considered it odd that Scots is separated from English on Wiktionary, but I've held back from raising the topic myself (it's potentially an emotive subject!). I'm a Geordie and have lived in Scotland, and I find that most of the Scots and Geordie terms are the same (supporting Benwing2's "(a)"). There's work to do in adding Northern English terms to Wiktionary; it's something I've been avoiding so far. Having had a look through, I'd come to the conclusion that the bulk of the job would be to take Scots entries and duplicate them as English (Geordie) entries. As a simple example, User:Stelio/Tyneside Songs has a bunch of orange links (if you've turned on that gadget) most of which are Scots terms, and the songs themselves could easily be misidentified as Scots to someone unfamiliar with their context. -Stelio (talk) 11:00, 20 February 2019 (UTC)
As someone who has spent a significant time around drunken Geordies, my vote is for the Geordie lect being considered its own language. XD --{{victar|talk}} 17:46, 20 February 2019 (UTC)
Mebbees like, but how man, dinna fash yersel'. That's wark, reet? ;-) -Stelio (talk) 14:25, 21 February 2019 (UTC)
Why aye man, canny wark! --{{victar|talk}} 14:53, 21 February 2019 (UTC)

Talk to us about talkingEdit

Trizek (WMF) 15:01, 21 February 2019 (UTC)