Open main menu

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019


Contents

March 2019

Mozilla releases 1,400 hours of voice recordingsEdit

https://venturebeat.com/2019/02/28/mozilla-updates-common-voice-dataset-with-1400-hours-of-speech-across-19-languages/Justin (koavf)TCM 02:44, 1 March 2019 (UTC)

It's whole sentences, rather than isolated words, so I think it's not especially useful for us at this point. —Μετάknowledgediscuss/deeds 04:35, 1 March 2019 (UTC)
Perhaps we could use them as usexes that have audio, if the license is compatible. —Suzukaze-c 03:05, 4 March 2019 (UTC)
@Suzukaze-c: It's CC-0. —Justin (koavf)TCM 03:27, 4 March 2019 (UTC)
Just had a look at the Italian portion of the dataset and it's definitely good usex material, lots of natural-sounding language. The recording quality varies, some have audible background noise. – Jberkel 22:36, 4 March 2019 (UTC)

Limit the table of contents to language namesEdit

In the table of contents I don't think there's a lot of use in listing subsections beyond the different language entries. In the vast majority of entries you can see all the different subsections within the space of a screen anyway. Personally, I have never ever tried to go to a specific subsection of an entry through the table of contents, whereas I have spent a surprising amount of time scrolling through long and messy tables of contents trying to find the language I'm looking for. The exception would be articles not in the main namespace. ─ ReconditeRodent « talk · contribs » 00:11, 3 March 2019 (UTC)

I tend to agree, but is there an easy way to do that? DTLHS (talk) 00:12, 3 March 2019 (UTC)
I find it useful to be able to click to the etymology when there are a number of different ones. When there are 5 or 6 different homographs, I find it easier to navigate using the ToC. Andrew Sheedy (talk) 00:37, 3 March 2019 (UTC)
Fair enough, though I'm assuming this is for when you're already familiar with an entry(?) (since "Etymology #" isn't very descriptive otherwise.) ─ ReconditeRodent « talk · contribs » 01:31, 3 March 2019 (UTC)
The following CSS should hide all but the top-level headings in the ToC in the main namespace: .ns-0 .toclevel-1 ul { display: none; }. Add to your common.css page, or try it out by entering mw.util.addCSS('.ns-0 .toclevel-1 ul { display: none; }') in your browser's JavaScript console. — Eru·tuon 00:47, 3 March 2019 (UTC)
I'm with Andrew. Has RR considered using the right hand side placement of the table of contents, achieved by a gadget? I also wonder whether a gadget could accomplish selective repression of the offending parts of the ToC. DCDuring (talk) 00:49, 3 March 2019 (UTC)
@Erutuon I take it that one could specifiy .toclevel-[2,3,4,etc] with the corresponding reduced display. DCDuring (talk) 00:52, 3 March 2019 (UTC)
@DCDuring: Yep, that works too if you want to show more header levels. I imagine getting it to look consistent (for instance, to always show part-of-speech headers even when they are at different header levels) would require JavaScript, though. — Eru·tuon 01:00, 3 March 2019 (UTC)
Hey, wow, that's cool! Thanks!
Well I guess I'm happy then, though instinctively I still feel this would be a better default. Would it be impertinent to suggest a vote/!vote? ─ ReconditeRodent « talk · contribs » 01:31, 3 March 2019 (UTC)
@ReconditeRodent: It's a good idea to make sure that the vote has some chance of passing first. For my part, I am not in favor. — Eru·tuon 01:41, 3 March 2019 (UTC)
I like to see how many etymologies there are. Equinox 01:43, 3 March 2019 (UTC)
Through the table of contents the editor can see if he has sorted the headings wrongly or used unbalanced equal signs and the like, which former cannot be easily seen since level 4 and level 5 are of the same size. But I do not even peruse this advantage since I use tabbed browsing. Fay Freak (talk) 12:27, 3 March 2019 (UTC)
I prefer seeing the other headers, although I don't know if that means it should be the default for everyone as opposed to just something users like me opt-out of changing. In any event it might be useful to make the code for hiding non-language headers a gadget users could find in their Gadgets tab. - -sche (discuss) 17:45, 3 March 2019 (UTC)

This thread inspired me to make a super compact TOC CSS work again, and here it is:

/* Use simple horizontal TOC */
/* Appearance: Language names are layed out as a horizontal list and are the only items
   shown in the TOC; borders are only horizontal ones; the result is very compact
   and minimalistic. */
.ns-0 div#toc ul ul { display: none; } /* Reduce the depth of shown headings in TOC */
div#toc span.tocnumber { display: none; } /* Hide numbers in TOC */
.ns-0 .toclevel-1 ul { display: none; }
.ns-0 div#toc li { display: inline; }
.ns-0 div#toc li + li:before { content: ' · '; }
.ns-0 div.toctitle { display: none; }
.ns-0 div#toc { border-color: #DDDDFF; border-right: none; border-left: none; background-color: white; padding-top: 0px; }

In kilo-, it produces approximately this:

English · Czech · Danish · Dutch · Finnish · German · Hungarian · Italian · Latvian · Norwegian Bokmål · Norwegian Nynorsk · Polish · Portuguese · Romanian · Slovak · Slovene · Spanish · Swedish · Turkish

--Dan Polansky (talk) 18:48, 3 March 2019 (UTC)

Wow! That's lovely. It's amazing what can be done with CSS. I would use it if I didn't sometimes want to find subheaders from the ToC. — Eru·tuon 20:18, 3 March 2019 (UTC)
I might try to figure out how to make a compact layout that uses two levels of headings, but I am no CSS guru; the core ideas of the posted code were provided by someone else on en wikt. Incidentally, Wiktionary:Votes/2012-10/Enabling Tabbed Languages passed, and the super compact TOC is no worse than tabbed languages as for availability of subheaders. I used to use the compact TOC CSS before the tabbed language vote, and I am using it right now. --Dan Polansky (talk) 20:41, 3 March 2019 (UTC)
As an aside, thank you for that mw.util.addCSS hint; it is very nice for finetuning CSS. --Dan Polansky (talk) 20:47, 3 March 2019 (UTC)

For what it's worth, my personal CSS gives me this sort of experience. It's not nearly as wonderfully minimalist as Dan's CSS; it keeps all headings. —Suzukaze-c 03:02, 4 March 2019 (UTC)

@Suzukaze-c: I like that because I can still see all the headers, but it requires less scrolling. I've enabled my own slightly modified version (not yet saved on-wiki). — Eru·tuon 03:23, 5 March 2019 (UTC)
How about this? It is quite hacky though. It hides all subsections but still keeps the numbered etymology sections visible as numbers after the language. — surjection?〉 15:56, 11 March 2019 (UTC)

Edit

From Okinawan onwards I keep getting the error message: “Lua error: not enough memory”. ---> Tooironic (talk) 10:02, 3 March 2019 (UTC)

I noticed the same thing with "me". Happens for every template. ─ ReconditeRodent « talk · contribs » 13:02, 3 March 2019 (UTC)
Some of the modules used (via templates) on the page use a lot of memory, more memory than pages are alloted. This has also hit e.g. water and man in the past (and discussions can be found in the archives of this page and the Grease Pit) and led to translations being moved to subpages. Transliteration (whether generating it or just checking a manually input one) seems to be among the things which is "expensive". Ultimately, we're going to have to do fewer "expensive" things with Lua, or at least (as we did with {{t-simple}}) have a set of much simpler or even Lua-less templates for use on large pages like this; for example, pages like this could use simpler headword templates that would just have the romaji input manually as a parameter and not invoke Lua to generate or check it. - -sche (discuss) 16:47, 3 March 2019 (UTC)
(edit conflict) See CAT:E. I've cleared all the module errors except for this one, so, for the moment, it doubles as a list of pages with this problem. This is a a recurring problem with large entries: each template that calls a module uses memory for that module, and no entry is allowed to use more than 50 MB of module memory. The location where it starts getting the error isn't all that significant, since it results from the system's order of executing the modules. Generally it's not any specific item, but the total number of them that causes the problem.
The only solution is to reduce the total execution time of all the templates in the entry. This is not easy, but here are a few tips (I'm sure @Erutuon can expand on/correct this):
  1. The easiest step is adding the entry to the opt-out list at {{redlink category}}. This template is called by every linking template such as {{l}},{{m}},{{t}}, and {{t+}} so its module use is multiplied over the entire number of linking templates in the entry. That's already been done for the 6 entries in question.
  2. Get rid of any unnecessary module-using content, such as duplication.
  3. Replace linking templates with ones that use less memory. Linking templates do a lot of work behind the scenes to check things and get the information needed to produce the correct link and display the text properly. The {{t-simple}} template does only the bare minimum of this for a translation template, and may be substituted for the regular ones where the loss of functionality isn't a problem.
  4. Replace linking templates with hard-coded wikitext- plain wikitext doesn't use module memory. For links to English entries, there's no need to look up the language information for displaying the text, since English is the default language here: {{l|en|word}} is functionally equivalent to [[word#English|word]]
  5. Move the largest blocks of linking templates such as translation tables or derived-term/compound lists to a subpage or appendix and provide a link to it in the entry. Moving quotes to the citation tab is another variant. Some of the most intractable memory-hogs have required this.
Chuck Entz (talk) 17:03, 3 March 2019 (UTC)
All good tips. Another way to reduce memory usage, which I just did for do, is to replace a bunch of individual linking templates ({{l}}) with a column template (in this case, {{rel2}}). Each template invokes a module, and each invocation uses a certain amount of "startup" memory, so fewer module-based templates tend to use less memory, when they are doing roughly the same job. — Eru·tuon 20:13, 3 March 2019 (UTC)
Thanks for the suggestions, but three CJKV characters, , , has been placed in Category:Pages with module errors for a very long time. Sometimes the memory works fine, but after a few days (even though no one has edited the entry), the page runs out of memory again. And after a few days it works again. This has been going on for a long time. Any idea what is the main cause of this issue? KevinUp (talk) 13:31, 4 March 2019 (UTC)
You can preview each section and look at the parser profiling data at the bottom of the page to see how much memory is used by that section (some memory use is shared between sections, so the numbers may add up to more than 50MB). The basic issue is that those entries have lots and lots of templates which call lots and lots of modules. When you consider all the things that these modules do, it's not surprising that they're pushing the limits of what the system will allow. Chuck Entz (talk) 14:48, 4 March 2019 (UTC)
  • This seems to be a significant issue that has a real impact on users' experience of Wiktionary, at least those who want to look up simple words with multiple definitions in multiple languages. I would recommend we use language-specific soft-redirects (i.e. sub-pages) for words like , , etc. Surely this would free-up lua memory enough so all the information can be displayed for users, even if a second click is required. As someone with no IT expertise, I can only hope one of our talented editors can facilitate such a solution. ---> Tooironic (talk) 05:32, 9 March 2019 (UTC)
Update: Thanks to User:Erutuon, Lua memory in is now reduced to 44.98 MB by subsituting {{ja-r}} with {{ja-r/multi}} and {{ja-r/args}} as well as {{zh-der}} with {{zh-der/fast}}. KevinUp (talk) 09:19, 11 March 2019 (UTC)

Pseudo-X-isms by languageEdit

There's CAT:Pseudo-anglicisms by language (which incidentally should perhaps use a capital A for consistency with other such terms). There are pseudo-Latinisms like noli illegitimi carborundum, which could go in a subcat of CAT:Pseudo-Latinisms by language. There are pseudo-Gallicisms like quoi ci quoi ça and double entendre. There's also CAT:Pseudo-Italianisms by language (and some English pseudo-Italianisms discussed here). There must be others. I think these should be grouped into a category for "Pseudo-X-isms by language", similar to "Borrowed terms by language". What should they be called? "Pseudo-borrowings", "pseudo-loans"? - -sche (discuss) 17:20, 3 March 2019 (UTC)

I thought of "pseudo-foreignisms" at some point. Per utramque cavernam 17:38, 3 March 2019 (UTC)
Now that I've page through Google Books results, I think "pseudo-loans" is most common, followed by "pseudo-borrowings", followed by "pseudo-foreignisms". - -sche (discuss) 19:12, 3 March 2019 (UTC)
I set up Category:Pseudo-loans by language. At the moment both "Portuguese pseudo-loans‎" and "Pseudo-anglicisms by language‎" -type categories are subcategories of it; somebody may want to change that or privilege the latter to be at the start of the list (with *) or the like. - -sche (discuss) 22:46, 7 March 2019 (UTC)

Stress of compound wordsEdit

Why isn't the stress(es) of compound words and the like reflected in their entries? E.g. ,bee's knees vs 'bee sting --Backinstadiums (talk) 19:24, 4 March 2019 (UTC)

@Backinstadiums: What do you mean by "reflected"? Stress is usually given in the Pronunciation section. You can include the stress by adding pronunciation transcriptions to bee's knees and bee sting. — Eru·tuon 22:47, 4 March 2019 (UTC)
@Erutuon: I meant addition. Is there any compound of the like that shows it? They're lexicalized sometimes and I am not a native speaker --Backinstadiums (talk) 01:50, 5 March 2019 (UTC)
@Backinstadiums: You can find examples of English words that are categorized as compounds and have a stress mark in an IPA template by searching : incategory:"English compound words" hastemplate:"IPA" insource:/\{\{IPA\|[^|}]+ˈ/. Some are very old compounds that are not felt as compounds anymore (like island), but others are probably more "fresh" compounds. — Eru·tuon 01:57, 5 March 2019 (UTC)

@Erutuon: According to the Longman Pronunciation Dictionary, " Usually, Compound words / phrases have early/late stress, respectively. Yet, among grammatical compounds pronounced with late stress are those where the first element names the material or ingredient (except for the terms cake, juice, water, so ˈorange juice), so a ˌpork ˈpie, a ˌrubber ˈduck, or a ˌpaper ˈbag (bag made of paper) but ˈpaper bag (bag for newspapers). --Backinstadiums (talk) 17:20, 6 March 2019 (UTC)

Other Germanic languages have the same stress distinction. The Dutch equivalents of those phrases all have stress on the same part as in English, but in each case the distinction is also visible in spelling because late stress has a space while early stress does not: ˈsinaasappelsap, ˌrubberen ˈeend (but ˈrubbereend has early stress), paˌpieren ˈzak (but paˈpierzak has early stress). Late stress is associated with adjective-noun phrases like wooden box, which suggests that "rubber duck" and relatives are in fact also syntactically an adjective-noun phrase and not compounds. —Rua (mew) 17:15, 9 March 2019 (UTC)

What is the policy on suprasegmental prosody? --Backinstadiums (talk) 17:08, 9 March 2019 (UTC)

africates: / d͡ʒ͜ɹ , ʃ͡ɹ /Edit

Pondering about some pronunciations of words such as imagery /ˈɪmɪ.d͡ʒ͜ɹɪ/ or dangerous, I infer that IPA should recognize as such africates such as / d͡ʒ͜ɹ / (and even / ʃ͡ɹ / in shrub), just as currently is / t̠ɹ̠̊˔ /. What are the guidelines on this issue in Wiktionary? https://www.youtube.com/watch?v=mH5FbbusdkI --Backinstadiums (talk) 01:52, 5 March 2019 (UTC)

If the IPA doesn't recognize it, why should we blaze the trail? What's the distinction between /ʃ͡ɹ/ and /ʃɹ/? /ʃ͡ɹ/ is certainly going to confuse people, and I don't see a value add.--Prosfilaes (talk) 05:22, 5 March 2019 (UTC)
[d͡ʒ͜ɹ] and [ʃ͡ɹ] (they should be in square brackets because they are not phonemes) don't really look like affricates. Affricates are basically stops with a fricative release. They don't qualify as affricates: [d͡ʒ͜ɹ] has an extra approximant at the end (d, ʒ, ɹ) and [ʃ͡ɹ] has a fricative and an approximant, not a stop and fricative. About guidelines, guidelines for what? — Eru·tuon 06:10, 5 March 2019 (UTC)

Read-only mode for up to 15 minutes on 19 March 15:00 UTCEdit

Hi everyone, a short notice. On 19 March 15:00 UTC your wiki will briefly be in read-only mode. That means that you’ll be able to read it, but not edit. This is because of network maintenance. It will last up to 15 minutes, but probably shorter. You can read more on Phabricator (phab:T217441, phab:T187960), or write on my talk page if you’ve got any questions. /Johan (WMF) (talk) 14:52, 5 March 2019 (UTC)

Unprotection of user scriptsEdit

User:Yair rand/newentrywiz.js is currently admin-protected. It's unnecessary (at least now, maybe not back when it was protected) because only admins and interface admins can edit user JavaScript pages. Could it be unprotected so that lowly interface administrators like me can edit it? — Eru·tuon 01:35, 6 March 2019 (UTC)

Done. It should be moved to MW namespace. Dixtosa (talk) 19:03, 6 March 2019 (UTC)

Order of etymologiesEdit

Does Wiktionary have a policy for the order etymologies should go in? I noticed that on fly, the first etymology listed is an obscure (relative to the other ones) dialectical word meaning "wing." It seems to me that when some etymologies are significantly more notable than others, they should go first. Is there a policy I'm unaware of that makes the current order correct, or should it be changed? Nloveladyallen (talk) 00:20, 7 March 2019 (UTC)

The Japanese entry ない has the same problem. Etymology 1 is an unproductive suffix only found in a small number of words. --Dine2016 (talk) 05:00, 7 March 2019 (UTC)
I am not aware of a policy, but I think you can be bold and re-order the sections to what you deem the most logical. If someone else disagrees they can bring it up for discussion. - TheDaveRoss 13:25, 7 March 2019 (UTC)
When I see obscure/dialect things as the first ety for an everyday word, I swap them around. Equinox 16:45, 7 March 2019 (UTC)
I agree, though I also tend to move up older etymologies, provided they still have at least one definition in common, widespread use. DCDuring (talk) 17:58, 7 March 2019 (UTC)
As to why it's like that on some entries: some people, at least historically, preferred to put the oldest / first-attested etymologies/words first. (And some users, at least historically, straight-up put uncommon or obsolete or dialectal etymologies/words first even when more common ones are equally old...) Please do re-order them. - -sche (discuss) 22:52, 7 March 2019 (UTC)
  • It's a matter of the editor's personal preference. AFAIK the order of etymologies (and definitions) has not been enshrined in Wiktionary policy. IMO we should put the most common usages first, and that's the way I have been editing. ---> Tooironic (talk) 05:27, 9 March 2019 (UTC)

[Japanese] Should historical Kanji readings always be noted whenever applicable?Edit

For example, the historical inscription of 川 in Kunyomi is かは (which has since been reformed to かわ), so should every instance of 川 in a word being read as かわ have かは as a historical hiragana? I noticed that most entries do not bother adding it but a few of them do. --Four-fifths (talk) 04:29, 9 March 2019 (UTC)

You could start adding them, of course. It's just missing information. —Suzukaze-c 04:38, 9 March 2019 (UTC)
I would prefer for historical kanji readings/spellings to be added only if the historical reading is attestable in historical literature. KevinUp (talk) 08:19, 9 March 2019 (UTC)
A number of historical literature does not spell according to 歴史的仮名遣い. Here is an example where 全(まと)う, reduced from 全(まっと)う, is spelled またふ despite the correct historical spelling being またう. --Dine2016 (talk) 10:59, 9 March 2019 (UTC)
IFF you're transcribing some text that includes 川, and other parts of that text use historical spellings, then sure, add かは as furigana for 川.
However, "should every instance of 川 in a word being read as かわ have かは as a historical hiragana?" -- no, there's no value in doing so, and instead you risk confusing users who might think the historical kana spelling is still in use. And while 川 has only one possible historical kana spelling, as Dine2016 notes, various words could be rendered in kana in multiple ways to achieve the appropriate reading. For the example of 全う, both まとう and またう resulted in the same pronunciation from around the 1700s or 1800s as the /ɔː/ vowel converged with /oː/. There are many such instances of historical kana spellings that could technically be called "misspellings", so it is not always safe to assume that the "correct" historical kana spelling was the one always used for a given word. ‑‑ Eiríkr Útlendi │Tala við mig 05:07, 2 April 2019 (UTC)

Language family trees in category pagesEdit

Hi, JohnC5 and I would like to add language family trees (generated by Module:family tree) to language categories, probably at the bottom of the text, directly above the lists of subcategories and pages in the category. This has been a plan for a while, but thanks to some HTML and CSS work by Suzukaze-c (and some work by me), the tree is finally in a presentable state.

As an example, the following tree would be added to Category:Proto-Germanic language. It shows the descendants of Proto-Germanic, based on the language data that is used in our entries. Click "Expand" to see the tree.

Some aspects of the tree are confusing. Etymology languages (such as American English) are shown as children of the languages, language families, or language variants that they belong to (in this case, English). This does not mean that they are descendants (like English is a descendant of Middle English); we simply don't have a better way to display them in the tree.

Currently, language families have a tree emoji after them (🌳) and etymology languages have a speech bubble (💭). This could be changed.

Some aspects of the style of the tree are not set in stone. One disagreement is the position of the tree icon: on the left side or the right side of the language family name. Currently it's on the right so that all the language and language family names line up. If you have an opinion on this either way, please let us know.

Is there any opposition to this idea? Also, any ideas for improvements? — Eru·tuon 08:14, 9 March 2019 (UTC)

Since nobody objected, the trees have been added to language categories. The icons have changed, though. Suggestions still welcome. — Eru·tuon 05:19, 17 March 2019 (UTC)
@Erutuon I've been looking at this and noticed some room for improvement. Families are defined by being descended from a particular ancestral language. The Frisian languages are those descended from Old Frisian, the North Germanic are those descended from Proto-Norse, and so on. I think it makes more sense to make families and their corresponding proto-languages a single node in the tree rather than two. Otherwise, every family would in theory would have exactly one child: its proto-language. —Rua (mew) 19:10, 4 April 2019 (UTC)
@Rua: There are currently undocumented display options |protounderfam= and |famunderproto= that do something similar to what you describe: show proto-languages directly below the family that they are the parent of, if they belong to that family, and the reverse. — Eru·tuon 19:16, 4 April 2019 (UTC)
What does it do if neither of those options is given? —Rua (mew) 19:20, 4 April 2019 (UTC)
You can see what it does in the tree above, with North Germanic and Proto-Norse. — Eru·tuon 19:22, 4 April 2019 (UTC)
That's the same as what |protounderfam= is supposed to do though, right? —Rua (mew) 19:24, 4 April 2019 (UTC)
Never mind, I see the difference now. I think my preference would be this format if the family has a proto-language: protolanguage (code) [family (code)]. —Rua (mew) 19:27, 4 April 2019 (UTC)
I'll implement that at some point. If there's ever some kind of vote on this, people will have to see all the possible options. — Eru·tuon 08:21, 6 April 2019 (UTC)

Encoding of apostrophe-like palatalisation marks in various languagesEdit

There are various languages written in the Latin alphabet that use a mark resembling an apostrophe or prime to indicate palatalization. The exact Unicode code point to use is often not specified in the language, or used haphazardly without regard to the function that Unicode designates for the character. As a result, there are many variations in use, often within the same language as well. The difficulty of producing the correct mark often leads language users to use the simple ASCII apostrophe ('), which is not well suited for that purpose. More generally, Unicode characters that are designated "punctuation" are often used as well, even though the palatalization mark is not a punctuation character and is sometimes considered a proper letter of the alphabet in question.

As far as the orthography of Skolt Sami is concerned, however, the codepoint to use is actually standardised: ʹ (U+02B9 MODIFIER LETTER PRIME). This character is intended in Unicode for use in linguistics to represent palatalisation, and we use it in our transliterations of Russian as well. More importantly, because it's considered a letter and not punctuation by Unicode, applications will not use it to separate words and will select it along with the rest of a word when you doubleclick on it. Therefore, this seems like the character we should use, and I hereby propose we make this the standard for all such cases across languages. This would affect various Finnic languages (Veps, Võro and Votic), but I'm sure there are others that I'm not familiar with. Spellings with alternative palatalization signs can become redirects to the spellings using the proposed symbol. —Rua (mew) 15:06, 10 March 2019 (UTC)

Seems right. Fay Freak (talk) 16:24, 10 March 2019 (UTC)
For people who can read French, you may be interested by Wiktionnaire:Apostrophes. We try to record all the apostrophe-like mark we should use for any languages. Pamputt (talk) 09:51, 11 March 2019 (UTC)
Please don't change the transliteration for Russian or other languages, which transliterate "ь" as "ʹ" (not a plain apostrophe), e.g. мать (matʹ, mother): "matʹ"! (Asking just in case). In Ukrainian in Belarusian a plain apostrophe is also a standard letter (different from "ь", which is also used) and Uzbek seems to use "ʻ".
Czech and Slovak uses a symbol, which is merged with the letter it palatises: e.g. ť as in mať (mother). --Anatoli T. (обсудить/вклад) 11:55, 11 March 2019 (UTC)
I think you misunderstood. —Rua (mew) 19:31, 11 March 2019 (UTC)
Yes, I did. --Anatoli T. (обсудить/вклад) 23:20, 11 March 2019 (UTC)
Seems OK, yes. Per utramque cavernam 19:37, 11 March 2019 (UTC)

: When Japanese Kyujitai and Traditional Chinese shapes for the same codepoint differ.Edit

I've noticed that the Japanese Kyujitai form and the Traditional Chinese form while sharing a Unicode codepoint, differ in that the Japanese form has an extra stroke joining the two stacked rectangles whereas the Traditional Chinese form does not. (Do they officially differ by stroke count?)

I'm assuming this is systematic across the majority of Japanese vs Traditional Chinese fonts.

I know we have some mechanisms for documenting when Simplified and Traditional Chinese forms have differing appearances but a shared codepoint? Do we do the same with Kyujitai vs Traditional? If so should we add that to this entry? If not should we start doing so? And what to do if such variation is not systematic but varies from font to font? — hippietrail (talk) 06:32, 11 March 2019 (UTC)

@Hippietrail: perhaps like 蝉#Translingual? where the usage notes describe the difference, or 浅#Translingual, where the IDS describes the difference. @KevinUpSuzukaze-c 07:35, 11 March 2019 (UTC)
No, both Traditional Chinese (Taiwan/Hong Kong standard) and Japanese kyūjitai of have the same stroke number and glyph appearance. The difference occurs due to Xin Zixing (新字形) in mainland China, which substitutes all characters containing with which is one stroke less. KevinUp (talk) 08:40, 11 March 2019 (UTC)
The revised form of (containing rather than ) can be found in books published in mainland China, such as calligraphic books and modern dictionaries such as Xiandai Hanyu Cidian. The different glyph forms have been noted in this edit. KevinUp (talk) 08:40, 11 March 2019 (UTC)

Very interesting - thanks everyone! Should we do more to document these differences? Especially in the translingual section that shows the forms? This is a kind of variant form after all, even though sharing a codepoint. Do we have a category for characters affected by Xin Zixing? Are there cases where the Xin Zixing does get its own Unicode codepoint due to differences being regarded more significant?

  • The differences can be noted using IDS at the translingual section. I sometimes add detailed verbose descriptions under the "alternative forms" header (See for example). We don't have a category for characters that are affected by Xin Zixing.
  • We could create something such as Category:Traditional Chinese characters with Xin Zixing form, but characters have to be added on a case-by-case basis, because many of our IDS are incorrect (please check the official Unicode chart at https://unicode.org/charts/PDF/U4E00.pdf etc.) and some obsolete characters are encoded only in Taiwan and not mainland China.
  • Yes, there are many cases where Xin Zixing forms get its own Unicode codepoint. These are well documented at the Chinese Wikipedia article for 新字形. KevinUp (talk) 11:45, 11 March 2019 (UTC)

Also this means my photo is a bit of a quirk. I'm in Taiwan and I've taken photos of three different Japanese style "open for business" signs and all three actually use this Xin Zixing form rather than the Kyujitai or Traditional form. If anyone knowledgeable would like to alter the captions an descriptions here and/or over on Commons that would be great. I might upload the pictures of the other two signs too. — hippietrail (talk) 09:45, 11 March 2019 (UTC)

It turns out my photos are one of each form! Two are already at 営業中. I'll upload the third and I might make cropped versions of each too... — hippietrail (talk) 09:57, 11 March 2019 (UTC)
The characters in "commons:File:Japanese "open" sign in traditional characters.jpg" are written in nonstandard form:
  • The character (with instead of the orthodox ) is recorded as A02453-025 in 教育部異體字字典 (Dictionary of Chinese Character Variants).
  • If you look closely at the second character the bottom component of is written ⿳䒑一木 rather than its correct form ⿱䒑未.
  • I've modified the description of the file at Wikimedia Commons.
(By the way, this discussion belongs to the Tea Room) KevinUp (talk) 10:42, 11 March 2019 (UTC)
Thanks once more! Sorry I've been away from beer parlours and tea rooms for so long I've forgotten. Please feel free to move the discussion in case I do it the wrong way and make a mess. I've just uploaded the third variant too:
 
營業中
hippietrail (talk) 11:17, 11 March 2019 (UTC)

Wiktionary:Votes/2019-03/Defining a supermajority for passing votesEdit

I've drafted this vote to define the supermajority we use, as well as what "fail" and "no consensus" mean. Please give me feedback, particularly regarding the higher standard for modifications to WT:CFI and WT:EL. —Μετάknowledgediscuss/deeds 00:29, 12 March 2019 (UTC)

Reminder to contribute to the discussion at Wiktionary talk:Votes/2019-03/Defining a supermajority for passing votes, particularly on issues like whether only admins should close votes, and the higher standard for CFI and EL mentioned above. —Μετάknowledgediscuss/deeds 14:44, 15 March 2019 (UTC)

Standardizing some template shortcutsEdit

Can we pick a standard for the shortcuts of alternative forms templates? Currently we have:

I am constantly forgetting which ones use a hyphen, a space, neither, or some combination thereof. Personally I would prefer a space with no hyphen for all of them: alt sp, alt form, and alt caps (without of) seem like the clearest option to me. Ultimateria (talk) 17:44, 12 March 2019 (UTC)

I take it back, I would keep of to be consistent with {{form of}}, {{synonym of}}, etc. Ultimateria (talk) 19:43, 12 March 2019 (UTC)

Page-deleter role?Edit

It's been suggested several times in the past few months that we break up admin responsibilities into smaller roles. Personally, I think one such role could be that of "page-deleter".

As I've written here, I see the blocking tool as "the most powerful tool, and the one requiring the most discernment"; this means that someone trusted with it can easily be trusted with all the rest and be made an admin (I'm not the only one thinking as much). The reverse is not necessarily true: one could be trusted to make a good job as a page-deleter, but not as a blocker. That's why I think having the possibility of granting page-deleting rights as separate from adminship could be useful.

A user entrusted with that role would be able to delete entries that failed RFD, vandalistic entries, spam entries or spam user pages, empty categories tagged for deletion, wrong bot entries, unwanted redirects, etc.

If this is accepted, another question would arise: on what basis should it be granted: a vote? A whitelisting-like nomination?

What do you think? Per utramque cavernam 18:14, 12 March 2019 (UTC)

While I feel slightly less strongly about this idea than the blocker version, it still feels like a solution in search of a problem, and I would still probably prefer that we just have a single role for blocking/deleting. Deletion is also a multifaceted function, which of the following functions would be included: page delete, page undelete, revision delete, revision undelete, view deleted/hidden revisions, delete logs entries, delete tags, mass delete? If it were to be a new role, I would suggest it be voted on. - TheDaveRoss 12:47, 13 March 2019 (UTC)

WMF proposes rebranding Wiktionary as a "Wikipedia project"Edit

WMF conducted a study discovering that "Wikipedia" is the most recognized name and project, while "Wikimedia" is less recognized. It proposes rebrading Wiktionary as a "Wikipedia project". For public feedback, you should go to meta:Talk:Communications/Wikimedia brands/2030 research and planning/community review; for private, email to brandproject wikimedia.org. --George Ho (talk) 21:12, 12 March 2019 (UTC)

My feedback is that Wikipedia is awful and I'd hate to be affiliated with it. DTLHS (talk) 21:16, 12 March 2019 (UTC)
Nothing like a needless "rebrand" to suck up volunteer donations :( Equinox 21:54, 12 March 2019 (UTC)

You may now become 'Wiktionary — A Wikipedia project'Edit

According to this discussion at Meta, Wikimedia Foundation is considering rebranding. This means for you, that rather than Wiktionary being a Wikimedia project, it would become a Wikipedia project.

The proposed changes also include

  • Providing clearer connections to the sister projects from Wikipedia to drive increased awareness, usage and contributions to all movement projects.

While raising such awareness in my opinion is a good thing, do you think classifying you as a 'Wikipedia' project would cause confusion? Do you think newcomers would have a high risk of erroneously applying some of Wikipedia principles and policies here which do not apply? If so, what confusion? Could you please detail this. I have raised a query about that HERE in general, but I am looking for specific feedback.

Please translate this message to other languages. --Gryllida 23:05, 12 March 2019 (UTC)

@Gryllida: This is a terrible idea. We frequently have newcomers, both with and without actual experience editing Wikipedia, attempting to apply English Wikipedia policies like notability (which has a local, but very different lexicographical equivalent) and 3RR (which does not exist here). We try to patiently point them toward noticing the name of the website they are currently editing, and to acknowledge that they are separate projects. I can only imagine how much more confusion there will be if this were to go through. —Μετάknowledgediscuss/deeds 00:30, 13 March 2019 (UTC)
Thank you for these clarifications. Three questions:
  • Apart from notability and 3RR, is there anything else that is different?
  • Would you be willing to give examples of these confused newcomers and the communication with them?
  • I've found Wiktionary:Wiktionary_for_Wikipedians. It talks about the differences. Is it up to date? Is there any other relevant documentation that you would share in response to this question? Gryllida 01:09, 13 March 2019 (UTC)
    I find it confusing that in this discussion Wikipedia appears to stand for the English Wikipedia, and Wiktionary for the English Wiktionary. For each language, these have their own policies and customs. As to the respective English-language projects, it is easier to list the commonalities: (0) Like all Wikimedia projects, both use MediaWiki software; (1) Anyone can edit Wiktionary (but, unlike on Wikipedia, also anonymous IPs can create pages); (2) Users who are apparently not there to contribute to the project will soon find themselves blocked. (3) Only administrators, who get that role only after having been approved by the user community, can block users and delete pages. That’s about it.  --Lambiam 10:15, 13 March 2019 (UTC)
Currently we can speak of "Wikipedia policies" and "Wiktionary policies" (or "...votes", "editors", etc.). How are we supposed to distinguish these things, in speech and writing, after the word "Wikipedia" subsumes Wiktionary? Equinox 00:49, 13 March 2019 (UTC)
If the rebranding is approved, the name 'Wiktionary' will remain. As I understand, it will become named 'a Wikipedia project' (the new branding) instead of 'a Wikimedia project' (the current branding), that is all.
While at the moment we see the 'a Wikimedia project' only at certain pages (the main page; {{sisterprojects}}; documentation; these places are pretty hard to discover), if the rebranding is approved, the belonging of the project to the family of Wikimedia (to-be Wikipedia) projects may be featured more prominently. Gryllida 01:05, 13 March 2019 (UTC)
Today I can say "he edits Wikipedia but not Wiktionary". How would I say that afterwards? Equinox 01:07, 13 March 2019 (UTC)
The same phrasing and names would apply. Their first name (Wiktionary, Wikipedia, etc) would remain the same and their last names ('a Wikimedia project', which nobody sees now, but after the rebranding they may become 'a Wikipedia project' and become more prominently shown to readers) would change, so to speak. Gryllida 01:12, 13 March 2019 (UTC)
How would you think of renaming 'Wikimedia' to 'Wikimania'? To name it 'a Wikimania project'? Perhaps Wikimedia Foundation likes this brand, and it does not cause as much confusion as 'a Wikipedia project'. It is probably not too bad that there is a conference with this name, it is about the same movement anyway. Gryllida 01:16, 13 March 2019 (UTC)
  • The problem is that most of the world is confused about everything in WikiWorld except Wikipedia. So for outward-facing presentation purposes we probably benefit from a more explicit connection with WP. This seems to me to be a lot like what I have to do when I explain this project which has consumed much of my time for more than a decade. I have to say "Wiktionary is like Wikipedia, except it's a dictionary. It's supported by the same foundation that supports Wikipedia." Two sentences; two mentions of Wikipedia. To me this re-branding is almost a non-event. It seems like a simple recognition of where we stand in the eyes of the world. DCDuring (talk) 02:29, 13 March 2019 (UTC)
    Thank you DCDuring. Since your position is that Wikipedia brand would not cause harm, how do you think about the Wikimania brand? Do you think naming Wiktionary 'a Wikimania project' would make any harm? Do you think this change would be as good as the 'a Wikipedia project' name? Gryllida 03:29, 13 March 2019 (UTC)
I agree that this isn't really a thing. Really what is happening is that Wikimedia is rebranding itself as Wikipedia. We wouldn't need to change anything around here. - TheDaveRoss 03:06, 13 March 2019 (UTC)
I'd also welcome the opportunity to know your opinion about the 'Wikimania' brand as well. It is not confirmed by Wikimedia at this stage but knowing your views about it would be nice. Gryllida 03:30, 13 March 2019 (UTC)
I thought you were joking. The Wiki movement has worked quite hard to be taken seriously and has finally achieved the objective for many audiences. 'Wikimania' would undermine all that progress IMO. It seems to convey the image of the lunatics running the asylum. DCDuring (talk) 03:47, 13 March 2019 (UTC)
Yeah, I find "Wikimania" a bit harder to take seriously. Equinox 03:54, 13 March 2019 (UTC
I am not an expert but my gut feeling is that the way things are now is fine. There is the saying- If its not broken, don't fix it. As a frequent editor, if this change was made, it would not affect me too much. Geographyinitiative (talk) 05:20, 13 March 2019 (UTC)
It aint broke - so don't try to fix it. SemperBlotto (talk) 06:55, 13 March 2019 (UTC)
While I agree with some sentiments that this could marginalize smaller projects the decision to rebrand makes sense. As a word, Wikimedia is just too close too Wikipedia and gets easily confused, both in reading and speaking. The choice of Wikimedia as an umbrella term was unfortunate in the first place. – Jberkel 11:59, 13 March 2019 (UTC)
I think the branding is broken. The proposed change seems reasonable.
I still get confused navigating among Wikimedia Foundation, MediaWiki, and Meta-Wiki. I hope that my confusion is not an indicator of the confusion of others. DCDuring (talk) 12:26, 13 March 2019 (UTC)
The MediaWiki vs Wikimedia naming is unfortunate. Back in 2003 the naming committees really got stuck on a theme. I don't think Wikimania is a better brand or name than any of the alternatives, I don't think it has any cachet outside of a subset of the Wikimedia community and I don't think it is strictly worse at indicating what the thing is that it is naming. Wikicon would be a better name for Wikimania to begin with, at least that follows the form of the thousands of other conventions. - TheDaveRoss 12:39, 13 March 2019 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

Assuming that the purpose of having a unified brand is facilitating publicity for all projects, a major consideration is how evocative and easy-to-remember the brand name is. While currently Wikipedia is the best-known name associated with Wikimedia, with the right approach any well-chosen name can quickly become widely recognized; it is just a matter of generating publicity. I agree that Wikimedia was an unfortunate choice: not appropriately evocative (“media” is not a unifying focus), and easily confused with Wikipedia or MetaWiki. Replacing it by Wikipedia will raise the confusion to an unmanageable level. Wikimania may seem cool but has bad connotations that are just too strong and is irresistably inviting of the derived term Wikimaniac, which is fine for internal use, but we would not be able to keep its use contained. Why does the WMF not open up a contest for a unified brand name in the style of WikiXXX for some suitable term replacing XXX, with (after a preliminary selection producing a shortlist) the user community selecting the winner. My submission: Wikiworld. That certainly covers everything and has a nice alliteration. (I know there used to be a WikiWorld, but that has now been defunct for over 10 years.)  --Lambiam 14:52, 13 March 2019 (UTC)

I'm skeptical that any change will improve whatever perceived problem there might be. If Dine Brands Global Inc. changed their name, would people eat at Applebee's or IHOP more often? I doubt anyone would notice. -Mike (talk) 16:51, 13 March 2019 (UTC)
I'm boycotting Google because they aren't Google on the stock exchange anymore. DCDuring (talk) 17:05, 13 March 2019 (UTC)
There is a proposal at meta to have a brainstorming for different names. (Thanks Lambiam) The names proposed so far are 'wikipedia', 'wikiworld', 'wikimania', 'wikiweb'. Please share your proposals either here or there, at your convenience. Gryllida 00:43, 14 March 2019 (UTC)

Not only will it cause confusion because of an old sense competing with a new sense, if you rebrand Wikimedia to Wikipedia or any other project’s name, but it will also be factually untrue if you call Wiktionary “a Wikipedia project”. Wiktionary isn’t a Wikipedia project, won’t become one, shouldn’t become one, even if you do hold the Wikipedia brand in higher esteem, which I do not, thinking that the abyss would stare back; the confusion and separation issue is enough of a reason. If you do a rebranding do it only if that is worth it and don’t mingle projects in so much as they are intentionally separate.
Currently your issues are that Wikimedia is not distinctive enough, being only different in one grapheme or phoneme, though this issue is minor and can be ignored as it until this proposal has been ignored, and that one the other hand the merits of Wiktionary, as a project being as much of higher quality as it works distinctly, – the analogous with other projects like Wikispecies – are not highlighted enough. If you show an attachment of Wiktionary to Wikipedia you will pull it down and achieve the opposite of what you want to achieve. The messages must be and stay: Wiktionary will give you an experience that is well above that on Wikipedia. Wikipedia has lost its chances to be taken seriously, I am sorry to blackpill you, though the usefulness of Wikipedia is of course not debated by anyone, and Wiktionary is currently above it, as is Wikispecies, but people do not know the difference, only know Wikipedia. It is important to make known for those who have, rightly, lost hope in Wikipedia, that Wiktionary is 1. made by other editors 2. editors working pursuant to dissimilar principles and workflows, even if they also edit Wikipedia 3. describes a wholly unlike subject matter, hence the resulting project should be put not all on one level with Wikipedia. Fay Freak (talk) 04:27, 15 March 2019 (UTC)

Who is the “you” implied in “your issues”, used above? Are you addressing the Wikimedia Foundation? I don’t expect them to be monitoring the discussion on this page.  --Lambiam 09:16, 15 March 2019 (UTC)
Yes, Wikimedia’s, and also like one’s, the editors who try not to confuse when explaining; though I am not sure if they don’t even monitor this where they have posted, this being a Wikimedia project. Well I could repost it under meta:Talk:Communications/Wikimedia brands/2030 research and planning/community review#Wikipedia I guess since I now do not discern a different place for it; it would be a comparatively long answer there though. Fay Freak (talk) 14:38, 15 March 2019 (UTC)

Μετάknowledge, Fay Freak, , Μετάknowledge, DCDuring: As an alternative to saying 'a Wikipedia project' there is the possibility of saying 'a sister of Wikipedia'. This in my opinion may reduce confusion: it makes the sister project stand out as a separate project more clearly. That's what I commonly do when speaking with people about one of the sister wikis, when asking them to release an image under a free licence. They usually understand quickly. Do you think this option can reduce confusion here caused by people misinterpreting Wikipedia policies as Wiktionary's own? --Gryllida 18:25, 25 March 2019 (UTC)

People believe that WP rules apply here now, even though our name is distinct. I'm not sure that any plausible renaming will change that. DCDuring (talk) 19:59, 25 March 2019 (UTC)
At the same time, I don't see how labelling everything a "sister of Wikipedia" is any good either. It's like your only claim to fame is being a sibling of someone famous, rather than on your own merit. —Rua (mew) 20:11, 25 March 2019 (UTC)
Right, and therefore the effect of nivellating the image of Wiktionary would be the same. Fay Freak (talk) 15:09, 26 March 2019 (UTC)

Deleting Template:redlink_category and Module:redlink_categoryEdit

FYI, I started a discussion about deleting this feature on the talk page of the template. Noting here in case not everyone notices the discussion there. - TheDaveRoss 15:55, 14 March 2019 (UTC)

I hope this doesn't need a vote. It seems to me to merit a BP discussion, especially because the idea behind it is potentially of wider application. DCDuring (talk) 16:04, 14 March 2019 (UTC)
Although this is a really awful hacksaw-and-bailing-wire-and-duct-tape way of doing this (run a module from every linking template on every page that has linking templates, every time any such page loads, with an expensive parser function run every time if the template is linking to one of the target languages- really? To populate todo lists?), there are people who find the information it generates very useful, and no one seems to want to spend the time and effort to generate it by other, more sensible methods. Chuck Entz (talk) 03:19, 15 March 2019 (UTC)
It's not too much effort to implement something like this by analyzing the dumps. It would work for all languages, and could take other ways of linking into account (plain wikilinks), and perhaps even indicate orangelinks. – Jberkel 10:30, 15 March 2019 (UTC)

category: silent tEdit

Please could somebody create a category for words with a silent t: moisten, often, thistle etc. --Backinstadiums (talk) 14:45, 15 March 2019 (UTC)

Pronunciation trivia of this sort seems more suited for an appendix really. --Tropylium (talk) 09:47, 18 March 2019 (UTC)
This category would be very useful to learners. — Ungoliant (falai) 15:40, 26 March 2019 (UTC)

Translations in languages you don't knowEdit

In User talk:Panglossa#Translations in languages you don't know, we read "Please avoid adding these. It is very easy to make mistakes, and even if you get the content right, you may end up adding it in the wrong way, as you did at walk, thus requiring someone else to clean up after you."

The original poster to the user talk page added e.g. Czech pivní sýr, and admitted they do not know Czech.

As far as I know, multiple established editors add translations in languages they do not know. A very recent example is diff, where at least Estonian and Greek do not match any Babel box.

Do we want users to receive such messages on their talk pages? Do we want to introduce a policy or recommendation to the effect of that message on the user talk page? --Dan Polansky (talk) 09:59, 17 March 2019 (UTC)

The way the message is phrased makes it sound like the boilerplate language of Wikipedia warning templates (“When moving pages, please remember to fix any double redirects”). Warning users when they make a kind of mistake that they are likely to repeat is by itself a good thing. It would have been better (I think) if the mistake had been specified more, and I think I might have phrased the warning like “please be extremely careful when...”. Perhaps a readable essay for new editors with positive advice on how one can contribute to Wiktionary (focus on languages you are familiar with) works better than introducing guidelines on what to avoid,  --Lambiam 12:08, 17 March 2019 (UTC)
Elsewhere, I made the following proposal:
Editors can contribute new entries even for languages that they do not know and have not studied. However, in such case, they are strongly encouraged to work very carefully with sources, and get acquainted with the lemmatization practice of the English Wiktionary for the language. For instance, for Latin, some dictionaries use e.g. stare as the lemma while Wiktionary uses sto as the lemma.
Whether that should have the status of policy, guideline or advice is a little less important, I think. --Dan Polansky (talk) 12:17, 17 March 2019 (UTC)
The status is less important than how easy a read it is.  --Lambiam 13:56, 18 March 2019 (UTC)
I agree that there shouldn't be an absolute prohibition from editing or adding translations in languages one doesn't know. I think in such cases one should be very careful, but certainly there are cases when being pretty much sub-A1 level in a given language doesn't preclude one from being able to consistently add correct and useful content in that language. There is in my experience a gradation in the degree to which one can be unfamiliar with languages not listed on one's Babel. For example: I am absolutely lost when confronted with Chinese or Nahuatl texts, but if you give me a Romanian word I am confident that I could with some effort find out whether it is in use, whether it is SOP or what its lemma form is. Perhaps we could for convenience create a Wiktionary namespace page or a new section on a relevant extant page with advice and warnings regarding possible pitfalls when editing/translating in languages one doesn't know (with a shortcut à la WT:ATTEST or WT:EL like idk, WT:UNFAMILIAR or w/e), but it'd be undesirable imo to prohibit such editing entirely (not least because proficiency is self-reported anyway, making such a rule difficult to enforce). — Mnemosientje (t · c) 13:08, 17 March 2019 (UTC)
Again, Dan, this isn't a matter of policy. I don't leave these messages for everyone doing it, and I wasn't going to for Panglossa until they made a mistake that I had to clean up. At that point, their contributions became a slight waste of another editor's time, and I therefore wanted them to stop doing that. It's that simple. —Μετάknowledgediscuss/deeds 15:09, 17 March 2019 (UTC)
I add such entries from time to time, and while I usually make sure the entry is correct, either by checking a dictionary or by asking a native speaker, I understand I can make mistakes, especially regarding the form of the entry. I welcome Μετάknowledge's warning about the correct procedure, but I also understand this a collective project, we contribute what we can and more knowledgeable peers will correct it if necessary. I will certainly be more careful from now on, but whenever I find something worth including, I will do so. Panglossa (talk) 15:19, 17 March 2019 (UTC)
@Panglossa: Thank you. If you're willing to put in the care to check both correctness and that the lemma/spelling/etc. meets with Wiktionary's standards, then I am perfectly satisfied. —Μετάknowledgediscuss/deeds 17:08, 17 March 2019 (UTC)
What about adding these under "Translation to be checked"? Panglossa (talk) 15:22, 17 March 2019 (UTC)
That's not really the purpose of the "Translations to be checked" sections, which are for translations where it's not known which of the translation sections a translation belongs to. Instead, you should use the template {{t-check}} where you would use {{t}}. This automatically tags it for checking by someone who knows the language, and also displays a message saying that it needs to be checked. This will alleviate much of the problem, though it still requires someone spending time later to clean up.
The biggest potential problem is that someone may add a translation that's wrong and that goes unnoticed. We don't have a really convenient way of finding all the translations in a given language, so it could be a long, long time before it's fixed. Translations are very hard to patrol, since they involved language-specific knowledge that no one person has for every language, and there's no way to check where the contributors got them.
If I see someone add or change translations in a large number of unrelated languages, that immediately raises my suspicions. Yesterday, a Canadian IP completely reworked the translation tables at middle in a single edit, with changes in multiple languages that I don't know. Fortunately, one involved changing an uppercase German noun to lowercase, which no one who knows anything about German would ever do, so I reverted all their edits and blocked them. They could have been mostly right, but the difficulty of sorting through all of their changes in all of those languages made throwing all of it out the only practical option once I knew they were seriously wrong on one aspect. Chuck Entz (talk) 15:59, 17 March 2019 (UTC)

Pronunciation respelling for EnglishEdit

I propose adding the corresponding entries for the graphemes of some Pronunciation respellings for English, especially the one used by wikipedia, that is WIK-ih-PEE-dee-ə-Backinstadiums (talk) 14:51, 17 March 2019 (UTC)

  Oppose. These aren't words, nor do they have any meaning in a language. —Rua (mew) 15:12, 17 March 2019 (UTC)
If you mean adding WIK-ih-PEE-dee-ə, then no. If you mean adding to e.g. ee that it's used to represent /i/, then maaaybe, but pronunciation respelling schemes are possibly too varied for us to want to try to include them all; they are as Rua says not words. (And in many works that use them, they're explained in appendices already.) - -sche (discuss) 19:43, 19 March 2019 (UTC)

Attestations of native toponyms mentioned in Latin textsEdit

Many old toponyms of Europe are found only in the form of mentions within Latin texts. Because the text itself is Latin, it seems that our CFI would treat these words as Latin. However, they are generally not Latin grammatically (i.e. they lack Latin endings), and are by and large written down by native speakers of the area in question, not native speakers of Latin. Thus, it can be argued that this is simply code-switching, inserting for example an Old Dutch name in its native form into an otherwise Latin text. If they are considered an attestation of the native language, we can include them in etymologies of modern place names, which is great. It wouldn't make sense to say that a modern Dutch place name is descended from a Latin name merely because the Old Dutch name was quoted in a Latin text. There really isn't anything Latin about these other than the language of the text they happen to appear in.

My question is whether these toponyms count as attestations for the local language, rather than Latin. I'm not sure if CFI says anything about this either way, but it certainly seems like it would be desirable to be able to include these. —Rua (mew) 15:12, 17 March 2019 (UTC)

If an undeclinable Latin form and an Old Dutch form (code-switched into Latin) are indistinguishable by form then according to some one and the same occurence attests this form for both languages. Since seemingly still people fail to see criteria for the lexicographical quality of an occurence with regard to code-switching.
I’d argue for a “favour for the smaller language.” If you say it is Latin one expects a bit clearer evidence that shows that these are names used in Latin, otherwise one could add place names without end because they somewhere appeared in Latin, which would be insipid. Whereas if you see such a thing for Old Dutch, one naturally can’t expect pull out more.
In my view toponyms and personal names should not even get their own language sections. They should be under L2 headers called “Name” or similar, other spellings being soft redirects like سميث‎ being “Arabic spelling of Smith” for instance; also using own linking templates perchance. Things like Timișoara and its argument “is this Spanish?” will get on everybody’s wick at some point. Why do we need hundred entries for Srebrenica only because history books about the Yugoslav Wars have been written in hundred languages? Why is Karadžić according to Wikipedia an English name spoken /ˈkærədʒɪtʃ/? I don’t believe in the “pronunciation information” arguments. If a Turk lives in Germany his name will stay bare Turkish for seven generations and beyond. Eindeutschung according to peculiarities of law won’t help. Kowalski is still not a German name. And yeah, all the entries in Category:English surnames from German are German lexemes used in English discourses, if not the German spellings of Slavic names etc. Kaufman is German and not English. People just don’t realize that they don’t talk English any more when they use these names. No, this is not code-switching. Names work differently. In other words languages are sets that do not contain proper nouns, since, rightly observed, these stay if you switch the language. Fay Freak (talk) 17:30, 17 March 2019 (UTC)
Why don't English speakers need to know how Karadžić is pronounced by English speakers? Why is Kaufman German--as our entry points out, and w:Kaufman (surname) shows, it's not a name used in Germany. According to Kaufmann, the basic form is attested back to Old High German, so if it's not English, it's not German either.
Proper nouns do not necessarily stay if you switch language, as the translation table for Rome makes clear. Even a new city like Las Vegas has six Latin-script Wikipedias that chose a name for their article other than "Las Vegas" (or "Las Vegas, Nevada"), with Navajo ranking in as the most unusual with Naʼazhǫǫsh Hatsoh. Tokyo is named Tokyo, Tokio, Tòquio, Tokyô, Tōkyō, Tóquiu, Tókio and Tang-kiaⁿ-to͘. To go at it from another direction, Perth may be spelled the English way in many languages, but it is not pronounced the English way in most of them, the dental fricative being rare among the world's languages. It's a complex mess, and your rant bluntly ignores all the hard details.--Prosfilaes (talk) 09:22, 18 March 2019 (UTC)
“Perth may be spelled the English way in many languages, but it is not pronounced the English way in most of them, the dental fricative being rare among the world's languages” – does not seem so. This name does not appear in discourse in German and if a German tries to pronounce it he tries a dental fricative. There is currently no place in Australia or Scotland having a lexicalized German pronunciation. How would a Russian pronounce it? It would likely also be with a dental fricative, if the speaker knows about its existence in other languages.
Las Vegas being primarily inhabited by English speakers, it would of course be notable in its section. Or in general, if we have “Name” sections, we can put the English pronunciations at first; it would also include a German pronunciation, which also is lexicalized, /las ˈveːɡas/. But I would be we wary not to conjecture any like you do for “Perth”.
The spelling or pronunciation, or inflection, apparently does not say anything about nativization; it is not constitutive and can not be taken as an indication for a name being included in a language, not related to what lexicalization means. “Kaufman” is a German name, only and even though used in Germany in a different spelling. But the pronunciation information would not get lost, as I said. It is important to see that one can’t just talk about “names used in Germany”, “a name used in the United States” and the like. “Names” aren’t “used” this way. They aren’t used because they belong to a language but because they belong to a specific entity referred to; with rare exceptions. Only one in England for German, a few more in Italy (Rom, Mailand, Venedig, Florenz, Turin, Padua, Genua, Neapel and then it ends for any speaker, if I haven’t missed one, other places are perceived and spoken as if bare Italian, ignoring those in the now or once German-settled areas), and for Poland in former German-settled places both compete. Is Szczecin German because it is sometimes used in this form and not Stettin in German newspapers and the like? No, this is a wrong question, not even Stettin is German in this sense: “Rome” being different by language, even names calqued does not say anything, since names are changed even for one language: Са́нкт-Петербу́рг (Sánkt-Peterbúrg), Ленингра́д (Leningrád). See, place names and people can just be “renamed” pursuant to the law, this is also shows how names work differently: This question “is this of the language X” does not arise in such a form in nature for names but you ask this only because on Wiktionary you group all under a language, by grouping names independently you avoid such questions which are wrong.
I also want to emphasize that place names and personal names slant the statistics in the categories “X terms borrowed from Y”. One could go around in Germany and quote the local Russian journals for any commune in Germany, we get 11,000 “terms” borrowed from German into Russian this way (the approximate number of communes in Germany). No, this is underplayed, since the towns also have districts, so the number is actually higher, even if we count stretches of land of which seemingly no Russian has ever heard of. Fay Freak (talk) 14:26, 18 March 2019 (UTC)
A German who knows no English who tries a dental fricative does not in fact produce a dental fricative. Vowels will consistently get mangled, as you point out with Las Vegas. Names get mangled, both spelling and pronunciation, into various languages, particularly when the place or person doesn't speak the original language. Every major city of Europe has one or more cities named after it in the US, and all of those cities have their pronunciations anglicized. One of my friends grew up near Venice, Missouri, and it took her years to realize that the city was named after a place in Italy, as the cities, even in English, were not pronounced anywhere near the same.
Again, why is Kaufman German? If the Anglicization doesn't make it English, then the modernization doesn't make it other than Old High German.
Compare w:en:List_of_sovereign_states_and_dependent_territories_in_Europe, w:de:Liste_der_Länder_Europas, w:az:Avropa_ölkələrinin_siyahısı and w:lv:Eiropas_valstu_un_atkarīgo_teritoriju_uzskaitījums. Comparing the first and second lists make it clear that English and German disagree on the names of about half of the nations of Europe. An examination of the third and fourth list show that Latvian and Azerbaijani make a habit of changing the spelling of names.
Spellings are changed all the time by law, and language regulators like Académie française change the words for things. Places have different spellings and names depending on the language, and even pronunciations for the same name: /ˈkaʊ̯fˌman/ verus /kaʊfmæn/ for the name you brought up.
I understand that most place names just get adopted as is, with no real nativized pronunciation. But I don't think we can deal with that without recognizing that place names can be as intertangled with their language as any other noun.--Prosfilaes (talk) 01:43, 19 March 2019 (UTC)
Avoiding for a moment the question of what language to consider them, I'd note that they can still be mentioned in etymologies even if they're considered Latin, without saying the e.g. Dutch name is derived from Latin. Lüneburg#German uses the "First mentioned in 956, in Latin, as Luniburc"; a similar approach would be to say something like "from Old High German *Foo,"—(or Old Dutch, or whatever)—"attested in Latin in 632 as Fou". - -sche (discuss) 19:50, 19 March 2019 (UTC)
I suppose so, but the Lüneburg example is exactly what I'm referring to in this question. To me, it seems weird to treat Luniburc as Latin, it doesn't look at all like Latin to me and has no Latin grammatical endings. —Rua (mew) 19:55, 19 March 2019 (UTC)

Old GutnishEdit

I don't know if this has been discussed before, but I am wondering what people think of adding Old Gutnish as an etymology-only language, with its parent language being either Old Norse or Gutnish. It shows up in descendants sections of Old Norse entries, and in Gutnish etymologies. However, as I understand, it is a dialect of Old Norse as are Old East Norse and Old West Norse, which do not have their own codes, so I'm on the fence. Julia 04:44, 18 March 2019 (UTC)

Old West Norse and Old East Norse already mentioned in entries using {{label}} and have categories, so I think it wouldn't hurt to add etymology language codes for them. Jonteemil suggested adding them last year, but nothing came of it. I don't think their absence is a good reason not to add Old Gutnish. — Eru·tuon 05:59, 18 March 2019 (UTC)
Added. I also added it as a label, like OEN and OWN. The only concern I have is that while not including it at all was clearly problematic, including it as Old Norse may or may not go far enough: some references treat it as its own language. - -sche (discuss) 03:17, 23 March 2019 (UTC)

Words who letters are in alphabetical orderEdit

Do we have a category for words (such as "biopsy, almost, chintz") whose letters are in alphabetical order? SemperBlotto (talk) 13:22, 19 March 2019 (UTC)

I'd expect such a category to be in Category:English terms by orthographic property, but there's only Category:English words that use all vowels in alphabetical order. — Eru·tuon 17:53, 19 March 2019 (UTC)
I don't know if there is a name for these, I call them "alphagram words". Here is a list of a bunch of terms we already have which qualify. - TheDaveRoss 19:16, 19 March 2019 (UTC)
Created a list, though not restricted to English entries if that was what you were thinking of, from the latest dump. — Eru·tuon 19:25, 19 March 2019 (UTC)
Your list is a lot more permissive than mine. You have a slutty list. - TheDaveRoss 19:29, 19 March 2019 (UTC)
Interesting, your list is more permissive in another way, because it allows letters to be repeated. — Eru·tuon 19:53, 19 March 2019 (UTC)
Not hard to do this in Module:en-headword. Would Category:English words with letters in alphabetical order be an okay category name? I suppose Category:English words whose letters are in alphabetical order is clearer. The function here is the one I used for the list above. Might want to exclude words with uppercase letters or with at least two consecutive uppercase letters (acronym-like). — Eru·tuon 19:41, 19 March 2019 (UTC)
Nice lists. What's the longest alphagram word, by the way? Interesting to know according to both ASSes (Alphagram Sluttiness Systems) - the DASI (TheDaveRoss Alphagram Sluttiness Index) and the EASI (Erutuon Alphagram Sluttiness Index). --I learned some phrases (talk) 13:33, 20 March 2019 (UTC)
For DASI; aegilops (which is what Wikipedia lists as the longest) and affinors are the only 8 letter options. Aegilops has the advantage of not having any repeated letters, so it exists in EASI as well. It is also not plural, so it just feels good as the winner. If capital letters are allowed you can add DDMMYYYY to the 8-letter list, but that isn't a word. - TheDaveRoss 14:15, 20 March 2019 (UTC)

{{lb|neologism}}Edit

Wiktionary:Neologisms doesn't give any guidance on when a neologism stops being one. This came up when I was looking at our entry for Latin@, which has just been added to the OED with citations going back 19 years. Is that long enough to be considered no longer a neologism, or where should we designate the cut-off? Ƿidsiþ 13:11, 21 March 2019 (UTC)

I agree it'd help to have at least a soft cutoff. I recall we listed thon as a neologism for a long time even though we quoted uses from the 1880s to the 1980s(!) (I see someone fixed that in 2012). I'm having a hard time finding a book that gives a clear definition / cutoff. Simple English WP says "15-20 years" (and cites sources, but I'm not sure they're sources for the cutoff per se), and poking around google books:neologism years, I see many books talking about neologisms from the last 10, 15 or 20 years, but it's not clear if they mean that's the cutoff for when something stops being a neologism, or just the cutoff for what they looked at. Still, 15 years seems reasonable to me (or 20 if we wanna be more conservative). - -sche (discuss) 19:52, 21 March 2019 (UTC)
Pick a number, take a poll, make a vote. 20 years seems good enough. Actually using {{defdate}}, based on attestation or use on line would be nice. DCDuring (talk) 20:39, 21 March 2019 (UTC)
I believe the OED has hard cutoffs like "no uses in the past 100 years = obsolete". This of course assumes that someone has actually made a good effort to find historical examples and that the researcher actually has access to a representative corpus to search in. Both of which may be questionable for us. Not to mention that these labels need to be reviewed periodically. DTLHS (talk) 04:45, 22 March 2019 (UTC)
So yeah I am opposed to this, since it will become another label like {{defdate}} that gets slapped indiscriminately on entries without any supporting evidence. At least now when we say something is obsolete we're not making any more than a vague statement (which is all we're equipped to make). DTLHS (talk) 05:21, 22 March 2019 (UTC)
Ah, good point; I hadn't interpreted Widsith's question as being about when to add the tag to entries, but only when it's OK to remove it. Perhaps we should, as you seem to be suggesting(?), remove it from all entries and only focus on things like whether a new word is rare. - -sche (discuss) 07:03, 22 March 2019 (UTC)
Yeah, that's what I meant. But I suppose the broader question is – what's the point of this label? I mean, I can see the point of "protologism" because it basically means "this may not have the citations a word would normally require". But "neologism" doesn't really mean anything except "it's kind of new in an undefined way, but it can still be cited normally", in which case it seems rather pointless. Ƿidsiþ 13:43, 22 March 2019 (UTC)
I looked over Category:English neologisms and removed ~24 entries that jumped out at me as being from the 1990s or earlier. Removal of the tag from all entries might require more discussion. If the label is kept, we should at least treat its category as a 'check back on' category like the hot word category. - -sche (discuss) 01:41, 23 March 2019 (UTC)
Recently added to blurb, now removed. That word is older than me, I reckon. DonnanZ (talk) 01:09, 24 March 2019 (UTC)
Remove all. Not useful information, especially since it is applied so randomly, as when somebody rarely thinks about it. A word being a “neologism” does not say anything to anyone about whether or where he should use it or could encounter it. Fay Freak (talk) 23:34, 23 March 2019 (UTC)
I agree. It's also used as a context label here, while it says nothing about usage context. It's an etymological detail. —Rua (mew) 18:21, 24 March 2019 (UTC)
I'm going to tweak [[neologism]]'s usage notes to reflect this, btw, emphasizing the 15-20 year cutoff over "being felt to have always been valid", which isn't even the case for non-neologisms (like "ain't"). - -sche (discuss) 04:27, 26 March 2019 (UTC)
I think that this label has some of the same pros and cons as {{lb|nonstandard}}. There are elements of subjectivity and authority shopping and the question of when something becomes acceptable, albeit informal or colloquial. But there is also the fact that it is the kind of useful information that people expect from a dictionary. Perhaps we should let volumes like Garner's Modern American Usage go withoutcompetition from us, since we won't have our hearts into doing a good job of it anyway. DCDuring (talk) 12:02, 26 March 2019 (UTC)
I'm not sure it is really useful information. "Nonstandard" tells you a bit about how you can use a word or can't, about as much as one word can. Many entirely standard words are neologisms, and a lot of slang, no matter how long it's been around, is and always will be slang. The word fanac is mid-20th century in origin, and yet it's still slang for a specific audience.--Prosfilaes (talk) 06:50, 27 March 2019 (UTC)

Categorize Japanese verbs by their classical conjugations?Edit

-- Huhu9001 (talk) 10:44, 22 March 2019 (UTC)

Language explicitly stated in quotation of useEdit

Since recently, some Czech quotations now show "(in Czech)", which I find annoying and unnecessary. Of course Czech attesting quotations are in Czech; what else could they be. An example entry is být na dvě věci. Thoughts? --Dan Polansky (talk) 20:28, 22 March 2019 (UTC)

I completely agree. —Μετάknowledgediscuss/deeds 20:30, 22 March 2019 (UTC)
Maybe it's semantically useful to store this information but it's idiotic to display it when the entry is the same language as the citation. Equinox 20:34, 22 March 2019 (UTC)
Agreed. - TheDaveRoss 21:50, 22 March 2019 (UTC)
I agree; display should be suppressed by default. Is it intended to represent the language of the work as potentially distinct from the language of the quoted snippet? (E.g., for a mostly-English anthology with one Spanish paper in it?) Then it should be suppressed unless those two languages are different. Otherwise, visible display should be suppressed by default (enable-able by some other parameter?), if not in all cases. - -sche (discuss) 01:33, 23 March 2019 (UTC)
The template documentation for {{quote-book}} includes |lang= in the “most basic” parameters for both English and non-English quotations, thus encouraging its pointless use. The parameter |worklang= still makes sense in case the book is not in the same language as the quotation.  --Lambiam 19:39, 23 March 2019 (UTC)
Yes. The worst example, as it appears to me, is “Qur'an (in Arabic).” I don’t even see a single reason to name the language by default. Fay Freak (talk) 23:29, 23 March 2019 (UTC)
OK, I can change this. The display of (in Foo) has always been there but the difference is I added language codes to the various quotes so they get categorized and formatted properly. Benwing2 (talk) 04:38, 27 March 2019 (UTC)
I changed this so the annotation is only displayed in two cases: (1) |worklang= is given (in which case |worklang= is displayed); (2) |termlang= is given and is different from |lang= (in which case |lang= is displayed). The former case is intended to handle the situation where the language of the work as a whole is different from the language of the quote, and the latter case handles the situation where the language of the term is different from the language of the quote. Benwing2 (talk) 04:52, 27 March 2019 (UTC)

Can we now get rid of all these Webster's 1913 requests for quotes?Edit

Many years ago - so long that I can no longer find the discussion - I proposed that we get rid of thousands of templates in entries asking for quotes from specific historical figures. An example would be absolution, which includes the following entries:

  1. The form of words by which a penitent is absolved. [First attested around 1350 to 1470.]
    (Can we find and add a quotation of Shipley to this entry?)

...

  1. (obsolete) Delivery, in speech.
    (Can we find and add a quotation of Ben Jonson to this entry?)

When I first proposed getting rid of these, the determination of the community was to keep them on the grounds that these requests hinted at sources of citations for the definitions, and might soon be fulfilled. Years later, I see no sign of that happening, certainly not on any kind of scale sufficient to suggest that the nearly ten-thousand requests will ever be addressed. In the meantime, they are just pollution in the entry, an eyesore, a permanent signifier of incompleteness that falsely suggests to the reader that a specific quotation is required to have the complete definition.

Furthermore, our entries are in no way contingent on having quotes used by another dictionary. For most of the words for which such a template exists, there are thousands of sources to which we can turn to cite the word in general. There is nothing magical about Shipley or Johnson that makes their quotes particularly significant to the meaning of the words, nor do we have any guidance for which specific quote by these subjects the authors of Webster's 1913 may have been referring to in their inclusion of these names.

I therefore again propose that we get rid of these requests for quotes. I would propose that a reasonable alternative to having them in the entries would be to have a bot move all of them to a project page, so that those who are really interested in hunting down these citations can look there to see which entries they are associated with. bd2412 T 20:53, 22 March 2019 (UTC)

@Aabull2016, you're someone who I've seen actually try to fulfill these requests. DTLHS (talk) 21:06, 22 March 2019 (UTC)
I would be in favor of, at least, having the templates not display in the entry but simply categorize. Moving them to a project page is even better. - TheDaveRoss 21:48, 22 March 2019 (UTC)
I very occasionally fill a request when I'm working on a page for some other reason, most recently (yesterday) at [[disadvise]]. DCDuring (talk) 22:39, 22 March 2019 (UTC)
We might get rid of them from the display (if we can store them elsewhere, if only in HTML comments!) but I strongly disagree with removing that data altogether: otherwise, these senses will be RFVed with no evidence available and probably deleted, whereas they might survive with these hints at how to find them. Legitimate definitions are more important than prettiness of a Web page. Equinox 00:18, 23 March 2019 (UTC)
@Equinox: Are HTML comments visible in the dumps? Specifically: Articles, templates, media/file descriptions, and primary meta-pages: enwiktionary-20190320-pages-articles.xml.bz2? DCDuring (talk) 04:02, 23 March 2019 (UTC)
Though I wasn't the one you asked, yes, they are. The pages-articles and pages-meta-current dumps (which have the same format) contain the wikitext just as it appears when editing the source of a page, except that < and > are encoded as &lt; and &gt;. — Eru·tuon 23:04, 23 March 2019 (UTC)
Thanks. I was just interested in whether the data would be recoverable if Equinox's suggestion was implemented. I have other reasons stated below for wanted these kept as is. DCDuring (talk) 02:31, 24 March 2019 (UTC)
By the way, you are criticising some sources because they are other dictionaries: that's fine (usage vs. mention rule) but note that a lot (I think a very huge majority) of these requests are not actually references to dictionaries, but rather to real writers using a term. Equinox 00:19, 23 March 2019 (UTC)
I am not criticizing them because of the source of the quote, but because all of these "hints" are derived from other dictionaries. In other words, another dictionary somewhere is saying "use this as a source", an we are acting as if we are constrained by the decision of that other dictionary to use their preferred source. bd2412 T 15:22, 31 March 2019 (UTC)
The suggestion above of having the templates display nothing (but be present in the wikitext, and add categories) might be a compromise. However, I'm not really opposed to removing them altogether: the main benefit is that they're a starting point for trying to cite an obscure sense you're not sure is real, but in that case just turn to a free online copy of Webster's old dictionary and look up the entry (and its quotations) there, the way I check old public domain copies of Century to see if they have citations if I'm trying to cite some obsolete word/sense. - -sche (discuss) 01:23, 23 March 2019 (UTC)

I think the notion of making Wiktionary look superficially like a finished product when, 1., it was founded on the principle of user participation in its construction and improvement, 2., it remains far from being a finished product, and, 3., it is likely to remain unfinished for quite some time is positively wrong-headed. We need to offer more ordinary-user-facing evidence of incompleteness to help lure potential contributors into the process. Those patient enough to cite entries would be particularly desirable. DCDuring (talk) 03:54, 23 March 2019 (UTC)

Thanks for tagging me in this discussion, @DTLHS. I can appreciate the arguments for and against having these requests visible; however, I believe it would be a terrible loss to remove them altogether. In my own work, they very often provide an entry point into work that becomes much more extensive. The idea of checking *every* obscure sense in a free online copy of Webster's 1913 dictionary is not at all appealing as it would involve a lot of wasted time and chasing down dead-ends. Having worked with a very large number of these requests, I can definitely confirm @Equinox's contention that the vast majority of these are not references to dictionaries. The few that do refer to dictionaries inevitably involve specialized dictionaries (e.g. nautical, technical, agricultural) and can also provide useful information to clarify senses and periods of use. Aabull2016 (talk) 16:34, 23 March 2019 (UTC)
I have seen it happening. Sometimes even IPs add quotes “as requested”, so it does lure in editors. I have also solved a few of these, working on pages for other reasons. Also this template has been used for other languages already where there aren’t corpora available whence you could easily get other quotes (“thousands of sources to which we can turn”). Fay Freak (talk) 23:25, 23 March 2019 (UTC)
  • Delete all these I've even seen requests from Shakespeare for words that didn't exist in this time. Requesting a quotation is fine but requesting one from a particular dictionary or Samuel Johnson is too much. —Justin (koavf)TCM 23:31, 23 March 2019 (UTC)
    • This abuse must be pretty rare and does not say anything about the template in general. It is like arguing Gothic translations should not be allowed because people add Gothic translations for things that did not exist with the Goths. Samuel Johnson is not too much either, one can search the term plus his name or databases of texts of him with the term. But even if Samuel Johnson is not so relevant, there are more interesting use cases, I think about historical uses of plant names difficult to identify. Fay Freak (talk) 23:39, 23 March 2019 (UTC)
      • One thing that could be done is to remove the requests that are obviously for dictionaries. We could compile a hit list of dictionary-only sources with links to their request categories, and those who are bothered by them could remove them. That would also have the benefit of improving the focus on the most worthy types of requests. Chuck Entz (talk) 00:39, 24 March 2019 (UTC)

Hey look at me! I'm in bold at the bottom. If consensus says these must go then I would at least like a list of what they were, so I can try to add those senses, which will otherwise be stupidly thrown away. Can someone help me with a bit of botting? Please consider this before doing a blanket deletion. Equinox 04:48, 26 March 2019 (UTC)

Theoretically all of the senses should exist, right? These requests were added in conjunction with the Webster's 1913 definition. If there are instances where a request exists but no definition that should probably stay, and have an {{rfdef}} added to boot. - TheDaveRoss 12:26, 26 March 2019 (UTC)
Doesn't appear to me as if there's anything like a concensus. Of course all the *senses* exist; it's the citations that will need to be added to illustrate / confirm them. As mentioned earlier, where those senses are obsolete and/or rare, they are liable to be removed after failing verification requests. It's a large amount of useful information to flush down the toilet. Aabull2016 (talk) 15:04, 26 March 2019 (UTC)
@Aabull2016 Do we actually know that all the senses exist? I don't know what the criteria for inclusion in Webster's 1913 was, but it is possible that they included senses that actually would not meet our CFI. bd2412 T 15:26, 31 March 2019 (UTC)
We know that the professional lexicographers thought that readers might need the definitions, possibly just to understand the way the author named used it. They had the notable-work attestation criterion for inclusion. (What were they thinking!?!?!?) In some cases they were just a century or more closer than we are to the use of the definition. I think we need their help to understand some of the older meanings of words. We could use dated citations to help use know in what time period to look for citations, especially for uncommon, dated (etc) definitions of polysemic words, which can be very hard to cite. Or we could just risk COPYVIO and use the OED's citations. DCDuring (talk) 17:55, 31 March 2019 (UTC)
The First Edition of the OED up to N should be in the public domain worldwide, and up to Th in the US.--Prosfilaes (talk) 02:43, 1 April 2019 (UTC)
@BD2412 “it is possible that they included senses that actually would not meet our CFI?” Did you have any specific senses in mind? Of the many instances I’ve worked on, I have not run into any cases of this. As DCDuring rightly points out, in many cases it would be extremely time-consuming and difficult to chase down citations without clues such as those provided by Webster’s 1913. Aabull2016 (talk) 03:36, 1 April 2019 (UTC)
So far as I have seen, Webster's gives a clue for a single citation for each word, whereas our CFI requires three citations. If the senses with these templates were taken to RfV, then that clue might be useful for finding one cite, but we are still on the hook for the other two. bd2412 T 04:33, 1 April 2019 (UTC)
...while, for senses added randomly by some anon Internet drive-by, we have no proof whatsoever. At least Webster gives us a clue to ONE of them. Equinox 05:33, 1 April 2019 (UTC)
I have recently encountered some interesting use cases in Latin entries, so in quodsī and sonīvius. And for obnūbilus Ennius is the only author, hence it can’t be too much to request a quote of a particular author. And see for plant names which sometimes had particular uses in Arab Spain Category:Requests for quotation/كتاب عمدة الطبيب في معرفة النبات لكل لبيب. Those occurences in Andalusi authors mentioned by sigles in this Andalusi plants glossary should ideally be all quoted, hence requests. Fay Freak (talk) 15:26, 26 March 2019 (UTC)
To revise my earlier stance since we have multiple people who are working on this, "leave them but maybe make them invisible" seems like a decent compromise if that's possible. Perhaps even make them only visible for people who opt in. That way people who want to add them can. Or we could of course just try and fulfil them all... - -sche (discuss) 17:00, 26 March 2019 (UTC)
Is there really a consensus that we should conceal this and other evidence of the incompleteness of Wiktionary? Why aren't we advertising all the incompleteness to try to lure more contributors?
Are we really going to a hidden category for each author? Why bother to hide these categories anyway? DCDuring (talk) 20:13, 26 March 2019 (UTC)
We already have a category for each author...? - -sche (discuss) 20:21, 26 March 2019 (UTC)
As I understand it, categories and static lists are good if many members are involved. Dynamic lists with many members are slow and resource-intensive because the search is repeated fairly often as one works through the list. I believe that the list is not updated after every relevant change or, if it is, the lag can be a matter of minutes. DCDuring (talk) 11:56, 27 March 2019 (UTC)
@Equinox: If it comes to that, I have a program that can grab all instances of {{rfquotek}} from the dump. Something like this, though you'd need more of the text of the page; at the very least the definition above the template. — Eru·tuon 20:44, 26 March 2019 (UTC)

Eliminating the difference in formatting between no-etymology, single-etymology and multiple-etymology entriesEdit

Right now, the formatting of pages with regard to etymologies is an inconsistent mess. To me, the real problem is how etymologies aren't nested under the term they describe, but I'll not go into that any further. As of right now, there are roughly three different formats when it comes to etymologies:

  • Entry with no etymology at all. POS sections are at level 3.
  • Entry with a single etymology. POS sections are also at level 3, so at the same level as the etymology section and not nested within it.
  • Entry with multiple etymologies. POS sections are now at level 4, nested under the etymology section they belong to, and everything else has to be bumped up a level too.

This is rather inconsistent. We nest POS sections under the etymology section when there are multiple etymology sections, but not when there is only one. This has been a nice breeding ground for headings with incorrect levels, and it's pretty bad that you have to re-nest the entire entry whenever you go from one to two etymologies. This is frustrating and pointless and there needs to be a better and more consistent way of doing this.

A first possibility is to change single-etymology pages to use the same format as multiple-etymology pages. This means that the POS sections are at level 4 nested under the etymology section, regardless of how many etymology sections are in the entry:

==English==

===Etymology===

====Noun====

=====Derived terms=====

However, this really just shifts the inconsistency around rather than eliminating it, because there are still the pages without an etymology to account for. The POS sections can't be at level 4, because there is no level 3 heading to nest them under. Having a section more than one level below its immediate parent is undesirable. Moreover, people have rightly complained in the past that the difference in heading size between levels 4 and 5 is not easily visible (or not visible at all). The conclusion I draw from this is that it is necessary to eliminate heading level 5 altogether.

That brings me to the second possibility: POS sections always appear on the same heading level as the etymology section. This is like we currently do for single-etymology and no-etymology entries, but now we extend that format to entries with multiple etymologies as well.

==English==

===Etymology 1===

===Noun===

====Derived terms====

===Etymology 2===

===Adjective===

====Derived terms====

This has the advantage of not only eliminating level 5 headings, but it also means that every heading has exactly one level that it's allowed to appear as. POS sections will always be level 3. Inflection, Derived terms, Descendants etc will always be level 4. There will no longer be a need to re-nest the sections; once a section is added, it can always stay at that level no matter how the entry is changed in the future. We could also decide to remove the numbers from the Etymology sections. They don't really serve a purpose after all, and we don't number any other section that way.

While the second possibility definitely has my preference over the current format, it's not without its downsides too. It eliminates the use of nesting as an indicator of what goes with what, instead substituting it with a rule that says "the first etymology section above a POS is the one belonging to that POS". Since we already do exactly that for Pronunciation sections as well as single-etymology entries, it's not a huge problem, but it's not the nicest layout either. Moreover, it encourages editors to add new POS sections that automatically subsume the existing etymology section, whether that is correct or not. A lot of editors, especially inexperienced ones, don't pay particular attention to such tricky details, they just want to add content, which we should applaud them for. It's the same situation that causes Synonyms sections to become incorrect when new senses are added but no {{sense}} template has been added to the synonyms.

For that reason, I still think that the best solution is to have etymology and pronunciation at level 4, nested under the POS they apply to, rather than the current situation. But if people won't agree to that, I think either of these proposals is still better than what we have now, proposal 2 in particular. —Rua (mew) 17:47, 24 March 2019 (UTC)

I guess I could support this. I would like to see some CSS work done to make sections more visually distinct somehow, if it's possible. Such as automatically indenting everything "under" a single level 3 Etymology header. DTLHS (talk) 18:14, 24 March 2019 (UTC)
That's not possible, because our HTML output doesn't actually contain any sections. In the HTML, it's simply a series of a header followed (without nesting) by paragraphs of text. So there is no notion of the sections "containing" the subsections. —Rua (mew) 18:18, 24 March 2019 (UTC)
You're right, it would have to be a JS thing. DTLHS (talk) 18:28, 24 March 2019 (UTC)
I do use level 5 where necessary, but it has the same font size as level 4, which makes it difficult to distinguish. DonnanZ (talk) 18:23, 24 March 2019 (UTC)
I like this (specifically I like etymology subordinate to the POS sections), and further I like the hinted at notion of etymology being subordinate to the definition lines (and probably subst'ed in there so that a single etymology could be applied across many definitions without having to keep it in sync. - TheDaveRoss 23:14, 24 March 2019 (UTC)
All of these are improvements. I have a strong preference for proposals in which the presence or absence of an etymology section does not affect the level of the other headings. Conceptually the simplest is to include the etymology sections in the headings after the definitions. That leaves an issue, however: how to deal with the frequent cases where different parts of speech share an etymology, like for abrupt or wash.  --Lambiam 06:33, 25 March 2019 (UTC)
I would argue that those terms don't actually share an etymology, we're just being sloppy and pretending they do. Imagine that the adjective and abrupt and the verb abrupt were spelled differently, and thus put on separate pages. Would the etymology section for both of them be exactly the same? I doubt it; the verb would have an etymology section indicating it is derived from the adjective in some way. The same should be done even if the spelling happens to be the same. So really, they don't have the same etymology, the verb is actually missing its etymology. The same for the noun of course. See Wiktionary:Beer_parlour/2018/November#Per-lemma_etymologies for a previous discussion that never went anywhere. —Rua (mew) 18:34, 25 March 2019 (UTC)
I agree that some terms for which we show just one etymology, like nouns and verbs in English especially, really should show two different etymologies (pinging Equinox because I recall he's a fan of merging these types of etymologies), but I am not so sure about Ancient Greek words that happen to be both an adverb and a preposition, or a pronoun and a determiner. I suppose we could try to figure out which part of speech was earlier and have the full etymology there, with the other etymology saying "derived from the part of speech word". I seem to recall that prepositions are thought to usually derive from adverbs in Ancient Greek for instance. That would involve some work. It is simpler to show just one etymology. — Eru·tuon 20:33, 25 March 2019 (UTC)
@Rua, Erutuon About that, see Wiktionary:Etymology scriptorium/2017/July § legitimate. I wrote then that "The etymologies aren't exactly the same, since the verb comes from the adjective by conversion. It probably doesn't warrant two headers though". I have changed my mind, and actually agree with Rua that there being two - however slightly - different etymologies does warrant two headers. ChignonПучок 20:52, 25 March 2019 (UTC)

I realised I forgot to discuss the Pronunciation section with regard to this, the placement of which is even less consistent. When there is one etymology, it goes below the etymology section, but when there are multiple etymology sections it goes above the first etymology. Again, messy and inconsistent, and thus prone to errors (I had to fix one such error just now, diff). Moreover, sometimes we nest pronunciation sections under etymology sections (at level 4) if each etymology has a different pronunciation. As the relationship between pronunciations, etymologies and POS sections become more complex, the entry structure itself becomes more complex. What about an entry with multiple etymologies, where one etymology contains multiple POS sections that each have their own pronunciation? Or where there are 3 etymologies but only 2 pronunciations? The second proposal above would somewhat solve this, in that there is no complicated nesting but the POS simply subsumes the nearest pronuncation above it. But it can lead to this weird structure:

==English==

===Etymology===

===Pronunciation===

===Noun===

===Etymology===

===Adjective===

===Pronunciation===

===Verb===

Is it clear at all which POS has which pronunciation? Following the rule of "nearest section above", the noun goes with the first pronunciation and the first etymology, the adjective goes with the first pronunciation and the second etymology, and the verb goes with the second pronunciation and the second etymology. Not very clear if you ask me. This, again, is why I prefer the structure in which only POS is at level 3. Then it's immediately clear what belongs to what:

==English==

===Noun===

====Etymology====

====Pronunciation====

===Adjective===

====Etymology====

====Pronunciation====

===Verb===

====Etymology====

====Pronunciation====

With this structure there is no complicated nesting, everything is clear and every section has the same level everywhere. The downside of this method is of course potential duplication of the pronunciations, but it seems that's unavoidable without going back to a weird nesting structure again. Duplication of etymology is not really an issue, because as I mentioned above and in past discussions, no two terms actually have the same etymology. —Rua (mew) 20:03, 25 March 2019 (UTC)

I have always assumed that Etymology was meant to be the word's etymology and not the sense etymology so whether the noun mutated into an adjective or the adjective into the verb is of little concern. Trying to document it seems like splitting hairs. However, if you are making this change, how would you handle multiple etymologies for the same POS? Would you have, say, "Noun 1" and "Noun 2" or put both etymologies under one "Noun" heading? The latter would muddy the waters further if there were also adjective or verb POSs for those multiple etymologies so you end up with something like the 2nd adjective etymology is derived from the 3rd noun etymology. -Mike (talk) 18:34, 26 March 2019 (UTC)

We would have separate noun headings for each noun. We would not number them, as the numbering isn't actually useful and is subject to change anyway. Instead, we'd clarify which one we mean using glosses and {{senseid}}, like we already do currently.
As for "splitting hairs", I think I'd prefer being correct over being accused of doing that. From a linguistic point of view, the conversion of one POS into another without any change in the lemma form is no less notable than the conversion of one POS to another with a change in lemma form. Moreover, it's actually possible to choose the lemma form in such a way that identical terms are no longer identical, or the reverse. Imagine that we chose the imperative form for Dutch verbs instead of the infinitive. Then, verbs would have the same lemma form as the noun or adjective they might derive from. Would that in itself make a difference in how much etymological detail we give to the user? Of course not. Likewise, if history had chosen to declare the past tense of English verbs as the lemma form instead of the infinitive like now, then we'd be forced to put them on different pages and give them separate etymologies. If we then simply left the etymologies of the derived terms blank, it would be a disservice to our users and nobody would stand for it. In the same way, it does a disservice to the user to not include such information merely because the lemma forms happen to be identical. I therefore stand by the position that the question of whether we provide a separate etymology for a term should be independent on whether the lemma form of the term is identical to another. —Rua (mew) 21:39, 26 March 2019 (UTC)

I also sympathize with this proposal, “in which the presence or absence of an etymology section does not affect the level of the other headings”. Less strain for the eyes, less wasted time for resorting, easier botting. What will we do though with language sections containing multiple etymologies where some etymology is not written but it is self-evident from the gloss that it is to be found at a further link, e.g. we have some complicated etymology 1 and an etymology 2 which is merely a lame alternative form of some other term one does not necessarily even lose words about in the etymology section for example because it would be a duplication of what is in the gloss or in the main form where all alternative forms are explained? For if we just reassign levels there will be empty etymology sections, so apparently we need additional text at some places. For form etymologies we use {{nonlemma}} which refers us to a “main entry” – if the same is to be used in alternative forms entries {{nonlemma}} has to be renamed because this template shall be used independently of whether something is non-lemma but depending on whether one refers aliō. Fay Freak (talk) 14:37, 28 March 2019 (UTC)

This is an advantage of nesting etymology under POS. In the current setup, as well as the first two possibilities above, etymology still drives the whole structure of the entry, so we must include it for the sake of structure regardless of whether we want to. We've had to create silly templates like {{nonlemma}} to have something to put into those etymology sections that we'd rather not put anything in. If we nest etymology under POS, then we can just omit the section, and nothing in the page structure will break as a result. Instead, POS will drive the entry structure, which makes a lot more sense to me. —Rua (mew) 22:36, 28 March 2019 (UTC)
I’d rather have the setup where neither is nested under neither. Etymology level 3, POS level 3, Etymology level 3, POS level 3. For such cases like poucave where you can’t know whether noun or verb came first, plus the etymologies are often written in a fashion independent of the part of speech (including the frequent “from the root”, without there being a need to tell anything further about the the derivation type of a certain pattern or POS: if form II has a causative meaning you don’t need to write in an extra section that it is a causative, it generally is, and the Appendix:Arabic verbs is linked by the “II” in the headword). Without nesting by the mere order it should be clear to the averagely observant reader what belongs to what. According to the current layout POS comes under the etymology, so the reader holds it the same if it is sequentially under the etymology, but not nested under it. Nesting etymology under POS would be an unnecessarily great change, given this easier way to achieve a state “in which the presence or absence of an etymology section does not affect the level of the other headings”. Just this minor issue with then it being more needed to write something under the etymology headers like with {{nonlemma}}, but maybe not even since people will learn that the etymology sections themselves do not need to contain anything but are there without anything to signify that “here starts a different etymology”. PLUS amn’t I right to observe that this second variant with etymology under POS needs manual care while the conservative variant only needs to have etymology numbers removed and the level promoted which would be done by bot? Fay Freak (talk) 13:03, 29 March 2019 (UTC)
If it needs manual care, then that implies it's not machine readable and that's a problem in itself. Nesting etymology under POS has the least potential for mistakes, the most potential to spot mistakes, and it is immediately understood what goes with what. Your idea of empty etymology sections, just as a signal that the next POS does not belong to the previous etymology, is exactly what is broken about the current etymology-driven approach, and sadly neither of the proposals above solve that issue. At least with POS at level 4 you can still speak of the etymology section "containing" its subsections in some way, which does not apply when both etymology and POS are at level 3. Then you just have an empty etymology section that's prone to be removed by other editors because it appears to be useless. Moreover, not all editors will be aware that a new POS they insert already has an etymology section that will automatically apply to it, leading to errors. With nesting etymology under POS, that kind of mistake becomes impossible. The goal is to be more explicit while also reducing the number of different formats an entry can have. Both proposals above reduce the messiness of entries, but only etymology under POS can properly solve the problem. —Rua (mew) 17:17, 29 March 2019 (UTC)

Redundant messages on new talk pagesEdit

Why do we have two almost equally large and visible warnings at the top of the page when you create a new talk page? Here they are, for comparison:

  1. NOTE! Wiktionary's talk pages are usually not regularly followed by other editors. If you want to discuss this entry, please go to Wiktionary:Tea room instead, where more people will see your message. For general questions, please leave a message at Wiktionary:Information desk.
  2. Talk pages of individual entries are not usually monitored by editors, and messages posted here may not be noticed or responded to. You may want to post your message to the Tea Room or Information desk instead.

Can we please get rid of one of these? The more warnings there are to read, the more I imagine new users choose to ignore them. Ultimateria (talk) 17:44, 25 March 2019 (UTC)

I removed the more garish message from MediaWiki:Newarticletext. - TheDaveRoss 21:15, 25 March 2019 (UTC)
It looks so much better! Thank you. Ultimateria (talk) 15:21, 26 March 2019 (UTC)

Remove lemmas from Korean hanja and Vietnamese Hán tự entriesEdit

I suggest to strip the Korean and Vietnamese Chinese character entries of their lemma status (also any PoS and topical categories) Category:Korean Han characters and Category:Vietnamese Han tu. The same would apply to Zhuang and some other languages where the main form is not written in Chinese characters. It may take longer to remove the topical categories but it's the right thing to do.

A simple example: 한자 (hanja), written in hangeul is a lemma and a noun and 漢字, written in hanja is its hanja form and should only belong to Category:Korean Han characters. --Anatoli T. (обсудить/вклад) 23:48, 25 March 2019 (UTC)

If they are not lemmas, then what are they a form of? Non-lemmas are always a form of something else. —Rua (mew) 21:49, 26 March 2019 (UTC)
@Rua: It's a special case for these languages. They are not the main spelling form, consider them soft redirects, like transliterations, e.g. Category:Mandarin pinyin. We also agreed with User:Benwing2 to strip Category:Russian spellings with е instead of ё of the lemma status. It's OK to keep add them to language name non-lemma forms categories. --Anatoli T. (обсудить/вклад) 22:05, 26 March 2019 (UTC)
@Atitarev I agree with you. These are alternative spellings, similar to transliterations, and not lemmas. Benwing2 (talk) 00:57, 27 March 2019 (UTC)
How are they like transliterations if they have been used by principle historically? Why not decategorize Russian pre-1918 spellings then? And why not decategorize Serbo-Croatian Cyrillic spellings if you are that far, since these basically double the category entries? Plus they aren’t non-lemma forms even if you remove the lemma category. Those Russian spellings like актер (akter) are categorized neither as lemma nor as non-lemma. These Russian entries are sorted as Category:Russian spellings with е instead of ё, hence it would make sense to have entries for “Vietnamese lemmas in Chinese characters” resp. “Korean lemmas in Chinese characters” as distinguished from the “normal lemmas”, or similar. Lemma—non-lemma is a false dichotomy, this is a tertium, as has already been conceded by categorizing the Russian е-instead-of-ё-entries as neither. Fay Freak (talk) 14:24, 27 March 2019 (UTC)
Yes, they are neither lemma nor non-lemma forms. Benwing2 (talk) 15:16, 27 March 2019 (UTC)
I suggest having these: (1) Category:Korean terms in Han script and (2) Category:Vietnamese terms in Han script. KevinUp (talk) 08:55, 28 March 2019 (UTC)
  SupportΜετάknowledgediscuss/deeds 01:58, 27 March 2019 (UTC)
  Support KevinUp (talk) 12:18, 27 March 2019 (UTC)
The following PoS categories are available for Korean Han characters:
  1. Category:Korean nouns in Han script
  2. Category:Korean proper nouns in Han script
  3. Category:Korean adverbs in Han script
  4. Category:Korean pronouns in Han script
The following PoS categories are available for Vietnamese Han characters:
  1. Category:Vietnamese nouns in Han script
  2. Category:Vietnamese proper nouns in Han script
  3. Category:Vietnamese adjectives in Han script
  4. Category:Vietnamese verbs in Han script
  5. Category:Vietnamese adverbs in Han script
  6. Category:Vietnamese idioms in Han script
  7. Category:Vietnamese proverbs in Han script
For Vietnamese, there's also (1) Category:Vietnamese Han tu, linked by either {{vi-readings|hanviet=}} or {{han tu form of}}, and (2) Category:Vietnamese Nom, linked by either {{vi-readings|nom=}} or {{Nom form of}}.
However, I would prefer to have a separate category for single character entries:
  1. Category:Korean Han characters for single character hanja.
(This category currently contains both single character hanja and hanja compounds)
  1. Category:Vietnamese Han characters for single character Hán Nôm (both chữ Hán and chữ Nôm).
(This category currently contains only single character entries provided by deprecated {{vi-hantu}} that will be deleted soon)
What does the community think of having something like (1) Category:Korean terms in Han script and (2) Category:Vietnamese terms in Han script for compound word entries, after stripping them of their lemma status?
The reason for this is because I currently use incategory:"Korean lemmas" intitle:中 [1] and incategory:"Vietnamese lemmas" intitle:中 [2] to search for derived terms of Korean and Vietnamese . KevinUp (talk) 12:18, 27 March 2019 (UTC)
I don't think we need to make a distinction between single-word and compound-word entries, it's enough just to have a single category for all such terms with Han characters. Benwing2 (talk) 15:16, 27 March 2019 (UTC)
Single character terms in Sino-Xenic languages are notoriously difficult and extra efforts will is always required to provide disambiguation on endless homophones or homographs (in the main script), e.g. the Korean syllable/word/component (na) and a list (incomplete) of hanja with the same reading: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , etc. These ARE required for disambiguation but they are not the main form or lemma. --Anatoli T. (обсудить/вклад) 03:26, 28 March 2019 (UTC)
And as for Vietnamese Han characters, we need to further distinguish between chữ Hán (literary Chinese characters) and chữ Nôm (characters used for native Vietnamese). Also, a distinct category would help to monitor edits done on Korean/Vietnamese single Han character entries. KevinUp (talk) 08:55, 28 March 2019 (UTC)
Take a look here: Special:Contributions/27.3.73.64. The edits are correct (incorrect readings have been removed), but most Nôm readings have also been removed. I'll clean it up later but the point is, we need a separate category for single character entries, which are prone to all kinds of discreet vandalism due to it being rarely used. KevinUp (talk) 10:48, 28 March 2019 (UTC)
I agree we need a separate category for chữ Nôm characters. It doesn't need to have lemmata either. --Anatoli T. (обсудить/вклад) 00:12, 29 March 2019 (UTC)

Making etymological derivations more specific, retiring {{der}}Edit

When {{der}} was first created, it was just a generic replacement for {{etyl}}. But since then, we've created {{inh}}, {{bor}} and {{calque}}, all of which categorize more specifically. This has relegated {{der}} to the role of indicating "other" derivations. From what I can tell, though, it really only indicates three things:

  1. Indirect derivations, i.e. cases where a term was taken into an intermediate language, from which the term was then directly inherited, borrowed or calqued.
  2. Morphological derivations, especially from roots. These are really, in theory, instances of the other three; *hundaz is inherited from a term in some ancestral Pre-Germanic Post-PIE language we may call X, which itself was derived within that language from a descendant of *ḱwṓ. The actual sequence of events is then really PG *hundaz <inh (intermediate term in X) <affix (intermediate term in X) <inh PIE *ḱwṓ. The reason we use {{der}} here is that we don't know the intermediate term, the language in which it existed, nor the derivational morphology by which one was derived from the other within X.
  3. Cases that really should be labelled with one of the more specific templates, but where the editor just converted {{etyl}} to {{der}} without further thought.

I propose that we close the gap further by creating new templates specifically for the first two of these roles. I believe that this would then make the derivation-from-another-language templates/categories exhaustive, and then we may be able to retire {{der}} altogether once the erroneous uses of the third type are cleaned up. The category for "terms derived from" would remain, but only as a parent category for the more specific ones to be categorised in.

We may further distinguish between indirect borrowings and indirect inheritance, depending on what the relationship in the intermediate language is. While our current practice for {{bor}} is to use it only for borrowings into the current language, I have begun to think that this may be a mistake. English is inherited from Middle English, and so in a sense, it inherits borrowings from it as well. Distinguishing terms borrowed into English from terms borrowed into Middle English and then inherited doesn't make that much sense in practice; they are both terms that were borrowed sometime during the long unbroken chain of inheritance stretching from modern English back to pre-Indo-European times and beyond. So perhaps we change one of two things here:

  1. Label modern terms that are inherited from ancestral borrowings as both "English terms derived from (ancestor)" and "English terms borrowed from X", ignoring the fact that the borrowing language was not modern English but its ancestor.
  2. Refine this concept in the form of "English terms borrowed from X inherited from Middle English". Then we can still distinguish English terms borrowed within English from terms borrowed into Middle English, Old English, Proto-Germanic and Proto-Indo-European as appropriate.

The first solution is easier, but the second conveys more information to our users. If we do either of these things, then what remains of indirect derivations consists only of indirect borrowings, so we can label them as such.

Wiktionary is often praised for its etymological content, so this could be a way to reinforce an existing strength. —Rua (mew) 02:11, 28 March 2019 (UTC)

If possible, can we use a bot to convert existing {{etyl}} to {{der}} before ultimately retiring {{der}} itself? KevinUp (talk) 09:18, 28 March 2019 (UTC)
That would just increase the number of erroneous uses of {{der}}. —Rua (mew) 12:06, 28 March 2019 (UTC)

1:
So {{inhbor|entry language|source language}} for inherited borrowings?
2:
Ergo {{inhaf|lingua lemmatis|lingua quae affixit|lingua radicem impetrans}}, e. g.: {{inhaf|de|gem-pro|ine-pro|*hundaz|*ḱwṓ|af=affixum quod cognoverimus}}?
3:
> pretending that with further thought, even with ideal literature acquaintance, one would always know the nature of the derivation
Maybe one needs a special template signifying “I specifically used this template because I don’t know the exact derivation”, which could be {{der}} but isn’t it as this has been used now with ineradicable different signification. There are also cases of a word claimed to be X, and claimed to be Y, which are incompatible but all categorize fully. Fay Freak (talk) 14:19, 28 March 2019 (UTC)

Oppose retiring {{der}}. An interesting idea to become more specific, but the end goal is not one that I see as desirable. Besides unknowns, as Fay Freak mentioned, there are other issues like words in creoles derived from their lexifiers or Yiddish words from Hebrew — you would probably subsume those as a borrowing, but it is not how those linguistic traditions consider these words, and we use {{der}} as a result. —Μετάknowledgediscuss/deeds 14:41, 28 March 2019 (UTC)
Being able to retire {{der}} is not the end goal, it's only the logical consequence if we have replaced it in every case by something more specific. All I did here is lay down some of those specifics, where they lead is another matter. It shouldn't be a reason for objecting to the first sensible step merely because you don't like where it might lead. —Rua (mew) 22:33, 28 March 2019 (UTC)
And I'm not objecting to the first step of creating specialised templates. But I pointed out multiple examples of {{der}} being used in ways you seemingly hadn't considered. (They might need more specialised templates, in fact.) —Μετάknowledgediscuss/deeds 23:31, 28 March 2019 (UTC)
Yes, I figured there would be cases I hadn't considered. I did say "from what I can tell" after all. —Rua (mew) 23:35, 28 March 2019 (UTC)
  • Keep der, avoiding forcing contributors to be more specific than their knowledge affords. Furthermore, I consider this whole inherited vs. borrowed business in our etymologies to be rather unimportant and nuisance, and I would be happy to see it removed, which is not going to happen but anyway. --Dan Polansky (talk) 19:26, 29 March 2019 (UTC)
  • Keep: I agree that people have been a bit lazy using {{der}} in cases where they could be using {{af}}, etc., but I see no reason to get rid of {{der}}. --{{victar|talk}} 05:09, 30 March 2019 (UTC)
  • I can see the etymology templates heading in the direction of the old category-boiler templates, where only a few specialists will know the right template to use. Sure, a few obvious cases like {{inh}} and {{bor}} save some typing, but if we try to make a template for every possibility, you start running into etymological spaghetti like pikake and creme anglaise. If you want to start getting complex, add parameters to {{der}} so all that complexity is explained on one template's documentation page. As for your other idea: I don't think both mister and magister or script and shrift belong in Category:English terms borrowed from Latin. Chuck Entz (talk) 06:01, 30 March 2019 (UTC)
    Yes, I seem to recall that one reason we decided back when {{bor}} was adopted to only use it for borrowing by the L2 language is that it would feel odd to say e.g. English iron was "borrowed from Proto-Celtic" as if the two languages were coeval or modern English speakers intentionally adopted a dead language's word like they did with ceorl or ghrelin. I grant that that's subjective and other people feel otherwise; I also grant that happenstance does lead to 'inconsistency' when a word that Hebrew borrowed 3,000 years ago would be considered to be "borrowed" into Hebrew just as much as a word borrowed today, because we don't split that language up by age.
    I'm really unconvinced that "English terms borrowed from X inherited from Middle English" would be a good idea. - -sche (discuss) 05:10, 31 March 2019 (UTC)
  • Keep per Chuck. DCDuring (talk) 15:55, 30 March 2019 (UTC)
  • I must keep use of der. A word may be directly borrowed from language A, but indirectly derived from another language B, that we cannot use either bor or inh to indicate B. Sometimes, we do not know if a word is directly borrowed from language A or not. So we cannot use bor in this case. --Octahedron80 (talk) 05:28, 31 March 2019 (UTC)

Unicode imagesEdit

An image of Unicode characters must show up together with the actual characters in their entries so that they can be seen regardless of browser, fonts etc. For example, no image of U+2053 appears either in swung dash nor in --Backinstadiums (talk) 19:26, 28 March 2019 (UTC)

Not replace, but possibly supplement. Equinox 19:30, 28 March 2019 (UTC)
@Equinox: I've edited OP. --Backinstadiums (talk) 19:34, 28 March 2019 (UTC)
Then do it. There doesn't seem to need to be policy here, just add images where you feel they are helpful.--Prosfilaes (talk) 00:34, 29 March 2019 (UTC)

I'm leaning against this: how do we decide which characters to show as images? All of them? —Justin (koavf)TCM 19:36, 28 March 2019 (UTC)

Welcome, foreignersEdit

Many Wiktionaries have a Wiktionary:Welcome, newcomers page. But for Swedish Wiktionary, I also added a Welcome page in English for those who don't speak much Swedish (yet). I think they can still make useful contributions. Why don't you try it. I'd welcome feedback. Feel free to copy the idea. --LA2 (talk) 22:37, 28 March 2019 (UTC)

What language would we write it in on the English Wiktionary, though? Swedish? —Rua (mew) 22:38, 28 March 2019 (UTC)
I'll leave that to each volunteer to figure out. I assume that many users here, just like you and me, in addition to English, speak one lesser known language whose Wiktionary could need some more contributors. --LA2 (talk) 22:54, 28 March 2019 (UTC)
I don't know how I feel about encouraging people to create barebones entries. You probably don't have much in the way of infrastructure at sv.wikt like we do here for the likes of Swahili, but I wouldn't want people to create entries lacking basic information like noun class (compare our sukari). —Μετάknowledgediscuss/deeds 23:29, 28 March 2019 (UTC)
A major difference is that en.wiktionary has 4000 entries in Swahili and sv.wiktionary so far only has 200. Swahili is not my main concern, but it was a good neutral example of a language for which we currently did not even have a translation of sugar. We'd be happy if someone comes by and adds the names of the months. The other example was Ukrainian, for which sv.wiktionary has 1200 entries and a fully developed system of inflection templates for nouns, adjectives and verbs, e.g. sv:писати. --LA2 (talk) 00:52, 29 March 2019 (UTC)
Maybe Esperanto would be a good choice? – Jberkel 21:04, 1 April 2019 (UTC)

a distrust of politiciansEdit

distrust says it's uncountable but one can find " a distrust of politicians". Is the situtation here similar to "a fear of heights"? The Oxford Genie dictionary includes a in the entries of both terms, but I do not exactly know why --Backinstadiums (talk) 17:49, 29 March 2019 (UTC)

Similar to disbelief, it should say: (usually uncountable, plural distrusts): [3], [4], [5].  --Lambiam 19:13, 29 March 2019 (UTC)
I updated that, because I agree. - TheDaveRoss 14:47, 1 April 2019 (UTC)
Is this the new term for a group of politicians, like a flock of birds? :D —Rua (mew) 21:19, 31 March 2019 (UTC)
This has entered my vocabulary. - TheDaveRoss 14:47, 1 April 2019 (UTC)
It's a construction used to describe characteristics, along the same lines as "she has a lively personality", "the bird has a large beak", "he has a peculiar laugh", etc. This particular variation isn't really countable, though: you would say "they all have a distaste for formality", and never "both have fears of heights". Chuck Entz (talk) 00:02, 1 April 2019 (UTC)
I have added the noun plural, which was missing. It is almost never heard. I did find a couple of examples in old religious writing: "So often did they provoke God by their distrusts and murmurings..."; "The titles here given them, were enough to shame them out of their distrusts." Equinox 00:09, 1 April 2019 (UTC)
The three links I put above are to examples of (in these cases non-religious) use.  --Lambiam 14:19, 1 April 2019 (UTC)

CFI-amendment: excluding typos and scansEdit

In light of this comment, I've decided to draft a new proposal: Wiktionary:Votes/2019-03/Excluding typos and scannos. Comments and improvements are welcome. ChignonПучок 18:25, 29 March 2019 (UTC)

@Chignon: It seems good to me. You should set a start (and end) date, because there's no point in waiting much longer; you won't get more feedback once this is buried. I'd recommend starting a week from when you posted this, following usual custom. —Μετάknowledgediscuss/deeds 04:33, 31 March 2019 (UTC)

Should an optional orthographic convention in Pashto to distinguish final /ə/ and /a/ be followed?Edit

The existing declensional templates for Pashto [[Category:Pashto declension-table templates]] utilise a final he-hamza ۀ to denote final /ə/ which Anne Boyle David's grammar (Descriptive Grammar of Pashto and its Dialects) describes as a "suggestion" (p. 29). This would put it in contrast to final hamza ه which would denote final /a/. I as much as cannot find the character (U+06C0) on an OS X Afghan Pashto keyboard. Note that any non-final /ə/ is not denoted using any diacritics in the Pashto Perso-Arabic script. Should newer entries respect this suggestion or not, and the in-place templates be left as they are or modified to show both variants? Bringing in Vahagn Petrosyan, Qehath, and Adjutor101. Sinonquoi (talk) 12:01, 30 March 2019 (UTC)

I don't know anything about this subject. --Vahag (talk) 12:21, 30 March 2019 (UTC)
Similarly, my Pashto isn't good enough that I can make any useful comment. — [ זכריה קהת ] Zack. — 02:06, 31 March 2019 (UTC)
@Sinonquoi, could you give an example? --{{victar|talk}} 02:14, 31 March 2019 (UTC)
The word ویښتۀ for instance which is often written simply as ویښته. Sinonquoi (talk) 04:48, 31 March 2019 (UTC)
@Sinonquoi: Pashto French [Dr. M. Akbar Wardag - Qamosona.com] dictionary just uses the spelling وېښته(weӽtǝ́) with the normal final ه‎ and transliterates with an "-ǝ". Pashto Wiktionary also has وېښته . If this spelling with an ۀ(ë) is stricter and more reflective of the pronunciation, I guess we can use it as the main dictionary form and the one with an ه(a) - a redirect or an alternative form.
(I am just using Qamosona's transcription "weӽtǝ́", perhaps we should transliterate ویښتۀ‎ as "weẍtë", as per WT:PS TR.) --Anatoli T. (обсудить/вклад) 05:43, 31 March 2019 (UTC)
What about including the /ǝ/ spelling in links and headwords, but stripping it from page titles, like Russian or Latin diacritics? —Suzukaze-c 05:57, 31 March 2019 (UTC)
(edit conflict) @Suzukaze-c It's an option, especially considering that in Persian, the same letter ۀ‎ is not part of the headword, probably not considered a separate letter when used in [[ezafe]], e.g. ایالات متحده آمریکا(the United States of America), transliterated as "eyâlât-e mottahede-ye âmrikâ" uses "ایالات متحدۀ آمریکا" in the headword. It's a different usage in Persian, though. I'm stretching my knowledge here. All depends on how it's perceived. The Arabic ة‎ used to be perceived as a letter ه‎ with diacritics but now it's a letter. We also write out hamza over and under alif. That's why I asked if it's considered a stricter spelling (or a diacritic). --Anatoli T. (обсудить/вклад) 06:12, 31 March 2019 (UTC)
@Sinonquoi, I would support moving ویښتۀ‎ to وېښته‎ and placing |head=ویښتۀ in the {{head}}. We can also strip ۀ from links, effectively redirecting them to ه pages. --{{victar|talk}} 06:06, 31 March 2019 (UTC)
I will support depending on what the letter means to Pashto speakers, a diacritic or stricter spelling. We use "tāʾ marbūṭa" and hamza over and under alif in headwords but write out diacritics, which only serve as a pronunciation guide. --Anatoli T. (обсудить/вклад) 06:12, 31 March 2019 (UTC)
@Sinonquoi: You haven't expressed your own opinion. This can go either way. Meanwhile, I have made the entry وېښته‎ using "ویښتۀ" in the header. --Anatoli T. (обсудить/вклад) 00:40, 1 April 2019 (UTC)
I think it's best to use just ه since it's prevalent. ۀ is used very little. That only leaves us with the problem of the current templates which all use it. Sinonquoi (talk) 08:32, 14 April 2019 (UTC)
@Sinonquoi: I have swapped the entries around. --Anatoli T. (обсудить/вклад) 08:50, 14 April 2019 (UTC)

Audio files for example sentenceEdit

Hi, everyone. I am an intermediate English learner. Recently I collected and uploaded some audio files of english speeches at Wikicommons and made some links to those audio files from this English wiktionary or Korean version. This is for the educational purpose for English learners like me to listen some sentences in which a word is used. So, I think it is very useful and meaningful works. But some people don't agree on my opinion. Please see this. [6] I wonder what other people think about it.HappyMidnight (talk) 00:58, 31 March 2019 (UTC)

The audio recording in question sounds bizarre. We don't normally have audio for quotations, but if we do, it should be good audio, not that dreck. —Μετάknowledgediscuss/deeds 04:30, 31 March 2019 (UTC)
I could not feel those audio files so bizarre or dreck. Now that I understand it. Thank you. HappyMidnight (talk) 05:29, 31 March 2019 (UTC)
From the VOA website: “Learning English [broadcasts] use a limited vocabulary and are read at a slower pace than VOA's other English broadcasts. Previously known as Special English.”  --Lambiam 10:02, 31 March 2019 (UTC)

April 2019

Including or excluding ethnic slurs under synonyms for ethnicityEdit

Recently, User:Jimbo2020 removed the ethnic slurs/derogatory terms under the synonyms of Somali. On their user talk page, they argue that the precedent is to "not have ethnic slurs as synonyms unless they are historically significant". The examples for entries with synonyms including ethnic slurs were Chinese, German and African-American, while those that did not include them were Italian, Finn, and Oromo (the last two of which do not have any synonyms). I don't think there are any specific guidelines on this, so it would be a good idea to come up with at least something. — surjection?〉 18:42, 1 April 2019 (UTC)

I don't know what "historically significant" means. All words are "historically significant". DTLHS (talk) 18:44, 1 April 2019 (UTC)
Wiktionary is not censored. If they are or were at one time used as synonyms, they belong in the list of synonyms. —Rua (mew) 18:54, 1 April 2019 (UTC)
It's not about censoring. Kraut would be a "historically significant" entrance under German. Listing a marginally used neologism like muzrat (which should be deleted btw) under Muslim would not be. Almost all ethnonym pages do not list any ethnic slurs unless the word has a storied history in the English language or is particularly relevant to English speakers. BTW The American page currently has Ameritard listed as a hyponym, does that look right to you? Jimbo2020 (talk) 18:57, 1 April 2019 (UTC)
Based on the definition as "stupid or ignorant American", I would say yes, since it describes a subset of Americans. — surjection?〉 18:58, 1 April 2019 (UTC)
Again, are they synonyms? Then we list them as synonyms. Have you ever looked at some of our Thesaurus pages? They're full of offensive terms. —Rua (mew) 18:59, 1 April 2019 (UTC)
Once again this is about a general style precedent check the pages for Arab and Pakistani and Italian, where are the slurs listed as synonyms? Jimbo2020 (talk) 19:01, 1 April 2019 (UTC)
You've pointed out that those entries are missing some synonyms, so someone will hopefully get around to adding the missing ones. —Rua (mew) 19:04, 1 April 2019 (UTC)
This would seem to undermine our mission as a descriptive reference work. It would not undermine it if we RfVed allegedly offensive terms, though the RfV process itself would advertise them. DCDuring (talk) 19:09, 1 April 2019 (UTC)
I don't understand. How does including all synonyms go counter to our mission? The opposite seems true to me. —Rua (mew) 19:42, 1 April 2019 (UTC)
Sorry about the misleading indentation and the ambiguous deixis of this. I was referring to the deletion of content on apparent grounds of offensiveness. DCDuring (talk) 22:34, 1 April 2019 (UTC)
Ah, ok. Thanks for clarifying. —Rua (mew) 22:46, 1 April 2019 (UTC)
Mehhh. I do sympathize with the desire to not (in effect) "promote" obscure derogatory terms by putting them as synonyms of common terms (like the example of muzrat on Muslim, above), but precedent certainly seems to be that they would be included, with appropriate tags of course (e.g. "derogatory, rare"), along with any alternative spellings (including e.g. rare and obsolete ones which someone might also complain about the oddness of "promoting"). A possible compromise would be to put them in a collapsed box like related and derived terms are put in, or to offload them to a Thesaurus page and have the synonyms section direct people to it. But what I would regard as the usual approach, of just listing them as synonyms with appropriate tags, seems OK. - -sche (discuss) 00:05, 2 April 2019 (UTC)
Terms that meet our CFI should be included, also if they are offensive – until we decide to change the CFI. At the same time we should be careful to mark offensive terms as offensive. Under German (the noun) the synonym “Kraut” is labelled offensive. I think “skinnie” is at least as offensive – and not only to the person being derogated with the slur, but to anyone with sensibility.  --Lambiam 20:31, 2 April 2019 (UTC)
Note that the change did not remove the entries for skinnie/Skinnie, just their appearance as synonyms for Somali. It's not entirely unreasonable to take the position that a slur is not an exact synonym for the corresponding more neutral demonyms. But a slur does have a semantic relationship to the corresponding demonym. Is it an antonym, a coordinate term? Should it appear under 'See also'? For consistency should we make sure that mutt and mongrel do not appear as synonyms of mixed-breed/mixed breed (Lemmings have it.) because they are pejorative?
The likelihood that anyone with pejorative intent will come to Wiktionary to find some good ones is negligible. It is much more likely that someone will come here looking to object to our inclusion of the pejoratives. So this seems to be a matter of w:virtue signalling rather than something likely to have a bad effect outside of the potential controversy. It is a question of our ascription of virtue to descriptivism vs. the proscription against any purported encouragement or even license of the use of ethnic slurs. DCDuring (talk) 21:30, 2 April 2019 (UTC)
If the issue is that words like "muzrat" are absurdly rare (which may be true), it seems this is a problem with listing synonyms of anything, not just of ethnicities. Equinox 21:39, 2 April 2019 (UTC)

Fun game againEdit

Hi all. As last year we had an excellent time playing a multilingual board game, I'd like to repeat this year. I set up Wiktionary:Random Competition 2019. We'll start sometime soon provided there's someone to play with me. --I learned some phrases (talk) 10:26, 2 April 2019 (UTC)

URL shortener for the Wikimedia projects will be available on April 11thEdit

Hello all,

Having a service providing short links exclusively for the Wikimedia projects is a community request that came up regularly on Phabricator or in community discussions.

After a common work of developers from the Wikimedia Foundation and Wikimedia Germany, we are now able to provide such a feature, it will be enabled on April 11th on Meta.

What is the URL Shortener doing?

The Wikimedia URL Shortener is a feature that allows you to create short URLs for any page on projects hosted by the Wikimedia Foundation, in order to reuse them elsewhere, for example on social networks or on wikis.

The feature can be accessed from Meta wiki on the special page m:Special:URLShortener. (will be enabled on April 11th). On this page, you will be able to enter any web address from a service hosted by the Wikimedia Foundation, to generate a short URL, and to copy it and reuse it anywhere.

The format of the URL is w.wiki/ followed by a string of letters and numbers. You can already test an example: w.wiki/3 redirects to wikimedia.org.

What are the limitations and security measures?

In order to assure the security of the links, and to avoid shortlinks pointing to external or dangerous websites, the URL shortener is restricted to services hosted by the Wikimedia Foundation. This includes for example: all Wikimedia projects, Meta, Mediawiki, the Wikidata Query Service, Phabricator. (see the full list here)

In order to avoid abuse of the tool, there is a rate limit: logged-in users can create up to 50 links every 2 minutes, and the IPs are limited to 10 creations per 2 minutes.

Where will this feature be available?

In order to enforce the rate limit described above, the page Special:URLShortener will only be enabled on Meta. You can of course create links or redirects to this page from your home wiki.

The next step we’re working on is to integrate the feature directly in the interface of the Wikidata Query Service, where bit.ly is currently used to generate short links for the results of the queries. For now, you will have to copy and paste the link of your query in the Meta page.

Documentation and requests

Thanks a lot to all the developers and volunteers who helped moving forward with this feature, and making it available today for everyone in the Wikimedia projects! Lea Lacroix (WMDE) (talk) 11:57, 3 April 2019 (UTC)

The relationships between lemmas and formsEdit

Why is colours the plural of colour and not of color? The obvious answer to this question would be that the spellings are different. But I ask you to look at little deeper at this question. All our definitions, etymology and translations are currently on the page color, so that is clearly the lemma. Yet, if you look up colours, then you don't get sent to the lemma, but instead to colour, which doesn't actually have any information and just redirects you a second time. A lot of our entries have this idea that there is some kind of "main" term, a lemma of sorts, which has inflections. But as you saw here, the lemma isn't always the actual lemma (the page that defines the term). Instead, we've created a kind of intermediate tier that is not a lemma, yet it has inflections as if it were a lemma. The result is this double indirection.

Having to hunt for links just to get to the definitions of a term is really bad for users. Someone who looks up colours is not interested at all in colour, which has no useful information. They are looking for color, where the definitions, etymology, translations and everything else useful are. And it begs the question: why is colours not defined as an alternative form of colors? It's equally valid, after all. Moreover, forcing this kind of "sublemma" structure gets really confusing in cases where it doesn't work so neatly. A single form could belong to multiple possible sublemmas (alternative forms). better is the comparative of good, but it is equally the comparative of the alternative goode. In highly inflected languages, you can have quite complicated situations, where there are multiple possible lemma forms, yet all the other inflections are shared. Inflections can sometimes have their own inflections; participles are well-known examples. All this increases the mental burden on the editor who somehow has to figure out how to translate the situation into Wiktionary's conventions, and also on the user who has to jump through multiple hoops to get to the real lemma.

I would like to re-examine the relationship we have between lemmas and forms. There is really only one true lemma here, because only one of the entries has a definition. It's the relationships between the different forms that is throwing us off, because we introduce concepts like "alternative forms": lemmas that aren't lemmas. The way I would analyse the situation above is that there is one lemma (which itself has no inherent written representation) with multiple possible representations of both the singular and plural. color and colour are singular forms of this lemma, and colors and colours are plural forms of this lemma. Each of the forms is used by some subset of English speakers, but they all belong to one lemma, not two. We are hamstrung by the need to place definitions, etymology and translations on the page of one of those forms, and by convention that is the singular, so we picked one of the possible singular forms and placed everything there. But it would be beneficial if we could let go of the idea that the singular is therefore "special", that it has its own inflections and cannot be an inflection itself. There is really no need for alternative forms, and the complications they bring, if we can accept that color is simply the lemma of four forms: color, colour, colors and colours. —Rua (mew) 20:36, 3 April 2019 (UTC)

I don't like your way of doing it because it suggests to me that someone took the plural colours and decided to respell the plural specifically. I think the real solution here is to come up with a system that can show the full entry regardless of which spelling is visited, with appropriate modifications (I realise this won't be easy due to accuracy of citations etc.). Equinox 20:45, 3 April 2019 (UTC)
That's only because we have to choose one of the possible spellings/forms to place the definition at. If we didn't have to do that, if the lemma could be entirely detached from the way it's spelled, then that would no longer be a problem. They'd simply all be lemmas of entry 19515, or something like that. Unfortunately, as I said, we're hamstrung by having that requirement. However, I don't think that should be an excuse to convolute the relationships on purpose, by introducing multiple "fake" lemmas as intermediates when there is really only one. —Rua (mew) 20:50, 3 April 2019 (UTC)
Also note, I'm not directly proposing anything to be supported or opposed. Rather, I'd like people to challenge the assumptions we've always made on Wiktionary, and consider other options. Some of what I said is inspired by Wikidata's data model. Wikidata strictly separates lexemes from forms, where lexemes contain one or more forms, but always at least one. Forms have grammatical properties such as "singular" or "plural", they have a written representation, and they have a pronunciation, all of which lexemes do not have. The representation of the lexeme (the lemma in their terminology) is not strictly tied to how it's written. The lexeme for our color is titled colour/color for example. It seems that most of the problems I described above arise from tying lexemes too closely to one particular written form. If we could treat the lemma form as simply the place where everything is gathered, and not as a word, then things might be easier for us. —Rua (mew) 21:01, 3 April 2019 (UTC)
I can supply a slightly more extreme example: Medises, an inflected form of Medise, a (variant capitalization of medise, which is itself a) variant spelling of medize. I wouldn't want to define Medises as being an alt form of medizes and link to that non-lemma, but I think we could simply pipe the link to the lemma, i.e. define Medises as: Third-person singular simple present indicative form of [[medize|Medise]]. "Colours" could likewise be: plural of [[color|colour]]. The only downside is that that might be what Wikipedia calls an "Easter egg", a link that doesn't go where a reader would necessarily expect, if they expect it to go to the display form and not the place where the content is. However, that doesn't seem much different from how e.g. Mēdōrum goes to Medorum, not Mēdōrum, and since medize mentions Medise as an alternative spelling, a reader should not be confused for long. Would that be a simple solution? (Is this what you were already thinking of, or...?) - -sche (discuss) 23:28, 3 April 2019 (UTC)
How about something like this: at colours have "plural of colour (see [[color]])" giving "plural of colour (see color)". That way they see both forms, but they're linked to the lemma. Chuck Entz (talk) 03:10, 4 April 2019 (UTC)
What should be done for cases where the inflections belong to multiple alternative forms of the same lemma? Or the extreme case where the lemma form is the only form that differs between them? —Rua (mew) 11:39, 4 April 2019 (UTC)

@Rua: Sorry for chiming in. I think the problem is that the term “form” in “lemma form” and “alternative form” on Wiktionary is used to refer to two different concepts: spellings and inflected forms. In the post above, color and colour are different spellings of the same word form ˈkʌl.ə(ɹ), while ˈkʌl.ə(ɹ) (color, colour) and ˈkʌl.əz (colors, colours) are different forms of the same lexeme color. So there are really two levels of hierarchy, but current Wiktionary terminology flattens them into one. If I'm not mistaken, the linguistic definition of a word (word form, to be precise) is its sound shape plus its meaning, and the spelling is largely irrelevant. I touched the point below, where I distinguished two kinds of categories, one dealing with word (form)s (e.g. CAT:Japanese proper nouns) and the other dealing with spellings (e.g. CAT:Japanese terms written with two Han script characters). --Dine2016 (talk) 05:37, 12 April 2019 (UTC)

Proposed change to zh-derEdit

zh-der currently automatically provides the Mandarin pinyin for entries that have Mandarin pinyin in zh-pron. But for those entries which don't have Mandarin pinyin in zh-pron, no romanization is given. I propose including the non-pinyin romanizations like the Yueyu Pinyin and Min Nan POJ. It does not have to be well thought out or well planned at this stage, it just needs to happen and then be refined over time. --Geographyinitiative (talk) 22:51, 4 April 2019 (UTC)

That would be very confusing to mix up different romanisations. Also, I think this topic is only for Chinese editors only, so this can be discussed at Wiktionary talk:About Chinese instead, rather than here. --Anatoli T. (обсудить/вклад) 23:14, 4 April 2019 (UTC)
moved to Wiktionary talk:About Chinese per suggestion --Geographyinitiative (talk) 23:30, 4 April 2019 (UTC)

IPA-to-speechEdit

Hi!

Are there any IPA-to-speech projects here?

I see there are a few FOS engines out there. How would/could they be incorporated?

Thanks. Saintrain (talk) 18:37, 6 April 2019 (UTC)

No such projects here, and also no plans. We’d rather have no audio representation than an inaccurate one. Even in narrow transcription IPA cannot reflect all nuances of human speech.  --Lambiam 07:06, 7 April 2019 (UTC)

Vote on excluding typos and scannos is liveEdit

A heads up: the vote on a proposed change to CFI that would exclude typos and scannos is now open. (See also the thread above titled CFI-amendment: excluding typos and scans.)  --Lambiam 07:13, 7 April 2019 (UTC)

Read-only mode for up to 30 minutes on 11 AprilEdit

10:56, 8 April 2019 (UTC)

Fortunately for us, English Wiktionary isn't on the list at phab:T220080. — Eru·tuon 10:59, 8 April 2019 (UTC)

Category:English coordinated pairsEdit

I came across this category and I'm trying to figure out what a coordinated pair is. We don't have a coordinated pair entry, and the description at the top of the category is not very helpful either. Could someone write a better description so me and future mes know what it's for? Thank you! —Rua (mew) 18:36, 8 April 2019 (UTC)

The membership in the category is an ostensive definition of the category. The meaning is SoP. I'll review the membership to see if any mes have erroneously included any terms. DCDuring (talk) 19:05, 8 April 2019 (UTC)
But what's "coordinated" about the pairs then? I really don't get it. It seems to be a category for just any pair of words that happen to appear together in an entry name. —Rua (mew) 19:39, 8 April 2019 (UTC)
I said I'd take a look and I have.
The easy cases are terms linked by coordinating conjunctions, principally and, or and their word-like equivalents ' n ', &, et. In these, each term in the pair is at the same grammatical and (usually) semantic level as the other. slowly but surely seems similar. The harder cases are the pairs linked by commas or hyphens/dashes. In ding-dong, willy-nilly (and others) the elements may or may not have distinct lexical existence and are, in any event, in Category:English reduplications. I'd be inclined to remove these from the category and refer to the reduplication category on the Category:English coordinated pairs page, either as a "see also" or by making it a subcategory. In another day, another dollar, finders, keepers, finders, keepers; losers, weepers, first come, first served and others, there is no coordinating conjunction. The semantic link seems to be not coordination but implication. I'd be inclined to removed these only if there is another plausible short category name that would describe them. I haven't thought of such a name.
I'd like other opinions. DCDuring (talk) 19:43, 8 April 2019 (UTC)
There is Coordination (linguistics) in Wikipedia. I don't pay much attention to the categories, but it would be nice if it had a description or a link to a Wikipedia article which would describe it. -Mike (talk) 21:36, 8 April 2019 (UTC)
If the description is updated, consider also updating Category:English coordinated triples. - -sche (discuss) 22:02, 8 April 2019 (UTC)

Japanese entry layout revisitedEdit

Hi. I'd like to propose the following long-term changes to the Japanese entry layout, and would like to have some of them incorporated into WT:AJA

  • A new citation format of Japanese terms: 日本 (にほん, Nihon, にっぽん, Nippon) or やまと (大和, , Yamato).
    • Currently many Japanese words are either cited with {{m|ja|...}} or {{ja-r}}. The disadvantage of the former is that there is no way to show both kanji and kana, or support multiple readings. The disadvantage of the latter is that (1) it takes up too much vertical space, discouraging editors from adding more synonyms, derived terms, etc. (2) The font size of the kanji is too big compared to normal citations Japanese terms, causing disharmony, and the size of the kana is too small on some computers, as Eirikr reports. I would like to employ the new format to cite all Japanese words and reduce the use of ruby to examples, and I think the best way is to modify {{ja-r}} to use the new format by default. This way we don't need to create new templates or mass-update mainspace entries. Please see User talk:Suzukaze-c#CSS for more.
    • I would also like to propose the new syntax {{ja-r|KANJI:KANA}} in addition to {{ja-r|KANJI|KANA}}. Editors can still use the second format, but other templates relying on {{ja-r}} can take advantage of the former. The reason is as follows: For most languages, one parameter is enough to enter a word (e.g. {{m|en|English}}), and the format of templates is pretty predictable (e.g. {{compound|en|place|holder}}). For Japanese, however, two parameters are often needed (e.g. {{ja-r|日本語|^にほんご}}), leaving different ways to place these parameters (e.g. {{ja-compound|日本|^にほん||}} versus {{ja-vp|終える|終わる|おえる|おわる}}). If we build the new syntax KANJI:KANA, then templates relying on it will have more consistent and more predictable syntaxes (e.g. {{ja-compound|日本:^にほん|語:ご}} and {{ja-vp|終える:おえる|終わる:おわる}}), which are also more interchangable with kanji/kana only versions (e.g. {{ja-vp|終える|終わる}}).
    • What about automatic fetching of the reading from the mainspace entry? For example, {{ja-r|日本料理}} should produce 日本料理 (にほんりょうり, Nihon ryōri) while {{ja-r|日本}} could produce 日本 [Term?] because there are many readings possible.
  • Eliminate sortkeys. Once the use of soft-redirection ({{ja-see}}) is established, there will be no need to categorize the kanji terms under kana. This is because {{ja-see}} copies categories from the lemma spelling to the non-lemma spellings, so all spellings of the term will appear in the same category. If we eliminate sortkeys, the kana part and the kanji part of a category will contain the same set of vocabulary, once in kana and once in kanji, so there is nothing to lose. More importantly, editors are liberated from the constant need to watch for categorizing templates (such as {{lb|ja|...}}) and add sortkeys.
  • Is there consensus on whether to lemmatize the wago vocabulary at kana spellings? I prefer to lemmatize terms at the most common spelling as a general rule, but make the core wago vocabulary an exception to it. First, wago terms have a greater degree of independence from and variety in combination with kanji. The most common kanji spelling is not necessarily the intended meaning it is used (e.g. 帰る返り点), but kana is acceptable everywhere. Second, the etymology of non-transparent-compound wago terms are best illustrated by the kana form. In etymology sections, “くら (, kura) + (, wi)” looks better than “ (kura) + (wi)”. (By the way, when the focus is on the meaning, such as in synonym sections or entries from other languages, I think the kanji should still be put before kana.) On the other hand, I'm not sure about whether to do the same for transparent compounds like 繰り返す, which have less justification. This means that the border between “terms lemmatized at kana” and “terms lemmatized at the most common spelling (usually a kanji spelling)” can be very vague and arbitrary.
  • What about a custom reference template? {{ja-ref|DJR}} is much easier to type than <ref name="DJR">{{R:Daijirin}}</ref>. For common references, we can also make the template link to Wiktionary:About Japanese/references, rather than generating a <ref>, because ===References=== <references/> is also tedious to type :)
  • Simplify the interface of inflection templates. The current syntax is unnecessarily complex. I think only two formats are needed: {{ja-infl|type=1}} (for わらう) and {{ja-infl|つれて いく|type=iku}} (for 連れて行く; the space is merely for the purpose of romanization). Everything else, from slight irregularities (e.g. 行く, ある) to separating the stem and the ending (e.g. {{ja-go-u|わら}}) as well as detecting |sik= should be built into the module. This should make it easy for {{ja-see}} to copy inflection tables around. With the current templates, {{ja-see}} would need to recognize both Category:Japanese inflection-table templates and {{ja-conj-bungo}} as well as learn their quirks (such as remembering to add |sik= when copying from もうでく to 詣で来), which is too tiring and error-prone.

(To be continued.) (Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Dine2016 (talk) 10:04, 9 April 2019 (UTC)

  • Generally in favor.
What is sik?
‑‑ Eiríkr Útlendi │Tala við mig 21:37, 9 April 2019 (UTC)
It is short for suffix_in_kanji, one of the parameters of {{ja-conj-bungo}}, used for example in the conjugation of .  --Lambiam 15:11, 10 April 2019 (UTC)
  • I have no real objections (though I don't think I understand all of the technical specifics). I do support the principle of using the most common form of a lemma rather than having a language-wide rule. In other words, treat e.g. 日本, 学校, する, and きれい each as 'main' entries, rather than having a general preference for kanji or kana. Cnilep (talk) 03:54, 10 April 2019 (UTC)
  • I think these are improvements that I expect to be uncontroversial. Some of these proposals are easy to implement, but I feel a plan is needed on how to roll out the more involved changes. As to lemmatization – apart from the fact that we need to strike a balance between what is most useful to the users and what is a reasonable effort to ask from the editors – which form to prefer is an issue for all languages offering alternatives that in the end needs to be addressed on a case-by-case basis, and if for any specific case two choices are more or less equally good (or bad), there is no point in losing sleep over which one to choose. It will be helpful to offer advice on such issues in Wiktionary:About Japanese.  --Lambiam 15:11, 10 April 2019 (UTC)
  • Support.
    • As for the new syntax, perhaps it can be implemented in the major link templates, so that we can use {{compound|ja|FOO:ほげ|...}} or {{compound|ko|방:房|...}}. (Or perhaps make a it a new parameter in the style of {{{ts}}}, if there is larger objection to :.) Personally, I am worried about : taking on too much responsibility in the linking templates.
    • Sortkeys: no aprticular comment.
    • Wago: +1 for kana.
    • However, I don't really like {{zh-ref}}, TBH. (Well, I have =3+hr+ expand to ====== + References + <references/> in my IME; maybe that's why I'm not terribly bothered.)
    • inflection templates: Absolutely.
  • Suzukaze-c 19:58, 11 April 2019 (UTC)
    • Support everything but Oppose lemmatising wago terms on kana entries. Like before, we should lemmatise on the actual most frequent Japanese spelling, so  () (yomu) is the lemma, IMO, not よむ (yomu).
    • I don't understand what is going to happen with eliminating sort keys. Will  () (ほん) (Nihon) still be sorted by "に"? Also, how are terms with multiple readings are going to be sorted?
    • Welcome to all new Japanese specific templates, they are overdue.
    • I think we also need to add categories for Sino-Japanese terms, similar to the Korean and Vietnamese but possibly split into smaller categories, considering the complexity of etymologies, reduce info in kyūjitai entries. Care should be taken when using Middle Chinese templates for sources but this should be encouraged. --Anatoli T. (обсудить/вклад) 01:22, 12 April 2019 (UTC)
  • Curious as to your opposition to lemmatizing yamato kotoba at the kana spelling? The kanji spellings are irrelevant to the etymologies of yamato kotoba, only being applied later when Chinese characters were borrowed, and lemmatizing at kanji spellings actively obscures cognacy and relationships.
Take the verb tsuku, for instance. By kanji, this could be spelled 付く・着く・就く・即く・憑く・突く・衝く・撞く・搗く・舂く・築く・吐く・漬く・浸く・尽く・歇く・竭く. Most of these 17 spellings are etymologically related, sometimes very closely indeed. Lemmatizing by kanji spelling hides this interrelationship and adds confusion, and necessitates a lot of data duplication across entries. ‑‑ Eiríkr Útlendi │Tala við mig 03:38, 12 April 2019 (UTC)
Agreed re: the failure of sortkeys. The current approach was based on the assumption that the back-end capability would eventually support multiple sortkeys for a given lemma string. We reported the MediaWiki shortcoming years ago, and received zero response from the devs -- 黙殺された. It's clear they don't give two shits, so we clearly need to change our approach if we want something workable. ‑‑ Eiríkr Útlendi │Tala við mig 04:40, 12 April 2019 (UTC)
@Dine2016, Eirikr: OK, agreed and Support on both points and sorry for doing this again to you. I completely forgot about the convincing つく-argument :) --Anatoli T. (обсудить/вклад) 05:24, 12 April 2019 (UTC)
@Eirikr, Atitarev: Honestly speaking, I'm not sure if making an exception for wago terms is really a good idea. One problem with kanji spellings is that the most common spelling does not necessarily cover all meanings of the term (while kana does). For example, the 帰る spelling of かえる does not cover the sense “to turn over”, so that the etymology of 裏返る has to be written as “ (ura, ) + 返る (kaeru, alternative spelling of 帰る in the sense ‘to turn over’)”. If we take a kana-centric approach to wago terms, then the etymology of うらがえる is simply “うら (ura, , …) + かえる (kaeru, 返る, 反る, ‘to turn over’)”. Another problem is that wago terms may appear as the reading/furigana to entirely irrelevant kanji, such as in person's names. However, such problems only concern a small percentage of the wago vocabulary, so I'm doubting whether it's really worthy to employ the kana spelling for all wago terms, especially transparent compounds such as 追い払う(注). I think an alternative approach is to (1) either just lemmatize at the most common kanji spelling, but still list the whole range of kanji with {{ja-spellings}}, and sense division with {{ja-def}}, or (2) break the word into different sense groups (e.g. かえる(帰る・還る) and かえる(返る・反る)), and lemmatize each of them as if they were different words, but use soft redirection for the etymology and pronunciation sections to avoid data duplication (c.f. Daijirin's treatment of 帰る as 〔「かえる(返)」と同源〕). This way every word is lemmatized at the most common spelling, and everyone is happy. --Dine2016 (talk) 06:09, 12 April 2019 (UTC)
Um, maybe we can justify the wago exception on the basis that the JA WT is also making it. Or this argument: “if the ‘lemmatize at the most common spelling rule’ were applied for Chinese, then each Chinese word would be lemmatized/mentioned in Simplified Chinese or Traditional Chinese based on whether it is used more frequently in {Mainland China and Singapore} or {Taiwan, Hong Kong and Macau}, which would be too absurd.” --Dine2016 (talk) 06:50, 12 April 2019 (UTC)
Using simplified over traditional as the main entry is a legitimate request, which has been discussed but discarded for very important reasons, etymological and technical., btw. your link is not working: ja:Wiktionary:項目名の付け方. --Anatoli T. (обсудить/вклад) 07:12, 12 April 2019 (UTC)
  • I don't work on Japanese entries but wanted to make general remark about something which came up while working on parsing code to find wanted entries (replacement for Template:redlink_category): if we can avoid specialized templates like {{ja-compound}} it will really help to make these sort of automated tasks much simpler. Otherwise we need to have additional logic to cover the language specific linking templates. The general idea would be to push the responsibility into the core linking code (which could internally still delegate to other modules). This would keep the template "surface area" small. Another thing to avoid is nesting inside linking templates: I've seen some instances of {{bor|en|{{ja-r|....}}}} which is tricky to parse and produces invalid output. {{bor}} should be able to figure out what to do when used with Japanese entries. – Jberkel 00:16, 16 April 2019 (UTC)
    Support language-specific logic that is incorporated into the "main" templates. —Suzukaze-c 18:19, 18 April 2019 (UTC)

Allographic variantsEdit

@Eirikr, Suzukaze-c, Lambiam, Atitarev While I was editing まま, it occurred to me that there are allographic variants among kanji forms which are fully exchangeable in writing without regard to reading. For example, and are essentially the same kanji whether read as まま or まんま, and 間々 and 間間 are essentially the same kanji form whether read as あいあい, あいだあいだ, ひまひま or まま. Therefore I would like to create a variant of {{ja-see}} with a recursion depth of two instead of one:

{{ja-see|儘|v}}
For pronunciation and definitions of – see the following entries.
まま【儘】 ⇒まま
[particle]as it is; remaining in a certain state; while; still
まんま【儘】 ⇒まんま
[particle](uncommon) Alternative form of まま (mama, as it is; remaining in a certain state; while still)
(This term, , is an alternative spelling of , which in turn is a kanji spelling of several terms.)

Under this approach, only "canonical kanji forms" will contain a list of readings (e.g. soft redirects to まま and まんま), while other kanji forms will simply redirect to the canonical kanji form (e.g. soft redirects to rather than duplicates its content) and have the template fix "double redirects".

For this we need to define what the "canonical kanji form" is. For example:

  • Should we allow extended shinjitai and lemmatize tōrō “lantern” at 灯篭, or should we stick to the official shinjitai list and lemmatize it at 灯籠? I think we need to have a standard if we want to build jitai conversion modules.
  • Should we allow the 踊り字 in canonical titles? I prefer to do so, because it's an essential part of modern orthography just as shinjitai and modern kana spelling are.

Also, how should we list the variants of a canonical kanji form, such as kyūjitai? It seems that there are two ways to present kyūjitai: either we limit ourselves to JIS X 0208/0213 to comply with Japanese computing, or we utilitize Unicode as much as possible to adhere to the Kangxi dictionary printing forms. If the latter, we might want to list as the kyūjitai of , of , or even 𥳑 of , the last of which seems to lack font support.

(Well, sometimes the orthographic variants are not fully exchangeable. For example, needs to fetch a subset of content from , and so does けふきょう, which complicates matters.) --Dine2016 (talk) 15:47, 12 April 2019 (UTC)

As to “canonical kanji”, inasmuch as lemmatization is automated (it may be prudent to allow overrides), following the 2010 jōyō kanji list has the advantage of being a clear standard and avoiding potentially endless debates over which character is to be preferred on a case-by-case basis – but a disadvantage is that this may (at least in some cases) not be what most users would expect. But, as I wrote above, there is no ideal solution to lemmatization, and occasionally having to follow a soft redirect is (IMO) not a big deal. Whatever is decided, the decisions should be encoded in tables used by the software modules, so that future revisions of the list can easily be incorporated. As to the internal representation of other kanji, I am somewhat partial to Unicode as being the more portable approach across platforms and probably the future also of Japanese Industrial Standards. Disclaimer: I have no experience whatsoever editing Japanese entries, so my opinions should not be assigned as much weight as those of experienced editors.  --Lambiam 16:53, 12 April 2019 (UTC)
I compiled a list of 398 official shinjitai at Template talk:ja-spellings#kyūjitai, of which 67 kyūjitai were found to be encoded using CJK Compatibility Ideographs. Since modern computing systems now have better font support for Japanese glyphs, I would prefer to comply with Japanese computing for better searchability. We can still list older forms such as and 𥳑 which are not in JIS X 0208/0213 as "historical kanji" rather than "kyūjitai" and nonstandard simplified forms such as as "extended shinjitai" rather than "shinjitai". KevinUp (talk) 02:51, 13 April 2019 (UTC)
It seems reasonable to me. Perhaps we should enforce the use of "official" Japanese kanji as main spellings, including 籠, for the sake of consistency. And I prefer 々. —Suzukaze-c 22:04, 13 April 2019 (UTC)

Proposal to look to Wikisource for citations.Edit

I think that perhaps we should establish a practice of making Wikisource the first place that we look for citations for words, particularly older words. There are now thousands of books transcribed there. Cheers! bd2412 T 01:26, 10 April 2019 (UTC)

Why? —Μετάknowledgediscuss/deeds 01:29, 10 April 2019 (UTC)
It's hard to tell how accurate a cite is with just a sentence of context sometimes, and even if that editor can see the full context in Google Books, other editors, depending on their location and sometimes dumb luck, may not be able to. Wikisource will show the whole context to all users.--Prosfilaes (talk) 04:08, 10 April 2019 (UTC)
Wikisource is hardly the only site with full texts. If this is about providing more links that could be discussed. I see no reason to favor Wikisource over other sites such as archive.org. DTLHS (talk) 04:39, 10 April 2019 (UTC)
Archive.org doesn't generally provide full transcribed text, and the scans on Archive.org can often be quite slow to flip through. Wikisource offers both transcribed text and usually a link to the original scan.
Besides which, Wiktionary is hardly the only site with definitions. Should Wikisource work with Wiktionary, or should we link to other dictionary sites?--Prosfilaes (talk) 05:48, 10 April 2019 (UTC)
Archive.org is a rich but messy resource, some works have dozens of scans in varying quality, taking up precious editor time. Wikisource is definitely preferable here. There have been a few (community wishlist) proposals around to build tools to automatically extract and format quotations for the use in Wiktionaries but as far as I know nothing has materialized. – Jberkel 07:20, 10 April 2019 (UTC)
For what it's worth, {{Q}} (Module:Quotations) links to Wikisource quite a bit, for instance when you add a reference to the Iliad or Odyssey in an Ancient Greek entry: {{Q|grc|Il.|1|477|form=inline}}Homer, Iliad 1.477. — Eru·tuon 05:00, 10 April 2019 (UTC)
It might be particularly useful for all the requests for quotes from particular authors the templates for which some find annoying.
Any bias toward Wikisource is also a bias toward out-of-copyright sources and therefore old sources. I don't think we need that at all, even for terms that have been around for a while. DCDuring (talk) 12:21, 10 April 2019 (UTC)
A bias toward Wikisource over other similar collections of out-of-copyright sources doesn't change the overall issues. I'd like to have more quotes from the birth of our language. My problem is more about the dead period, from 1924 through ~1995 where we have the same problem basically anywhere we look. The works just aren't publicly available for copyright reasons anywhere.--Prosfilaes (talk) 03:28, 16 April 2019 (UTC)
  • I use Wikisource all the time for quotes. And there are the awesome lists on User:DTLHS/eswikisource. We should have User:DTLHS/enwikisource too, of course. I believe I asked D to make me one but the reply was something along the lines of that it was "full of crap" - yes, they were the exact words D used. --I learned some phrases (talk) 12:24, 11 April 2019 (UTC)

To expand on my original post:

  1. Wikisource is a sister project of ours, and as a Wiki any of us can edit there, meaning that we have some measure of control over what gets put there.
  2. Due to its joined status as a Wikimedia project, Wikisource is about as stable as Wiktionary. Other websites may disappear out from under our noses, but it is likely that Wikisource will exist as long as Wiktionary exists.
  3. To DCDuring's point, yes, Wikisource does have a lot of old sources but:
    1. We have a lot of old words, and there's nothing wrong with old citations if they define the word accurately.
    2. Wikisource actually does also have a lot of recent material, particularly public domain government documents including reports from various areas of specialization, and some case law; it can permissibly host much more of that.
    3. Didn't we just have this discussion last month about all these Webster's 1913 requests for quotes? Guess which Wikimedia project would be the one to host all the works from which those quotes could be found.
  4. Further to Jberkel's point, we could develop a tool to find and extract sentences containing sample words from Wikisource. It seems reasonable that somebody should be able to make a concordance of Wikisource, or of a particular subset of Wikisource texts.

Cheers! bd2412 T 22:11, 12 April 2019 (UTC)

Does Wikisource have Congressional committee testimony, especially Q&A? That's linguistically valuable and sometimes fun. Bureaucratic reports, not so much fun. DCDuring (talk) 02:05, 13 April 2019 (UTC)
That certainly falls within the remit of Wikisource, although I don't know how much of it there actually is at this time. bd2412 T 15:40, 13 April 2019 (UTC)

6 million entriesEdit

According to Equinox, Finnish konelypsy (automatic milking) is our six millionth entry, created by User:Surjection. —Μετάknowledgediscuss/deeds 14:34, 10 April 2019 (UTC)

That sounds like enough, job done. Time for us to find some new, worthy project; I wonder if Wikipedia still needs help generating lists of Pokemon... - TheDaveRoss 14:43, 10 April 2019 (UTC)
There are still one or two words in Wiktionary:Wanted entries so we shouldn't give up just yet. SemperBlotto (talk) 14:45, 10 April 2019 (UTC)
Onward to the six million and two!  --Lambiam 15:15, 10 April 2019 (UTC)
Did McDonald's stop at 6 million burgers? I think not. -Mike (talk) 20:40, 10 April 2019 (UTC)
Surjection and his milk again. *sigh* --I learned some phrases (talk) 12:21, 11 April 2019 (UTC)
  • Next question: who is the most prolific entry creator? DonnanZ (talk) 23:37, 18 April 2019 (UTC)
Equinox, followed by SemperBlotto. If you count machines then SemperBlottoBot, then WingerBot, then Equinox, then NadandoBot, then SemperBlotto. - TheDaveRoss 00:04, 19 April 2019 (UTC)
Oh, and if you only count euphemisms by sockpuppets, Wonderfool. - TheDaveRoss 00:05, 19 April 2019 (UTC)
Thanks, a predictable answer, I suppose, but I didn't think of pages created by bots. Do you base your figures on pages created by each editor? I do this for my own paltry figure. DonnanZ (talk) 08:29, 19 April 2019 (UTC)
This stats site gives a rundown of the top 76. If they had just one account over the years instead of 200, Wonderfool would be in 4th place, actually. --I learned some phrases (talk) 12:34, 19 April 2019 (UTC)
OK, so that doesn't have page creations. Where do you get your results, Dave? --I learned some phrases (talk) 12:36, 19 April 2019 (UTC)
It does give creates, that is the right set of columns. It is also no longer being updated with 2019 data and beyond, it has been replaced by Wikistats 2 which is garbage for things like user stats. X's tools is still current, but doesn't show lists of users. Not sure if there is a better view of users by contribution count available currently. - TheDaveRoss 12:44, 19 April 2019 (UTC)
Also WF is including bot edits in his count, but not in anyone else's, so [Citation needed]. - TheDaveRoss 12:46, 19 April 2019 (UTC)
It looks as though I rank 12th for edits, and 4th for creates (which is quite astonishing). If I look on my watchlist at "pages watched not counting talk pages" that gives me the current figure (56,708) as all pages created are automatically watched (and I don't watch any other pages). The 53,106 figure for creates in those stats is out of date of course, but seems to be accurate. DonnanZ (talk) 17:53, 19 April 2019 (UTC)

How should gerunds be handled?Edit

In English, gerunds seem to be entirely ignored, I guess because they are always identical to the present participle. However, that doesn't apply for other languages. There are a few specific cases where this is relevant.

The first is Dutch. Dutch has a gerund, but it's identical in form to the infinitive, which is also the lemma form. We usually don't make form-of entries for forms that are the same as the lemma, so we have no entries for Dutch gerunds at all. It is mentioned in the inflection table, though, see roepen. As shown in the table, the gerund has neuter gender. Should every Dutch verb have a separate entry for the gerund?

The second case concerns German and West Frisian. In both of these languages, the gerund is also neuter, but it's not identical in form to the lemma. In German, there is a difference in capitalization, which also shows that gerunds are treated as nouns. In West Frisian, it's identical to the long infinitive, which is something the other languages don't have (but Old English had it). There seem to be a bunch of entries created for German gerunds already, in Category:German gerunds, and they are given a Noun header with its own inflection table. West Frisian barely has any entries for verb forms yet, so there is no precedent to go by.

The implication I take from the German treatment is that we should really be treating the English, Dutch and West Frisian gerund as nouns in their own right too. After all, why would we have entries for German gerunds but not for English, Dutch and West Frisian ones? In German, the gerund is unique in its orthographic representation, so it can't just "piggyback" on another verb form, and must have its own entry. But gerunds aren't just verb forms in other respects. They can have genders, like nouns, and even case forms depending on the language. They can also take both definite and indefinite articles, as well as possessive and other determiners, in English too. We already treat participles specially in many languages, giving them their own Participle header to show that they aren't just verb forms, but are more like adjectives. The same could be argued for gerunds, but we don't currently have Gerund headers anywhere. Should we? Or should we call them Noun? The fact that gerunds have genders and case forms tells me that we shouldn't just be labelling them as Verb. A sticky point is that Dutch gerunds can have a direct object before them (Wiktionary bewerken is leuk!) and English ones can have it after them (Editing Wiktionary is fun!), which is something specific to gerunds and not shared with regular nouns. That speaks in favour of a separate Gerund header. —Rua (mew) 12:58, 11 April 2019 (UTC)

Why should we draw any implications whatsoever for English PoS from what Dutch, German, and West Frisian inflection. Uniformitarianism is not the official religion of Wiktionary. DCDuring (talk) 17:55, 11 April 2019 (UTC)
I’m of 2.718 minds about this. On the one hand, it seems eminently reasonable. These gerunds are syntactically nouns, and therefore a heading “Verb” is misplaced. On the other hand, giving separate entries for all gerunds whose form is indistinguishable from a verb form will mean a lot of extra work. (In some cases the gerund has become a noun with a slightly different sense, like eten meaning “food”, not the act of eating, and such nouns definitely need a separate entry; here we consider the true gerunds whose meaning follows directly from the meaning of the underlying verb.) In Turkish, next to the infinitive (Sigara içmek yasaktırSmoking is forbidden), also the third-person present simple and future can assume the role of a noun (çıkmaza girmişHe has entered a dead end, literally a “does-not-exit”; gelecek bilinmezdir - the future is unknowable, literally the ”will-come”); moreover, they can also serve as adjectives. (Normally these are called participles by grammarians, not gerunds, but I see no argument why the same reasoning would not apply here.)  --Lambiam 19:07, 11 April 2019 (UTC)
Considering that editing has a noun entry, are you just arguing that the header should be changed to "Gerund"? Could it not just be handled in an etymology section or as text at the beginning of the sense definition? -Mike (talk) 20:19, 11 April 2019 (UTC)
There is noun entry for this particular verb, but every verb has a gerund. I'm saying that we should be making this a regular thing. —Rua (mew) 20:49, 11 April 2019 (UTC)
Is "editing" really a gerund in "Editing Wiktionary is fun"? Equinox 20:21, 11 April 2019 (UTC)
What else can it be? It's not a participle, unless you somehow read it as meaning that Wiktionary is doing the editing. —Rua (mew) 20:49, 11 April 2019 (UTC)

Wikimedia Foundation Medium-Term Plan feedback requestEdit

Please help translate to your language

The Wikimedia Foundation has published a Medium-Term Plan proposal covering the next 3–5 years. We want your feedback! Please leave all comments and questions, in any language, on the talk page, by April 20. Thank you! Quiddity (WMF) (talk) 17:35, 12 April 2019 (UTC)

Classical compounds in Category:English words by prefix and Category:English words by suffixEdit

These categories are a complete mess right now, because we categorise all elements of Greek and Latin origin as affixes. As a result, the actual proper affixes of English are all but unfindable among all the noise. I think the problem here is our treatment of Greek/Latin elements. The combinations that are created when putting them together are called classical compounds, which makes their nature as compounds rather than affixed words very clear. While they are used productively in English and other languages, they follow their own rules, very different from true affixes:

  • They can be attached to each other, with no apparent root word, like anthropo- + -centric. You can't do this with real affixes: be- + -ness cannot make *beness.
  • They have a strong tendency to occur together. Often they can only be attached to each other, not to any other random word.
  • They originate in their parent language from root words, not affixes. Thus, combinations of them are not affixed words, but rather compounds. This is reflected in the English term for them, too.
  • One and the same term might be a prefix or suffix, with a difference in form. But what's really going on is that the shape depends on the position within the compound, final vs nonfinal. In informal use, words are adapted to this pattern by adding an o at the end of a nonfinal element.

Because of this, I don't think it does to call these "prefix" or "suffix", they're really their own kind of thing. I think in the interest of making the two above categories usable again, we should split the elements of classical compounds into their own kind of derivational category. There should at the very least be a Category:English classical compounds. We could have further subcategories based on the elements used, but I'm not sure if that's really fitting, given that these are compounds and we already tried and failed to categorise compounds by their elements before. I'm not sure about all the details of the solution yet, but I hope it's clear to everyone that something is wrong here. —Rua (mew) 20:55, 14 April 2019 (UTC)

Let’s define an English prefix or suffix to be something that is affixed (with possible morphological adjustments) to the stem of English words so as to form new English words, whose meanings for a given pre-/suffix are more or less derivable from the meanings of the words it is affixed to. Then indeed many of the entries currently advertized as English pre-/suffixes are miscategorized. The distinction with components with a classical pedigree is not always clear-cut, though, as seen in neologisms like user-centric ([9][10][11]) and Britain-centric ([12][13][14]). I think in these words -centric is a (productive) suffix. As another example, -ize is on the one hand a French suffix (-iser) that lifted along with words like angliciser when they were anglicized – in these words it is not an English suffix but an anglicized French suffix; on the other hand, it is responsible for forming new words like dandyize, bowdlerize and mongrelize. While I agree with the drift of this gripe, I think “English classical compounds” is a misnomer. Whatever xeno- and -phobia are, they are not compounds, but components found in classical compounds (and sometimes used in making new compounds with a classy appearance). Perhaps Category:English classicistic components?  --Lambiam 17:27, 15 April 2019 (UTC)
I think you misunderstood a little. I'm not saying that the elements of the compounds should be called classical compounds, but rather the combinations formed from them. In other words, anthropocentric should not be categorised as Category:English words prefixed with anthropo-, nor as Category:English words suffixed with -centric, because it's neither. I do see your point about terms like user-centric, and in that case we might be able to consider them suffixes, but I'm not completely convinced if -centric is a suffix in that case either. And since it's not a classical compound, that's separate from the matter I'm describing here anyway. —Rua (mew) 17:34, 15 April 2019 (UTC)
Sorry, I indeed misunderstood. I agree we should remove anthropocentric from Category:English words prefixed with anthropo- and Category:English words suffixed with -centric; in fact, I just did by changing {{confix}} to {{compound}}. I am not convinced there is a need for a new category Category:English classical compounds. (If the need exists, we will presumably also want Category:French classical compounds and Category:German classical compounds; and what about Category:Ancient Greek compounds and Category:Latin compounds?)  --Lambiam 17:54, 15 April 2019 (UTC)
Distinguishing Greek and Latin won't really be practical, because some of these combine both, even if there are some purists out there that hate it. :) —Rua (mew) 10:59, 16 April 2019 (UTC)
Yes, the word television is an abomination that flies in the face of etymological decency. The horror! The horror! Perskyi pereat!  --Lambiam 21:42, 16 April 2019 (UTC)

Wiktionary:Random Competition 2019Edit

Hello all, I decided it's time to kick start the 2019 Wiktionary word game, which, for copyright reasons, it not like any other board game in the world. Ever. Any such resemblance is purely a fluke. User:Metaknowledge has won the last two years, let's try to knock them off the top. --I learned some phrases (talk) 00:27, 16 April 2019 (UTC)

Splitting AramaicEdit

It seems to me like we need to split the various stages of Aramaic into actual separate language codes, chief in my mind, Ancient Aramaic and Imperial Aramaic from Middle Aramaic, i.e. Jewish Babylonian Aramaic. I'm thinking [arc] should be reserved for the family code. @Fay Freak, Profes.I., Wikitiki89, -sche, Metaknowledge, thoughts? --{{victar|talk}} 03:57, 16 April 2019 (UTC)

No. Don’t know why Jewish Babylonian Aramaic would be Middle Aramaic, while Galilean Aramaic not? And Biblical Aramaic is still not so distinct from Jewish Babylonian Aramaic. And Imperial Aramaic is not that far. And what would even be Babylonian? If some people wrote Aramaic in Spain I would not know if it is “Jewish Babylonian Aramaic”. And what which Aramaic derive all the Arabic, Armenian and what ever terms from that are said to be from Aramaic? Working on the premise that the Aramaic form is the same or same enough, sometimes only a more modern form given (as for example when one gives the now leading German form when there have been a lots of forms before but a language derives from earlier German, not clear exactly which form), we have customarily given “Babylonian” forms from which other language terms are derived. All much constructed, and useless distinctions, and not resembling the actual language situations. Other dictionaries do not distinguish either necessarily, though some restrict a dictionary to a certain “dialect”. The various terms are more distinctions of genres of texts, classifications of corpora, that is for literary studies, than useful for linguistics, or specifically lexicography. What would you gain except pain from splitting?
General rule: If the set of grammar is essentially the same, it is the same language. One should recognize that some languages move slower than others. So “Aramaic” spans two thousand years or more before deserving split language codes, and Arabic has also only one over one and a half thousand years and rightly though some dialects coexist with this Dachsprache, whereas over this time span French has four (Latin, Old French, Middle French, New French), but most other Romance languages only three (Latin, Old Spanish, Spanish), and even that being under the suspicion of being too much as the difference is not so great (“Old Italian” has hardly been used here).
Why would the situation for Aramaic be different from what is now seen in Arabic? They all wrote a Dachsprache even if dialectally the differences might have been greater amounting to “different languages” (which isn’t a clear concept either with the modern Arabic dialects). Only after being conquered by Arabs the unity dissolved. What you want to do is like to remove Arabic as a language and only treat it as a group because one sees some “stages” and unintelligibility between the “actually spoken” languages – but there is continuity too. The situation with Greek seems even be similar, for is it any different from splitting Ancient Greek in “Attic”, “Aeolic”, “Ionic” etc. and “Koine Greek” and “Byzantine Greek” (“Middle Aramaic”)? If something is only from a certain period one can state it in labels, but splitting the alleged stages is de trop. Fay Freak (talk) 13:05, 16 April 2019 (UTC)
@Fay Freak: Sooo... Scots and English are separate languages but not Jewish Babylonian Aramaic and Imperial Aramaic? Look, I'm a huge advocate for merging dialects and do so on the regular, but these are two distinct languages, with their own pronunciations, morphologies, written in two different scripts, and separated by hundreds of years. The delineation seems pretty clear to me, far more than, say, Old French and Middle French. --{{victar|talk}} 15:23, 16 April 2019 (UTC)
But also, what is this entry? It lacks any and all labels. What are its sources? Is it ancient, and if so, are these vowel points true to the attested word, or are they hypothetical? --{{victar|talk}} 16:57, 16 April 2019 (UTC)
See, you force people to write things that they don’t know.
What kind of question is that even “is it ancient?”? People see words as Aramaic, they add them as Aramaic. Who would split all the Aramaic entries? You wouldn’t. Nobody is there who would. You are proposing a thing that is impossible to accomplish, going against existing desires: One could have separated already the lects by labels, but the desire to separate has not been there, and you won’t create it against editors who have hereunto been reluctant to separate.
And again, you ignore the principle of unity while claiming they have been “separated by hundreds of years”. Cicero is separated from us more than two thousand years, and yet Stephanus Berard writes the same language. There is back-coupling and cross-coupling and it is as important and sometimes more important than evolution. And yeah, there is no reason why there wouldn’t be Classical Nahuatl from 2019 if an author subscribes to the old rules. Years and scripts are not even an argument at all, and pronunciation only with caution. Sounds merging the distinction of which is not even expressed in script, like also begedkefet, is rather an argument against splitting for lexicographical purposes because the differences are not relevant on the token-level graphically. The fact that we distinguish “Imperial Aramaic script“ and “Hebrew script” is delusive: It is two scripts but it is also the same script, the like as Cyrillic and Latin Serbo-Croatian but diachronically, or even closer. Morphology: Dubious, I stressed the differences must be essential: The fact that some or many Classical Arabic constructions and derivation types are now not used does not mean Modern Standard Arabic is not the same language. MSA is a subset of Classical Arabic, JBA is a subset of Imperial Aramaic. Distinguish decline from split. Romance was bad Latin before it become modern languages. Fay Freak (talk) 22:15, 16 April 2019 (UTC)
@Fay Freak: How it is unreasonable to expect the user to know which form of Aramaic it is? That's like saying how can we expect people to add πατήρ under the correct header when it just reads Greek. It should be the contributors responsibility to understand the material, especially when it comes to ancient texts. In virtually all of my Aramaic sources, it either specifies the form of Aramaic or cites the work that does. So going back to my example of פתגמא‎, without sources and a proper label, how do we know the original text was even in Hebrew, let alone had vowel points? It seems to me, specifying the form of Aramaic is essential to the entry's quality and the comparison to Serbo-Croatian is comparison between unequals.
I'm trying to follow your comparison to Arabic and Latin. Yes, Imperial Aramaic was a standardized liturgical language, much like Classical Arabic and Middle Latin, but how does the Jewish Babylonian Aramaic of the Middle to Late Aramaic periods fit into that using your argument? Are you trying to say JBA is the liturgical successor to Imperial Aramaic, like Classical Arabic is to Modern Standard Arabic? --{{victar|talk}} 02:48, 17 April 2019 (UTC)
It isn’t some source or the provenance that should tell you whether it is in a language but the text itself. The comparison with Serbian, Bosnian, Montenegrin, Croatian lies here. If I see a text on the internet it often takes long to find out in which of these it is and I often do not know it at all at the end. (And this is not even since the internet but similarly with printed texts in Austria-Hungary.) Hence it is sane to treat all as Serbo-Croatian, because the difference is too minute. The fact that it is often treated separately is no indication that it shouldn’t be done otherwise. And the reason to restrict treatments of Aramaic to certain lects is similar to treating only a regional variant of Serbo-Croatian. A Serbo-Croatian historical dictionary is more work than a dictionary of standard Croatian of the 21st century. Similarly, there is no man who could compile a “Comprehensive Aramaic Lexicon” though there are many who wish they could. If people compile works restricted to periods and provenance it is because of the pile of material is large and scattered. Hence we also have Latin dictionaries restricted to Latin of antiquity because a work including Medieval or even Modern Age Latin would be yuge (so Karl Ernst Georges could enter all Latin from when Latin lived into a dictionary, but not more, and it took his life). You also see that for Ancient Greek the limits have been pushed more to the present apparently with media access improving. The recent The Brill Dictionary of Ancient Greek covers all up to the 6th century CE, other Ancient Greek dictionaries stop at 200 or wherever, just because one is a homo oeconomicus and has to end somewhere or publish somewhere. The definition of a language is pliable dependent on what one wants to accomplish. This is to show you that “what is a language” is an economic decision, and when one writes about “languages” one writes about the literary history of the grammars and dictionaries created: the picture will be slanted by this fact. (Encyclopedias, Wikipedia, often fall for this fallacy, because they can’t know all the material either.) That hence treatments of what might seem as “languages” do not entail that there are indeed separate languages that they should be like that in a community dictionary. A lot of absurd distinctions have been entered this way into Wiktionary already, so we have “German Low German”, “Dutch Low German” (Dutch Low Saxon) and “Mennonite Low German” (Plautdietsch), which is obviously caused by researchers not accessing all three areas, though these lects form a unity. Hence editors who dealt with texts from the three languages somewhat extensively concluded that the separation was wrong (@Korn, as I remember). As a result the editors get disenchanted because of the arbitrary distinction and cease to treat the language on Wiktionary.
What you say “without sources and a proper label, how do we know the original text was” etc. is a general problem of Wiktionary and lexicography, but has little to do with language distinction. We would ideally have quotes to make all clear, which regions and periods used it and what the semantic range was or probably was, but splitting the language distinctions anew is no way to achieve editors to do it more than they already do, but I expect it will cause treatment of the language to die off. Fay Freak (talk) 19:01, 17 April 2019 (UTC)
Confirm, Low German would likely greatly benefit from eschewing categorisation based on non-linguistic tradition (orthography and political borders), but nobody wants to do the work of actually etching out a working solution for such a radical change or risk letting someone else do it alone. From this experience I warn that once a split is decided, some people will start implementing it, maybe botting it, potentially frustrating some editors. But if after five years everyone agrees it isn't an optimal state, it's likely that the decision will never again be undone, because that would require enough editors in Aramaic to band together, declare consensus, and then elbow-grease away five years of random edits to implement it, so you'll have a terrible mixed mess. Korn [kʰũːɘ̃n] (talk) 22:02, 17 April 2019 (UTC)
I'm wary of language splitting in general, but how hard would it really be to split it into Old and Middle or something equivalent?
Most of Wiktionary is written by people with a passionate interest in given languages, rather than passerbys who enter a word or two, especially for ancient or obscure languages. I think that asking editors to ascertain the variety roughly enough is not placing too much burden. I certainly feel that using the actual script in which the form was attested is in should be obligatory.
That said, it comes down to convenience, could we hear some more people who work on Aramaic? Crom daba (talk) 23:17, 17 April 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I think I need to circle back on my original complaint. To call Jewish Babylonian Aramaic the same language as Imperial Aramaic is completely unprecedented according to current scholarly conventions, which is why we have one ISO language code for Old Aramaic [oar] and another for Jewish Babylonian Aramaic [tmr]. Equally troubling is calling Classical Syriac [syc] lemmas "Alternative forms" to JBA, as seen in this entry, which by no stretch of the imagination should be considered one in the same language. Lumping JBA into Old Aramaic also creates the problematic situation where we find ourselves labeling inherited and borrowed descendants of Imperial Aramaic as derivatives of JBA in the descendants section, which is grossly inaccurate.

Here are three possible options:

Keep Old Aramaic and JBA mergedSplit Old Aramaic and JBA apartSplit into Old Aramaic and Judeo-Aramaic
  1. Label all identified entries, ex. {{lb|arc|Imperial Aramaic}}
  2. Move all unidentified entries to Latin script entries
  1. Split Aramaic [arc] into a) Old Aramaic [oar] (also with appropriate labels, i.e. Ancient, Imperial, Biblical, etc.), and b) Jewish Babylonian Aramaic [tmr] (beside other Middle Aramaic lects, i.e. Mandaic [myz], Samaritan [sam], etc.)
  2. Move all current Aramaic entries in Hebrew script (which is all of them) to Jewish Babylonian Aramaic
  1. Split Aramaic [arc] into a) Old Aramaic [oar] (Ancient, Imperial), and b) Judeo-Aramaic [arc-jud] (Biblical, Jewish Babylonian, Jewish Palestinian)
  2. Move all current Aramaic entries in Hebrew script to Judeo-Aramaic

Incidentantally, I found this very old (2012) discussion also advocating splitting Jewish Babylonian Aramaic away from Aramaic, but unfortunately nothing came of it. It should be noted that the 7th most recent Aramaic entry is from 2008/2010, so activity within Aramaic has been virtually dead for a long time. I wrote the transliteration module for Imperial Aramaic just yesterday. --{{victar|talk}} 05:16, 18 April 2019 (UTC)

In “this entry” the Syriac/CPA isn’t put as an “alternative form of JBA”, this is exactly what isn’t done because the entry is just “Aramaic”, and also the JBA form is rather the here alternative form אֲרִישָׁא‎, not what I made main form אֲרִיסָא‎. I didn’t make the JBA form “the” form. Nothing at all troubles me with the current language headings. Also as you can see on the CAL the form is way earlier, “Palestinian Targumic” and the like. On the other hand “Old Aramaic” allegedly gives way to “Middle Aramaic” by the 3rd century which is quite arbitrary. Haven’t dealt with the Imperial inscriptions much but at least Biblical Aramaic is not very different in grammar from so-called Jewish Babylonian Aramaic, from what I have seen in grammars and samples. The “options” show again how arbitrary the distinction is, apart from being variously complicated, given all the alleged dialects of back in the day that have a name (why would you even want to do the opposite from what CAL does? Their preference is also to lump and label, to avoid complicated structure at such a fundamental level). But if you can distinguish, you could already do it with labels, and you could auto-categorize Imperial-Aramaic script Aramaic as Imperial Aramaic.
Ironically, in “this very old (2012) discussion” @334a argued (nobody else argued on this topic) that Aramaic should be split “like Mandarin, Cantonese, Wu”, which is a split that has been reversed since because of having been experienced as tedious.
And again you ignore the principle of unity. You say “To call Jewish Babylonian Aramaic the same language as Imperial Aramaic is completely unprecedented according to current scholarly conventions, which is why we have one ISO language code for Old Aramaic [oar] and another for Jewish Babylonian Aramaic [tmr].” One can likewise say that they are all the same language, hence the language code [arc] and hence the name Aramaic – it shows that the Verkehrsauffassung prefers a unity, even more than with Serbo-Croatian. As I said: The definition of a language is pliable depending on what one wants to accomplish. The truth of what is a separate Aramaic language does not need to be pursued here howsoever. The fact of being entered in Imperial Aramaic script is alone a feature that allows every reasonable reader and editor separation. Adding “Imperial” in every L2 header of every such entry does not add any value, because nobody cares at that point, but splitting has the potential of disenchantment, as sufficiently outlined with Low German and what not. Fay Freak (talk) 16:52, 18 April 2019 (UTC)
@Fay Freak:
  1. Delineating Aramaic into Old and Middle is well founded principle of thought within Aramaic scholarship (see Fitzmyer and Siegal). To call a time period delineation "arbitrary" is to call all such delineations so. A line in the sand needs to be made somewhere and all Middle Aramaic languages have their own language codes on en.Wikt with the unjustified exception of JBA.
  2. No one is suggesting any radical segmentation of Aramaic -- we're just talking about splitting JBA away, as per common convention.
  3. [arc] is intended to exist as a family code, which is why we have codes for Old Aramaic [oar] and JBA [tmr]. Arguing that is like arguing "why do we even have [gem]?"
  4. Who has suggested adding "Imperial Aramaic" to any headers? Not I. I am calling for labels within Old Aramaic. I recommend you carefully read through my three proposals again.
--{{victar|talk}} 17:17, 18 April 2019 (UTC)
Does not make a difference. If you write “Jewish Babylonian” everywhere it does not make the entries any truer. The entries should be already true, and it is true if it stands that they are “Aramaic”, adding “Jewish Babylonian” in L2 does not add to it. Labelling “Jewish Babylonian” (with {{lb}} and {{tlb}}) is what one can already do – provided a form is really “only Jewish Babylonian Aramaic”. With “Aramaic” one is on the safe side. I dispute that a line in the sand must be drawn. Not delineating is also founded. What we must do only is to call entries by a name to give the reader the information they want about what it means and where it comes from, and “Aramaic” does the job no less than any split distictions. Whether it belongs to a certain region or period is something readers expect in labels and not on the language distinction level. Hence Chinese is not even split because it is not necessary for giving the information. Fay Freak (talk) 17:29, 18 April 2019 (UTC)
@Fay Freak: And that viewpoint would best first of the three options I put forth, but that should prerequisite moving unidentified Aramaic entries to Latin script entries because to render them in Hebrew is inappropriate. You may disagree with that statement, citing your incomparable Serbo-Croatian comparison, but I think you'll find that most editors disagree with you on that point, and think that historical lemma should either be rendered in their original script, or in Latin. --{{victar|talk}} 17:41, 18 April 2019 (UTC)
Aramaic should never-ever be in Latin script. The so-called Hebrew script (which is really the Aramaic script at least no less than the Hebrew script) is the appropriate fall-back script, particularly when a script is not yet encoded (there have been a lot, and they flowed into each other that one often does not even know if it is “already an own script”). The scholarly convention has always been to give Aramaic in Hebrew script when it has been attested in a script that is not available for printing or easier to enter, for example Nabataean Aramaic has often been cited as Hebrew. Again you miss what is the actual cause of scholars doing a thing. They don’t use Latin script because it is somehow standard or conventional, but because they are bad at doing better. Like Nostraticists cite all in Latin script. So Iranists rub their hands as long they can excuse writing books about Middle Persian and making recensions of Middle Persian works in Latin script with its not being encoded or no keyboard layouts being available etc. and thereafter they revel in being able to continue being lazy by referring to an alleged convention consisting of practices that have however always been wrong, but they have missed to tell it just in case it becomes opportune to rest on the Latin script. But the philological science of a remote culture only starts when you leave the schemes of the Latin script: As long as something is in Latin script it is pseudo-science, an imitation, if it wasn’t inclined for the Latin script, a surrogate for science. Now I see how the cat jumps: This proposal is another plot of Rome to impose its script upon all peoples, to the detriment of science. Fay Freak (talk) 20:40, 18 April 2019 (UTC)
If I had a class of philologists at my disposal, I'd employ them to lookup random n sentences from the corpus of Old and Middle Aramaic and see word by word how many words would result in effectively doubled data (sans the script) when entered together with its respective cognate (if it exists) into Old and Middle Aramaic.
Barring that, I don't think this discussion will be fruitful. Crom daba (talk) 20:57, 18 April 2019 (UTC)
(edit conflict) @Fay Freak: I would say the vast majority Aramaic terms in sources not in their original attested script are rendered in Latin. The only exceptions you might find to that is in the context of Jewish Aramaic or Hebrew research. Just as we render Tocharian and Book Pahlavi in Latin, instead of, say, Chinese or Arabic, the default on en.Wikt is to use Latin script. Regardless, that would not be my prefered option or the three I present.
I really rather hear more from others, because evidently you and I cannot alone come to any agreement. --{{victar|talk}} 21:07, 18 April 2019 (UTC)
Just chiming in here. I think Aramaic should definitely be split. In particular I find Fay's comparison of Aramaic to Latin to be ill-conceived. While it is the case that Latin is still produced in academic and ecclesiastic contexts, it has no native speaker population. That class of native speakers did stick around in the form of Romance language speakers. While modern Aramaic may have a lot of learned borrowings from earlier stages, this is nowhere near saying that the spectrum modern Aramaic lects is equivalent to the very much dead Latin.
Furthermore, if a user has trouble adding an Aramaic term because they don't know its chronology or script, this is as it should be. We aren't a dumping ground for random words. Even comparison to Ancient Greek show the highly articulated system for handling Ancient Greek dialectology (if "Ancient Greek" existed in any real sense until the Hellenistic period). Users should be expected to know about the words they are adding. —*i̯óh₁n̥C[5] 00:10, 19 April 2019 (UTC)
Script, I always stood on the position that it should be in the original script as far as possible. Nor did I say anything about modern Aramaic dialects. But chronology? I emphasized that it must become clear from the language itself that it is a different language. If a user does not see a difference then there probably isn’t any. Like one finds texts and does not see whether they are Bosnian or Serbian, or even if one knows the difference it does not look enough to necessitate a separation. The date and region is not the language. It’s not about “trouble” but about artificially seeking language distinctions when it is unneeded. The idea that after a certain point we have “Jewish Babylonian Aramaic” because “the line must be drawn somewhere” is, however tempting, still arbitrary, because only the purpose determines whether something is a language. It still has not been shown that only if languages are separate in that scholarly sense that is determined howsoever they should be separated in this lexicon. We treat Ukrainian and Russian and all Slavic languages separately because the standards and tables are different and the words in details too often, though it might appear that they are only one language with umpteen dialects. It does not matter what “is a language”, no! Who inculcated you that when something is considered a separate language by scholars it must be split off on Wiktionary? Baseless presumption. Whether something “is a language” or whether it is “correctly seen so by scholars” does not matter if we still can handle all with the same tables. And so we can handle Biblical Aramaic like Jewish Babylonian Aramaic in the same sections with the same tables. After all we have even Chinese in one heading. According to most purposes German and English are separate languages but a split of Jewish Babylonian Aramaic serves according to my expectation no purpose. Fay Freak (talk) 00:52, 19 April 2019 (UTC)
This proposal is another plot of Rome to impose its script upon all peoples, to the detriment of science. Really? Aramaic is normally written in a script of 22 letters; one can replace them with an arbitrary set of any other 22 symbols and preserve all the information. Transcription into computerized or typeset Aramaic throws away a bunch of information that may or may not be relevant, and that transcription may be more or less accurate. Once that's done, an exact transliteration doesn't change anything, yet makes it easier for people familiar with Latin script to understand the text and makes it possible to compare across languages that use different scripts. Demanding that all the people working in Gothic and Egyptian hieroglyphics use the exact form of the original has nothing to do with science, it's just exclusionary.--Prosfilaes (talk) 04:25, 19 April 2019 (UTC)
So many dissonances: On one hand one shall not be “exclusionary” and one not demand the exact form of the original, on the other hand one shall and it “is as it should be”.
What about acknowledging that the current exclusionarity is exactly right? No proposal for change here is advantageous. It’s all about adding qualifiers like “Old”, “Judeo”, “Jewish Babylonian” or Latin script where it does not belong to. As they are, the entries are correct and miss only the details like most entries on Wiktionary (period, quotes). Pigeonholing further but base all on one’s own “Latin“ standard is just the classical American hybris and doublethink and, in this case, casual anti-Semitism. “Just disregard all cultural ties and do it like we do. The American way is best! Our way is the basis and father of all things!” Fay Freak (talk) 13:45, 19 April 2019 (UTC)
My stance is thus: Ideally, they should be split, but it's a lot of work and maybe not worth doing. A note regarding Biblical Aramaic: It is very different from Jewish Babylonian Aramaic. For one thing, BA is a Western dialect, while JBA is an Eastern dialect. There are numerous grammatical, phonological, and morphological differences. --WikiTiki89 19:11, 19 April 2019 (UTC)
Indeed. There are also numerous grammatical, phonological, and morphological differences between the comedies of Plautus and the sermons of Augustinus. So much one could write about the unlike syntactical constructions, the sound changes meanwhile, and the different endings used. And yet what matters lexicographically would need to justify different language headers. And yes, I also oppose the concept of a language “Old Latin”. Its names aren’t even correct. Fay Freak (talk) 19:47, 19 April 2019 (UTC)

Misspelling AlternativeEdit

One of the biggest objections that seems to be raised around removing misspellings, or banning them, is that the entry for a misspelling points users who search for it to a correctly spelled entry. One possible solution which would enable search to consistently find misspellings we deem important enough to include would be to put them on the correctly spelled entry, but not display them. The search function finds them easily enough, so searches for misspellings still present the correctly spelled entry to the searcher, but without the intermediate step of landing on an incorrectly spelled page.

I created a demonstration template {{misspelling}} which provides the simplest possible version of this idea, and applied it to the recently deleted page urothelical (urothelial). One simply adds {{misspelling|urothelical}} to the end of the language section (or entry?), and then when a user searches for the misspelling they see the correct page (in this case as the first entry suggested). Additionally the template labels the term as a misspelling, so it is somewhat clear what is going on. You can see what it looks like on these example search results. This could obviously be fancied up with language and categorization and all kinds of things if desirable.

Ideally the Mediawiki search would be smart enough that the user always had the correct entry suggested at the top of the search page, but with the multilingual nature of the project that is an extremely difficult goal. This sort of structure may actually strengthen the search's ability to suggest pages, or to become more advanced down the road. Thoughts? - TheDaveRoss 18:37, 17 April 2019 (UTC)

Cool. For this example, isn’t “urothelical” a misconstruction though? Fay Freak (talk) 19:07, 17 April 2019 (UTC)
Probably, I just copied the intent of the original page. No reason this same mechanism couldn't be applied to misconstructions and typos. - TheDaveRoss 19:09, 17 April 2019 (UTC)
If we define a misconstruction to be a misunderstanding or misinterpretation resulting from the use of the wrong meaning of a word that has multiple meanings, then this is not one. Although uro- has two meanings, this is not the result of interpreting it incorrectly as meaning “tail”. The issue is solely with the spelling thelical, which has zero meanings. Note that we also list the miscreation epithelical.  --Lambiam 18:13, 18 April 2019 (UTC)
Support. It's a pretty elegant idea that lets these words be findable without having an entry. On the other hand, how do you distinguish a misspelling from an alternative spelling? —Rua (mew) 15:52, 18 April 2019 (UTC)
Alternative forms are when there are reasons why an informed person uses or used the forms (conscious spellings), while misspellings are when such reasons are absent. If a misspelling is also a legit spelling then one would of course use the gloss templates we are used to, {{misspelling of}}. Fay Freak (talk) 17:05, 18 April 2019 (UTC)
If "urothelical" happened to be a word in another language this wouldn't work at all. DTLHS (talk) 18:15, 18 April 2019 (UTC)
Not at all is a bit strong, but if there is no acceptable way to make that situation work we can leave the status quo for entries which would otherwise exist, and use this method for entries which would not. Other solutions include listing common misspellings in the "also" section at the top, perhaps distinctly in a "did you mean" format. - TheDaveRoss 18:31, 18 April 2019 (UTC)
Does anyone actually make use of our misspellings as data for some purpose?
Even if there were, the proposal, with modifications and limitations as suggested above sounds good. DCDuring (talk) 18:46, 18 April 2019 (UTC)
At this point I'd just like to kill all "misspellings" until they become acceptable spellings (not sure exactly how we make that judgement!). Giving them first-class status encourages all the fungus of categories, alt forms, etc. to grow on them and legitimises them beyond what they deserve. But this idea might be an improvement, sure. Equinox 19:28, 18 April 2019 (UTC)

Proposal to unify the size and style of CJKV textEdit

I have a proposal for a number of changes regarding CJKV text:

  1. Unify font size: 120%.
  2. Set line-height to be 1, to prevent CKJV text from affecting the line-height of Latn text.
  3. Re-enable bold font weight for Japanese.
  4. Do not enlarge CJKV bold text.
  5. Do not use bold font weight for all Vietnamese Hani text.
  6. (Other cleanup.)

Rough preview of before and after.

Secondary to CSS:

  1. (Use Kore for all Korean text, instead of using Hani for hanja and Kore / Hang for hangul.)
  2. (Repair certain Japanese furigana templates to fix certain oddities regarding font size.)

If there are no objections, I will ask for implementation.

Suzukaze-c 19:56, 18 April 2019 (UTC)

Semi-automatic correction of Cyrillic text with Latin charactersEdit

As editors who watch Recent Changes probably have noticed, I've been correcting Cyrillic text that contains Latin characters. I created a list of links in {{m}}, {{l}}, {{t}}, and {{t+}} for languages that only have Cyrillic script listed in their data table that includes the entries that will be processed. Russian, Ukrainian, Belarusian, Bulgarian, and Macedonian have already been processed. I'm using the list of similar-looking Latin and Cyrillic letters at w:User:Trey314159/homoglyphHunter.js, with some additions. An example edit can be seen here. I review each edit and don't change some of the links because they are clearly in the Latin script. — Eru·tuon 20:33, 18 April 2019 (UTC)

Finished. There still remain other linking templates that might need cleanup. [Edit: Finished the most common etymology templates.] — Eru·tuon 21:28, 18 April 2019 (UTC)

@Erutuon Great work, thank you! You may want to do the same with Arabic (partial) homoglyphs with Arabic, Persian, Urdu, etc. If it's still required. --Anatoli T. (обсудить/вклад) 04:25, 19 April 2019 (UTC)
@Atitarev: Yes, I think I should do that. I'm familiar with Arabic, but not very familiar with Persian or Urdu. What characters should I be looking for in each language and replacing? In Persian, it looks like ك‎ (Arabic letter kaf) and ي‎ (Arabic letter yeh) would be incorrect, since Persian uses ک‎ (Arabic letter keheh) and ی‎ (Arabic letter Farsi yeh) instead. If that's right, I can look for the non-Persian character in Persian linking templates and replace it with the Persian one. (Here's the working list.) — Eru·tuon 05:01, 19 April 2019 (UTC)
(edit conflict) @Erutuon: ك‎ (Arabic letter kāf) and ي‎ (Arabic letter yāʾ), ى‎ (Arabic letter ʾalif maqṣūra), and ک‎ (Persian letter kâf) and ی‎ (Persian letter ye) are exactly the letters to look for, they are partial homoglyphs because they look identical only in certain positions, copypasta and wrong keyboards cause the common misspellings. Urdu uses the Persian ک‎ and ی‎. The Arabic ي‎ is also used in Pashto but Pashto uses the Persian ک‎. These are the most common errors, which can be checked without any deeper knowledge of these languages and the spelling rules. Things to look for is to check if letters specific to one language are used in another, e.g. the Arabic ة‎ (tāʾ marbūṭa) is normally not used in other languages or it would be an extremely rare case, like specific Persian letters can occasionally be used in standard Arabic or dialects. --Anatoli T. (обсудить/вклад) 06:02, 19 April 2019 (UTC)
Okay, thanks! I've added instances of alif maqsūra to the Persian list, and tāʾ marbūta as well, though the latter I will just let others deal with. — Eru·tuon 06:46, 19 April 2019 (UTC)

Finding multiword terms when searching for one of the words?Edit

I had an interesting situation just now, where a friend used the term Gish gallop. I had no idea what that meant. I tried looking up the unfamiliar word gish, but the definition there made no sense and didn't help me understand what was being said at all. Of course, I didn't realise that this was a multiword term, and doing what is most natural in the situation (looking up the one word I didn't know) gave me nothing. Eventually she explained it to me and then I realised that it's a combination of two words I needed to look up, which then led me to the right entry. But in itself, there was nothing to hint that this was an idiomatic combination and Wiktionary wasn't helpful in getting me where I needed to be. I'm guessing I'm not the only one to have this problem. Is there anything we can do to improve it? —Rua (mew) 23:53, 18 April 2019 (UTC)

Solution: you consider that the element "gish" might be capitalised, go to Gish, and see the link to the derived term. —Μετάknowledgediscuss/deeds 03:18, 19 April 2019 (UTC)
Because I frequently search for unlinked taxonomic names (including one-part names), I have the habit of searching for entries that merely 'contain' my search term. That kind of search yields Gish gallop as the third item on the search results. DCDuring (talk) 10:03, 19 April 2019 (UTC)

"What means X?"Edit

Hi Wiktionary! I know a German guy and when he doesn't understand a word in English he asks "what means X?". I was trying to explain to him that you have to say "what does X mean?", because "what means X?" sounds like you are asking for a word whose definition is X (although, admittedly, people will probably understand it because the other interpretation is too weird). I don't speak any German and I found it totally impossible to explain to him what the difference is, and why "what means X?" is wrong. (I believe the grammatical term for English is "do-support", but this isn't a guy who will go reading a lot of grammar.) Could anyone help me explain this to him? A few short sentences of German that explain the difference would be absolutely fantastic. Equinox 03:12, 19 April 2019 (UTC)

I'm not great at saying it in German, but What means X? has what as subject (like the nominative case) and and X as direct object (like the accusative case), and What does X mean? is the other way around, with X as subject and what as direct object. — Eru·tuon 04:14, 19 April 2019 (UTC)
I understand the issue grammatically, but I would like to explain it to a German who doesn't know or care about grammar, but thinks "what X means?" and "what does X mean?" are identical. Maybe it would be good to have those two sentences translated very literally into German. Equinox 04:48, 19 April 2019 (UTC)
Hmm, well, the German word for what (was) doesn't have distinct nominative and accusative forms, but if you replace X with you and translate, you get Was bedeutest du? ("What do you mean?") for What does X mean? but Was bedeutet dich? ("What means you?") for What means X?, which seems just as weird as the English. — Eru·tuon 05:10, 19 April 2019 (UTC)
Tell the German speaker that this is a present simple question, for which the auxiliary verb do is required. The German translation, of course, would be "was meint X?" so it would be easy to translate that as "what means X?" --I learned some phrases (talk) 09:50, 19 April 2019 (UTC)
In German the phrase is "was bedeutet X?" (not "was meint X?"), in English you need to add "do" to interrogative sentences, unless the question is about the subject, e.g. "what makes this sound?" - "was macht diesen Laut?" or "who is speaking?"/"who speaks?" - "wer spricht?". --Anatoli T. (обсудить/вклад) 11:56, 19 April 2019 (UTC)
It sounds like he won't care why. You could just tell him that when using means (or meant) in a question the known thing should always be first and the unknown is last. Hence, "This means what?" Now how to put that in German, I have no idea. -Mike (talk) 07:47, 20 April 2019 (UTC)
This web page in German gives a very concise but readable summary (by way of examples) of the main rules of English grammar, including the word order in questions. The example that matches your friend’s problem the most closely is, “What does she watch everyday?”. The page only states what the rules are, not why they are as they are.  --Lambiam 10:41, 20 April 2019 (UTC)

Eye dialect (again)Edit

I know that there have been discussions previously about the use of the label "eye dialect" within Wiktionary, and, especially, whether it is correct to use the term to refer to "pronunciation spellings" that are intended to mimic a nonstandard pronunciation. Since the last time I looked, I think, the following additional definition has been added at eye dialect:

2. (more broadly) Nonstandard spelling which indicates nonstandard pronunciation.

Is everyone happy with this definition, and happy that it should be applied within Wiktionary, such that, just to give one random example, geddit should be labelled "eye dialect"? Mihia (talk) 03:35, 22 April 2019 (UTC)

If /ˈɡɛɾɪt/ is (locally) the standard pronunciation of “get it”, it is not unreasonable to consider ‘geddit’ eye-dialect spelling (compare e.g. the uses of the spelling “compuder”); that does not require the contested additional sense. I for one am unhappy with such dilutions that make these terms less functional; given the etymology of eye dialect it also does not make sense.  --Lambiam 07:35, 22 April 2019 (UTC)
I'm not sure how far down we can or should appeal to "local" standard pronunciations. In some communities and/or registers people might routinely say /ˈɡɛɾɪt/, just as they might routinely say -in' for -ing, or fink for think, or whatever else. These may be "normal" for some people, yet I believe it would be misleading for us to refer to them as "standard pronunciations". Mihia (talk) 17:16, 22 April 2019 (UTC)
I thought realizing intervocalic /t/ or /d/ as an alveolar flap is fairly standard for most speakers of American English, which is a bit wider than “some communities”. Compare liddle.  --Lambiam 23:44, 22 April 2019 (UTC)