Open main menu

Wiktionary:Beer parlour

(Redirected from Wiktionary:BP)

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit


December 2018

References for Vietnamese readings listed under Template:vi-readingsEdit

I would like to add superscript references for readings of Vietnamese Han characters using the following code as a suggestion:

| hanviet = giả - tdcn
| nom = giả - tdcn;gdhn, giã - tdcn, rả - tdcn, trả - gdhn;btcn, dã - gdhn

The abbreviations used are: tdcn = {{vi-ref|Nguyen (2014).}} gdhn = {{vi-ref|Trần (2004).}} btcn = {{vi-ref|Hồ (1976).}}

The desired output using as an example is as follows:

Han characterEdit

: Hán Việt readings: giả[1]
: Nôm readings: giả[1][2], giã[1], giở[1], rả[1], trả[2][3], [2]


Currently, this is also achievable using the bulkier code below:

| hanviet = [[giả#Vietnamese|giả]]<ref name="tdcn">{{vi-ref|Nguyen (2014).}}</ref>
| nom = [[giả#Vietnamese|giả]]<ref name="tdcn"/><ref name="gdhn">{{vi-ref|Trần (2004).}}</ref>, [[giã#Vietnamese|giã]]<ref name="tdcn"/>, [[giở#Vietnamese|giở]]<ref name="tdcn"/>, [[rả#Vietnamese|rả]]<ref name="tdcn"/>, [[trả#Vietnamese|trả]]<ref name="gdhn"/><ref name="btcn">{{vi-ref|Hồ (1976).}}</ref>, [[dã#Vietnamese|dã]]<ref name="gdhn"/>

If possible, could someone edit Module:vi so that the suggested code in the first paragraph would give the desired output? KevinUp (talk) 15:49, 1 December 2018 (UTC)

@Suzukaze-c Hi. If you have the time, would you mind comparing the desired output above with 者#Vietnamese? I can't figure out how to implement this within the module. KevinUp (talk) 06:50, 7 December 2018 (UTC)
  Done, I think. —Suzukaze-c 07:01, 13 December 2018 (UTC)
Thank you very much! Also, I'd like to mention here that Template:vi-hantu is now officially deprecated and will be replaced by Template:vi-readings (The older template contains readings imported from the Unihan database, which fails to distinguish between Hán Việt and Nôm readings. Previous discussion can also be found here).
Also, I found that the Nom Foundation database contains some mistakes/unverified readings, such as hoả for [1], which is why I wanted to list out readings based on what is found in the original reference source. All readings with eventually be given superscript references, but it will take some time for this to be done. KevinUp (talk) 14:34, 13 December 2018 (UTC)
@Mxn Hey, just want to update you on the latest developments. I am currently sorting through readings of Hán Nôm characters and adding references links to Wiktionary:About Vietnamese/references for each reading. Readings with higher number of references are arranged first. See for example, which has up to 16 readings.
This is work in progress and completed characters with verified readings can be found at Special:WhatLinksHere/Wiktionary:About_Vietnamese/references. KevinUp (talk) 18:49, 26 December 2018 (UTC)

unchanged pluralEdit

What does "unchanged plural" exactly mean in the Usage note for craft? that's not the general terminology used in Wkt, is it? --Backinstadiums (talk) 16:38, 2 December 2018 (UTC)

I've changed it to be explicit: "The plural craft is used to refer to vehicles. All other senses use the plural crafts." Ultimateria (talk) 19:12, 2 December 2018 (UTC)

Inevitable discussion about reference works from non-Latin culturesEdit

Given the situation where issue |lang= in {{quote-web}} in the Grease-pit page of this month insinuates opening all reference templates it has become opportune to uniformize their content. It has caught my eye that there are lurking multiple fashions of displaying references for cases of a work published in a script that is not the one of the Romans, namely, the author name was written in a certain script and the title of course too, but to my great surprise and contrary to Wiktionary’s usual laudable Unicode- and internet-standard compliance I encountered that there were reference templates here already created that did not even include the original title but wrapped it in {{xlit}} so that only a transliteration of it remained, and the same has also been done with titles of their authors, so that I could not recognize any of the books and almost did not find already created templates by Wiktionary’s search function, being already prepared – in vain – to create the templates.

So I reasoned that, since we are in late 2018 and our letter case is unlimited in what concerns languages that have in the Modern Age been used for pursuing science, the templates must all be uniformized so that the original title is displayed, opining also that transliterations are to be discarded for scripts that are unambiguous since they are no gain for anyone (if you don’t know the language you don’t know the transcription either, short of negligible cases when one is literate in Latin script only but not the actual script for a non-Latin-script-written language one knows) and “En.Wiktionary entries already have too much wasted space”, as @-sche acutely observed on the Grease pit page of this month and also has been voiced as a cause of displeasure.

There might be little experience in reference sections in any works containing non-Latin references, but naively and naturally and looking at how my computer does it, I always ordered references by the Latin names first and then the Cyrillic ones, and so I have come to the belief that the original-script author names can be had easily. People might however be more appealed to by Latin-transliterated names, but even then I am apprehensive of those being less iconic, but this is of limited importance for names. It can very well be grave in logographic writing systems some of which are still in use, for particularly people’s names can have the most arbitrary characters and it would be utterly impossible to reconstruct the original name without browsing the web again only to find a name which a Wiktionary editor has needlessly left out. Currently the Japanese reference templates have all formats.

So how does Wiktionary look upon all these factoids? What should references have, perhaps with distinctions by writing systems? I’d like to see completely removed the transcriptions of the titles of alphabetic and syllabaric scripts because they have no non-theoretical uses and would sort references by Unicode (I don’t actually know how Chinese sort their Chinese reference sections, and perhaps one feels that Japanese titles transliterated could somehow help, I avoid talking about those scripts). Plus why have people even thought that |title= and the author parameters would be the correct place to put transliterations or transcriptions? This would easily be different parameters |tr-title=, |tr-author= and so on that can be expanded for those who need it (whose existence I deny), and this would make reference templates use more expected parameters. Which of course entails as a minimum that we have original-script titles – come on, are readers supposed to reverse-transliterate titles? Author titles perhaps in both since one might not know the script but the author from other publications in other, Latin-written languages? But this is not generally true, though there are often adapted author names around. Avicenna is quite iconic, no need for اِبْن سِينَا (ibn sīnā), but that’s more often for classics and applicable to quotation templates. What does iconicity tell us here? And I have not even mentioned how often title-translations should be done, which have a parameter already. There is still this issue around of quotation templates containing bare long titles, and there are a few “click to expand” solutions for these as I remember. Pinging some people I find interesting to hear or interested: @Sarri.greek, Eirikr, Sgconlaw, Dan Polansky. Fay Freak (talk) 00:29, 5 December 2018 (UTC)

I’m sorry, could you summarize all that? I’m having trouble understanding what your concerns are. — SGconlaw (talk) 01:51, 5 December 2018 (UTC)
@Sgconlaw I wanted to uniformize references of books written in a non-Latin alphabet a bit, pointing out the questions whether the original script of a) the author name b) the title should be shown, and c) whether transliterations of the author names should be shown d) whether transliterations of the titles should be shown. I was just formulating much pros and contras. My result has been to vehemently affirm b), deny d) (hardly valuable clutter), lean to a), I am rather open to c), but it would need to look good enough (like on the Chinese reference page KevinUp has linked it is great but we need |tr-author= for this I think). Fay Freak (talk) 19:35, 5 December 2018 (UTC)
I say show both the original and the transliteration, in the future we will be able to customize this to everyone's satisfaction with css magic. Crom daba (talk) 03:42, 5 December 2018 (UTC)
Yes, and at the least transliteration does not belong to |title=, otherwise there won’t be CSS magic. There need to be separate fields for original titles and author names and their transliterations, I don’t think I can be wrong here, @Sgconlaw. Now supra there are the arguments for displaying. The decision about display should not be influenced by limited forms of saving the information. Fay Freak (talk) 19:35, 5 December 2018 (UTC)
Here are the formats used for Chinese references: Wiktionary:About Chinese/references, Korean references: Wiktionary:About Korean/references and Vietnamese references: Wiktionary:About Vietnamese/references. Also, all Chinese quotations and usage examples (whether it is cited from a book, song, video or the web) are provided using Template:zh-x. A list of abbreviations for well known references used by this template can also be found at Module:zh-usex/data. KevinUp (talk) 04:08, 5 December 2018 (UTC)
@KevinUp The Chinese reference page is great. Until the point where I find: “Starostin, Sergei (1989). Rekonstrukcija drevnekitajskoj fonologicheskoj sistemy (A Reconstruction of the Phonological System of Old Chinese)”. Why is the Russian title not given in Russian script but the Chinese titles are given in Chinese script only (and not in Pinyin)? No logics.
There is also the issue of some titles being translated and some not, but that’s minor. Fay Freak (talk) 19:35, 5 December 2018 (UTC)
@Fay Freak: I'm not sure why the work by Sergei Starostin was not written in the Cyrillic script. I tried to trace the source of that work, and this is what I managed to find: [2]. Unfortunately I was unable to trace the original source. Perhaps someone else could help by looking up the bibliography of Sergei Starostin.

Phonological reconstructions for Early Zhou, Classical, and Middle Chinese are based on Sergei Starostin's version as originally published in: [Starostin, Sergei. Rekonstrukcija drevnekitajskoj fonologicheskoj sistemy [Reconstruction of the Phonological System of Old Chinese]. Moscow, 1989.] Particular reconstructions are transliterated into the UTS from S. Starostin's etymological database of Chinese characters (bigchina.dbf), available online at

As to why Chinese titles are given in Chinese script only and not in Pinyin, this may have been done to prevent a cluttered appearance of the reference works. Also, it seems that pinyin tone marks are omitted for Chinese reference works in Yale University Library's Quick Guide on Citation Style for Chinese, Japanese and Korean Sources: APA Examples. KevinUp (talk) 16:32, 6 December 2018 (UTC)
Transliteration of author and title can be useful. There are several scripts where I can read faster (and more accurately) names and titles in transliteration.
I am not sure if it was being suggested, but omitting transliteration in general would be a bad idea; Sanskrit look-up is already hampered by the ban on Sanskrit in the Roman script. --RichardW57 (talk) 23:01, 29 December 2018 (UTC)

Adding pinyin for numbers in Chinese (Mandarin?) example sentencesEdit

@Dokurrat, KevinUp, Justinrleung, Suzukaze-c, Tooironic, Wyang & co. (alphabetically organized) I added Pinyin for the numbers in a Mandarin Chinese example sentence, and that pinyin was removed- see [3]. I think we should give the pinyin for the numbers (maybe?). I'm okay either way- in fact I don't think we need to do all sentences one way (no pinyin for numbers in example sentences) or all the other way (pinyin for all numbers in example sentences). But I'm not sure. idk. I'm just putting it out there for y'all to discuss. Any which way is fine to me. --Geographyinitiative (talk) 04:30, 5 December 2018 (UTC)

No, I don't think we should add pinyin for Arabic numerals. Dokurrat (talk) 04:41, 5 December 2018 (UTC)
I like the idea. I usually do it for Japanese. —Suzukaze-c 04:42, 5 December 2018 (UTC)
I'd like to see the numbers as pinyin, because they are read according to its Mandarin pronunciation. Also, depending on context, they can be read as cardinal numbers or standalone digits:
365  ―  sānbǎiliùshíwǔ tiān  ―  Three hundred and sixty five days.
員工365失踪 / 员工365失踪  ―  Yuángōng sānliùwǔ shīzōng le.  ―  Employee no. 365 is missing.
KevinUp (talk) 05:04, 5 December 2018 (UTC)
^ this. —Suzukaze-c 05:18, 5 December 2018 (UTC)
Agreed that we should add pinyin conversion for Arabic numerals. ---> Tooironic (talk) 06:09, 8 December 2018 (UTC)
It has to be added manually, of course, otherwise we are asking for possible future errors in conversion. Perhaps re-transliterated numbers need to be displayed differently, so that e.g. sānbǎiliùshíwǔ for "365" is known to mean to stand for 三百六十五 (sānbǎiliùshíwǔ, “three hundred sixty five”) or 三六五 (sānliùwǔ, “three six five”). A different colour or underlined? Also, maybe a trick is needed to use a hidden "三百六十五"/"三六五" but display "365", so that a manual pinyin is not required? BTW, @KevinUp: I have suppressed the display of "365" in your example with @. --Anatoli T. (обсудить/вклад) 07:15, 8 December 2018 (UTC)
@Atitarev: Automatic pinyin transliteration of Arabic numerals can be done by adding pronunciation data of 0-9 to data.polysyllable_pron_correction in Module:zh-usex/data. However, this would render "365" as 三六五 (sānliùwǔ, “three six five”). Manual input would still be needed if "365" is intended to be read as 三百六十五 (sānbǎiliùshíwǔ, “three hundred sixty five”). KevinUp (talk) 14:45, 8 December 2018 (UTC)
@KevinUp: I understand. As I said, what we need is, a new method in the module to use the transliteration of hidden characters, in this case "三百六十五" for transliteration purposes only - "sānbǎiliùshíwǔ" but display unlinked "365" in the Chinese text. --Anatoli T. (обсудить/вклад) 04:16, 9 December 2018 (UTC)
This seems to be slightly complex, so we may have to add this to Wiktionary:About Chinese/tasks. KevinUp (talk) 04:25, 9 December 2018 (UTC)

Wiktionary lemmas written in a nonnative scriptEdit

As Wiktionary grows, I noticed some unusual entries written in a nonnative script such as 0.5#Chinese, の#Chinese that qualify for Wiktionary:Criteria for inclusion and may have also passed Wiktionary:Requests_for_verification due to its widespread used in a particular language or region. However, I think that it might be better to list such entries (that have passed RFV) in an appendix or separate namespace or to put a banner right below the language header to inform our readers that this lemma is written in a nonnative script along with categorization. KevinUp (talk) 15:14, 5 December 2018 (UTC)

Out of curiosity, do we have Arabic, Greek, Hebrew, Hindi, Russian lemmas that are written in the Latin script, for example? I've also found Category:Terms written in foreign scripts by language, but only Chinese, Japanese and Korean are listed in this category. KevinUp (talk) 15:24, 5 December 2018 (UTC)
Category:Chinese terms written in foreign scripts DTLHS (talk) 15:26, 5 December 2018 (UTC)
These entries are rather interesting: fighting#Chinese, friend#Chinese, part-time#Chinese. Yes, I've heard these terms used in real life, such as in TVB dramas, but I am surprised to see these entries included in Wiktionary. I would like to propose for such terms to be listed in an appendix or separate namespace, because such entries are more likely to be found in an informal dictionary such as an A-Z pocket slang dictionary, rather than a formal dictionary. KevinUp (talk) 15:55, 5 December 2018 (UTC)
The issue has come up before, with marketing being used (in Latin script) in Greek texts. Wiktionary:Beer parlour/2017/September § Modern Greek terms spelt with Latin characters. See also this revision history for a recent disagreement. I'm not comfortable at all with including that sort of things. Per utramque cavernam 16:15, 5 December 2018 (UTC)
Foreign script is a strong argument for code-switching. Even when it is used constantly in Greek it can be the case that it never passes into Greek, and it is no loss not to add it either because the English entry suffices (you read a Greek text, look up a word here but find it as English, that’s enough, you don’t expect anyway that all that you read is in the dictionary as Greek). Fay Freak (talk) 19:39, 5 December 2018 (UTC)
Script is secondary to the actual spoken language, and usage of words should be analyzed for codeswitching, and for what-language lexicon a word belongs to. French has fr:American way of life#Français and fr:web design#Français, and Japanese has サード (sādo, third) and ホエールウォッチング (hoēru wotchingu, whale watching); are these "acceptable"? —Suzukaze-c 19:42, 5 December 2018 (UTC)
Maybe we need to find a way to represent code-switching? It would seem like a common pattern for a foreign word to have a code-switched variant (with foreign pronunciation, in a foreign script) and a nativized one (being closer to the native language's phonology, spelled in the language's native script) with the first one being extremely common and the second at the edge of attestability, but due to our policies we only include the second one and create a distorted picture of actual usage patterns.
I remember @Vahagn Petrosyan having something to say about this. Crom daba (talk) 20:06, 5 December 2018 (UTC)
I create a Usage note, as in վարագույր (varaguyr). --Vahag (talk) 12:18, 6 December 2018 (UTC)
Yes, I think that we need to find a way to represent code-switching. Rather than using foreign script as an argument for code-switching it might be better to decide based on the pronunciation of the entry.
I would like to suggest for entries such as (1) part-time#Chinese, (2) PK#Chinese, (3) SUS#Japanese that have been nativized to become closer to the phonology of the language it was borrowed into (despite retaining its nonnative script) to be accepted as legit entries whereas entries such as (1) fighting#Chinese, (2) fr:American way of life#Français, (3) の#Chinese that are found mostly in written form but rarely in spoken conversations are to be put under some sort of banner to inform our readers that such entries are of unconventional usage and are mostly written for stylistic effect. KevinUp (talk) 16:32, 6 December 2018 (UTC)
Alternatively, we should set up some sort of guideline to decide whether or not an entry is considered code-switching or not. KevinUp (talk) 06:50, 7 December 2018 (UTC)
Yes, language-specific CFI are needed. --Anatoli T. (обсудить/вклад) 07:17, 8 December 2018 (UTC)
I think that the issue of the script is a bit of a red herring. Take the originally English word online, which has become commonplace in many languages, including Serbian. Now when Danas, a major newspaper, uses the word, they write for example ”Srbi sve više kupuju online. The Politika newspaper is also written in Serbian but uses Cyrillic script; when they use the word, they write for example “Политика Online, as they in fact do on every page of their website. It would be strange to consider the use by Danas a loan word but the use by Politika a case of code switching, merely because one happens to use Roman script and the other Cyrillic for what is the same language.  --Lambiam 17:30, 8 December 2018 (UTC)
In this particular case, the spelling is a strong indicator of code-switching, as Serbian orthography is phonemic and (unlike Croatian) strongly prefers transcribing foreign names and terms. You could consider onlajn (abundantly attested) a nativized variant, although arguably the choice between these spellings is a matter of personal style. Crom daba (talk) 18:00, 10 December 2018 (UTC)

For an example in English, Москва is citeable (Citations:Москва) but was deleted (Talk:Москва), and Citations:ἄρχων is also citeable (as are, I expect, Arabic-script forms of Allah and PBUH, etc). An older Chinese example is Talk:Thames河, deleted in 2011.) - -sche (discuss) 17:54, 8 December 2018 (UTC)

When I read, “With absolute confidence I can boast that my Frittelle di Fiori di Zucca are the best in the world”, I don’t think, “Oh, perhaps we should consider including an entry for the English term frittella di fiori di zucca. No, I think this is an instance of code switching, and in this case one of a very common type. I think we should not have an English entry oliebol either. Although the term can be found in English texts, it is obviously a Dutch word. There is a need for a test or criterium when the use of a foreign term is simply code switching, and when the term becomes part of the lexicon of a borrowing language. As I’ve tried to argue above, being written in a different script is not a litmus test. Being included in quotation marks is a strong indicator of not being seen as part of the lexicon, but not all authors will use these when code switching. When the imported term becomes subject to local inflection, or can serve as a component to form new compound words, this is a strong indicator of having become lexicalized, but as a test this does not work for analytic languages like Mandarin.  --Lambiam 12:33, 9 December 2018 (UTC)
In personal experience, code switched fragments can very easily be inflected and are likely to be joined in compounds to attach them to native sentence structure. Also, lexicalized loans are likely to have defective inflection.
Pronunciation is also no good, since it is extremely speaker and context dependent, and lexicalized loans can themselves have a special phonology.Crom daba (talk) 18:07, 10 December 2018 (UTC)
I don't think we can have a coherent policy or test across different languages. Speakers of different languages will absolutely differ in their criteria for what counts as a native word. This is even more difficult with global languages like English where different communities are in contact with a huge variety of other languages from which to borrow from. DTLHS (talk) 18:23, 10 December 2018 (UTC)
  Good words, @Crom daba. I want to point out how language is really written on the internet: In printed works or works inspired by print practices there are many things that don’t happen but are unproblematic elsewhere, in unrestrained speech where people can develop their own standards or own morals, unspooked by societal expectation, so to speak in Stirnerian language: Remarkably, nowadays in Russian chats, and I mean those where discussions take place and people try to write correctly, one just writes some foreign words in foreign script and then immediately joins Russian endings in Cyrillic script to them. It’s also the way I think and do it: Writing Russian in Germany, referring to things in Germany without having a notion of a Russian equivalent, I just write German words or English words in Latin script and decline them Russian and in Cyrillic script (without space I mean, you understand; most iconic, I think), and this does not make them Russian. I often can think “Is this word Russian already”? There are some obvious ones that do exist, like everyone uses the word терми́н (termín) in reference to appointments in Germany, a word that does not exist in Russia, and I long did not even know that it doesn’t, it seems so indispensable. This middle ground of dubiosa (is this English or Latin, huh? Not English because of lacking spread) is only left out by me and other dictionary editors often because these words have limited relevance to a greater world and one would look up these words in German dictionaries anyway (as I said earlier, an entry in one language suffices, a Greek entry marketing (marketing) is otiose), plus they are CFI-problematic (best one can do is quote them from fora and commentaries under articles, perhaps with archive links, but that’s it, these Soviets here don’t produce a corpus that would help to quote Russian as spoken in Germany). Separating the words is even more difficult if you look at inter-Slavic conversations: Like is Russian менто́вка (mentóvka, mint liquor as popular in Bulgaria) Russian? It is used in Russian texts here and there, and obviously with Russian endings then, but is it perceived as Russian? (With a German legalese term with no equivalent in English, how does the Verkehrsanschauung or Verkehrsansicht see it?) I have also read quite a lot strange words from Russian expats in Serbia and things like that, you could make large lists of such words if you wanted to; theoretically this could lead to having words in Russian written with Cyrillic characters we thought do not exist in Russian – I make here the strange observation that Latin words with foreign diacritics pass easier into texts of other languages but the Cyrillic languages tend more to transcribe all, i. e. having a Russian text with ђ is way more weird than Vietnamese diacritics, Semitic transcriptions and what you can imagine in English texts. And that’s only in Europe, elsewhere things become crazier, which others can describe better.
For the phonetical point, see that legit French words contain pharyngeal fricatives, like hebs (prison, can), hnouch (popo, bacon). Here we have also an issue arising if we know that a word has passed into French, English, and you can attest it from songs (like they have been printed on CD or are buyable as downloads or else unlikely to vanish, so durable). The flip side of words written in a non-native script are words which have passed but cannot or only with uncertainty be written in the native script. English example: gwop (moolah).
Normal dictionaries to a large part avoid such problems because they leave out exotisms, i. e. words for things that do not exist in an area where there is a community of the language documented. With this I lean towards an exclusion ground that is that if a word in English is for a foreign thing and the Verkehrsanschauung does not see the word as English then it is not English. Confer mesdemet! This is “not really English”. What does apply for abstracta then, what is Greek marketing then? This criterion I have just stated becomes difficult for foreign “ways of life”. Maybe Greek marketing is not actually Greek because he who uses such a word ceases to think like a Greek, regardless of the script it is written in. There are many gross things written and said in Arabic or Hindi texts that I would for this reason see as not-Arabic and not-Hindi. And the same criterion can apply to determine if a word has passed from German into Russian.
The issue gets complicated however because there is not only code-switching for Wiktionary but there is also Translingual: You could make a case for “marketing” being Translingual and not only English. I have argued already (User talk:Fay Freak § Translingual) for grammatical terms like genitivus absolutus, status constructus and the like being Translingual in the first place. Maybe “marketing” is translingual because teachers of business and marketing have made it so ex cathedra, which is why it is used in Greek, never able to become Greek. Fay Freak (talk) 20:02, 10 December 2018 (UTC)
Agree. Crom daba (talk) 23:18, 10 December 2018 (UTC)
To me, the better treatment is to put them in the main name space, and pile the obloquy on them there. They need to be found. Not everything in another script is foreign - there was a time when manuscripts of the Pentateuch contained the divine tetragrammaton in what is now called the Phoenician script while the rest of the text was in what is now called the Hebrew script (as a boy I learned to call it the Aramaic script). And if Western Arabic digits are not part of the Thai script, the native Thai interjection 555 (555) is another example. Indeed, a subtler issue is whether the '555' in Thailand-focused English language forums dominated by Europeans is now an English word borrowed from Thai.
One thing that can easily go missing is the pronunciation. I have to say that the Thai pronunciation of words taken from English is frequently far from obvious. Translingual terms seem not to have pronunciations. --RichardW57 (talk) 22:53, 29 December 2018 (UTC)

Linking elements of a term in {{en-noun}}Edit

At l'esprit de l'escalier, should the individual elements of the phrase in {{en-noun}} be linked to French words, like this: {{en-noun|head=[[l'#French|l']][[esprit#French|esprit]] [[de#French|de]] [[l'#French|l']][[escalier#French|escalier]]}}? (Pinging @Per utramque cavernam as we discussed this on the entry talk page.) — SGconlaw (talk) 17:11, 5 December 2018 (UTC)

No. They should be linked in the etymology. DTLHS (talk) 17:13, 5 December 2018 (UTC)
In the case at hand, the link is to an entire French term, esprit de l'escalier. Where should the individual elements be linked, or do we just not link them in this case? I was thinking that since the elements of a term in {{en-noun}} are usually linked by the template anyway, it makes sense to include the links to the French words manually. — SGconlaw (talk) 17:20, 5 December 2018 (UTC)
Those links are one click away. Theoretically it can be different if a French phrase exists only in English or an other language, the French not being CFI-compliant as French. Fay Freak (talk) 19:43, 5 December 2018 (UTC)
I usually link to component multi-word terms of a term if they reflect the sense of that term, eg, black sugar maple would link to black and sugar maple. And, as Fay Freak says, the individual words are just one more click away. It seems unhelpful to make a user guess at whether there are multiword components and which grouping leads to a possible entry. DCDuring (talk) 23:08, 5 December 2018 (UTC)
The {{en-noun}} template links to English terms, though. In this case, the terms are French, so it's not appropriate to link. —Rua (mew) 10:45, 6 December 2018 (UTC)
Generally, yes, but arguably not exclusively. For example, sometimes when an element is not present in the Wiktionary (for example, a person's name), I've seen a link to an English Wikipedia article. I see no reason why links can't be to other languages where appropriate. — SGconlaw (talk) 12:01, 6 December 2018 (UTC)
Because, again, {{en-noun}} creates English links. If you put a French word in there, it will still be an English link. A dead link, moreover. —Rua (mew) 18:43, 10 December 2018 (UTC)
No, it works fine. Try pasting {{en-noun|head=[[l'#French|l']][[esprit#French|esprit]] [[de#French|de]] [[l'#French|l']][[escalier#French|escalier]]}} at Wiktionary:Sandbox. — SGconlaw (talk) 07:07, 13 December 2018 (UTC)

New sinograph QIOU "poor and ugly"Edit

How should this situation be dealt with in terms of lexicography?

poor and ugly
--Backinstadiums (talk) 00:26, 6 December 2018 (UTC)
The same way we deal with any other word or sinograph — add it if it is attested in durably archived media, spanning over a year, etc. (It doesn't look like this is.) —Μετάknowledgediscuss/deeds 02:01, 6 December 2018 (UTC)


quadrumanus appears in the Cambridge Grammar of the English Language, page 1663; is it a typo or a variant of quadrumanous --Backinstadiums (talk) 15:58, 6 December 2018 (UTC)

(This sounds like a Wiktionary:Tea room question. — SGconlaw (talk) 16:02, 6 December 2018 (UTC))
It is a taxonomic designation (as in Chiropsalmus quadrumanus). Highly unlikely to be an English adjective because of the spelling. Equinox 16:40, 6 December 2018 (UTC)
The authors were probably looking for a word that began with quadru and was not formed in Latin, as they are talking about "marginal vowels" as English morphological elements, which in the case of 'quadr' can be i, a, or u. Why they didn't choose quadrumane or quadrumanous for the purpose is beyond me. We could ask them. Maybe it is was typo. DCDuring (talk) 18:16, 6 December 2018 (UTC)

New Wikimedia password policy and requirementsEdit

CKoerner (WMF) (talk) 20:02, 6 December 2018 (UTC)

Programming languagesEdit

Since the Wiktionary includes all languages; Does it also include Programming languages? --2A01:112F:742:C00:14B9:E7A5:D1B3:F0B3 09:23, 8 December 2018 (UTC)

No, as they aren't human language (though a few words may rarely get borrowed into English grammar). Equinox 10:23, 8 December 2018 (UTC)
Is tlhIngan Hol a human language?  --Lambiam 19:26, 8 December 2018 (UTC)
Eh, it's clearly a totally different kind of thing from a programming language. The only programming language I've ever seen that even inflects verbs is Inform 7. Equinox 19:34, 8 December 2018 (UTC)
Programming languages are determined by a language specification, not by usage. That falls under "documentation", not lexicography. DTLHS (talk) 17:31, 8 December 2018 (UTC)
But the reference manuals for a programming language use terms from that language as if they were English, French etc - so we really ought to have them somehow. SemperBlotto (talk) 14:23, 10 December 2018 (UTC)
We've had this discussion before. Early programming languages only had a few keywords, but now there are hundreds of frameworks with thousands of named classes (e.g. ExecutionEngineException, HttpMessageInvoker) and each class may have hundreds of named properties, methods and fields. These, too, are listed in manuals and guides. Equinox 14:30, 10 December 2018 (UTC)
See Wiktionary:Requests_for_verification/English#caddr as well. - TheDaveRoss 14:31, 10 December 2018 (UTC)
Take this sentence from a book on conversational French: Bonjour is usually used until around six p.m., whereas bonsoir is used after six p.m.” In a book on French you can expect to find French words used as nouns in English sentences. Only, they are not used with their French meaning. They stand for themselves. So these sentences mention the word in the sense of the use–mention distinction. Likewise, the English sentence esac is case spelled backward, rather like fi is if spelled backward” only mentions these keywords. To understand the sentence you don’t have to know the meaning of any of these words. On the other hand, grep, originally just another computer command, can be used as a verb (”I grep, he greps, we grepped”), so it clearly has become lexicalized and merits to be included.  --Lambiam 18:12, 10 December 2018 (UTC)

Appendix:Reference detailEdit

According to the description: "This appendix provides detail to sources linked by Wiktionary. It is to be linked from reference templates." It contains three items, all created by User:Dan Polansky. Is this a new policy? The only reason I've noticed it is that Dan changed one of the Hungarian reference templates. I'd prefer to link directly from the template to its corresponding website and not to an appendix. Was there a Beer parlour discussion or vote on this? Panda10 (talk) 15:25, 9 December 2018 (UTC)

It is not a new policy, not anything mandatory and rigid. If you don't like my change in Template:R:TotfalusiEty 2005, please revert it. The point of the appendix is to provide more information than comfortably fits in the mainspace, e.g. English rendering of the title. Some reference templates link to Wikipedia, which is similar in that it does not lead to the main website of the reference. --Dan Polansky (talk) 15:31, 9 December 2018 (UTC)
Dan, thanks for your prompt reply. I do see your point, but for now, if you don't mind, I will revert the changes until it is decided by the community how to standardize reference templates. Panda10 (talk) 16:46, 9 December 2018 (UTC)
Thank you. I realized we could link to the appendix via "→Detail", without losing the immediate link to the dictionary website. I added the link as a proposal. --Dan Polansky (talk) 08:13, 15 December 2018 (UTC)
@Dan Polansky: I'm not sure. The only extra information the Appendix provides is the English translation of the book title. There has to be some other benefits of such an Appendix because it has to be maintained. Who will do it? I appreciate that you care about this but maybe in the future you could demonstrate the proposals using the Czech reference templates? I don't see any of them in the Appendix. :) Thanks! Panda10 (talk) 14:51, 15 December 2018 (UTC)
@Panda10:: The appendix needs to be maintained no less than the templates themselves. Furthermore, once correct information is entered, I do not see much of a need of further updates. As for Czech reference templates, I now added {{R:PSJC}} and {{R:SSJC}} to the appendix, and I am glad all the detail I added is not in the template display for the mainspace. In the appendix, I have stated how many entries there are in the dictionaries. --Dan Polansky (talk) 09:25, 16 December 2018 (UTC)
@Dan Polansky: I'm still not convinced. But if you find this system useful, it's fine to add all the Czech reference templates to the Appendix. As for the Hungarian template, I will revert the change. Panda10 (talk) 18:01, 16 December 2018 (UTC)
I'm letting it be now, I guess, but let me note that I don't understand it. I think it pretty obvious that the reader was better off having a link to a page with more detail, including English rendering of the title and the number of entries in the dictionary. --Dan Polansky (talk) 18:05, 16 December 2018 (UTC)
@Dan Polansky: I see too what you want. Actually instead of listing references in Appendices a technical solution that I consider agreeable is to have the transliterations, transcriptions and translations present in the templates but not shown without clicking to collapse – no? @Panda10: On this page, section 3 is actually about standardizing the information given by reference templates but the community does nothing, you could weigh in too. Fay Freak (talk) 20:21, 10 December 2018 (UTC)

Interesting BBC articlesEdit

An interesting BBC article on an analysis of Twitter that traces the geographic rise and spread of neologisms in American English: Feeling litt? The five hotspots driving English forward (4 May 2018). -Stelio (talk) 08:26, 12 December 2018 (UTC)

Another one on anti-languages: The secret “anti-languages” you’re not supposed to know (12 Feb 2016).

- Stelio (talk) 13:28, 12 December 2018 (UTC)

Thanks for sharing. The inaugural lecture video has some more details, and the data and scripts are available as well. It would be interesting to apply this to other languages. – Jberkel 22:11, 14 December 2018 (UTC)

English words with contraction-'s, etcEdit

A recent RFD got me thinking: by vote, we don't allow entries for words with possessive-'s, with only a few exceptions. Do we have any policy on which contractions are allowed? I've created some more interesting ones myself (double and triple contractions), seems like contraction-'s can be added to as many words as possessive-'s. Just googling the first few words from various parts of speech that pop into my head, I can find citations of all of them: not just nouns like cat's but difficult's (see google books:"difficult's an", etc), write's (google books:"If any line I write's a nobbler", etc), wow's (google books:"wow's an"), see also google books:dogs're. I presume we don't want entries for all of these! (The small set of ones attached to pronouns (he's, y'all'd've, etc) are worth keeping, IMO.) - -sche (discuss) 18:00, 12 December 2018 (UTC)

Oh yeah!Edit

Guess who has cracked 600,000 edits. [11] That works out at about 150 a day on average, though in practice I have some days when I don't come to Wiktionary at all and some days when I hammer away at it like a lunatic for eight hours. Equinox 00:06, 14 December 2018 (UTC)

I just love how the xtools edit counter has a big red banner at the top saying, "User has made too many edits!" Tsk, tsk. Andrew Sheedy (talk) 02:07, 14 December 2018 (UTC)
Impressive. And I haven't got to the half million mark yet. SemperBlotto (talk) 07:08, 14 December 2018 (UTC)
Impressive. That's over 1% of all the edits made on this site. - -sche (discuss) 22:35, 14 December 2018 (UTC)
Not so impressive. You wouldn't even get into the top 15 in Wikipedia. Also, perhaps you should get help for your wiki-addiction. What is impressive, however, is Wonderfool's hitting 300,000 despite around 130 blockings. --Mustliza (talk) 10:57, 15 December 2018 (UTC)
How about that top 15 on Wikipedia, though? bd2412 T 01:43, 16 December 2018 (UTC)
How on earth am I in sixth place? I feel like a lot of editors are way more active than I am. —Rua (mew) 23:24, 15 December 2018 (UTC)
You deserve some kind of medal, made of some kind of metal, for showing some kind of mettle. bd2412 T 01:44, 16 December 2018 (UTC)

Any use for a "rare character" index?Edit

Hello! There was recently a discussion at Extension:CirrusSearch about creating a new search index for "rare" characters that are currently not indexed by the on-wiki search engine. The three examples of difficult-to-find characters given were (Ankh), (ditto mark), and (ideographic closing mark). (Note that you can currently do an insource regex search like insource:/☥/, but on large wikis this is guaranteed to time out and not give complete results, and it is extremely inefficient on the search cluster.)

We can't index everything—indexing all every instance of e or . would be very expensive and less useful than , for example. So, in English, we would ignore A-Z, a-z, 0-9, space, and most regular punctuation (exact list TBD) and index pretty much everything else.

The most plausibly efficient way to implement such an index would only track individual characters at the document level, so you could search for documents containing both and , but you could not specify a phrase like "☥ 〆" or "〆 ☥", or a single "word" like ☥☥ or 〆☥.

I've opened a Phabricator ticket T211824 to more carefully investigate such a rare character index, to get a sense of how big it would be and what resources it would take to support it. If you have any ideas about specific use cases and how this would or would not help with them, or any other thoughts, please reply here or on the Phab ticket. (Increased interest increases the likelihood of this moving forward, albeit slowly, over the next year.)

Thank you! TJones (WMF) (talk) 16:27, 14 December 2018 (UTC)

One thing that comes to mind immediately is searching for control characters, private use block characters and unusual whitespace characters. It would be even more useful if such characters could be grouped together in a single search. DTLHS (talk) 16:35, 14 December 2018 (UTC)
We haven't thought too much yet about how the keyword for this would work. Parsing the query carefully so you can search for whitespace characters is always tricky. So, suppose the keyword is char:, then searching for documents with both ☥ and 〆 could be char:☥ char:〆, while searching for either would be char:☥ OR char:〆. We could have a special syntax like char:☥〆, which is more efficient, but would that be an implicit AND or an implicit OR? Either could be confusing; for example, searching for char:Иван would only incidentally actually find the name Иван.
For control or whitespace characters, being able to specify them by number would probably be useful, so \u2002 or U+2002 for an 'en space'. For the all three use cases, it sounds like you'd want OR, not AND as your combining operation, so you'd have to spell them all out, like char:\u2002 OR char:\u2003 OR char:\u2004 OR char:\u2005 ... for whitespace characters. I can see how something like char:\u2002-\u200D would be useful, but on the back end that would balloon into a fairly expensive search, and something like char:\uE000-\uF8FF for the whole Private Use Area or char:\uF0000-\uFFFFF for whole Supplementary Private Use Area-A would explode into ~6,400 or ~65,000 search terms on the back end, which we could not support. I could see maybe allowing specifying a range, but it would have to throw an error for more than some limit of characters in the range. (10? 20? 50?)
Were you hoping to search for an entire private use area at once, or just a limited range of characters? Thanks for the interesting use cases! TJones (WMF) (talk) 18:40, 14 December 2018 (UTC)
Yes, the whole private use area. Maybe that's not such a good fit for this request since I'm more interested in the boolean value "does this page have a private use area character in it or not", and not specifically which character it is. DTLHS (talk) 18:50, 14 December 2018 (UTC)
It might be possible to also index by Unicode block, so if I dig into this, I'll try to get a sense of what that looks like, too. Though I wouldn't expect it to be in the first version if we get that far. TJones (WMF) (talk) 19:30, 14 December 2018 (UTC)
For our purposes, Wiktionary entry names and links to entry names are far more important in searching for special characters: it helps to know when they include zero-width non-joiners, left-to-right markers, punctuation/whitespace outside of the Basic Latin Block, combining diacritics, or anything else that might produce a visual duplicate with different encoding. A different issue is mixing of scripts: Latin-script English paca and Cyrillic-script Russian раса (rasa) are fine, but we want to know when there's something like pаcа that has both Latin and Cyrillic, for instance. You might think of it as a multilingual version of antispoofing. Chuck Entz (talk) 20:08, 14 December 2018 (UTC)
Happily left-to-right marks and some other bidirectional control characters (see these testcases) are automatically removed from titles (as mentioned in Manual:Title.php § Canonical forms), so they would only need to be indexed when they appear in article text. For instance, ab%E2%80%8Ec (with a percent-encoded left-to-right mark) links to abc. — Eru·tuon 22:40, 17 December 2018 (UTC)
We already have that capability with the standardChars field in Module:languages. Searching inside entries for specific characters is more challenging. DTLHS (talk) 20:11, 14 December 2018 (UTC)
Latin/Cyrillic homoglyph detection and correction is a sometime hobby of mine on my volunteer account—so I know what a pain that can be. Did you know that intitle: now supports regex searches? This search finds titles (or redirects) that have a Cyrillic and Latin character adjacent to each other: intitle:/([Ѐ-ԯ][A-Za-zÀ-ɏɐ-ʯ]|[A-Za-zÀ-ɏɐ-ʯ][Ѐ-ԯ])/ (no link, because it's an expensive query, so you have to want it enough to copy-n-paste). There are some false positives with redirects that have been fixed, and with Kabardian and a few other languages that do seem to actually mix scripts, so къуэкIыпIэ is probably right, but ларпурлартизaм looks like the final a is Latin. Anyway, intitle: searches on regexes still time out (it's just too expensive to scan for everything), but they probably get closer to completion than insource: queries, which have more text to scan.
Anyway, it sounds like a second rare-character index for titles would be helpful for finding zero-width joiners, LTR/RTL markers, etc. in titles. Finding them specifically in links would be harder. They do get stripped from search terms, which is what I usually pay attention to. TJones (WMF) (talk) 20:47, 14 December 2018 (UTC)
Actually, that Kabardian word should have a palochka (here on Wiktionary, the lowercase one, ӏ; elsewhere often the uppercase, Ӏ) instead of a capital Latin letter I (see Kabardian orthography on Wikipedia). But about mixed scripts, on Wikipedia someone posted some words from Halkomelem, which adds the Greek letter theta into an otherwise Latin alphabet. (I was surprised that there wasn't a Latin theta character, because theta is regularly used in the IPA.) — Eru·tuon 00:28, 15 December 2018 (UTC)
@Chuck Entz: Here is a list of titles with both Latin and Cyrillic characters from the December 1st dump. Looks like there are a few quark words (like b-кварк) for which this isn't an error. [Edit: See also User:Keith the Koala/Mixed character sets, though it is not up-to-date.] — Eru·tuon 01:16, 15 December 2018 (UTC)
@Erutuon: I would go through that list and fix them, but right now there are too many redirects and valid uses of the palochka. If you could exclude those, the list would have much fewer false positives. —Μετάknowledgediscuss/deeds 02:45, 15 December 2018 (UTC)
@Metaknowledge: The palochka belongs to the Cyrillic script, so anything in the list with a palochka lookalike (like the aforementioned къуэкIыпIэ) needs fixing. — Eru·tuon 04:27, 15 December 2018 (UTC)
I've removed all the redirects. — Eru·tuon 04:40, 15 December 2018 (UTC)
@Erutuon: Thanks. I honestly can't remember the outcome of the old discussions about what to do with different ways to encode the palochka in Caucasian languages. @Atitarev? —Μετάknowledgediscuss/deeds 05:43, 15 December 2018 (UTC)
@Metaknowledge, Erutuon: I don't remember the exact outcome either BUT when Roman letters, numbers or "|" substitute for palochka (upper or lower case), they are definitely wrong but could be used as redirects, since the use of palochka proper is still uncommon. The correct/normalised spelling for Kabardian къуэкIыпIэ (q̇°ăkIəpIă) is къуэкӏыпӏэ (q̇°ăč̣̍əṗă), using the lower case palochka ӏ but some people think we should use the upper case palochka Ӏ: къуэкӀыпӀэ (q̇°ăč̣̍əṗă). It's the form used when palochka was first introduced and there was no upper case/lower case distinction. Both forms look alike and the lower case palochka was added much later by the Unicode. In my opinion, we should use upper case Ӏ and lower case palochka ӏ following the capitalisation rules of the corresponding languages as intended. Lookalikes: !, 1, |, I, l should be all replaced with Ӏ/ӏ. --Anatoli T. (обсудить/вклад) 06:20, 15 December 2018 (UTC)
@Atitarev, Erutuon: I have now fixed everything on the list except for legitimate/unclear uses and palochkas. Anatoli, would you be willing to move the palochka entries as you see fit, leaving redirects behind? —Μετάknowledgediscuss/deeds 06:24, 15 December 2018 (UTC)
@Erutuon, Metaknowledge: I think it's mostly done. I'm not sure what to do about the Akhvakh term жиᴴво (žĩwo, cow), which has a letter letter . It's a very poorly documented language. Letter "ᴴ", according to the Russian Wikipedia is only used to display the pronunciation and in real life vowels are written without it. The information is based on the newspaper «Ахвахцы — Ашвадо». I don't know if there is a Cyrillic equivalent for this superscript letter. --Anatoli T. (обсудить/вклад) 00:28, 18 December 2018 (UTC)
@Atitarev It looks like there is a "MODIFIER LETTER CYRILLIC EN" (U+1D78), but it also looks like font support for it is kind of weak (but not much worse than the Latin version). That would give "жиᵸво", for example. It has the benefit that it normalizes to regular Cyrillic н for search, too! TJones (WMF) (talk) 19:36, 18 December 2018 (UTC)
@TJones (WMF): Sorry for taking time to respond. I have moved the entry to живо, since Cyrillic or Roman are considered non-mandatory diacritics. I used "жиᵸво" in the display (жиᵸво) and converted both жиᴴво and жиᵸво to hard redirects to живо. --Anatoli T. (обсудить/вклад) 01:37, 29 December 2018 (UTC)
@Atitarev: No worries! With the end of the year and the holidays everything has been a little slow to happen. That looks like a good compromise. Thanks for worrying about the details! TJones (WMF) (talk) 19:25, 2 January 2019 (UTC)
To the invisible characters Chuck has mentioned (namely ZWNJs and LTR and RTL marks) as being undesirable in pagenames, and thus desirable to find, I would add: soft hyphens. Currently, all of these are caught by periodic checks of database dumps, as mentioned in Wiktionary:Todo#Semi-regular_tasks; being able to find the characters in a way that didn't require downloading database dumps would make it easier for more people to check for them more often. (This gives me an idea about MediaWiki:Titleblacklist which I will raise in a new section!) - -sche (discuss) 06:35, 15 December 2018 (UTC)
ZWNJs are quite desirable in pagenames for certain languages, e.g. Persian. You'd have to sort by language just to filter out all the good examples of ZWNJs being used. —Μετάknowledgediscuss/deeds 06:37, 15 December 2018 (UTC)
Good point. I suppose one might do a search like char:[ZWNJ] insource:-Persian. - -sche (discuss) 06:47, 15 December 2018 (UTC)
We could probably make a regex that matches ZWNJ in a position where it actually has a visible effect, for instance between a left- or dual-joining Arabic character, zero or more characters transparent to joining, and a right- or dual-joining character. (I imagine it would be long.) But it would have to be applied in the following manner: if the title contains ZWNJ, forbid it unless it matches this regex. Not sure if that's possible. I did notice there is MediaWiki:Titlewhitelist though. Maybe ZWNJ can be unequivocally blacklisted in MediaWiki:Titleblacklist, but then whitelisted under limited circumstances in MediaWiki:Titlewhitelist. — Eru·tuon 07:16, 15 December 2018 (UTC)
I mean, we don't have to blacklist ZWNJs if it would be problematic/complicated, we could just keep making periodic database-dump checks for them (excluding Persian), and only blacklist things that are indeed always unwanted. - -sche (discuss) 16:38, 15 December 2018 (UTC)
The problem with any regex approach on English Wiktionary or other large wikis is that unless there is a clear trigram that the regex acceleration can latch onto, the search will have to do a text scan, which will time out, and you are going to get incomplete results, which is a bummer. A rare character title index would actually be great for regexes built around specific characters. char:[ZWNJ] insource:/<complex regex with ZWNJ>/ would actually be likely to finish because the pool of docs with a ZWNJ somewhere in them would be relatively small. TJones (WMF) (talk) 21:06, 17 December 2018 (UTC)
I can’t a priori exclude that bidirectional control characters might appear legitimately in pagenames, and I can only warn since they do have a purpose and invectives against them frequently lead to gold-plating. They are just unlikely needed as multiple scripts are also unlikely needed in page names. I could imagine some mixed chat slang using Latin and Arabic or Hebrew script needing bidi characters, how far away the creation of pertinent pages might now be. Though definitely any bidirectional control sign should throw warnings. The direction-overriding U+202D and U+202E can be blacklisted though. Fay Freak (talk) 14:21, 15 December 2018 (UTC)
A bit off-topic, but I wonder if we might want to use such control characters more often in the {{DISPLAYTITLE}} magic word. I note that on my computer, the headword line at بند «پ» looks as it should, but the pagetitle does not. —Μετάknowledgediscuss/deeds 05:43, 17 December 2018 (UTC)
@Metaknowledge: My scriptTitles script fixes the top header of بند «پ» by adding a script class. Control characters would work, but I don't think they are the preferred method in HTML. — Eru·tuon 06:20, 17 December 2018 (UTC)
Soft hyphens are with the others on my list of commonly-encountered invisible characters. I monitor them when I make language analysis changes, and I've lobbied Elasticsearch and Lucene to strip them in their default language analysis chains. Anyway, a rare character index on titles would make finding them easier, so I'll add it to my list. Thanks! TJones (WMF) (talk) 21:06, 17 December 2018 (UTC)

Use MediaWiki:Titleblacklist to block titles with undesirable invisible charactersEdit

It occurs to me that we could use this to prevent pagenames from containing various undesirable invisible characters which persistently creep up, like soft-hyphens (a recurring problem when people copy-paste words from certain other sites), couldn't we? My understanding is that pages containing those characters would thereafter be impossible to create, but presumably our existing entries on the characters themselves would be unaffected(?)—or if not, we could move them to Unsupported_titles/. - -sche (discuss) 06:37, 15 December 2018 (UTC)

An abuse filter would also work. Which one is more user friendly? DTLHS (talk) 06:42, 15 December 2018 (UTC)
Good point, and an abuse filter could also warn against and block or tag these in article bodies. OTOH, we can only have abuse filters do so many things before they run out of resources. As for user-friendly: it seems to be possible to display a customized message to anyone adding a blacklisted title (like w:MediaWiki:Titleblacklist-custom-imagename), which might be more friendly(?) than the messages abuse filters theoretically display, since those usually don't display for me (I see only the "short descriptions" like "ref-no-references") and apparently other users, based on confused feedback we've gotten from users wondering why their edits were blocked. - -sche (discuss) 07:08, 15 December 2018 (UTC)

Christmas competitionEdit

Hey all. I made a new Christmas competition. You have until Nanakusa-no-sekku to submit an entry. --Mustliza (talk) 10:52, 15 December 2018 (UTC)


Another important announcement...another entry has hit 10 years. The one in question in Dakasian, which has been sitting in WT for 10 whole years without being corrected. It was made by some prat called Jackofclubs (talkcontribs). I wonder what came of him... --Mustliza (talk) 11:07, 15 December 2018 (UTC)

I just touched the 10-year-old. Equinox 13:25, 15 December 2018 (UTC)
Why don't you have a seat right over there?Dixtosa (talk) 17:01, 15 December 2018 (UTC)
@Mustliza: Wonderfool, how do you know it is of English origin? Looks like an Anglicization of of Armenian Դաքեսյան (Dakʿesyan). --Vahag (talk) 13:46, 15 December 2018 (UTC)
Are the Dakasians something I should be keeping up with? Equinox 15:08, 15 December 2018 (UTC)
IIRC, VP thinks that all words are of Armenian origin. He may be right, though - this website shows Dakasians with first names Hayz, Vahan, Hagop and Vesta. --Mustliza (talk) 20:23, 15 December 2018 (UTC)
I wish I could see the mugs of these people. I can identify an Armenian face with a 99% accuracy. --Vahag (talk) 12:15, 16 December 2018 (UTC)
On Wikipedia, we had a project a while back to identify the oldest and longest untouched pages, touched by the fewest editors. We had a bot assign points based on the age of the page, age of the last edit, and number of people who had edited it. We did come up with a lot of problematic pages that way. bd2412 T 01:47, 16 December 2018 (UTC)
Who should we talk to about running that bot here? - -sche (discuss) 23:07, 16 December 2018 (UTC)
You don't need a bot, they're listed on Special:AncientPages. DTLHS (talk) 00:13, 17 December 2018 (UTC)
Another special page with false and annoying message: "Updates for this page are currently disabled. Data here will not presently be refreshed." DCDuring (talk) 03:33, 17 December 2018 (UTC)
So change the message! I think this is sth admins can do. --Mustliza (talk) 06:54, 17 December 2018 (UTC)
Maybe not in this case; there isn't a single editable page with "Updates for this page are currently disabled" in it, besides this one. It's probably built in. — Eru·tuon 07:01, 17 December 2018 (UTC)
So build it out! I think this is sth admins can do. --Mustliza (talk) 07:08, 17 December 2018 (UTC)
Tell me how to do it and I will do it. I also don't now who to ask or what to ask for. DCDuring (talk) 08:17, 17 December 2018 (UTC)
MediaWiki:Querypage-no-updates is the "no longer updated" message, and MediaWiki:Perfcachedts is the message about caching. Neither of those messages are specific to the page itself, so modifying them will possibly alter other pages unintentionally. MediaWiki:Ancientpages-summary can add text to the top of the page, but can't remove the existing text (it seems). We could suppress the message with javascript or CSS (class is mw-querypage-no-updates). - TheDaveRoss 19:51, 27 December 2018 (UTC)
@TheDaveRoss: Huh, so I was wrong when I thought that page didn't exist. How about just using a parser function, which I've just done? — Eru·tuon 20:38, 27 December 2018 (UTC)
Just to be clear, are we editing this text because the page is in fact regularly updated? - -sche (discuss) 20:48, 27 December 2018 (UTC)
It's been updated since I last visited it. The most recent update was on December 21, and I visited it before that time while the original discussion was still going on. Since that's about six days ago now, apparently it's only infrequently updated. — Eru·tuon 21:54, 27 December 2018 (UTC)
There are a few pages that are in fact updated twice a month or so that have the warning. There are many more that are updated on a similar schedule that don't have the notice. If it requires a rectal tonsilectomy to change the notice, I'm sorry that I asked. DCDuring (talk) 22:08, 27 December 2018 (UTC)
Also, some wiki-magic which may be of use to people in these situations, if you append ?uselang=qqx to a URL it will show the names of all messages in parenthesis, which makes them much easier to find. - TheDaveRoss 19:59, 27 December 2018 (UTC)

Selection of the Tremendous Wiktionary User Group representative to the Wikimedia Summit 2019Edit

Dear all,

Sorry for posting this message in English and last minute notification. The Tremendous Wiktionary User Group could send one representative to the Wikimedia Summit 2019 (formerly "Wikimedia Conference"). The Wikimedia Summit is an yearly conference of all organizations affiliated to the Wikimedia Movement (including our Tremendous Wiktionary User Group). It is a great place to talk about Wiktionary needs to the chapters and other user groups that compose the Wikimedia movement.

For context, there is a short report on what happened last year. The deadline is very close to 24 hrs. The last date for registration is 17 December 2018. As a last minute effort, there is a page on meta to decide who will be the representative of the user group to the Wikimedia Summit created.

Please feel free to ask any question on the wiktionary-l mailing list or on the talk page.

For the Tremendous Wiktionary User Group, -- Balajijagadesh 05:56, 16 December 2018 (UTC)

Who wants to go to Berlin? Does anyone know whether there is any money for travel? Otherwise, it will probably be dewikt that sends someone. DCDuring (talk) 22:15, 16 December 2018 (UTC)
@Psychoslave I missed the 2018 report when it was published. Lots of interesting points in there, but I doubt many people have read/noticed it, the talk page mentions a translation which never happened. – Jberkel 06:34, 17 December 2018 (UTC)
I'd love to go to Berlin! I'll be your representative. So send me there. I think this is sth admins can do. --Mustliza (talk) 06:52, 17 December 2018 (UTC)

Hello everybody. I'm pleased to see there was a vote for sending someone at the summit, I hope this will pass the whole selection process. Sorry for the report no being available in other language than French, I didn't found time to translate it so far, and don't see available free time in my near schedule ever. Feel free to help translating it if you can. Also apologies for the lake of efficient communication regarding its publication. Please feel free to ask me anything if you have specific points you would like to have information about and to ping me for anything related to the TWUG.

This was an extremely unsatisfactory process. The selected representative is someone I've never heard of, but who seems to have been backed by a well-coordinated, well-informed group in what seems more like a coup than an election. DCDuring (talk) 16:23, 20 December 2018 (UTC)

Zulu vowel length markingEdit

Zulu orthography does not mark vowel length, however Wiktionary (and Wikipedia) have taken up the convention of marking vowel length in Zulu with a macron. I have not seen this convention outside of Wiktionary and Wikipedia, and I think it's a bit problematic because it could be mistaken for a tone diacritic. Also, it looks messy when tone diacritics get stacked on top of the macron. I would like to use the character ː to mark length because it leaves room for tone diacritics and is more clear in its meaning. Another possibility is to double the vowel character to show length, but in my opinion that doesn't look as good. Smashhoof2 (talk) 02:25, 17 December 2018 (UTC)

@Rua, Metaknowledge Chuck Entz (talk) 03:59, 17 December 2018 (UTC)
Well, I've not studied Zulu, but my understanding is that vowel length is not a very important phenomenon in it. You've got it in most (but not all) penultimate syllables like most Bantu languages, and that's allophonic. Then you've got it in some ideophones, but I think those are generally written with double or even triple vowels (correct me if i'm wrong). And then there are the contractions, which I suppose are actually phonemic but don't come up too frequently. I would say that we don't really need to be marking this on the headword at all — just stick the length mark in the IPA, and leave it at that. —Μετάknowledgediscuss/deeds 05:36, 17 December 2018 (UTC)
@Rua, Metaknowledge Vowel length is very important in the realization of tone. Some verbal prefixes have long vowels, and the short form of the perfect is a long vowel. And Rua mentions that the noun class prefixes contain long vowels. Along with that, there is the allophonic penultimate vowel lengthening. However, this allophonic lengthening has large effects on the tonal realization. For example, inja is written in the headword as înjá. However, this is misleading as it has two tonal realizations. With penultimate lengthening you get îːnjá, and without it, you get ínja. (The underlying tone is /ínjá/.) I plan on adding tone to the Zulu verb inflection tables, but to do so I will need to have a short form and a long form for every verb form, as the tone is different between the two forms. I just think we should adopt a different marking of vowel length because I don't like stacking diacritics, and I haven't seen the macron used elsewhere. Authors on Zulu tone (James Khumalo for example) tend to use doubled vowels in autosegmental derivations, but use ː in surface forms. Smashhoof2 (talk) 19:30, 17 December 2018 (UTC)
It's not a tone diacritic because only ´ and ^ are indicators of tone. ^ already implies length, so ¯ is only needed where there is a ´ tone or no tone at all. I don't agree with removing length indications though, because they can be distinctive (class 5 ī is distinguished from class 9 i, Xhosa distinguishes these orthographically). In the case of a long final syllable, the stress also shifts to the final syllable. —Rua (mew) 11:30, 17 December 2018 (UTC)


I've never see as much clickbait on Wiktionary as on the page toe-tapper - does it really help the reader to have a link to see pictures of a particular bathroom? I'd like to delete the link, but figure that this is sth admins can do. --Mustliza (talk) 07:32, 17 December 2018 (UTC)

  • Removed (it did seem a bit over-the-top) SemperBlotto (talk) 07:45, 17 December 2018 (UTC)

Word of the yearEdit

Lots of the many online dictionaries on the web have got a supercute "Word of the Year" feature. There's nothing on Wiktionary:What Wiktionary is not that says we're not supercute either. Hence, as a result it is undeniably clear that we also are supercute too. QED. Maybe we could choose something from the bleeding edge of lexicography (this year's Word of the day), tweet about it and when hobnobbing with other lexicographers they will kiss our rings. I suppose it would have to be decided before the dernier day of the year - that would be tickety-boo. This is sth admins can do as well as regular wordsters. --Mustliza (talk) 08:04, 17 December 2018 (UTC)

  • Just as long as it's not bloody Brexit. SemperBlotto (talk) 08:05, 17 December 2018 (UTC)
    That will probably be next year's. — SGconlaw (talk) 08:30, 17 December 2018 (UTC)
    • Not a bad suggestion, but the question is who is going to draw up the shortlist and arrange for a vote. (Not me, please.) — SGconlaw (talk) 08:29, 17 December 2018 (UTC)
      • Shortlists are often driven by analysing search / request traffic, something we don't really do. I think we should also focus on non-English words, or maybe English borrowings from other languages, just to differentiate us a bit (we're multilingual, after all).– Jberkel 10:43, 17 December 2018 (UTC)
        @Jberkel, we do actually keep track of traffic: link. - TheDaveRoss 13:12, 17 December 2018 (UTC)
        @TheDaveRoss: We keep track, but don't really analyze in depth. And the top viewed pages don't look very interesting / feature-worthy. Do you know if there is a way to get hold of the search query data? – Jberkel 14:02, 17 December 2018 (UTC)
        @Jberkel: not that I know of, but I haven't looked into the stats API thoroughly. It doesn't seem to be on the hadoop cluster, and it certainly isn't available in the wiki database, so I am not sure where it would live if the data exist. - TheDaveRoss 14:37, 17 December 2018 (UTC)
        Yeah, looks like the top search term in 2018 was "BF". No idea why that would be the case. — SGconlaw (talk) 15:00, 17 December 2018 (UTC)
        I don't trust this data – if you enable "Show mobile percentages" you'll get some top entries in the 0.x% / 99.x% ranges, which can't be right. – Jberkel 16:00, 17 December 2018 (UTC)
        It has a lot to do with the fact that outside of the Western world the predominant means of accessing the internet is via mobile (e.g. India is 90% mobile views), and the vast majority of views on certain pages are from specific regions. - TheDaveRoss 17:58, 17 December 2018 (UTC)
        Granted, but 99.6% mobile access (for BF) seems improbably high. – Jberkel 20:48, 17 December 2018 (UTC)
        • Well, since we have both WOTD and FWOTD, we could have both a Word of the Year (English) and Foreign Word of the Year. Maybe one way to avoid having to compile shortlists would just be to ask editors to propose and vote on entries by a certain date. — SGconlaw (talk) 11:00, 17 December 2018 (UTC)
My proposal is to pick an entry entirely at random. DTLHS (talk) 16:06, 17 December 2018 (UTC)
When I just did a Random entry, I got broutâtes. Perhaps we should keep it at lemmas.  --Lambiam 20:11, 17 December 2018 (UTC)
I think the idea was to pick a former (F)WOTD randomly. Or we could try to find the most zeitgeisty ones. My picks: Anthropocene / Dutch wereldbrand. Grim. – Jberkel 20:48, 17 December 2018 (UTC)
I see no point in designating an arbitrary random word as "word of the day/month/year" because we already have a random-entry search feature anyone can use. The point of these features is usually to illustrate the zeitgeist, i.e. words in the news. (This tends to be abused by using words that have been mentioned without much real usage, but I suppose some of them do survive.) Equinox 20:13, 17 December 2018 (UTC)
I rather like Jberkel's suggestions. By the way, not much time till the end of the year to do this, unless the Word of the Year is intended to be announced in the following year. Also, someone has to design a template that will fit on the Main Page. — SGconlaw (talk) 07:08, 20 December 2018 (UTC)

Should set-type categories also contain their namesake?Edit

I noticed a lot of cases where a set category, such as Category:en:Hares or Category:en:Dogs, contain entries for synonyms of their namesake (hare and dog in this case). I tend to place these in their parent category instead, Category:en:Lagomorphs and Category:en:Canids, because a hare and a dog are a specific kind of lagomorph and canid respectively, and not a specific kind of hare or dog. Other people seem to have thought differently in the past, and there isn't a specific consensus about it. To avoid people moving entries back and forth to match the way they think, I would like to form a consensus and perhaps even a formal rule. Personally, I think a set category should only used for kinds of their namesake, and not their namesake itself. —Rua (mew) 13:32, 20 December 2018 (UTC)

I see where you are coming from, but think it would be quite hard to enforce such a rule as it is not obvious. Perhaps just put the word in both categories. — SGconlaw (talk) 17:04, 20 December 2018 (UTC)
Seems logical to me. It'd be a good idea to have a set of guidelines at least somewhere, maybe on a Wiktionary namespace page (perhaps a policy page, but I don't think that much is even necessary) explaining how the set/topic cat system works and what it's intended to do. Related to this is the point that the description of the categories don't unambiguously point to what you're suggesting; e.g. the description of Category:en:Dogs ("terms for dogs") doesn't necessarily suggest that it is only for hyponyms of "dog" and that its namesake should be placed elsewhere. — Mnemosientje (t · c) 14:17, 21 December 2018 (UTC)
Nobody knows or agrees on what it's intended to do. DTLHS (talk) 18:26, 21 December 2018 (UTC)
All the more reason to make sense of it in writing somewhere for future reference? — Mnemosientje (t · c) 12:08, 22 December 2018 (UTC)

Vietnamese xxx class nouns => Vietnamese nouns classified by xxxEdit

I am going to rename all categories "Category:Vietnamese xxx class nouns" into "Category:Vietnamese nouns classified by xxx" as the same way of Chinese, Thai, Lao, Lü, etc. Also, "Category:Vietnamese nouns by class" into "Category:Vietnamese nouns by classifier" either. I must ask here if you agree. --Octahedron80 (talk)

Seems fine to me. I’m all in favour of consistency. — SGconlaw (talk) 14:59, 21 December 2018 (UTC)
All renamed. However, I did not check if any word has correct classifier or any classifier exists. --Octahedron80 (talk) 03:31, 23 December 2018 (UTC)

Pinyin, Zhuyin Fuhao and ErhuaEdit


现代汉语词典7 p1348 "wányìr"; 现代汉语规范词典3 p1350 "wányìr"; "ㄨㄢˊ ㄧˋㄦ (ㄧㄜˋㄦ)"; “ㄨㄢˊ ㄧˋㄦ (變)ㄨㄢˊ ㄧㄜˋㄦ”

现代汉语词典7 凡例 p4 talks about the changes to pronunciation of the preceding syllable caused by erhua

I think that the time has come to include ㄨㄢˊ ㄧㄜˋㄦ somewhere on the 玩意兒 page- but in what way? --Geographyinitiative (talk) 16:06, 21 December 2018 (UTC)

WT:VOTE after four years?Edit

As in most democracies, could votes be repeated after four years if the community decides so? --Backinstadiums (talk) 19:20, 21 December 2018 (UTC)

No. In democracy the demos decides not that the demos has to vote (but a different organ) and here the community would decide that the community has to vote, and Wiktionary is not even a democracy, like shareholder meetings or student committees are unrelated to democracy. Fay Freak (talk) 19:48, 21 December 2018 (UTC)
You can make any vote you want. Nothing is going to automatically repeat itself. DTLHS (talk) 19:52, 21 December 2018 (UTC)
Yes, you can make a new vote (or RFD) if you think the situation has significantly changed since the last one. Making a new vote/RFD/etc shortly after an old one, when it isn't likely that anything has changed, would usually be disruptive, but revisiting an issue four years on is usually OK (after several years, the community itself will have changed a bit as users come and go). If this is about reversing the current lemmatization of Chinese entries, though, it would be best to start a discussion first and try to get major editors of Chinese onboard... - -sche (discuss) 22:28, 21 December 2018 (UTC)
  • Yes, you can repeat a vote. There is no specified number of years that need to elapse, but it would be a waste of everyone's time to repeat a vote very often. --Dan Polansky (talk) 08:32, 27 December 2018 (UTC)


(If I may make another w:WP:BEANS post,) I noticed that M. lists the two Latin names it's an initial of. This is probably sensible/tolerable for (inscriptional) Latin, where there are a limit number of regular abbreviations of names. In English and other languages, of course, any first, middle or last name that starts with M can be reduced to the initial M. (with or without a dot), and likewise for every other letter. Presumably we do not want thousands of {{abbreviation of}} senses to be added to M. and other letters for every abbreviated name. Do we have any guideline or policy that would prevent this, besides common sense, resorting to RFD, and blocking disruptive editors? I noticed this because I was trying to find out if "m." was an abbreviation of "mid" or "middle" in general or only in "m.Yks." for "mid-Yorkshire". - -sche (discuss) 22:41, 21 December 2018 (UTC)

Maintenance cost. Prognosis about how much it can get out of hand. How limited the set is. Cost benefit-ratio: If it is a waste of life time, may it be other editors’ or readers’ attention or may it seem like an unhealthy obsession (to which I count bad-faith trolling, like WF with the double surnames, like in Walden Two criminality was considered the result of an illness — ashaming that for all the speechlessness it took some days before Equinox set an example, even though the argument is simple, that one does not seek these entries, they are arbitrary, they violate limits of attention and abuse resources …) the line should be cut. For the readers side: One could have in mind more often how many people would look it up and find it useful. A list with all possible values for M. at M. has no value, one can use the forename category or whatever for the same purposes (Category:English given names, Latin has Category:Latin praenomina‎, Category:Latin nomina gentilia, Category:Latin cognomina‎). Also note the scope of the dictionary: There are uncommon abbreviations that can be cited enough but are to be found in the list of abbreviations of works. Like legal commentaries abbreviate all kinds of common words. If one does not recognize a word while reading such a work one looks into the list of abbreviations, then maybe one can look up the resolved word in a dictionary, this one or some other kind of reference works. If someone starts to add these abbreviations it needs to be stopped for his own sanity – there are enough places to do useless things, and too many things that should be done instead. In short, think about what consumers get and what editors get from having the entered content, this is a very universal principle. Fay Freak (talk) 19:33, 26 December 2018 (UTC)
I tend to agree. What are you referring to though, by "for all the speechlessness it took some days before Equinox set an example"? Per utramque cavernam 19:42, 26 December 2018 (UTC)
The fact that there was a long thread (it cringed me off) before Equinox just deleted them with the reasoning that Wonderfool is trolling (which is an umbrella term really that does not name the actual dangers), that it needed a Machtwort, like no admin was able to justify deletion expressis verbis, so he did it because he could do it without anyone able to cast doubts upon him. Of course there were good reasons, but admins shunned for perhaps knowing what is right but not being able to formulate what can be discerned without this being as such a formally voted rule. That’s how it appeared to me. Fay Freak (talk) 20:04, 26 December 2018 (UTC)

"Friendliness" versus RealityEdit

I have just made a change to Wiktionary:Example sentences (see also: Wiktionary talk:Example sentences). If you have to revert it, just know that you are 'the man'; it also doesn't affect my willingness to edit Wiktionary. Let me know what you think, but I think I just struck a blow against the empire. --Geographyinitiative (talk) 13:25, 23 December 2018 (UTC)

I don't agree with this change: I think the old friendliness rule was a good one. Equinox 18:15, 27 December 2018 (UTC)
I agree with Equinox, I think the old phrasing allowed for non-friendly examples when necessary, but preferred friendly ones when reasonable. The new rule gives no preference, and my view is that, all things being equal, we should include neutral or positive examples over negative or offensive ones. - TheDaveRoss 18:31, 27 December 2018 (UTC)
Hmm. Yes, the new text seems to tilt too heavily in the other direction, of writing offensive usexes. If we went back to the old text except we dropped "some", i.e. "Although some offensive or explicit words will require a sentence that demonstrates those qualities", that would seem to resolve the issue of suggesting that some offensive words should have friendly usexes. - -sche (discuss) 18:36, 27 December 2018 (UTC)
I tried that, also qualifying "You should generally write it so that it is unlikely to offend or embarrass". Better? - -sche (discuss) 18:40, 27 December 2018 (UTC)

Presentatives part of speechEdit

Which part of speech should be used for presentatives? I would like to add Zulu presentatives, but I'm not sure how to classify them. In the literature, Zulu presentatives are called copulative demonstratives. They are forms such as nansi "here it is (class 9)", nabo "there they are (class 2)", and nankaya "there they are over there (class 6)". They are also used with nouns to form expressions such as "Here is the dog", "There are the children", and "There are the girls over there." Other languages have presentatives as well, but I'm not familiar with any that do, so I'm not sure if/how they are labelled in Wiktionary. --Smashhoof2 (talk) 20:17, 26 December 2018 (UTC)

In at least some cases (English voilà, Latin ecce, Latvian lūk, Spanish vualá, Turkish işte) they have been classified as interjections.  --Lambiam 21:25, 26 December 2018 (UTC)
Interjection is a junkyard category for standalone expressions in English. I would hope that we would use terms that grammarians of Zulu use, rather than the Latin parts of speech into which we cram English expressions. DCDuring (talk) 22:11, 26 December 2018 (UTC)
Yeah, interjection isn't a good fit because presentatives are used as part of larger phrases. The closest category is verb, since they can form predicates, but really it's not a good label because they definitely aren't verbs. It seems that Wiktionary doesn't have any fitting category for them. --Smashhoof2 (talk) 03:54, 27 December 2018 (UTC)
Most of the presentatives I listed above as currently categorized as interjections can also be used as part of larger phrases (ecce cor meum; lūk mana sirds; işte kalbim). The set of allowed POS headers is severely limited, so we have to resort to using some as catch-alls.  --Lambiam 10:43, 27 December 2018 (UTC)
They're essentially verbal pronouns. I'd put them under the pronoun header, and instead of making independent entries, have the entries point to the independent pronouns with a link to an explanatory appendix. —Μετάknowledgediscuss/deeds 18:00, 27 December 2018 (UTC)
How are these named in popular books on Zulu grammar or, failing that, in a plurality of not-so-popular books on the subject? DCDuring (talk) 21:50, 27 December 2018 (UTC)

Vote: Lemming principle into CFIEdit

FYI, I created Wiktionary:Votes/pl-2018-12/Lemming principle into CFI, based on Wiktionary:Beer parlour/2014/January#Proposal: Use Lemming principle to speed RfDs. --Dan Polansky (talk) 08:26, 27 December 2018 (UTC)

Thanks for creating this, I hadn't realized it had never been voted on. - TheDaveRoss 13:18, 27 December 2018 (UTC)

Vote: Phrasebook CFIEdit

FYI, I created Wiktionary:Votes/pl-2018-12/Phrasebook CFI. From what I recall, some editors seemed to support similar criteria in RFD discussions. --Dan Polansky (talk) 09:06, 27 December 2018 (UTC)

Invitation from Wiki Loves Love 2019Edit

Please help translate to your language

Love is an important subject for humanity and it is expressed in different cultures and regions in different ways across the world through different gestures, ceremonies, festivals and to document expression of this rich and beautiful emotion, we need your help so we can share and spread the depth of cultures that each region has, the best of how people of that region, celebrate love.

Wiki Loves Love (WLL) is an international photography competition of Wikimedia Commons with the subject love testimonials happening in the month of February.

The primary goal of the competition is to document love testimonials through human cultural diversity such as monuments, ceremonies, snapshot of tender gesture, and miscellaneous objects used as symbol of love; to illustrate articles in the worldwide free encyclopedia Wikipedia, and other Wikimedia Foundation (WMF) projects.

The theme of 2019 iteration is Celebrations, Festivals, Ceremonies and rituals of love.

Sign up your affiliate or individually at Participants page.

To know more about the contest, check out our Commons Page and FAQs

There are several prizes to grab. Hope to see you spreading love this February with Wiki Loves Love!

Kind regards,

Wiki Loves Love Team

Imagine... the sum of all love!

--MediaWiki message delivery (talk) 10:12, 27 December 2018 (UTC)


Hello, I just saw this edition, which I wanted to undo because it didn't make sense - a word derived from itself in the same language. In cases like this, where I don't speak the language, what's the best thing to do? Ignore? Undo? Ask here? Tag the page? Speak to the editor? --Pious Eterino (talk) 17:02, 27 December 2018 (UTC)

Don't ask here. Obviously bad edits should be undone. If you don't know what to do, leave it to somebody else. —Μετάknowledgediscuss/deeds 17:57, 27 December 2018 (UTC)
You can bring it to people's attention in the WT:ES (if the change is to the etymology) or WT:TR (if the change is to definitions, etc) if it seems fishy but is not obviously bad. - -sche (discuss) 18:32, 27 December 2018 (UTC)
Ignoring is always an option, but with millions of entries that no one has time to visit, bad edits can go unnoticed for a very long time. There are a number of better options. In this case, the details of the language are irrelevant: you don't list the word itself as its own source in an etymology, so you would have been entirely justified in undoing the edits with no knowledge of the language whatsoever. It never hurts to explain your reason in the edit summary, because a lot of bad edits are made in good faith (this edit looks like a matter of being solely focused on adding the reference and getting the technical details right without thinking about whether it made sense). Or you can include the same kind of message in a post on the editor's talk page, though there's always the possibility the editor may not respond satisfactorily. You can also use {{attention|wo}}, which puts the entry in a category that people who know the language may check ... eventually. If it does hinge on matters you're not qualified to address yourself but you want to challenge the etymology, you can use {{rfv-etymology|wo}} and start a discussion at the Etymology scriptorium. You can also start a discussion at the Tea room if it's something about the entry other than the etymology. Chuck Entz (talk) 18:36, 27 December 2018 (UTC)
Great advice, Chuck Entz! --Pious Eterino (talk) 00:40, 28 December 2018 (UTC)

CFI and English editing guidelinesEdit

CFI was changed in diff to no longer point to Wiktionary:About English, pointing to Wiktionary:English editing guidelines (redlink) instead. I cannot see the point of the change; can someone undo the change to CFI? --Dan Polansky (talk) 19:00, 27 December 2018 (UTC)

Wiktionary talk:English entry guidelines § RFM discussion: November 2015–August 2018 Per utramque cavernam 19:05, 27 December 2018 (UTC)
The consensus for moving to English entry guidelines was pretty weak; if the page has to stay there, CFI needs to be updated accordingly ("entry" guidelines, not "editing" guidelines) or reverted back to Wiktionary:About English, which is still a redirect (and should be). --Dan Polansky (talk) 19:09, 27 December 2018 (UTC)
Why would only the English page be renamed anyway, when all of our infrastructure expects the page to be named the same in all languages? Is there a particular reason why English should have a different page name from other languages? —Rua (mew) 20:20, 27 December 2018 (UTC)

Defect in WT:THUBEdit

@BD2412, SemperBlotto — On another page Dan Polansky pointed out a shortcoming in the definition of qualification in the section Translation hubs of our CFI. It can be illustrated with the Chinese translation 失身 of English lose one's virginity. The first character, , means to lose. So far, no good. The second character, , can mean many things, from body to social status to moral character. It cannot mean virginity. There is no way to construct this idiomatic meaning of the whole from the meanings of its parts. In all reason, this should qualify as a translation that supports inclusion of the English term. However, it does not, because the rules, as formulated (which are said to be “tentative”), specifically exclude any translations in a language that does not use spaces to separate words.
Obviously, we do not want, say, Chinese 超文本系統 to qualify as a supporting translation of hypertext system; it is simply a compound that is a word-for-word translation of the English term: 超文本 (hypertext) + 系統 (system). But this is already excluded in the present formulation by the same rule that excludes German Autoschlüssel as a qualifying translation of car key. The remedy is simple: just strike the clause excluding languages that do not separate words by spaces. While we are at it, I propose to combine the first two exclusion rules to a single one:

  • a closed compound or multi-word phrase that is a word-for-word translation of the English term: German Autoschlüssel does not qualify to support the English term “car key”; or

 --Lambiam 08:12, 28 December 2018 (UTC)

My first impression is that the proposed removal of "a phrase in a language that does not use spaces to separate words" from WT:THUB would be fine, and the proposed merger of the two items would be fine as well. However, I feel I do not have enough energy to consider the impact more thoroughly. --Dan Polansky (talk) 19:31, 28 December 2018 (UTC)

soft redirection template for Japanese (revisited)Edit

Hiragana modern まっとう
historical まつたう
Kanji 全う
Notes 真っ当 – ateji, adjective only
完う – literary

Hello everyone! I have redesigned {{ja-kanji spellings}} after Modèle:ja-trans on the French Wiktionary. Here are my ideas of a soft redirection system for Japanese.

  1. The new version of the template can display kana spellings (modern and historical) in addition to kanji spellings. As such, the name is no longer accurate. As the template is intended for wide use, the name {{ja-spellings}} is a little long. I would like to rename it to {{ja-forms}}, but that name is already taken. So I hope someone with a bot can rename that template and update pages using it to make room. (I have made the request here.)
  2. The original version of the template supported embedding of kanjitabs in the kanji spellings, but this function is now removed. The idea is that {{ja-kanjitab}} would be used independently of the {{ja-ks}}/{{ja-see}} system. Therefore, if for the word aikotoba, n. the kanji spelling 合い言葉 is chosen as the lemma entry, it would have both a {{ja-ks}} and a {{ja-kanjitab}}, but if the kana spelling あいことば is chosen as the lemma entry, it would have only {{ja-ks}}.
  3. The soft redirection system requires that among the different spellings of a given word, the lemma entry uses {{ja-ks}} to link to the other spellings, and the other spellings use {{ja-see}} to link back. Now that {{ja-ks}} is created, {{ja-see}} remains to be discussed:
    1. Do you think {{ja-see}} needs to copy definitions from the lemma entry? One difficulty is that Japanese definitions on the wiki tend to be elaborate. For example, the definition of 酔う is “to get drunk, become intoxicated or inebriated, fall under the influence of alcohol; to become drunk or intoxicated by something; etc.” This is fine for the lemma entry, but a more concise definition such as “to get drunk; to become intoxicated; to feel sick; etc.” would be better on the non-lemma spellings. Chinese editors solve the problem by writing definitions in a way like # translation {{gloss|longer description}}, and {{zh-see}} would only copy the translation. Following this principle, the Japnaese definition of 万葉集 should read “Man'yōshū (Japan's oldest anthology of poems, completed in 759)” so that {{ja-see}} would only copy the “Man'yōshū”. [UPDATE: the ja-see template now uses a different format, one which looks better when definitions are copied in full.]
    2. What do you think about categories? I think the categories concerning the word (e.g. Category:Japanese nouns for aikotoba, n.) should be copied among all the spellings, while categories concerning spellings (e.g. Category:Japanese terms read with kun'yomi for 合い言葉) should remain spelling-specific. In addition, I think sort keys can be removed and each page can be sorted under its first character (for example, あいことば under あ, and 合い言葉 under 口, the radical of 合) for the following reasons: First, the current practice of sorting pages under their readings do not work well for pages with more than one readings. For example, 避ける must be sorted under both さ and よ, which is not possible in the MediaWiki software. Second, under my proposal, categories concerning words would hold all the spellings of a given word. Since Category:Japanese nouns would contain both あいことば and 合い言葉, there is no need to categorize both under あ. Categories concerning spellings could be sorted by spellings instead of readings.
    3. What do you think about soft redirects within written forms that are independent of reading? For example, is a variant of whether it represents tai, n., affix or karada, n., and 1日/1日/壱日 is a variant of 一日 whether it represents ichinichi, n. or tsuitachi, n. Therefore it might be a good idea to have two stages of soft redirects, one within written forms (1日/1日/壱日一日) and one from written forms to words (一日ichinichi, tsuitachi). The first stage could be handled by a template like {{ja-seex}} which copies all the categories from the lemma written form, and the second stage could be handled by the regular {{ja-see}} which copies only the relevant categories from the lemma word. (For example, would copy categories of both shin and ma- from , but would only copy ma- “right, true” and not for example ma “time, pause, space” from ).

(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Fumiko Take, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4): --Dine2016 (talk) 15:36, 28 December 2018 (UTC)

@Dine2016: How could it be used to improve the chinese entries as well? --Backinstadiums (talk) 19:04, 28 December 2018 (UTC)
Re Q3.1: The “Chinese” approach for avoiding long definitions seems fine to me. re Q3.2: Spelling-independent categories should indeed apply to all regular spellings, while spelling-dependent categories should (obviously) only be applied to the spellings to which they are applicable. I have no opinion on the best approach to sorting. As long as we cannot sort on multiple keys, any approach will suck. How is sorting on the actual entry (e.g., 合い言葉 under the character 合 rather than its radical) worse than sorting on the radical?  --Lambiam 22:34, 28 December 2018 (UTC)
Sorry for the late reply.
@Backinstadiums: There is already soft redirection for Chinese entries so we do not need another set of templates. The Japanese case is different from Chinese. In Chinese, there is usually a one-to-one correspondence between Traditional Chinese and Simplified Chinese. In Japanese, however, kana spellings and kanji spellings of a word are usually one-to-many, and one page (e.g. かえる) can correspond to several words. So we should exercise more caution in synchronizing glosses and categories.
@Lambiam: I agree that MediaWiki categories are weak compared with the complexity of the Japanese writing system. Sorting on the radical is an idea taken from Chinese categories. I'm open to the idea of sorting on the whole character rather than its radical, though. --Dine2016 (talk) 03:11, 31 December 2018 (UTC)
Sorting on the radical may be helpful to people who are sufficiently familiar with Kanji to know not only the radicals, but also which of potentially several candidates is the radical traditionally assigned to a given character. An advantage of sorting on the whole character sequence is its conceptual simplicity. The collation order would be Wikimedia’s system, based on the CJK Unicode values, which I think are already presorted on the radicals. But I never search for a specific term through the categories, so I cannot really speak for users who do.  --Lambiam 07:27, 31 December 2018 (UTC)
Each tranche of CJK characters is sorted independently, so sorting by codepoint only really works for the initial batch of 20,000 or so. ICU includes better sorting data, and it might be possible to use an ICU sort instead. --RichardW57 (talk) 13:13, 1 January 2019 (UTC)
Since Japanese does not have an alphabetical order in the traditional sense—i.e. having an unequivocal ordering from the first written character to the last—Japanese kanji dictionaries tend to sort kanji using the radicals. If one tries ordering them individually after the Unicode consortium's table, the Japanese Industrial Standards (JIS) classification, or some other system, a problem quickly arises once we get into lesser known ones, such as the Hyōgaiji categories kokuji, extended shinjitai, or Asahi moji (characters), many of which might not even be included. Whatever system is chosen to sort them after, it needs to be applicable to any kanji that exists, no matter how obscure it is. --AstroVulpes (talk) 14:42, 1 January 2019 (UTC)
Interesting. How does Wiktionary handle unencoded characters? Ideographic description sequences? I had assumed that Wiktionary only allowed entries that could be written in non-PUA Unicode. The lag from Unicode standard to ICU is to be measured in weeks. The bigger issues are then the adoption of new issues of ICU, corrections to the Unihan database, and disagreements as to the radical or to the stroke count. --RichardW57 (talk) 05:05, 2 January 2019 (UTC)
As far as sortkeys go, ideographic description sequences are identified by a function in Module:zh-sortkey, and then looked up in Module:zh-sortkey/data/unsupported. If the module encounters an IDS without a sortkey, the page is tracked. At least some titles with IDSes are categorized in Category:Terms containing unencoded characters. — Eru·tuon 05:24, 2 January 2019 (UTC)
How about radical + number of strokes + whole character? That way you have the customary cross-linguistic dictionary order, plus a tie-breaker. It may seem like a lot of work, but I believe we already have everything we need in the data modules- so it's just a matter of coding. Chuck Entz (talk) 01:16, 2 January 2019 (UTC)
  • (Catching up after the holidays -- very behind in many different dimensions of life...)
Addressing just the sorting issue, I'd like to point out that, for dictionaries of whole words (i.e. not just single kanji listed as kanji, such as in a specific kanji-lookup dictionary), no other dictionary that I'm aware of sorts by radical or kanji. When every other dictionary sorts by reading, there's a strong rationale for doing so at Wiktionary too. I understand that the MediaWiki software backend is deficient, and entries like 避ける (sakeru, yokeru) can (currently) only be categorized under one sort index at a time. However, sorting under either reading is preferable to sorting under neither (such as by sorting on the kanji codepoint or the radical). ‑‑ Eiríkr Útlendi │Tala við mig 00:11, 8 January 2019 (UTC)
@Eirikr: Thanks for your (long-awaited) reply. Actually, the format I purposed would sort words under both reading and kanji. For example, CAT:Japanese verbs would list さける under さ, よける under よ, and 避ける under 避 or 辶. The only deficiency is that spellings are sorted under kanji only. For example, CAT:Japanese terms spelled with 避 read as さ would have only 避ける under 避 or 辶, not さける under さ, because I considered さける and 避ける different spellings of the same word sakeru, v. As the former is not spelled with 避, it is not categorized under CAT:Japanese terms spelled with 避 read as さ. --Dine2016 (talk) 05:06, 8 January 2019 (UTC)

Should languages be grouped in translations?Edit

The translation editor currently treats the various Sami languages as entirely separate languages, and sorts them accordingly. But there are also some entries like these, where there is a single heading "Sami" with all the languages listed under it. Before I go and "correct" all of these, I'd like to know what everyone thinks the format should be? Should they be treated entirely separately and scattered across the translation table, or grouped together? Also, shouldn't there be two ** instead of *:? I thought *: was reserved for different orthographies of the same language, like Serbo-Croatian. —Rua (mew) 12:36, 29 December 2018 (UTC)

No grouped translations use "**". There is no such distinction with "*:". DTLHS (talk) 16:56, 29 December 2018 (UTC)
The distinction is made in descendants lists, at least. —Rua (mew) 22:35, 29 December 2018 (UTC)
Well other than using *: and not **, translation language grouping is inconsistent and completely undocumented (and therefore left up to the whims of individual editors), so if you feel it's best to group Sami together I support it. DTLHS (talk) 22:49, 29 December 2018 (UTC)
Perhaps this is a good opportunity to codify something then? —Rua (mew) 23:42, 29 December 2018 (UTC)
That would be good. Personally, I think it makes sense for every L2-header-having language to be sorted under its own name (I think someone will look for languages which they know we call 'Bavarian' and 'Ancient Greek' under 'B' and 'A', respectively, not under 'G'), and to only nest things like 'Min Nan under Chinese' since we also do that on the L2 level. But things are indeed very inconsistent, because the translation-adder has been revised to start or stop nesting various things without existing entries being changed, and people do things manually. Maybe we could straw poll "should X be grouped?" for each set of L2-having languages that is currently grouped, with an option somewhere for "don't group/nest anyting [that has its own L2 header]"? - -sche (discuss) 01:43, 30 December 2018 (UTC)
I would like the rules to be in Module:languages/data instead of the translation adder script, for one thing. DTLHS (talk) 03:23, 30 December 2018 (UTC)

January 2019

WARNING! The title you are using may be wrong.Edit

Very annoying and shouty. Do we still need it? If we do, it would be nice to have some option to stop showing it, or maybe disable it for non-anons. – Jberkel 13:47, 1 January 2019 (UTC)

Where is that message configured? DTLHS (talk) 16:11, 1 January 2019 (UTC)
Yes, is there any list of where all these messages are located? — SGconlaw (talk) 17:56, 1 January 2019 (UTC)
I did a search for mediawiki: insource:"the title you are using may be wrong" and found MediaWiki:Newarticletext. — Eru·tuon 19:13, 1 January 2019 (UTC)
To remove the "new article text" on any page, you can use CSS: .mw-newarticletext { display: none; }. But that will remove other messages besides the shouty case-insensitivity message. I could also add another class around the shouty message so people can selectively remove it. — Eru·tuon 05:50, 2 January 2019 (UTC)
It's still needed. The alternative is even more entries that need privileged access to remove. --RichardW57 (talk) 05:20, 2 January 2019 (UTC)
To make it less shouty, we could replace the text “WARNING! The title you are using may be wrong.” by the friendlier “Are you sure this is the right title?” Also, in the second next sentence, instead of “You probably want to edit the lowercase version of your word”, it is probably better not to presume the editor is “probably” mistaken; we may replace that by “You may want to edit the lowercase version of your term”.  --Lambiam 09:31, 2 January 2019 (UTC)
+1 to replacements proposed by Lambiam. No icons please. I can see the discussed text in Amadabacra, e.g., when I try to create it. --Dan Polansky (talk) 11:17, 5 January 2019 (UTC)
I can't seem to see this warning: I probably ad-blocked it if it was annoying. "Warning" seems good for Richard's reason above. Perhaps we could use a warning icon (yellow triangle etc.) instead of the word WARNING in caps? Equinox 20:07, 3 January 2019 (UTC)
There's already a warning triangle in the notification UI (above the notice), but it's blue instead of yellow and not very prominent. Changing the wording to be less aggressive/patronising as suggested above would be a first step to a friendlier interface. – Jberkel 21:07, 3 January 2019 (UTC)
This is not a popular opinion but I think that being aggressive is a good thing. We have certain rules and standards and we are not going to help ourselves by easing newbies into creating unusable entries. ("Patronising" is another matter...) Equinox 15:50, 6 January 2019 (UTC)
This is what I get in Amadabacra:
WARNING! The title you are using may be wrong.
Remember that Wiktionary is case sensitive. You probably want to edit the lowercase version of your word: amadabacra.
And this is what would be an improvement without reducing the warning effect much:
WARNING: Did you intend to edit amadabacra, starting in lowercase? Wiktionary is case sensitive.
--Dan Polansky (talk) 16:08, 6 January 2019 (UTC)

When do we include capitalized versions of wordsEdit

I came across Zumeendar and zumeendar, and it made me wonder what the policy is around including both letter-case versions of nouns. With older nouns it is often trivial to find alternative letter-case examples since it used to be quite fashionable to capitalize lots of words (perhaps to Impress your Friends), but that may be the equivalent of YELLING in modern internet communications. What is the right thing to do? - TheDaveRoss 16:47, 3 January 2019 (UTC)

  • I suppose that the normal rules apply - do they actually occur in print. Of course, not everybody agrees with that (I tried to add The some time ago, but it got deleted). SemperBlotto (talk) 16:52, 3 January 2019 (UTC)
Yeah, I'd disagree that the only criterion should be whether it is verifiable by our rules. As mentioned by Dave, in the past it used to be the norm to capitalize nouns, so I wouldn't be surprised if many common-or-garden nouns would be found to be verifiable in a capitalized form. Something more is needed – perhaps it is an eponym and so occurs in both capitalized and uncapitalized form. — SGconlaw (talk) 16:58, 3 January 2019 (UTC)
Isn't this the same word as zamindar? Basically any function that bestows some authority will also be found in capitalized form: ”the Protector”, “the Taxman”, “the Constable”. I think we need more than the occasional occurrence of a capitalized form before we decide it is an alternative spelling and not some instances where the custom of spelling honorifics and titles showing rank or prestige, usually only used when referring to a specific person, spilled over (possibly by ignorance of the customary rules) to a generic use.  --Lambiam 21:46, 3 January 2019 (UTC)
This is a bit of a grey area. My sense is that most of us agree that not just any capitalization should be included, since any common word is sometimes capitalized (hence Semper's The got deleted). I think that when a capitalized form does have a distinct sense,then it should be included: e.g. Native vs native (even though older works probably have instances of "Belonging to one by birth" capitalized as Native, and there are some uses of "Indian" written in lowercase as native), and likewise aboriginal/Aboriginal, black/Black, white/White, etc, where even if we host the definitions all in the lowercase entry, we have an {{altcaps}} soft redirect at the uppercase form. I think I'd rather exclude capitalizations of titles (unless they're almost always capitalized), but the current situation is haphazard; we have King, but not Secretary (it's blue only because it's a town), President but not Viceroy. I suppose you should RFD it. (I would also have deleted e.g. He, She, Me, Who, You, etc, but other people wanted to keep them.) - -sche (discuss) 22:30, 3 January 2019 (UTC)

Users Luxipa and BigDomEdit

Hello. I want to bring your attention to User:Luxipa, whose username clearly indicates that the primary function of that account is adding phonemic transcriptions of Luxembourgish words to Wiktionary. This is confirmed on his user page.

To me, this seems to be User:BigDom that could be using that account in a (somewhat) deceptive manner. This user has put hundreds if not thousands of incorrect Luxembourgish transcriptions on Wiktionary. Incorrect, because he put phonetic transcriptions such as */bəˈdʀekən/ ([ə] and [e] belong to one phoneme) to bedrécken (see [12]).

I corrected a few hundred of them (see the last ~600 edits of Mr KEBAB, my previous account), which then already got him a bit mad - notice that this edit itself contains an error, because the correct phonemic transcription of bedrécken contains either /e/ or /ə/ for all three vowels (depending on which symbol you choose to represent the phonological mid front vowel in phonemic transcription). But notice that on my user talk page, he was nothing but nice to me.

A year later, an anon appears and writes on my old user talk page (notice that was over 5 months after I switched accounts). The tone of that message is similar to the tone of the edit summary I've linked above. Similar style of interacting can be seen in this reply to my message (I also thought it unfortunate that you had already made so many edits when you did. Particularly in a language that you don't know the first thing about. I can offer your to simply revert all of our edits and go back to system we had.) Notice the last sentence: he never brings up BigDom's mistakes and blames the whole thing on me, offering to go back to the system we had - the system that had so many mistakes. Not only that, in the next sentence he writes Your system, as I said, is "halfbaked" and therefore not only unnecessary but confusing - but that's the thing - it's not my system. 99% of it is based on the JIPA article on Luxembourgish. In that reply, he also just dismisses the whole article we have about Luxembourgish phonology, most of which is based on said JIPA article.

Also, compare the latest 100 edits of Luxipa with those of Bigdom. There's also this anon who (presumably) is Luxipa as well.

I also want to bring your attention to the fact that many of BigDom's phonetic transcriptions of Icelandic words are also wrong - see e.g. brúsi - vowel length isn't phonemic in Icelandic (see [13]), and so the transcription should read either [ˈpruːsɪ] or /ˈprusɪ/, [ˈpruːsɪ] but not */ˈpruːsɪ/. Kbb2 (talk) 06:16, 4 January 2019 (UTC)

@Kbb2 Sorry to disappoint you, but I am not the new user you are saying I am, and I hope a CheckUser confirms that for you. I simply haven't been editing over the holidays as I have been travelling - I was only alerted to this as I got an email saying you had commented on my talk page (which I will reply to separately).
I admit that in the past I didn't react well in the edit summary, but as you've pointed out on talk pages I always endeavour to be nothing but polite and courteous. Having seen the way Luxipa was responding to you on his/her talk page, I am genuinely dismayed that you even considered that I would converse with you or any other user in that way.
Having looked at the edits of this new user, I too have some ideas about who it may be but without proof I don't wish to speculate on this public forum. In any case, I hope that these allegations won't stop us working together going forward - I realise we might not always agree in our methods but I do think we both want what's best for the dictionary. BigDom 07:19, 4 January 2019 (UTC)
For what it is worth, there isn't much evidence to suggest that BigDom and Luxipa are related from a checkuser perspective. Even so, it is not against policy to have multiple accounts, only to abuse them. None of that takes away from your concerns about whether or not there are patterns of incorrect edits being made, but at least that aspect of the discussion can be put to rest. - TheDaveRoss 14:55, 9 January 2019 (UTC)

Phrases comprised of words with multiple meanings should be keptEdit

Since that's an argument regularly invoked in RFD, we might as well include it in the CFI, right? That will allow us to keep hollow victory of course, but also hollow vessel, hollow quest, hollow city; anything that can be hollow, in fact. Per utramque cavernam 18:15, 5 January 2019 (UTC)

And dear old brown leaf, which may indeed be a burned piece of paper as well as autumn foliage. Equinox 18:17, 5 January 2019 (UTC)
Ambiguity is part of the language, and there's no way we can make a dictionary ambiguity-proof without making it too wordy to use. Most words have multiple senses- wouldn't that make any random phrase containing those words includable, if it happens to be used often enough? Should we have an entry for "go to the bank" because it could refer to either a financial institution or the shore of a river? How about "go to the dry cleaner's to pick up a suit"? After all, "dry cleaner" could refer to a cleaner that's dry (not allowing alcohol?), "pick up" could refer to physically lifting something or to getting someone to go on a date, and "suit" could refer to a person in management, to cards or to a legal action. Think about the old joke where the Buddhist says to the hot dog vendor "make me one with everything", or just about any play on words- dictionary material? Chuck Entz (talk) 22:04, 5 January 2019 (UTC)
Sometimes people can't realise what a bad idea something is until you actually let them try it. I took Per's remark in that satirical vein. I might be wrong (in which case gawd help us all). Equinox 22:12, 5 January 2019 (UTC)
Yes, I'm being sarcastic. SemperBlotto isn't, though, so here we are. Per utramque cavernam 22:22, 5 January 2019 (UTC)
My interpretation was that PUC was stating the assertion positively for rhetorical purposes so we could discuss it and come to an explicit consensus. I knew he doesn't actually agree with it. Either way, I felt that the fact that people have been seriously using that argument meant that we should take the opportunity to point out its flaws. Chuck Entz (talk) 23:46, 5 January 2019 (UTC)
It would be my chance to sneak the German translator in.  --Lambiam 22:19, 5 January 2019 (UTC)
All entries of the form "[X] [Y]", where [X] and [Y] are entries (not the possible recursion), should be kept, but with the sole definition {{&lit|[X]|[Y]}} => Used other than with a figurative or idiomatic meaning: see X,‎ Y. to allow for all possible combinations of the polysemic Xs and Ys. It wouldn't be fair to compel the advocates of such entries to insert all the possible combinations, even just the attestable ones. To make implementation more gradual we should restrict this initially to combinations of single words. Further I would recommend automating the process and imposing some grammatical restrictions to eliminate automated entries like the of. Automating attestation would be a help. DCDuring (talk) 23:26, 5 January 2019 (UTC)
Wouldn't it be better if the software just made reasonable suggestions and we didn't create trillions of pages which provided no actual information? - TheDaveRoss 13:23, 7 January 2019 (UTC)
I think the real utility of a dictionary comes from collecting terms with meanings that are unexpected for its users. If we could define somehow which senses are rare, we should imo add amply attested compounds containing them, such as hollow victory, which might not be idiomatic, but is useful in a way the headlessness isn't. Crom daba (talk) 15:01, 6 January 2019 (UTC)
None of the lexicographers who control the references at OneLook seem to share your opinion. Perhaps the OED? DCDuring (talk) 22:02, 6 January 2019 (UTC)
Yeah, maybe no one does, thinking in expectations is not very common but I believe it is very useful. My suggestion might not be practically implementable or attractive for anything, but it describes well what I intuitively feel is a useful entry as opposed to clutter. Crom daba (talk) 03:17, 7 January 2019 (UTC)
These unexpected combinations belong to {{uxi}}, in this case in hollow, where it is already since yore. The SOP rule has to be understood as implying that an entry should not be created in cases when even though the meanings of the parts used need additional mental strain for recognition one rather expects a comparatively infrequent meaning than an idiomatic use of the whole, even if this expectation is slanted by the experience of dictionaries restricting themselves in their coverage of composed expressions, since why assume the fulfillment of the inclusion criteria by idiomaticity if even an averagely astute learner is expected to look up the parts. Fay Freak (talk) 03:48, 7 January 2019 (UTC)
Conversational understanding is mostly derived from context and metaphor. You don't have to "know" that hollow means "empty" (also metaphorical) to understand hollow victory. The dictionary mostly helps some learners move something from the category of understood to that of able to be used by confirming the meaning, which can be done by looking up all the terms, though usually it is clear which term has the less certain meaning. DCDuring (talk) 14:21, 7 January 2019 (UTC)

PageNotice extensionEdit

I've finally posted on the Phabricator ticket titled "Review the PageNotice extension for deployment" that we would find it useful because it would allow {{reconstruction}} to be automatically transcluded at the top of pages in the Reconstruction namespace.

See also the previous discussions at Wiktionary:Grease pit/2018/September § %7B%7Breconstruction%7D%7D, Wiktionary:Beer parlour/2017/September § Proposal: install mw:Extension:PageNotice, and at Wiktionary:Grease pit/2017/June § Citations at citations. — Eru·tuon 00:07, 6 January 2019 (UTC)

Thesaurus:sexually frustratedEdit

The entry for sexually frustrated was deleted as SOP per consensus at RfD. That leaves behind Thesaurus:sexually frustrated, which could get the same treatment, or could be moved to one of its provided synonyms. bd2412 T 21:24, 6 January 2019 (UTC)

The names of Thesaurus entries don't have to be valid dictionary terms. They're supposed to be descriptive and unambiguous, which often means SOP. Chuck Entz (talk) 21:56, 6 January 2019 (UTC)
I have adjusted the thesaurus header to eliminate the red link. Further tweaking may be needed. Cheers! bd2412 T 05:13, 7 January 2019 (UTC)

Competition finishedEdit

So, the Christmas competition has finished. It was a resounding success with one and a half entries. Apparently, now the winner is to be decided democratically. I'm expecting a massive turnout for voters too. --Wonderfool Dec 2018 (talk) 10:58, 8 January 2019 (UTC)

Straw polls on criteria for including chemical formulasEdit

Previous discussions: Talk:AsH₃, Talk:CO₂, Talk:LiBr, WT:RFDN#SiGe (will become Talk:SiGe)

To gauge what criteria for including or excluding chemical formulas/formulae might have consensus, probably as a precursor to a vote, let's straw poll some possibilities. This also allows for problems with proposals to be pointed out.
For example, some people previously suggested including only formulas which would be read by letter, like "aitch two oh", but AFAICT all formulas can be read as letters and unfamiliar ones are necessarily read as letters. Other people proposed including only formulas that have unformulaic common names, but e.g. AlF₆Na₃ would meet that criterion as cryolite while CO₂ would fail as carbon dioxide, which seems opposite to what most people would expect. (As a result, I didn't list those ideas below.)
- -sche (discuss) 02:27, 9 January 2019 (UTC)

Include all attested chemical formulasEdit

Please indicate if you support or oppose including all chemical formulas, such as BaCO₃, H₂O, Al(NO₃)₃, HArF, and CH₃(CH₂)₂₄-COOH, if they are attested.

  •   Support - treat them just like any other "word". SemperBlotto (talk) 07:34, 9 January 2019 (UTC)
  •   Oppose Per utramque cavernam 10:43, 9 January 2019 (UTC)
  •   Oppose - TheDaveRoss 14:31, 9 January 2019 (UTC)
  •   Oppose  --Lambiam 16:08, 9 January 2019 (UTC)
  •   Oppose Equinox 16:12, 9 January 2019 (UTC)
  •   OpposeRua (mew) 16:41, 9 January 2019 (UTC)
  •   Support if it is pronounced as a noun in a sentence. There are many attested sentences where CO₂ is pronounced as cee-o-two and functions as a noun. If CH3(CH₂)₂₄-COOH appears only in formulae, in tables, or in lists, it is just a symbol and we cannot be sure whether it is a part of a natural language. — TAKASUGI Shinji (talk) 12:31, 12 January 2019 (UTC)
  •   Abstain. I don't really know; probably exclude some chemical formulas. The opening of this poll does not provide any relevant facts, or links to where to find them, such as a rough estimate of the number of attested chemical formulas. --Dan Polansky (talk) 15:38, 13 January 2019 (UTC)

Exclude all chemical formulasEdit

Alternatively, indicate if you support or oppose excluding all chemical formulas. Indicate if you would prefer to exclude them all without exception, or just exclude them by default but with the possibility for individual formulas (such as perhaps H₂O, which passes LEMMING) to be included on a case-by-case basis via consensus (presumably at WT:RFD, which is where requests for un-deletion are normally handled, and where consensus has occasionally been reached to keep other unidiomatic, non-translation-hub entries).

  •   Oppose Some formulas intrude in material intended for broader than technical audiences, such as consumer protection, worker safety, and environmental literature. I don't see why we would limit STEM content. DCDuring (talk) 16:10, 9 January 2019 (UTC)
  •   Oppose. I think there's obvious value in including formulas like H₂O and CO₂, which have a lot of currency, and I see no reason not to include attestable formulas that are used outside of chemistry-related subjects. In scientific contexts, we can probably expect readers to understand them and not need to look them up, but in non-scientific contexts, many people probably wouldn't know what they mean. Andrew Sheedy (talk) 22:47, 10 January 2019 (UTC)
  •   Oppose. Some are so basic that excluding them would put us on the wrong side of being a dictionary. bd2412 T 14:13, 11 January 2019 (UTC)

Without exceptionEdit

  •   Support excluding them all without exception. DTLHS (talk) 02:29, 9 January 2019 (UTC)
  •   Support excluding all without exception. There are formats better suited as chemical formula databases than Wiktionary for ways of display and interaction. That is to say it seems to me that Mediawiki is an inefficient software for them: but even if one want’s them on Mediawiki and with Wikimedia, the user is still better off if they are contained on Wikipedia or other projects. Fay Freak (talk) 07:41, 9 January 2019 (UTC)
  •   Oppose excluding all chemical formulas without exception. --Dan Polansky (talk) 15:21, 13 January 2019 (UTC)
  • I tend to   Support this. Per utramque cavernam 18:08, 19 January 2019 (UTC)

By default, with exceptionsEdit

  •   Support - with the ability to override with compelling justification (e.g. lemming or common usage outside of scientific works). - TheDaveRoss 14:36, 9 January 2019 (UTC)
  •   Support - exclude by default; I agree with TheDaveRoss above about possible exceptions (which should be very rare). Equinox 16:37, 9 January 2019 (UTC)
  •   SupportRua (mew) 16:42, 9 January 2019 (UTC)
    • @Rua I've split the section in two. Is your vote still at the right place, or should it be moved above? Per utramque cavernam 21:33, 10 January 2019 (UTC)
      • I'm not a fan of exceptions, but things like "CO2" are so widespread and part of normal vocabulary, it would be a disservice not to include them. —Rua (mew) 22:59, 10 January 2019 (UTC)
  •   Support, would allow numerous previous-raised exceptions for inclusion (attestation in a non-scientific text; chemical with a trivial name in use; element and non-IUPAC constituent abbreviations such as Me). — As an aside though, we already have WikiSpecies, maybe WikiChemicals should exist as well? --Tropylium (talk) 18:47, 9 January 2019 (UTC)
  •   Support – exclude by default but allow exceptions for terms used freely in, e.g., MSM news reports (such as “CO2”).  --Lambiam 20:31, 10 January 2019 (UTC)
    • This is just moving the goalposts. What exceptions? What does "used freely" mean? This section seems useless since it includes "the possibility for individual formulas". It seems this is what most people support but we probably need to be more granular. DTLHS (talk) 20:48, 10 January 2019 (UTC)
Arguing here whether there shall be exceptions allowed “on a case-by-case basis via consensus” is nonsensical because the option always remains and posterior consent switch cannot be excluded by consent (a contradiction like “consensual non-consent”, “voluntary slavery” etc.), or it would not be allowed by rules we cannot decide. So, as long as I see an outline of an exception I desire exclusion without exception since I do not know any exception. Indeed “freely” does not mean anything so far and won’t probably. You can only argue for certain exceptions, not if there shall be exceptions or for exceptions of indeterminable meaning. Fay Freak (talk) 21:22, 10 January 2019 (UTC)
Indeed, I must say I don't see much difference with the "Include only formulas that are attested in non-scientific contexts" option below. Per utramque cavernam 21:33, 10 January 2019 (UTC)
I assume that if we adopt the rule exclude but, the exceptions will be stated as part of the rule, like we have done for WT:BRAND and WT:FICTION. My current preferred rule for exceptions is stated below at #Include only formulas that are attested in non-scientific contexts – which should not be a surprise, considering that this is essentially the rule I have proposed myself. I can imagine, though, that we can live with other versions. By “used freely”, I meant, “used without further explanation”. If a news report mentions that vats were labelled with C3H8NO5P, but then goes on to explain that this is the chemical formula of glyphosate, it would not count as free use. This is similar to what we have at WT:BRAND. The commonality is whether the author assumes the reader is familiar with the term.  --Lambiam 22:57, 10 January 2019 (UTC)
  •   Support – because (as Rua says) there are definitely some formulas like CO₂ and H₂O that are so commonly used that it would be bizarre to exclude them. — Eru·tuon 23:20, 10 January 2019 (UTC)
  •   Support for contextual uses aimed at non-experts. bd2412 T 14:14, 11 January 2019 (UTC)
  •   Support: Only formulas used in layman's speech, ex. CO2, H2O, etc. --{{victar|talk}} 22:02, 11 January 2019 (UTC)
  •   Oppose I support inclusion of some chemical formulas and exclusion of others, but I do not support any defaulting as proposed. That is to say, I do not think inclusion of a chemical formula should pass the bar of 2/3 majority absent agreed-on criteria. --Dan Polansky (talk) 15:34, 13 January 2019 (UTC)

Exclude formulas with more than a certain number of symbols (how many?)Edit

Please indicate if there is a cutoff beyond which you think formulas should be excluded; for example, if you would exclude any formulas with more than eight element-symbols (like CH₃CH₂OCH₂CH₃). Perhaps we will be able to agree (and then vote on) an upper bound.

Exclude formulas with parenthesesEdit

Please indicate if you would support or oppose excluding chemical formulas which have parentheses in them, like Al(NO₃)₃. A rationale is that these are more clearly formulas of which the component parts should be looked up separately.

Include only formulas that are attested in non-scientific contextsEdit

For example, a scientific paper or popular-science magazine article on the synthesis of carbon compounds would not attest CO₂, but a murder mystery saying "the air in the scuba tank had been replaced with CO2" could.

Comment: deciding whether some works are "scientific" or not will be a bit fuzzy, but we have other fuzzy policies, most notably deciding whether or not something is WT:SOP (and to some extent WT:BRAND, in deciding exactly how much can be said about a product, e.g. that someone drank it, before the product counts as having been "identified" within the text). - -sche (discuss) 02:39, 9 January 2019 (UTC)
I think that we should exclude usages in textbooks (where such formulae will be used more than normal). SemperBlotto (talk) 07:37, 9 January 2019 (UTC)
  •   Support, we can niggle over the details of what counts as we go, but I like the spirit of this option. If the formula is so common that it is being used without explanation in fiction or general news stories then it makes sense to define it. - TheDaveRoss 14:35, 9 January 2019 (UTC)
  •   Support. This is similar to other exceptions to general exclusion rules, like for brand names or entities from fictional universes.  --Lambiam 16:11, 9 January 2019 (UTC)
  •   Oppose We should certainly have formulas that are attested in popular science books (eg, Napoleon's Buttons) and journals (eg, Scientific American, Popular Science). DCDuring (talk) 16:34, 9 January 2019 (UTC)
  •   Support. Andrew Sheedy (talk) 06:55, 10 January 2019 (UTC)
    To elaborate, I'll repeat what I said above: "In scientific contexts, we can probably expect readers to understand them and not need to look them up, but in non-scientific contexts, many people probably wouldn't know what they mean." Andrew Sheedy (talk) 22:47, 10 January 2019 (UTC)
  •   Support. A formula worth including would likely be one that would occur outside of a technical context. bd2412 T 21:18, 10 January 2019 (UTC)
  •   Oppose We shouldn’t introduce inclusion criteria by literary genres. Even more so then we shouldn’t include by a scientificity criterion which is an epistemic and not a formal criterion and of dubious identification, this being aggravated by teleological interpretation giving the concept of science another twist and thus adding even more confusion. Fay Freak (talk) 21:36, 10 January 2019 (UTC)
    We allow names from fictional universes, but do not accept the fiction in which they occur for attesting citations. Instead, we require citations that are independent of reference to that universe. This is not meant to discriminate against fiction as a literary genre (although it does). It does ensure that the author of the citation assumes that the term in question has entered the lexicon. The exception proposed here serves the same purpose.  --Lambiam 11:40, 12 January 2019 (UTC)
  •   Oppose. Any time a chemical formula is used there's a scientific context. DTLHS (talk) 16:21, 11 January 2019 (UTC)
    If someone says "Drink lots of H20", that's not a scientific context, but it is a chemical formula. Andrew Sheedy (talk) 17:12, 11 January 2019 (UTC)
  • Weak   Support: Looks not too bad. I have posted other candidate criteria to "General discussion" section below. I think this discussion would have better started with exploration of candidate criteria. --Dan Polansky (talk) 15:30, 13 January 2019 (UTC)

Soft-redirect (Template:no entry) any excluded formulas to WikipediaEdit

Rationale: this way, for any formula which we exclude, people can still type the formula into the search bar and find content.

  •   Oppose, this should be handled by the software automatically if it is to be handled at all, otherwise we could be creating a very large number of "soft redirect" entries with no content. DTLHS (talk) 18:27, 9 January 2019 (UTC)
  •   Oppose Per utramque cavernam 18:57, 9 January 2019 (UTC)

While we're on this topic: should we lemmatize regular or subscript numbers?Edit

For any chemical formula with numbers that we do include, please indicate if you'd rather lemmatize the form with regular numbers (H2O) or the form with special Unicode subscript numbers (H₂O). (We can create hard or soft redirects from the other form.)

I think lemmatizing the forms with subscript numbers and creating hard redirects from the other form (unless it's citable, in which case it should be a soft redirect). Andrew Sheedy (talk) 04:36, 9 January 2019 (UTC)
As has been enough noted on other occasions, citations or usage are a bad guide in finer Unicode matters. In this case it is easily an editorial decision to have entries only in one form and always hard-redirect to the other. Even if chemical formulae are included – hopefully not –, then I doubt anyone wants to pursue attesting such typographic details. Then we would also want to display structural formulae in quotation templates and many other nasty things just to quote materials/books as they display content. Fay Freak (talk) 07:41, 9 January 2019 (UTC)
That reminds me: Wikipedia seems to mostly use regular numbers with <sub> tags, and it might often be impossible to tell whether a book was typeset using a mechanism like that, or using Unicode's special subscript numbers. That said, using Unicode subscript numbers to represent subscript numbers in books would seem(?) to be technically valid/sound, unlike using ʳ in Mʳ, so it's just a question of whether we want to do it or not. - -sche (discuss) 11:14, 9 January 2019 (UTC)
I would prefer regular numbers (I believe we can change the actual displayed headword with some kind of template). Wiktionary supports all kinds of formatting (bold, superscript, etc.) so we can rely on those capabilities and not on the rather hacky variant and legacy forms that Unicode is full of. Equinox 16:14, 9 January 2019 (UTC)
It raises the question, though, are there any chemical formulae that differ only in whether a number is subscripted? Equinox 16:14, 9 January 2019 (UTC)
Numbers occurring in chemical formulas are always subscripted or superscripted, but superscripts are used for specific purposes. For example, the formula for the phosphate ion containing radioactive phosphorus-32 is [32PO4]3−. Formulas with superscripts are unlikely to pass muster for lexical purposes. Disregarding superscripts, moving subscripts to the baseline is a lossless transformation. If superscripts have to be taken into consideration, regularizing all to the baseline might create an ambiguity, although it will be difficult to construct an example, and almost certainly impossible to find a realistic example.  --Lambiam 16:51, 9 January 2019 (UTC)

Exclude all attested chemical formulas except for H2O and CO2Edit

Rationale: this is what people keep giving as examples of things that we "should" include. DTLHS (talk) 17:15, 11 January 2019 (UTC)

Ha. But a good policy incorporates reasons for what it is doing, and isn't just a sort of black box. Equinox 17:20, 11 January 2019 (UTC)
What about excluding chemical formulae that aren’t also attestable from poetry (including raps)? There could be also the exclusion ground “this poem has mainly been invented to promote chemistry”, for cases like rapping professors, but else it sets a natural limit to used formulae by meter and consonance limits. Fay Freak (talk) 20:10, 11 January 2019 (UTC)
Blackalicious would like a word with you. They use Ca(OH)2 and NO2 at the very least. - TheDaveRoss 20:21, 11 January 2019 (UTC)
Not bad. Though myself I, more radically inclined, am against these as SOP my suggestion seems practicable, like I wouldn’t care to add them but it also cuts the sharp edges. Fay Freak (talk) 20:35, 11 January 2019 (UTC)

Exclude chemical formulae except those (attested in running text) that people may reasonably mistake for acronyms or for other non-chemical-formula wordsEdit

This would allow KCN if attested in running text but not H₂O. Rationale: It's reasonable to look up KCN in a dictionary if it's found in running text. It's unreasonable to look up in a dictionary something that's obviously a chemical formula.​—msh210 (talk) 21:26, 14 January 2019 (UTC)

  •   Support.​—msh210 (talk) 21:26, 14 January 2019 (UTC)
  •   Support with reservation. I like this one as an inclusion criterion, not an exclusion criterion. Thus, a chemical formula would be excluded unless it has one of multiple redeeming qualities, and the proposed criterion would be one of those redeeming qualities. --Dan Polansky (talk) 17:59, 19 January 2019 (UTC)
  • Tentative   Support. An interesting option. Per utramque cavernam 18:07, 19 January 2019 (UTC)

General discussionEdit

Make comments here, or add additional proposals above this section. :) - -sche (discuss) 02:27, 9 January 2019 (UTC)

  • I note that most relatively popular works that have chemical formulas have them in or with structure diagrams. I believe we should favor entries for attestable formulas for which we can provide a graphical ostensive definition and for which there is a name found in running text, however technical the source. I suppose this could be considered a "value-added" criterion. If we can add sufficient value to a potential entry, it should become an actual entry. DCDuring (talk) 16:45, 9 January 2019 (UTC)
    Isn’t this more of an encyclopedic than a lexicographic task? A name found in running text, would that include “DOTA-E{E[c(RGDfK)]2}2”, as in the sentence “The structural formula of DOTA-E{E[c(RGDfK)]2}2 is shown in Fig. 1c.”?  --Lambiam 22:32, 9 January 2019 (UTC)
    It's a matter of providing useful definitions. Definitions need to break out of the cycle of words to establish contact with the physical world from time to time. That's why we have pictures and diagrams in entries and should have more. If we are going to have some chemical formulas or even tedious chemical names, we might want to make sure that we are adding value by having them. Image availability is a consideration. It is also a form of attestation. DCDuring (talk) 23:35, 9 January 2019 (UTC)
  • I hate to throw in a new option while there are so many votes already but this is a perfect use of the Appendix: space. We shouldn't delete valid information--we should store it appropriately. —Justin (koavf)TCM 18:50, 9 January 2019 (UTC)
As you say, we should store it appropriately. Even though trigonometric formulas are valid information, we don't host them here, even in the appendix. Per utramque cavernam 18:54, 9 January 2019 (UTC)
A formula isn't a word, term, or name. Dictionaries record the latter but not the former. —Justin (koavf)TCM 19:28, 9 January 2019 (UTC)
H2SO4 (formula) is to O (oxygen) as 6+3=9 (equation) is to 3 (digit) or + (operator). We can cover the components but IMO should not attempt to include the virtually limitless "sentences" spelled out with them. Equinox 19:30, 9 January 2019 (UTC)
Correct, also imho H₂O, CO₂, NaCl should be deleted because of being SOP. (Do you think I joke? Why?) This would also solve the there being both Translingual and English entries. Remarkably enough for the constituent parts there aren’t English entries.
I wanted to pun about “sum formulae” but I see that they are called so in languages other than English and English calls them molecular formula. But look at German Wikipedia “Summenformel” which has a nice table of projections. Should we include all these projections? Sure we can’t include the structural projections because of technical reasons, but the linear projections aren’t of different nature. Wiktionary does not mean “include everything that is linear”. If Wiktionary were to grow to include chemical formulae in a remarkable extent I am surprised if there isn’t any policy by which some Wikimedia bureaucrat is obliged to bust up this project because of this luxury. In Germany it would be 1) for Wikimedia trustees embezzlement by omission to tolerate Wiktionary adding chemical formulae 2) lead to liability for additional expenses caused by these measures perhaps for anyone who voted for a policy allowing it. Fay Freak (talk) 20:45, 10 January 2019 (UTC)
"H20" means or refers to or is equivalent to "water" or "dihydrogen monoxide". "3 + (5/8)" doesn't stand for anything. Also, there can be an infinite number of mathematical statements but not an infinite amount of chemical compounds, so it would be inherently foolish to start making pages like Appendix:15+8+87+9. —Justin (koavf)TCM 19:39, 9 January 2019 (UTC)
There is in fact literally an infinite amount of potential chemical compounds. DTLHS (talk) 22:28, 9 January 2019 (UTC)
I may have to defer to your expertise here (I don't see how that's possible with ~118 elements, many of which are ephemeral) but even so, there is no H48580, H48590,H48600... but there is 1+1, 1+2, 1+3, ... —Justin (koavf)TCM 22:33, 9 January 2019 (UTC)
There is CH4, C2H6, C3H8, C4H10, C5H12, C6H14, C7H16, C8H18, C9H20, C10H22, and so on, the simplest representants of which are the linear alkanes. If the universe is infinite, there may even be an actual infinity of extant chemical compounds.  --Lambiam 22:45, 9 January 2019 (UTC)
Good to know. But the other good thing is that the citable chemicals are finite and by definition documented. So we don't need to guess if somewhere there is sililcone-based life that makes 5H785L somehow, whereas I can "document" all kinds of new and perfectly valid mathematical statements all the time that have never existed before (e.g. "−194850329328230932238239238*(1893349834710138103823/10935038430583503498)". I think the difference is frankly obvious and germane. —Justin (koavf)TCM 23:09, 9 January 2019 (UTC)
I think this discussion should have better started with exploration of putative criteria. In another discussion, I mentioned the following ones:
1) Keep a chemical formula only if it involves no more than 3 chemical elements and no more than 10 atoms.
2) Keep a chemical formula only if the chemical it denotes has a CFI-meeting name: e.g. H₂SO₄ has sulfuric acid or AsH₃ has arsine. This criterion ensures that the inclusion of chemical formulas no more than doubles the number of items in the dictionary.
I think especially 2) is worth considering. --Dan Polansky (talk) 15:28, 13 January 2019 (UTC)

Rename non-lemma categories to match the format approved in voteEdit

This vote changed the naming scheme of categories for comparatives and superlatives, but in a way that does not match any existing non-lemma categories. The standard name for such categories was "POS xxx forms" (such as Category:English noun plural forms, Category:Northern Sami noun possessive forms, Category:Armenian verb passive forms, Category:English verb simple past forms, Category:Arabic adjective plural forms, Category:Bulgarian adjective feminine forms and so on). Meanwhile, the naming scheme for subcategorised lemmas was "xxx POSs" (such as Category:Dutch diminutive nouns, Category:English uncountable nouns, Category:German reflexive verbs, Category:Armenian diminutive adjectives etc.). In some cases, an entirely different POS term is used for non-lemmas, such as "participles" and "infinitives"; these are implicitly non-lemmas by virtue of the part of speech, i.e. a participle and an infinitive are always a non-lemma and cannot be a lemma.

Now that the vote has passed, however, there are two categories which do not fit this naming scheme anymore. Category:English comparative adjectives has the name of a lemma subcategorisation, suggesting that a comparative adjective is a kind of adjective lemma like "diminutive noun" is a kind of noun lemma, but it is now categorised as a non-lemma. The same for Category:English superlative adjectives. Since this suggests that there are no longer separate naming schemes for lemma and non-lemma categories, I propose to realign all other existing non-lemma categories with these two new names. Thus:

This will make the naming consistent with the vote. The only categories that will retain the word "forms" are the base-level categories without a qualifier, e.g. Category:English noun forms and Category:English adjective forms. —Rua (mew) 13:15, 11 January 2019 (UTC)

I'm going to link a thread on RFM and another one on RFC to see the context. For the record, I do not support this change, as I consider comparatives and superlatives to be similar to participles yet different enough from other forms for them to be exactly considered comparable. — surjection?〉 13:21, 11 January 2019 (UTC)
The vote aligned "comparative adjectives" and "superlative adjectives" with "participles", by not including the word "form" anymore. This proposal aligns all the other categories with "participles" as well, by not including the word "forms" anymore. —Rua (mew) 13:30, 11 January 2019 (UTC)

Proposal: Japanese Classical (文語体) conjugation/inflection table for Japanese entriesEdit

This is how it is supposed to look like:

{{#invoke:User:Huhu9001/000|japanese_classical_conjugation|kanji=過|stem=す|ctype=2u-g}} {{#invoke:User:Huhu9001/000|japanese_classical_conjugation|kanji=得|ctype=2d-a|suffix_in_kanji=}} {{#invoke:User:Huhu9001/000|japanese_classical_conjugation|lemma=ぬ|kana_adv=ず<br>ん|kana_ter=ぬ<br>ん|kana_adn=ぬ<br>ん|kana_rea=ね}}

-- Huhu9001 (talk) 15:02, 11 January 2019 (UTC)

I like this, thank you. One concern: our readership consists of English-language readers, so listing conjugational info only in Japanese, such as ガ行上二段活用, seems inappropriate. Even more so when that potentially-illegible string isn't even included on the Appendix:Japanese_verbs page, leaving users unable to search easily. What about something like, g- stem, upper bigrade? Appendix:Japanese_verbs would also need updating to describe the situation for classical verbs. ‑‑ Eiríkr Útlendi │Tala við mig 21:25, 11 January 2019 (UTC)
To Eiríkr Útlendi: Appendix:Japanese_verbs#Classical_Japanese -- Huhu9001 (talk) 04:57, 12 January 2019 (UTC)
To Eiríkr Útlendi: How about 上二段活用 or ガ(ga)行上二段活用? -- Huhu9001 (talk) 04:59, 12 January 2019 (UTC)

It's done. {{ja-conj-bungo}} -- Huhu9001 (talk) 13:51, 13 January 2019 (UTC)

  • There are still usability issues here, presenting avoidable barriers to our English-reading users. I feel somewhat strongly that we cannot provide the conjugational type only in Japanese.
Although the Appendix:Japanese_verbs#Classical_Japanese section does describe classical conjugations, as previously noted, the strings ガ行上二段活用 and even 上二段活用 are nowhere to be found. Linking to is unfortunately of no apparent utility for explaining ガ行 in this context. While linking through to the JA entry for 上二段活用 is slightly better than plain text, it still hides the English rendering from the user, forcing them to click through. As currently (2019-01-14) implemented at {{ja-conj-bungo}}, ガ行上二段活用 links through to 上二段活用, leaving the ガ行 portion unexplained.
Could we not present this information in English instead? ‑‑ Eiríkr Útlendi │Tala við mig 20:16, 14 January 2019 (UTC)
To Eiríkr Útlendi: I suggest you give a list of the inflection names you want to apply to this template. -- Huhu9001 (talk) 13:31, 15 January 2019 (UTC)
To POKéTalker: Fixed. -- Huhu9001 (talk) 08:52, 20 January 2019 (UTC)

Unnoticed request for unblockEdit

Hello! I write here because I haven't found a local equivalent of Wikipedia:Arbitration Committee. If it is my mistake, please direct me to the right place.

Today my semi-static IP has been unblocked by timer after 1 month of block. Soon after I have found that I am blocked, on 23 December, I have written an unblock request, but until now nobody commented it. Like a confirmation that the block was unjust and should be removed, or, contrary, that it was a proper punishment for my deeds and should remain as is. No any comment, no any action. Is the indifference to such request normal here? -- 17:41, 13 January 2019 (UTC)

Can we get a Russian speaker to look into this please? Equinox 18:06, 13 January 2019 (UTC)
@Equinox What was written on Atitarev's talk page? It's hard to judge without that. But the block seems harsh. Per utramque cavernam 18:20, 13 January 2019 (UTC)
@Atitarev added a Russian translation to kick-ass a few years ago. lately disagreed with the translation and he/she had the usual options of changing the translation, adding another translation, and/or leaving a note on Anatoli's talk page explaining his/her POV about the translation (using civil, nonconfrontational language). chose not add a different translation, but instead left this offensive message on Anatoli's talk page:
Your translation. Were you able to add the non-obscene term instead, which exists at least in the Russian Wiktionary? For example: наглый, задиристый, крутой etc. Or the Russian obscene lexicon is your primary dialect?
Anatoli initially ignored the attack and deleted it., refusing to be ignored, came back with this comment:
That rollback is not an error, it's your moral position. Feel free to revert this post as well. Good luck in translations. -- 11:49, 13 December 2018 (UTC)
I would have blocked as well. There is no excuse for this unprovoked attack. In my opinion, the block was appropriate. —Stephen (Talk) 20:32, 13 January 2019 (UTC)
I don't think that's particularly confrontational; it might just represent annoyance. Equinox 20:51, 13 January 2019 (UTC)
Anatoli's Russian translation was correct. disagreed with the register, but there is no Russian term that means the same thing and also matches the register the of the English word. Accusing Anatoli of only being able to speak in obscenities was a deliberate and disingenuous insult. There was no reason to be annoyed. If the Anon disagreed with the translation or register, he or she could have suggested another translation. Because if his or her aggressive and combative comment, he deserved to be blocked. —Stephen (Talk) 21:03, 13 January 2019 (UTC)
Banning instead of stating the main point that Atitarev wasn’t obliged to add any translations, and indeed an obscene word is a possible translation and particulary if the translated English term contains a mildly vulgar word, and that morality claimed by the IP does not make sense since adding a vulgar translation is better than adding none? Better teach the IPs what they miss instead of blocking them. Having strange morality is not a ban reason. Worsening the dictionary is, but it does not seem to happen if an IP asks for better translations, be it with strange arguments and an insolent rhetorical question or be it without. Assume good faith. Some people are just bad at being flattering. That “insult” is a conjecture. Why do you think he wanted to insult? What is “deliberate”? Anything written here is deliberate since we try to think before posting, but it does not get good completely anyway even if we try. I don’t see the “accusation”, it is a rhetorical question the answer of which was implied as “no”, plus even if it wasn’t an insult an insult is not a ban reason: We have defined insult as being insensitive, but people just are insensitive. Maybe he hasn’t learned well to be sensitive but he still is concerned about a good dictionary and actually moved towards this goal, so what? No bannable “annoyance” (which could harm the lexicon by siphoning off attention) is there if simultaneously honest questions are asked (apparently he wasn’t so smart to see that his argument was faulty and thus asked honestly), since no plan suitable for harming is there. People let themselves be insulted too much by concluding insults: Really, if the IP was there to insult he could have written the insult and it would have been an insult, otherwise it was just maladroit. Else everything that is written anywhere is annoyance, there isn’t anyone around here that doesn’t annoy me, myself included. One could wish people wouldn’t post on talk pages and make best edits without them but somehow people need to talk on talk pages with rare avail. If we banned everyone who said objectively wrong or immoral or useless things … it’s about prognosis guys. Mojshahmiri (talkcontribs) has been banned for promising to add crank theories (his prognosis has been that it is not worth to groom him), this IP would have done what without a ban? Would it have learned to be sensitive? Man, Wiktionary is a minefield if sensitivities count. Feelings on tight reins please. Reason must prevail! Fay Freak (talk) 21:44, 13 January 2019 (UTC)
Blocking somebody and then deleting the history of what they did is damnatio memoriae at least, and Orwellianism at worst. Is one rude comment on a talk page worth this? I think that's disturbing and wrong. Equinox 21:48, 13 January 2019 (UTC)
Any blocks for actions other than blatant vandalism deserve explanation, especially if requested by the blocking party. Based on Stephen's translation I don't think the block was warranted, and certainly not without some form of communication. - TheDaveRoss 02:09, 14 January 2019 (UTC)
Anatoli DID communicate the reason for the block: Intimidating behavior/harassment. And many of us block anons all the time for the same or similar reasons, and with the same or less communication. Hardly a day goes by that I don't see some IP complaining about some admin abuse without any communication. As far as deleting the history, every one of us admins could still see it and could have looked at it just as easily as I did. This was a common action. If you want to take Anatoli to task, then let's pillory all the other admins who have done the same or worse. It's a tempest in a teapot. —Stephen (Talk) 05:17, 14 January 2019 (UTC)
I meant communication with the user prior to the block, i.e. about the translation issue and perhaps about their manner of raising their concerns, but I can see I wasn't clear. For the rest of it, the root problem is a lack of assumption of good faith in borderline cases. Even if the tone of the communication was poor, the content is perfectly reasonable and on topic, no reason to delete it and block the person for asking. I don't think Anatoli deserves to be punished for this or anything, but when issues such as this come up I think it is worth sharing how we each would handle it so that we can be more evenhanded going forward. - TheDaveRoss 13:47, 14 January 2019 (UTC)

FileExporter beta featureEdit

Johanna Strodt (WMDE) 09:41, 14 January 2019 (UTC)

Banning Altaic reconstructionsEdit

At the present there are no non-controversial reconstructions of Proto-Altaic, in fact the Altaic theory itself is a controversial hypothesis. In practice, allowing reconstructed Proto-Altaic entries means copying from the Etymological Dictionary of the Altaic Language (EDAL) by Starostin, Dybo and Mudrak.

EDAL reconstructions are based on ad-hoc soundlaws justified by semantically dubious comparisons, lack of strictness in lower-level languages, faulty philology and generally too many researcher degrees of freedom; it is not merely a controversial representation of an Altaist tradion, it is not homotopic to an earlier body of knowledge regarding sound correspondences within the proposed language family (such a thing exists only in fragments), rather it creates a completely new reconstruction using very tenuous soundlaws, with no prior precedent, to fit cognate sets which are also not traditionally accepted and which by themselves can only be called 'doubtful' at best.

I would like us to ban Altaic as a language family completely, since its only function seems to be smuggling lousy comparisons along with promising ones (usually Turkic-Mongolic, Mongolic-Tungusic or Korean-Japanese) and as a shorthand for "it appears in Mongolic and Turkic language and I can't be bothered to investigate the etymology further", but I would be fine with reducing it to an etymology-only language to stop further proliferation of garbage copy-pasted entries. Crom daba (talk) 22:59, 15 January 2019 (UTC)

  Support --{{victar|talk}} 23:25, 15 January 2019 (UTC)
  SupportTom 144 (𒄩𒇻𒅗𒀸) 11:58, 16 January 2019 (UTC)
The earlier vote for this was Wiktionary:Votes/2013-11/Proto-Altaic. Personally, I feel that the reconstruction template itself needs to display a notice about how controversial the Altaic hypothesis actually is serves as good evidence that it isn't exactly our most useful content. — surjection?〉 08:28, 16 January 2019 (UTC)
Of course it was Ivan who pushed for that. @Crom daba, if you want that vote reversed, I think you'll need to create a new vote. --{{victar|talk}} 17:53, 16 January 2019 (UTC)
I really know nothing about this so I will not vote on it, but it seems to me that even if Altaic is so controversial, the content should still be archived somewhere in an appendix - it seems like a waste to just erase it altogether. If we can have an Appendix:A Clockwork Orange, surely we can have an appendix for controversial reconstructions. (Perhaps that appendix should not be linked to from mainspace, however.) — Mnemosientje (t · c) 18:20, 16 January 2019 (UTC)
It's already archived by people who promulgate these reconstructions (StarLing). We are not and do not need to be a repository of all knowledge that is tangentially related to linguistics. DTLHS (talk) 18:23, 16 January 2019 (UTC)
  Support, but I too think that we should rightly have a vote in order to overturn a previous vote. —Μετάknowledgediscuss/deeds 20:15, 16 January 2019 (UTC)

Okay, here's the vote. Not sure if I set it up correctly though. Crom daba (talk) 23:03, 16 January 2019 (UTC)

@Crom daba: I think it would be better to keep option 1 only for now. If it doesn't pass, we can put option 2 to the vote later. Per utramque cavernam 23:34, 16 January 2019 (UTC)
Okay, sounds good. Crom daba (talk) 23:45, 16 January 2019 (UTC)

Project proposal: Enrichment of multilingual STM termsEdit

Hallo all,
I would like to propose a research project aimed to enrich the Wikitionary in the STM (scientifical, techncal, medical) domain.
As a starting point lays the observation that many terms (typically named-entities) are present in scientific literature sources, but they do not still have an entry even on the English Wikitionary, which has the best coverage. This situation is even worse for some "new" terms, which are certainly of interest, and for non-English Wikitionaries.
On the other side, it has to be observed that some of the information which is not available on the Wikitionary can be extracted from Wikipedia. Hence the project objective are:
a) the Wiktionary will be extended for STM relevant terms in English and Italian as well, for thousands of terms.
b) The whole process will be validated for two languages (English and Italian) having different coverage and characteristics between Wikitionary and in Wikipedia.
The result would be very useful for who works in the research field.

1) I will identify from the the STM English literature from the sampled areas, including hot topics (e.g. Artificial Intelligence) and some new terms which are not present in the English Wikitionary;
2) Then, I will create such new English Wikitionay entries with a semi-automatic supervised process which will include as much as possible what can be inferred from Wikipedia (e.g. term disambiguation, different translations, etc.).
3) Then, I will validate this entry process for the italian language also, which is my native language: in this case, I will directly enrich manually the entries in the cases when the algorithm identifies names which can not be inferred from Wikipedia.
4) Then, I would document this (multi-language) process in a detailed pseudo-code, resulting in a open-access paper as a further project. I think that this result is preferrable than delivering a language-specific implemented piece of code, since creating/mantaining software should be further tasks.

To support the project proposal please leave a comment at the bottom of the project page.
Thank you,
--Marco Ciaramella (talk) 16:30, 17 January 2019 (UTC)

How will you ensure that these terms meet WT:CFI? Wikipedia tends to invent words in order to translate concepts between languages. DTLHS (talk) 16:41, 17 January 2019 (UTC)
@DTLHS Good point. Briefly, since any translations into Italian of an existing word from the English Wikitionary could be problematic, this is the reason why such entries must be validated manually (as stated at the point #3). --Marco Ciaramella (talk) 19:46, 17 January 2019 (UTC)
I think this sounds great! My only concern is about the nature of the semi-automatic process. What part of entry creation do you see as being automatic? Andrew Sheedy (talk) 04:24, 18 January 2019 (UTC)
@Andrew Sheedy Thank you for the feedback ! :-) The semi-automatic fashion would involve the English terms enrichment process (however, it is intended that the analysis of the input sources and the generated names are one part of the project), referenced at the point 2. --Marco Ciaramella (talk) 08:31, 18 January 2019 (UTC)
I'm not too keen on using Wikipedia as a prime source of technical terms. Wouldn't it be better to harvest them from online scientific journals? I'm currently working my way through PLOS ONE, finding hundreds of words we would never otherwise have. And I can't see how you are going to arrive at definitions in a semi-automatic manner. Perhaps you could start in a very slow way so we can see some examples. SemperBlotto (talk) 07:13, 18 January 2019 (UTC)
@SemperBlotto This is another interesting really to-the-point feedback, thank you. One intended task of my proposal is the discussion about the use of Wikipedia as a bootstrap source for the seeds of new (related) terms, and some related topics (e.g. how the generated terms are related each other, etc.). The human supervision at this stage is aimed mainly to assess the results of such process. However, this can obviously not exclude from the discussion what can be generated from (open) literature, which is considered a primarily source for Wikitionary too - and often cited as reference or in the Wikitionary meaning examples. I have some knowledge about your project and I would like to include some of eventual related-results or at least paper/project reference about PLOS in my final (also, open) publication. --Marco Ciaramella (talk) 08:31, 18 January 2019 (UTC)

Hyphens and dashes in entry titlesEdit

Hi, An editor just pointed to me that dashes are not currently used for entry titles on English Wiktionary. Unfortunately, the only justification they managed to quote was this page: Wiktionary:Entry titles. This page currently doesn't say anything about not using dashes, and neither does the List of unsupported characters. So is there an actual policy on whether dashes could/couldn't be used in the entry titles? There are quite a few legitimate cases for them (as well as for hyphens obviously – as each of them has their own specific usage rules). But blindly advocating for using hyphens for everything resembling them (incl. en-dashes and em-dashes) seems to be an unnecessary simplification of typographic conventions. Cherkash (talk) 02:39, 23 January 2019 (UTC)

What are the "legitimate cases for them"? It would seem that we would just need to exercise or add search rules that fold all the hyphens and dashes into one. DCDuring (talk) 02:52, 23 January 2019 (UTC)
Or we could create redirects. — SGconlaw (talk) 03:27, 23 January 2019 (UTC)
Looks like the search engine does not automatically redirect "Tay–Sachs disease" with en-dash to Tay-Sachs disease with hyphen-minus, but it does return the hyphen-minus version at the top of the list of results because it considers punctuation characters as word separators. — Eru·tuon 04:25, 23 January 2019 (UTC)
@Cherkash: Can you give me an example of an endash- or emdash-appropriate title? —Justin (koavf)TCM 03:16, 23 January 2019 (UTC)
@Koavf: En-dash–appropriate example: Tay-Sachs disease. Cherkash (talk) 03:26, 23 January 2019 (UTC)
Good point. Perfectly valid name--please do create it in the correct form. —Justin (koavf)TCM 03:51, 23 January 2019 (UTC)
Do not create it, and don't tell people to create it. If you absolutely insist on using a special dash character it can go in the headword line. DTLHS (talk) 03:52, 23 January 2019 (UTC)
Agreed, unnecessary. --{{victar|talk}} 04:08, 23 January 2019 (UTC)
@DTLHS: Why? Why would we be opposed to proper typography, especially when we can trivially source it? —Justin (koavf)TCM 04:10, 23 January 2019 (UTC)
So put it in the fucking headword line. Why the fuck should we waste our time creating entries with trivial punctuation differences when instead we could just say every entry uses a hyphen and be done with it. DTLHS (talk) 04:15, 23 January 2019 (UTC)
Calm down, @DTLHS! Why so much anger? Please keep it civil. Dashes are no more special than hyphens. And they are standard punctuation in their own right, which has its own usage patterns. What's your reason to insist to avoid them? Cherkash (talk) 04:20, 23 January 2019 (UTC)
@DTLHS: No one is asking you to do anything and you don't have to be rude to me. —Justin (koavf)TCM 04:22, 23 January 2019 (UTC)
@Cherkash: You're wrong, em-dashes are more "special", as you say, because they aren't typically intended to be used in URLs and may even cause some encoding problems. They would definitely be f**k-all annoying for people typing in the URL. It's a bad idea all around. --{{victar|talk}} 04:31, 23 January 2019 (UTC)
Evidence for your claims, @victar? As far as I know, URL encoding schemes, as well as the Wiki engine, handle dashes and any other non-ASCII characters gracefully. Other Wikis (e.g., Wikipedias in many different languages) also have no problem with them. Cherkash (talk) 04:57, 23 January 2019 (UTC)
URLs have to encode dashes and em-dashes as %E2%80%93 and %E2%80%94, respectively, so you're actually suggesting Tay%E2%80%93Sachs_disease as opposed to Tay-Sachs_disease. One concern I have with it, other then it being a total hassle to type for zero benefit, is all the places where Lua mw.ustring.find functions, etc., might not include these special characters. --{{victar|talk}} 05:22, 23 January 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── On Wikipedia, for titles with a dash, the title with a hyphen instead usually redirects to the proper title; for example, Tay-Sachs disease redirects to Tay–Sachs disease. We can do the same here. A bot could create such redirects where they do not already exist.  --Lambiam 12:13, 23 January 2019 (UTC)

It looks like other dictionaries tend to normalize em- and en-dashes to hyphens. I think we should include entries for both the hyphenated and dashed versions when both exist, but for sanity's sake I think we ought to keep all content at the hyphenated version (using the "proper" punctuation in the header). No reason to exclude the dashed versions, the URL concerns are a red herring, we already have lots of page titles which don't play well with URLs (see: the majority of the worlds languages) and even so the em- and en-dashes are valid in URLs anyway (–Sachs_disease). - TheDaveRoss 13:55, 23 January 2019 (UTC)
Most of the entries get there in the first place because someone types in a search term and clicks on the redlink, and most people have no clue about how to produce any kind of dash on their keyboard. That means that most new entries will be "wrong" in a very subtle and non-obvious way, and end up being moved. It's bad enough that we have case sensitivity to mess people up, but at least there we have a very practical reason that people can understand. I think the reason for the depth of emotion on this issue is that it represents a type of prescriptivism, which the community here instinctively distrusts. I don't really want people going around changing double quotes to Smart Quotes and apostrophes from straight to curly ones, and this feels like the same sort of thing. Chuck Entz (talk) 14:55, 23 January 2019 (UTC)