Open main menu

Wiktionary β

Wiktionary:Beer parlour

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit

July 2017

Change "proscribed" to "considered incorrect"Edit

Reading through the discussion at Wiktionary:Tea room#.22alot.22_is_NOT_correct_English. finally gave me the resolve to propose something that has bothered me for a long time.

Let's change the "proscribed" label to "considered incorrect".

  • Proscribed is not a common word. I consider myself quite well-read, and I had never come across this word before encountering it on Wiktionary. An Ngram shows more usage of the word than I would have expected, but when you look at Google Books results since 1970, you see that its use is confined to academic and technical works, such as journal articles, textbooks and legislation. It appears in very few works targeted at a general audience, which is surely the audience we are targeting here at Wiktionary.
  • "Considered incorrect" would be better than "proscribed" in that it does not give the suggestion that we are the ones prescriptively proscribing the word - we are simply noting that many, if not all, sources consider the term incorrect.
  • It looks extremely similar to a word with essentially the opposite meaning. I couldn't think of many pairs of differently-spelt English words that look any more similar when written down in lowercase (other than M/RN pairs like "bum" and "burn" perhaps).
  • This information that this label conveys is especially important (I'd even say essential) for English learners and non-native speakers, but because it is conveyed using a word that they most probably do not know, it is going to be lost on them.

Sure, the definition is only a click away at the glossary. But why should we make people learn an extra word to be able to use our dictionary properly? It's silly. Let's do away with it.

I'm inclined to propose a vote along the lines of "changing the display of {{label|en|proscribed}} to (considered incorrect)". This, that and the other (talk) 06:15, 1 July 2017 (UTC)

I am persuaded by your reasoning here. I would support the wording considered incorrect. I wonder what @Dan Polansky would think of this proposal, given that he would prefer us not to use the proscribed label. — Eru·tuon 18:34, 1 July 2017 (UTC)
I seem to prefer often deemed incorrect; "considered" is okay but "deemed" is shorted. The addition of "often" reinforces the idea that the deeming is not done by Wiktionary editors. See also Wiktionary talk:Votes/2016-10/Removing label proscribed from entries#Other label name. --Dan Polansky (talk) 19:02, 1 July 2017 (UTC)
I’d rather we kept a distinction between “considered incorrect by language authorities” (= proscribed) and “considered incorrect by speakers in general” (= nonstandard). — Ungoliant (falai) 18:47, 1 July 2017 (UTC)
I agree the distinction should be kept, though I'm sympathetic to the point that users may not know the word and hence the info may be lost on them. It's hard to find a label that keeps the distinction and is concise and able to be put into all entries that use "proscribed". "Considered incorrect by authorities" or " some authorities" is wrong if only one (but e.g., the official or dominant/influential and notable) language authority proscribes the term, e.g. the Académie française, the Duden, maybe the OED, and "some" is wrong if most or all authorities proscribe the term.
For similar reasons, "often" should not be included in the text that is automatically displayed: not all terms are "often" considered incorrect by authorities: some may be considered incorrect by all authorities (this seems especially likely if a language has one or more central authorities), others may only be proscribed by some authorities while other authorities approve of them, in which case we use "sometimes proscribed", which would become "sometimes often considered..." or "sometimes often deemed...". And probably the most frequent occurrence is that one or more authorities proscribe a term and others don't mention it, which makes it debatable whether it is "often" considered incorrect.
(Ultimately, using "proscribed" and linking to the glossary like we do may be the best option, despite its drawbacks.)
An idea based on the name of the category which "proscribed" currently categorizes into is "authoritatively disputed" or "authoritatively deemed incorrect", but I don't like the sound of either of those; "authoritatively" seems liable to be misunderstood.
- -sche (discuss) 18:17, 2 July 2017 (UTC)
I would support changing "proscribed" to "considered incorrect", but I also agree with Ungoliant that it's useful to distinguish between whether something is only "officially" considered incorrect, or whether most speakers would think it a mistake. Ultimately, I think the ideal is to put that information in a usage note, which allows for further elaboration. One can't learn the subtleties of a word's usage from a label. Andrew Sheedy (talk) 19:47, 3 July 2017 (UTC)
I strongly support the use of the more common words, despite the greater length. DCDuring (talk) 02:39, 4 July 2017 (UTC)
"Official" incorrectness vs. "popular" incorrectness is not in fact a binary distinction: words can even be incorrect in some registers while being preferred in others. Don't we have the ====Usage notes==== section for this kind of detail? --Tropylium (talk) 15:17, 23 July 2017 (UTC)

@DCDuring, Andrew Sheedy, -sche, Ungoliant MMDCCLXIV, Dan Polansky, Erutuon I've created a vote at Wiktionary:Votes/2017-07/Changing the wording of the "proscribed" label. The discussion at the talk page may interest you. This, that and the other (talk) 10:16, 9 July 2017 (UTC)

Vote -- Requests for documentationEdit

Based on Wiktionary:Tea room/2017/June#"the Variety -er", I created Wiktionary:Votes/2017-06/Requests for documentation. --Daniel Carrero (talk) 10:48, 1 July 2017 (UTC)

July Lexisession: flightEdit

Is it a spin?

Monthly suggested collective task is to collect words about flight. In the category of Wikisaurus about travel and movement, there is nothing about motion in the air, and it is the same in French Wiktionary, so it seems like a good topic for this month - it could soar!

Yay! let's do a barrel roll!

By the way, Lexisession is a collaborative experiment without any guide nor direction. You're free to participate as you like and to suggest next month topic. If you do something this month, please let us know here or in Meta, to let people know that English Wiktionarians are doing something on this topic. I hope there will be some people interested to reach the altitudes!   Noé 11:13, 1 July 2017 (UTC)

I spruced up a little bit the Spanish entries volar and volador. That's my good deed of the month done, then. Also added an Asturian entry - vuelu. --Recónditos (talk) 11:13, 8 July 2017 (UTC)
Great! Thank you! ¡Muchas gracias! I updated the [[Meta page to display a shorter version of the passed editions. There is no mention if people did not ping me or wrote a note on the beer parlour, so feel free to let me know or to enhance the Meta page. LexiSession is getting a year old soon and it's time to look back and make some improvement in the formula   Noé 15:59, 3 July 2017 (UTC)
Cleaning up उड़ना (uṛnā). —Aryaman (मुझसे बात करो) 16:43, 4 July 2017 (UTC)

Category:Coinages by language (tentative name)Edit

I'd like to have a category for words which are known to have been coined by a specific person (example: evolutionarily stable strategy). There is Category:Neologisms by language, but I don't think all neologisms have necessarily a well-defined author. --Barytonesis (talk) 13:46, 2 July 2017 (UTC)

Every word is a neologism and a coinage, so I think neither category should exist. —CodeCat 14:52, 2 July 2017 (UTC)
Only a few words have a clear author, so a coinage category may be justified. — Dakdada 15:57, 3 July 2017 (UTC)
So "coinages by known individuals"? (Or named individuals; or groups; or...) Equinox 16:09, 5 July 2017 (UTC)

Proposal: automatically link all links without a section to the English sectionEdit

There have been a lot of efforts in recent times to make sure that terms in non-English are wrapped in a template that tags them as such and adjusts the link target appropriately. Thus, I think it makes sense if all links, by default, link to English. This should make it easier for definition writers, because they can link words in a definition without worrying about where that link goes. The template {{def}} was created to alleviate this issue, and people have been adding {{l|en}} to definitions as well which is even worse. Moreover, a global solution would affect links in etymologies and in other places too.

This proposal of course only affects links to entries in the main namespace. It's also explicitly meant to be applied only in places where English text is expected, so it wouldn't be used in lists such as Derived Terms. Those would still use {{l|en}} to tag them, as before. —CodeCat 17:21, 2 July 2017 (UTC)

What are you proposing? Some javascript to automatically make links point to #English? DTLHS (talk) 17:31, 2 July 2017 (UTC)
I think so. Unless there's another way. —CodeCat 17:47, 2 July 2017 (UTC)
I'm unsure. How expensive is js that only tags links in certain sections (for example, you seem to suggest not applying it to Derived terms")? what is the actual benefit, given that English is already the top section (where a user lands) on almost all pages where an English section is present? How does that benefit compare to the drawback that many bare wikilinks that are not to English terms will be mislabelled? (For example, users sometimes use simple wikilinks to link to German or Russian words if they're long enough that the users think it's unlikely there'll ever be any other language section on that page.) - -sche (discuss) 18:31, 2 July 2017 (UTC)
People are currently using {{l|en}} in definitions, so that suggests that those people find a need for such section links. TabbedLanguages links to the last-used language section whenever a link has no section, which ends up always going to the wrong section when a link is in a definition, etymology, or anywhere else that has running English text. Perhaps only the behaviour of TabbedLanguages should be changed. —CodeCat 18:48, 2 July 2017 (UTC)
Support. — Ungoliant (falai) 18:34, 2 July 2017 (UTC)
Might be OK, but not using expensive JS. DCDuring (talk) 19:05, 2 July 2017 (UTC)
Tentative support I can imagine that there may be some mul use cases but I agree that they are probably going to be English definitions. If JavaScript seems like too much of a headache, just have a bot do it--that way it works for users with scripts disabled. —Justin (koavf)TCM 19:32, 2 July 2017 (UTC)
This seems like a solution without a problem to me; English is already where links go, since they land on the top of the page and English is the first language section. In the cases where Translingual precedes English that is likely the desired solution anyway. - [The]DaveRoss 12:06, 3 July 2017 (UTC)
Again, if that's the case, why do people use {{l|en}} in definitions? —CodeCat 12:28, 3 July 2017 (UTC)
You will have to ask them, but this proposal does not prevent anyone from using {{l|en}} incorrectly. - [The]DaveRoss 12:40, 3 July 2017 (UTC)
True, but I figured if they thought it was necessary, then I'd rather solve it in this way than by having {{l|en}} in definitions. Do you think we should disallow putting {{l|en}} in definitions? —CodeCat 13:23, 3 July 2017 (UTC)
@TheDaveRoss: There are a number of reasons why I (and others) often (not always) use {{l}} rather than bare links in definitions, most of which are already mentioned elsewhere in this thread: (1) if the English word is spelled the same as the foreign word being glossed (e.g. French correct, then a bare link won't provide a link at all, but will merely write the word in bold; (2) sometimes Translingual, not English, is the top entry on the page; (3) in Tabbed Browsing, following a link without an explicit language marking takes you to the same language you were just looking at if it's there, rather than the top entry (e.g. if you're at French corriger and click on a bare link to [[correct]], you will be taken to correct#French, not correct#English. —Aɴɢʀ (talk) 21:57, 3 July 2017 (UTC)
That is fair, I am making no judgment about whether or not it is acceptable to use {{l|en}} in definition lines. If that is a problem then I think there are other possible solutions that don't involve creating a pervasive new scripted process. It is also possible to achieve the same result in the limited cases where it is necessary using standard wiki-markup, e.g. [[correct#English|correct]]. This can even be enforced by bots since it is a very regular situation. As far as the Tabbed Browsing issue, I don't use the feature so I can't speak much about that, but it seems like a bug in Tabbed Browsing which we should not fix by changing the default behavior of the site. - [The]DaveRoss 12:27, 5 July 2017 (UTC)
@CodeCat I use {{l|en}} in FL definitions for words that share a page with the English translation. For example, the French entry for correct includes a link to the English section for the word so that the reader does not have to scroll up, past the Dutch section and the second half of the English section, in order to see the word. This is especially useful for obscure words that have a more full definition in the English section, and/or are several languages down the page. Is that what you're talking about? Andrew Sheedy (talk) 19:56, 3 July 2017 (UTC)
Hmm, this debate will probably never end :). I think it would preferable to use the same (explicit, unambiguous & extensible) mechanism to link to other entries, regardless of the target language. “English is at the top of the page” means relying on an implementation detail of the current wiki presentation. Fixing it on the client-side with Javascript isn't exactly a good solution. But those [[square bracket]]s are just too popular... – Jberkel (talk) 22:35, 3 July 2017 (UTC)

Deleting template defEdit

FYI, consistent with Wiktionary:Votes/2016-07/Placing English definitions in def template or similar, I proposed to delete {{def}} at WT:RFDO#Template:def. --Dan Polansky (talk) 20:48, 2 July 2017 (UTC)

Changing auto-generated categories at bottom of pageEdit

Hey all, I've been searching the Help pages and haven't found an answer to this. How do I edit the categories at the bottom of a page when they are apparently generated automatically? In particular, overstudious is listed as a 4-syllable word when it actually has 5 syllables. How do I correct this? Thanks for any help. BirdHopper (talk) 21:47, 3 July 2017 (UTC)

@BirdHopper: These are made by templates. In this case, it is {{IPA}}. "Oh-ver" is two and "stood-yuz" is two more, so it generates Category:English 4-syllable words. You may be thinking that it's "oh-ver-stood-ee-yuz" which is five. Since words can be pronounced different ways, it can be in both Category:English 4-syllable words and Category:English 5-syllable words but I don't know that this template has the option to add it to two categories at once presently. —Justin (koavf)TCM 21:51, 3 July 2017 (UTC)
Very interesting! I suppose I can understand how it could be pronounced with 4 syllables. From a GenAm standpoint, the 4-syllable variant is rare, which is probably why I didn't even consider there could be an alternative. I'll just let the issue go, then. Thanks for the insight! :) BirdHopper (talk) 22:06, 3 July 2017 (UTC)
@Koavf: Oh, the IPA template does it! Okay. I just added some syllable breaks (dots) to overstudious and it picked it up as having 5 syllables rather than 4. I don't mean to impose my limited experience of the world on everyone else, but the transcriptions, as written, do have 5 syllables. Now I'm curious what would happen if someone entered a 4-syllable version. I think I'll leave it as-is for now, but it's cool to know more about how that system works! :D BirdHopper (talk) 22:31, 3 July 2017 (UTC)
@BirdHopper: I don't have an example off-hand but I know that some entries have multiple instances of {{IPA}} and are in multiple categories because of it. If you put in both, it will be in both--again, even the same word can be pronounced differently and so will have different IPA transcriptions. Rather than replace the one, maybe have both? Actually, it's probably just the one that's correct. Saying it out loud seems wrong. —Justin (koavf)TCM 23:08, 3 July 2017 (UTC)
More detail: syllable counts are done by Module:syllables. It has a list of English diphthongs, and /iə/ is on that list, because New Zealand has /iə/ as a diphthong in words like here. So, to make the syllable-counting function understand that it's not a diphthong, you have to add syllable breaks. (This would be simpler if {{IPA}} were told what accent the transcription represented, and used that to determine which list of diphthongs to use.) I went through a lot of entries with /iə/ using AutoWikiBrowser and added syllable breaks a while back; I guess I missed this word. (Oh, I see the pronunciation was added recently.) — Eru·tuon 23:40, 4 July 2017 (UTC)
@Erutuon: Thanks for the extra information. Very interesting. I agree that an accent specification would be useful in these cases. For now, I'll have to be a little more aware of diphthongs in other accents and add syllable breaks if necessary. BirdHopper (talk) 16:46, 5 July 2017 (UTC)

" most common surname in the United States in 2010" (Xin)Edit

Why do we want this information? Wyang (talk) 22:23, 4 July 2017 (UTC)

That seems wildly specific and virtually impossible to maintain. It's also probably not something that someone is looking for when looking at this word/phrase/term/entry. Unlike--e.g.--Nguyen, which is notably wildly popular in Viet Nam and is worth mentioning for context. —Justin (koavf)TCM 00:03, 5 July 2017 (UTC)
It seemed to me that if we were going to include surnames we ought to try to include some information about those surnames, such as how common they were and in what demographics. If someone has a decent dataset for demographic information outside of the United States I think they should feel free to add that as well, I do not have that information. As far as maintaining it, the US Government publishes the data in a machine-readable format every ten years with the census, it is fairly trivial to update it. - [The]DaveRoss 12:15, 5 July 2017 (UTC)
I think this is the same situation as the similarly problematic template of {{en-rank}}this is not dictionary material. As the entry itself demonstrates, it is actually composed of multiple etymologies, and it would be much more useful if the surname template can be modified to say A surname of Chinese origin. Statistics showing how many people in the United States bear these surnames (and what ethnicities they are) is inconsequential in a dictionary. Wyang (talk) 12:47, 5 July 2017 (UTC)
What is or is not dictionary material is obviously subjective, and if the consensus is that frequency and demographic information about names is not worth including then I will, of course, defer to that consensus. The other conversations I have had about including these things have been positive.
Re "of Chinese origin," that information should, hopefully, be represented in the Etymology section, but it might not hurt to have it echoed concisely in the "definition" line. - [The]DaveRoss 13:08, 5 July 2017 (UTC)
Comparing it to en-rank supports this; what words are core language words and what words aren't is certainly dictionary material. I'm not sure how I feel about this; it's specialized dictionary material, which tends to move it to the edge of what we we cover, but at the same time tends to say it's not clearly over the edge.--Prosfilaes (talk) 06:02, 8 July 2017 (UTC)

Alternative forms & quotationsEdit

Should a quotation of an alternative form/spelling be placed on the main lemma page, or on the form page? (e.g. should quotations of the term huomo – obsolete spelling of uomo – be placed on the former or the latter's page? – GianWiki (talk) 00:18, 5 July 2017 (UTC)

I tend to decide this on a case-by-case basis. In this case, we're dealing with an obsolete spelling of an extremely common word, so I would add citations to huomo, because what's being attested is the specific spelling with an h, not the existence of the word uomo itself. But for rare words that are attested in multiple spellings, I'd put the citations all together in a single entry, so the reader can see that the word definitely exists but is spelled in a variety of ways. —Aɴɢʀ (talk) 10:27, 5 July 2017 (UTC)
Isn't this part of the reason we have a citations namespace? bd2412 T 13:14, 5 July 2017 (UTC)
I agree. They should be placed there. —CodeCat 13:43, 5 July 2017 (UTC)
If a spelling is RFVed (for example, if someone disputes that huomo exists as an alternative spelling of uomo), citations of it must be put in its entry—or less often on a citations page to which it then links—to prove it meets CFI. If a spelling is rare, some people do this pre-emptively. Everything else tends to be subjective / less agreed upon.
Some people add the earliest uses of English words to the lemma entries, even if the citations use other spellings—sometimes even if the citations are other languages, like Middle English (I see this even with Chaucerian examples that aren't the earliest) or Old English (few editors do this; it seems nonstandard/removable). Some people might put famous uses of words in any spelling on the lemma entries, too. Sometimes citations of one {{standard spelling of}} something are put on the entry for the standard spelling that has had content centralized on it.
But in general I would put citations on the Citations: page for the spelling they use (linked to and from the lemma's citations page via {{also}}) or on the lemma's citations page. - -sche (discuss) 15:31, 5 July 2017 (UTC)

Join the strategy discussion. How do our communities and content stay relevant in a changing world?Edit


I'm a Polish Wikipedian currently working for WMF. My task is to ensure that various online communities are aware of the movement-wide strategy discussion, and to facilitate and summarize your talk. Now, I’d like to invite you to Cycle 3 of the discussion.

Between March and May, members of many communities shared their opinions on what they want the Wikimedia movement to build or achieve. (The report written after Cycle 1 is here, and a similar report after Cycle 2 will be available soon.) At the same time, designated people did a research outside of our movement. They:

  • talked with more than 150 experts and partners from technology, knowledge, education, media, entrepreneurs, and other sectors,
  • researched potential readers and experts in places where Wikimedia projects are not well known or used,
  • researched by age group in places where Wikimedia projects are well known and used.

Now, the research conclusions are published, and Cycle 3 has begun. Our task is to discuss the identified challenges and think how we want to change or align to changes happening around us. Each week, a new challenge will be posted. The discussions will take place until the end of July. The first challenge is: How do our communities and content stay relevant in a changing world?

All of you are invited! If you want to ask a question, ping me please. You might also take a look at our the FAQ (recently changed and updated).

Thanks! SGrabarczuk (WMF) (talk) 14:50, 5 July 2017 (UTC)

Well documented languages and TagalogEdit

Can someone please add Tagalog again to Wiktionary:Criteria for inclusion/Well documented languages? It was removed without a proper process. From reading Wiktionary:Criteria for inclusion/Well documented languages, the minimum would be a discussion in Beer parlour, whereas the removal was indicated to be driven by a RFV discussion as indicated in diff.

I do realize some think this is too formal. But as Wiktionary:Votes/pl-2017-05/Modern Latin as a WDL 2 shows, what some think to be consensus often turns out to be something else when a proper discussion or vote is created. --Dan Polansky (talk) 15:49, 5 July 2017 (UTC)

Etymology before PronunciationEdit

Hello again! As I've been adding audio, I've noticed a few pages where the Etymology section is placed after the Pronunciation section, as in gadgetry. This goes against Wiktionary:Entry_layout#List_of_headings. I know that entry layout is flexible, but I personally prefer consistency so I'm tempted to "fix" these issues. Can I assume that Wiktionary:Entry_layout is up-to-date and reflects current consensus regarding layout? I recall someone's user page (I don't remember who) that mentioned that the Entry Layout page needs updating. I'm always hesitant to start making edits when a set of guidelines might not be current.

There are other cases where I've seen Etymology after Pronunciation, as in chess. Here, it makes sense to have Pronunciation first as it is common to both etymologies. Just saying that, because I know there are always exceptions to the rules. However, even in this case, there is a guideline at Wiktionary:Entry_layout#Etymology where, again, pronunciation comes after/below etymology. In the case of chess, one would have to duplicate the pronunciation section.

I know these are just guidelines, and nothing is black and white. I'm just looking for some other opinions, or maybe a pointer to discussion about layout that I'm not aware of yet, before I start hacking away. Thanks! —This unsigned comment was added by BirdHopper (talkcontribs).

Yes you can change the layout. I wouldn't go out of your way to fix thousands of pages by hand since this can be fixed automatically if anyone cares to do so. DTLHS (talk) 18:00, 5 July 2017 (UTC)
Okay. And thanks BTW for adding a signature for me. That's the second time I've done that in as many days. Oops. BirdHopper (talk) 18:25, 5 July 2017 (UTC)
I always put pronunciation before etymology. That way it's consistent if there is one word with multiple etymologies. —CodeCat 18:57, 5 July 2017 (UTC)
@CodeCat: But one word can have the same etymology and two different pronunciations--e.g. perfect (purr-fict and pur-fekt). —Justin (koavf)TCM 20:15, 5 July 2017 (UTC)
There's two etymology sections on that page. Also, the "tense" noun is missing an etymology. —CodeCat 20:44, 5 July 2017 (UTC)
Just because there are two sections, doesn't mean there should be. BigDom 06:58, 11 July 2017 (UTC)
There should be as many sections as there are etymologies, of course. —CodeCat 12:08, 13 July 2017 (UTC)

Inline referencing definitions in English entriesEdit

I think that, in general, we should not be inline referencing English definitions of English words. Not using references has largely been our practice. We use attesting quotations, not references; for English words, references carry no weight as per WT:ATTEST.

I have removed an inline reference in abbate but was reverted. What do you think? --Dan Polansky (talk) 12:35, 6 July 2017 (UTC)

I agree that we do not (with the exception of what a few newcomers have done) and should not add <ref>s to definitions, at least not as references for the definitions. The definitions need to be based on how the terms are used, as indicated by citations, as you say. I agree with your edit to abbate (although ideally inline refs like that should be moved to "Further reading"). I have sometimes seen users add references to {{defdate}}s; that might be OK. I have also seen references added to context labels like "proscribed" and "offensive", but in those cases I think it is better to leave the label bare (unreferenced) and add the references to a usage note. - -sche (discuss) 21:05, 7 July 2017 (UTC)
I don't like references to {{defdate}}, but AFAIK there is no consensus for removing them, or else I'd remove them as well. This was a reference to the definition itself. I think a further reading item pointing to offline The Shorter Oxford is pretty useless for our readers, and I would prefer not to have it there, but let t be now. The presentation of the reference is from a horror dream: "“abbate” in Lesley Brown, editor-in-chief; William R. Trumble and Angus Stevenson, editors, The Shorter Oxford English Dictionary on Historical Principles, 5th edition, Oxford; New York, N.Y.: Oxford University Press, 2002, ISBN 978-0-19-860457-0, page 3." It's a winner in a competition about how to make a reference specification as long as possible while providing close to nothing of value of the reader. --Dan Polansky (talk) 21:11, 7 July 2017 (UTC)

TabbedLanguages edit: default to English for unmarked linksEdit

TabbedLanguages currently sends you to the last-visited section, whenever you click a link that doesn't include a language section. I propose that this be changed so that it sends you to English by default, or if there is no English, to Translingual, and if there's no Translingual either, then to the last-visited section. Thanks to the efforts of various editors to add {{l}} and such to unmarked non-English links, and Daniel's work to fix all instances of {{term}} missing a language, most links to non-English terms are appropriately tagged. Thus, by far the most unmarked links in any non-English section are for English words; sending the user to English is only very rarely wrong, and when it is, it's always a result of a non-English term that has not yet been appropriately tagged. —CodeCat 14:12, 6 July 2017 (UTC)

Makes sense. --Dan Polansky (talk) 14:19, 6 July 2017 (UTC)
I agree that this should be fixed, my only comment is that perhaps Translingual should be the priority. Not a big deal since there aren't that many pages with both. - [The]DaveRoss 14:21, 6 July 2017 (UTC)
On a page such as hotel, it would be very undesirable for the link to go to Translingual by default. —CodeCat 15:32, 6 July 2017 (UTC)
I agree it would be better for plain-linked [[hotel]] to take you to hotel#English, not whatever language you were last reading, nor hotel#Translingual, nor the top of the page (which will take a non-logged-in user to the table of contents only). Doing this would obviate the need for the unpopular {{def}}. —Aɴɢʀ (talk) 17:11, 6 July 2017 (UTC)

Enabling Page PreviewsEdit

CKoerner (WMF) 15:02, 6 July 2017 (UTC)

Which language section would it default to? Could it be changed via preferences? —Aryaman (मुझसे बात करो) 16:40, 6 July 2017 (UTC)
It would make sense if it defaulted to the language section given in the link, or to English (or the first section when there is no English) for plainlinks. I don't think it would be that useful for it to be configurable, because most non-English links have the language section specified (and the ones that don't, should). --WikiTiki89 17:28, 6 July 2017 (UTC)
Is this similar to the "Lupin" a.k.a. "Navigation" pop-up gadget? I have often wanted a version of that gadget that would fetch enough of the page that it would consistently fetch at least the first definition of the first (or specified) language section. - -sche (discuss) 21:12, 7 July 2017 (UTC)
For entries with {{wikipedia}} etc. I often see no substantive content at all from the popups we now have. I don't know whether an image is what we really need, rather than more - lots of - definitions. This could be very good. As with many improvements, configurability (suppressing graphics in my case) would be nice, but not at a high performance/server-load cost. DCDuring (talk) 19:43, 8 July 2017 (UTC)
(PoS header and definition lines seem essential; etymology header would be nice to indicate how much content there might be beyond what the page previews might be showing. Others might prefer other headers or content.) DCDuring (talk) 19:48, 8 July 2017 (UTC)

Wiktionary:Votes/pl-2017-07/Vote references in policiesEdit

Based on the discussions linked in the vote, I created Wiktionary:Votes/pl-2017-07/Vote references in policies. --Daniel Carrero (talk) 18:15, 7 July 2017 (UTC)

motî, iOS dictionary app releasedEdit

This is a follow-up post to Looking for beta testers for new Wiktionary iOS app from November last year. There wasn't a great deal of interest in testing so unfortunately I didn't get much feedback. However the app is now publicly available in the App Store. It works offline, it's free, and it doesn't have ads (and never will). Right now there are only 10 languages but I plan to add more in later versions. The idea is to continuously update it based on recent dumps. – Jberkel (talk) 09:12, 8 July 2017 (UTC)

This looks really nice! Id love to use it, but I have Android. —Aryaman (मुझसे बात करो) 13:30, 13 July 2017 (UTC)
I'd love to work on an Android version but will focus on iOS first. I also considered doing a simple HTML5/mobile web version, but the offline storage limits are still too low for the (quite heavy) dictionary data. – Jberkel (talk) 07:42, 14 July 2017 (UTC)
Good to heard news of your project! I have Android on my phone too, but I think a better mobile app than the one made for Wiktionary now is a good news!   Noé 12:12, 23 July 2017 (UTC)

What is the purpose of {{catlangname}} and {{topics}}?Edit

What does {{catlangname|ru|calques}} do that [[Category:Russian calques]] doesn't? Similarly what does {{topics|ru|Electricity}} do that [[Category:ru:Electricity]] doesn't? I gather there are shortcuts {{cln}} and {{C}}, but there's also the category shortcut [[CAT:...]]. Benwing2 (talk) 22:06, 8 July 2017 (UTC)

Stops HotCat from being usable. Also saves a bit of typing if there are many categories? - -sche (discuss) 02:21, 9 July 2017 (UTC)
Sort keys. —CodeCat 11:24, 9 July 2017 (UTC)
@-sche "Stops HotCat from being usable" - how is that a good thing? A genuine question because I find HotCat very useful. BigDom 05:57, 11 July 2017 (UTC)
I believe -sche was being sarcastic. That sounded like a criticism against the templates. Like this: "We should stop using these templates, they stop HotCat from being usable". --Daniel Carrero (talk) 06:00, 11 July 2017 (UTC)
Or "Please can a JavaScript wizard make HotCat work properly"... — Eru·tuon 06:06, 11 July 2017 (UTC)
Yes! --Daniel Carrero (talk) 06:08, 11 July 2017 (UTC)
I would like to get rid of {{topics}}, but the only way we can do that is if categories automatically sort entries the right way. This is a feature we've been waiting on for years. Right now, {{topics}} is essential for reconstruction pages, which otherwise all get sorted under R for Reconstruction. It's also necessary for mainspace languages since we have custom sort keys. —CodeCat 12:06, 13 July 2017 (UTC)

Compound and fiction etymology categoriesEdit

Is there a particular reason that all the top etymology categories for types of compounds are top-level etymology categories? That is, the following categories:

I would propose that these should be placed under Category:Compound words by language. The by-language categories already work this way.

We similarly seem to have Terms derived from [work] categories such as Category:Terms derived from Harry Potter by language as top-level etymological categories rather than, as expected, children of Category:Terms derived from fiction by language. --Tropylium (talk) 13:28, 9 July 2017 (UTC)

Agree. That's what I suggested here. --Barytonesis (talk) 13:32, 9 July 2017 (UTC)
It would be better not to put subtypes of compounds under Category:Compound words by language, but rather under another category. The x by language is supposed to contain only language-specific categories for x. See, for instance, how Category:Lemmas by language does not contain Category:Nouns by language, Category:Verbs by language, Category:Adjectives by language; instead these are placed in Category:Lemmas subcategories by language. So I've gone and moved the primary subtypes of compounds by language to Category:Subtypes of compounds by language, as @Barytonesis proposed on the talk page mentioned above. — Eru·tuon 23:48, 9 July 2017 (UTC)
While a category like Category:Subtypes of compounds by language is necessary, it could be renamed; perhaps Category:Types of compounds by language would be better, or something else? — Eru·tuon 23:55, 9 July 2017 (UTC)

Sound changes: categories and etymologiesEdit

I'm thinking there should be categories for the sound changes that English words particularly have undergone. For instance, terms that have undergone yod-coalescence (nature, nation, tune, idjit, whatcha) or yod-dropping (lute, new, figure, beautiful: obviously both of these vary by dialect), terms affected by the horsehoarse, Marymarrymerry, cotcaught, or weak vowel mergers. The same could be done for sound changes in other languages.

English is in an odd situation because these sound changes are usually not reflected in spelling, and hence they are not visible in etymologies.

Categories could be easily added by {{accent}} ({{a}}) in pronunciation sections, because they already sometimes mention sound changes (see, for instance, before § Pronunciation). But I think etymology sections should also contain information about changes in pronunciation that aren't reflected in spelling. — Eru·tuon 23:38, 9 July 2017 (UTC)

Synchronic regional differences in pronunciation have no place in etymologies, in my opinion: they're part of the history of the lect, not of the term itself. That's not to say that a regional form which is spelled differently than other forms due to one of the sound changes in question shouldn't mention it- but the etymology for Mary should say nothing about the Mary-marry-merry merger. If you think about it, any random sequence of letters of the right shape to be interpreted as containing such phonemes will reflect the changes when individuals read them aloud (e.g. *morpliger will reflect differences between rhotic and nonrhotic lects), so it's not about the history of the term. The place for such things is in the pronunciation section, as part of illustrating the regional variation for the term. Chuck Entz (talk) 02:51, 11 July 2017 (UTC)
I agree, I don't see how the kind of information you mention could sensibly be incorporated into etymology. It would make no sense to mention in the etymology of Mary that it has undergone the Mary-marry-merry merger, for example, since (1) that statement is false for some number of speakers, and (2) it's not etymological information. - -sche (discuss) 02:59, 11 July 2017 (UTC)
Listing examples of sound changes sounds like something best suited for an appendix. English could definitely use one, perhaps also various other languages with "minor" unpredictable spelling rules. See, for example, Appendix:Hungarian words with ly. --Tropylium (talk) 15:23, 23 July 2017 (UTC)

Arabic script CSS font stack proposalEdit

I ran into some problems with the current Arabic font stack which seems to be rather nonsensical at the moment. I dont know what the logic is behind it. Here's a proposal I came up with for what to do about it: User:Radixcc/ArabicFontStackTest It's a little hard to figure out what's going on with some fonts because I can't find a @font-face directive anywhere for including fonts currently in the CSS. — Radixcc 📞 16:44, 10 July 2017 (UTC)

Which fonts to use for Arabic has been discussed before; one of the issues that complicates things is that not all fonts display sequences of diacritics (vowels + shadda, shadda + vowels) well. But it has been several years since the last discussion, maybe fonts have improved. Perhaps it would be illustrative to check how Wikitiki's "Arabic font test" examples display in the fonts you propose, and the fonts that were previously rejected (to see if those have improved). Pinging some users who participated in the long discussion I linked to: @Wikitiki89, Atitarev, Mzajac. - -sche (discuss) 17:30, 10 July 2017 (UTC)
Ok I added the diacritic tests to my page. Now that I get to comparing them it seems like the problem with Droid Arabic Naskh is that it appears a bit larger than the others. The whole Arabic font situation sure is a headache. — Radixcc 📞 02:37, 11 July 2017 (UTC)

Focus search box by default on most pagesEdit

I am sure this has come up previously, although I didn't find any recent discussions on the Beer Parlour. An OTRS email suggested that it would be very useful for the user if the search box had initial focus on most pages. The German Wiktionary has had this feature implemented for years, and I copied it here for those who would like to test it out. Just add importScript('User:TheDaveRoss/searchFocus.js') to your common.js page to see what it is like. It does not focus the search box on a few pages where that would obviously be undesirable, such as edit pages and log-in pages.
Main question: should we implement this for all users by default? It is not without downside, but it is pretty handy most of the time. - [The]DaveRoss 19:26, 10 July 2017 (UTC)

My comment in the previous discussion was that (AFAIK) focusing a control always forces the page to scroll to a point where the control is visible. This can be very annoying, e.g. if you are visiting water#Occitan. A text box having focus might also block other shortcut keys that would normally scroll, such as PgUp/PgDn. Equinox 19:30, 10 July 2017 (UTC)
The scrolling to focused control problem is an important one, and I can see a few possible ways around that but it is a problem with the current implementation. One would be to only focus if the page URL does not include an anchor, which is crude but effective. There are probably cleverer solutions as well, but I would rely on people who have done webdev and understand how focus affects things in various browsers.
The page up and down, as well as the arrow keys, are still functional for scrolling with this enabled. - [The]DaveRoss 19:40, 10 July 2017 (UTC)
This SO Answer may be applicable, as long as it doesn't do the whole scroll up and down dance. - [The]DaveRoss 19:46, 10 July 2017 (UTC)
We could make the search box scroll with the page. DTLHS (talk) 19:44, 10 July 2017 (UTC)
User could also be advised of the focus-search-box key (Alt+Shift+F in Chrome, might possibly vary by browser, and presumably not available on mobile). Equinox 20:01, 10 July 2017 (UTC)

Count me in as opposed to the idea--we don't need to control the users' browsers or behaviors any more than we already do. Scripts which unexpectedly take away control or focus are a real nuisance to me. —Justin (koavf)TCM 20:45, 10 July 2017 (UTC)

One could argue that this isn't unexpected, the current behavior is what is unexpected, but I agree that poor implementations of focus change are a real nuisance. - [The]DaveRoss 12:26, 11 July 2017 (UTC)

Actually, that "feature" (and iirc also one other stupid script) is the reason why I (as a german) am using enwikt instead of dewikt. So, pretty please don’t do it here. --Nenntmichruhigip (talk) 19:07, 26 July 2017 (UTC)

temp:head or language-specific template?Edit

I never seem to know which markup is preferable to use: {{head|fr|suffix}} or {{fr-suffix}} (I'm only using French as an example)? I just did this, is this all right? --Barytonesis (talk) 22:30, 10 July 2017 (UTC)

Language-specific templates only really make sense if they add something significant, which is not the case in your edit. So I think your edit was right. —CodeCat 23:00, 10 July 2017 (UTC)

Not bolding the initials of abbreviations, acronyms and initialismsEdit

At some point, I'd like to create a vote to incorporate this rule in WT:EL:

"Abbreviations, acronyms and initialisms can't use bold letters like this: armoured combat vehicle in ACV. The correct would be simply armoured combat vehicle."

I don't have actual numbers, but I believe this proposal likely reflects an unwritten rule already in practice. Most affected entries don't use the bold letters anyway, but sometimes I find a few entries that do.

If this passes, it would be kind of consistent with this 2010 vote: Wiktionary:Votes/pl-2010-03/Bolding letters in initialisms (based on Wiktionary:Beer parlour/2010/March#Bolding letters in initialisms). All participants voted "Oppose bolding", but I believe this simply means that no rule was effected. Apparently, EL was not edited in any way based on that vote. --Daniel Carrero (talk) 07:06, 11 July 2017 (UTC)

Though I think the bolding has been overused, there are occasions when the bolding makes clear how abbreviations of multi-word expressions ("MWE"s) are constructed from the components, where it is not immediately apparent. Similarly for some blends. IOW, we may have a preference, but a vote seems inappropriately rigid. A more complex proposal that attempts to address the exception I've identified is likely to be harder to understand, have surprising unanticipated consequences, and make for more rigidity. Less legalism, more use of dump-processing to support reviews of possible problematic overuse, misuse, etc seems more wiki-like. If this flexibility is not you one's taste, perhaps WikiData is a better project. DCDuring (talk) 12:14, 11 July 2017 (UTC)
In the entry for ABQ bolding selected letters would be very handy. In fact the display forced by {{abbreviation of}} makes the desirable use of bolding impossible to implement while getting the benefits of {{abbreviation of}}. Note also that in this case the abbreviation is not even one of an MWE. DCDuring (talk) 12:26, 11 July 2017 (UTC)
@DCDuring: Maybe the wording could be something like: "bolding of initials is generally discouraged". Assuming we want some entries to have it, but not all or most. --Daniel Carrero (talk) 14:12, 12 July 2017 (UTC)
Why not just start off WT:ELE with the imprecation: "Try to do a good job of formatting." Another, probably more productive approach would be to eliminate all the bad formatting in existing entries so there are fewer bad examples for contributors to follow. This would probably be more productive than working on yet another vote that doesn't leave us in a better place than we are now. DCDuring (talk) 15:01, 12 July 2017 (UTC)
We'll eliminate all the bad formatting from entries as soon as someone writes an algorithm that can tell us what bad formatting is. Or as soon as we rewrite our entries so that they are actually parseable by a computer without inventing strong AI first. DTLHS (talk) 04:40, 13 July 2017 (UTC)
We don't have to do anything hard. We can identify entries that use the several templates used for abbreviations and also contain an emboldened capital letter followed without a space by one or more lowercase letters. Most of these will be for initialisms or acronyms for which, IMO, there is not sufficient justification for bold. The other cases may need manual review. As we have more or less standardized on the templates involved, we should thus easily identify many of the cases. Once this has been done, a dump could be analyzed for all remaining instances of such or similar used of bold for parts of words, for manual review. Obviously if there is no consensus on the simpler cases, we can't proceed. DCDuring (talk) 05:49, 13 July 2017 (UTC)
I remember this was discussed before and most people disliked the bolding. Unfortunately no idea when/where. Equinox 17:04, 12 July 2017 (UTC)
Other than the 2010 discussion I linked in my first message above? (I noticed you started that discussion.) --Daniel Carrero (talk) 17:10, 12 July 2017 (UTC)

Language sectionsEdit

I like very much the presentation of languages in articles in ( example page). Would it be an enchantment to have it in all wiktionaries? I mean, in every page, instead of language sections, to have a table that the user will select the language to see. That way when a user comes to very heavy page with lots of language sections will not be obscured by other languages. I see they only have a <language> tag which probably does most of the work. Maybe in such an extension Languages do not have to be in accordance with iso codes, in order to have the ability to add non standard language (if this is desirable). As of my understaning this works by creating a subpage of a language and just displaying it depending on user preferences (which we may or may not use in wiktionaries). The user has the freedom to choose any language to see. --Xoristzatziki (talk) 14:15, 11 July 2017 (UTC)

There is a similar feature available here, it is called Tabbed Languages. You can enable it in Preferences > Gadgets if you would like to see it in action. - [The]DaveRoss 14:23, 11 July 2017 (UTC)
Can someone remind me why Tabbed Languages isn't enabled for all logged-out users? Entries with several language sections are basically a cluttered mess without it. This, that and the other (talk) 21:14, 12 July 2017 (UTC)
It was actually voted on and passed, but then nobody did anything about it. It should be done now. —CodeCat 12:03, 13 July 2017 (UTC)

Strategy discussion, cycle 3. Let's discuss about a new challengeEdit

Hi! It's the second week of our Cycle 3 discussion, and there's a new challenge: How could we capture the sum of all knowledge when much of it cannot be verified in traditional ways? You can suggest solutions here. You can also read a summary of discussions that took place in the past week. SGrabarczuk (WMF) (talk) 15:36, 11 July 2017 (UTC)

We already don't require "reliable sources", so I think we're ahead of the curve on that one. —CodeCat 12:10, 13 July 2017 (UTC)

Creating redirects to xx-IPA templatesEdit

Is it an accepted practice to create a redirect to an xx-IPA template (e.g. {{hu-ipa}} redirects to {{hu-IPA}})? I am copying a conversation from User:Liggliluff's talk page:

Hi, what is the purpose of the redirect to {{hu-IPA}}? --Panda10 (talk) 16:43, 13 July 2017 (UTC)

Because {{fa-ipa}}, {{ko-ipa}} exists, and if people are used to these, it'll be easier for them to find the other, and it's quicker and easier to not having to shift case.
But then, the other templates doesn't have lowercase redirects: {{ar-ipa}}, {{ca-ipa}}, {{cs-ipa}}, {{eo-ipa}}, {{et-ipa}}, {{fi-ipa}}, ...
And then you got: {{grc-IPA}}/{{grc-ipa}}
I believe the standard naming convention is xx-IPA. But I will bring up the subject at Beer Parlour.

--Panda10 (talk) 18:43, 14 July 2017 (UTC)

Yes, the standard convention is xx-IPA. A few years back I moved all templates with deviating names to xx-IPA as long as they were luacized templates that automatically generated pronunciation information. Most redirects are there because of the page moves. There's nothing particularly wrong with having redirects from xx-ipa names, but there's no particular reason for them either. Just use xx-IPA. —Aɴɢʀ (talk) 08:17, 15 July 2017 (UTC)
Thanks. --Panda10 (talk) 13:58, 15 July 2017 (UTC)

CFI and Poorly-Attested Varieties of Well-Documented LanguagesEdit

The whole distinction between LDLs and WDLs was intended to protect entries for lects with limited corpora of written texts, and yet there are large numbers of dialectal terms even in English that are hard or impossible to verify under the current rules.

For one thing, people have always tended to write only in the standard lect and only speak in the other lects (or at least not write anything that gets durably archived). Add to that the lack of standard spelling, which means that any single variation is less likely to be attested often enough, and you have the equivalent of many LDLs embedded within WDLs.

There's also the matter of historical variation in depth of attestation: modern technology has made it easier to produce, distribute, capture and preserve language, and attitudes about various sublects, not to mention tolerance in general for lects other than the standard ones, have changed over time.

Is there any way we could modify CFI to take this into account? Perhaps we could specify in the WDL list which sub-lects are well-documented, and exempt all the others from the WDL requirements. Either that, or add general parameters for which types of sub-lects should be exempted or not exempted. Chuck Entz (talk) 20:46, 14 July 2017 (UTC)

I definitely agree. It's much harder to attest uncommon variants of a language, and even very informal levels of language can be difficult to attest, for the same reasons. Andrew Sheedy (talk) 03:56, 15 July 2017 (UTC)
I agree as well. We should protect dialectal terms and old hapax legomena (e.g. لسپردرک (laspardarak)). --Vahag (talk) 08:19, 15 July 2017 (UTC)

Entries by (talk)Edit

This user has been creating lots of entries for generic molecular formulae. I seem to remember that we don't accept these. Or do we? SemperBlotto (talk) 13:05, 16 July 2017 (UTC)

  • Most of his other entries are of rather poor quality - I have half a mind to block him. SemperBlotto (talk) 13:25, 16 July 2017 (UTC)
Now the user is adding phobias, some of which seem to be barely attested, and others of which just get a lot of mentions. But the user is probably adding this content in good faith, maybe not knowing that "mentions" don't meet CFI, so I think advising them on their talk page is better than blocking. - -sche (discuss) 18:02, 16 July 2017 (UTC)
Block them as a BrunoMed sock. At least I think that's who it is. Look for assembly-line-style use of the same verbiage whether it fits the entry or not. I'm not 100% sure, because I don't remember the geolocation details of their previous socks- except that they all geolocated to Croatia, as this one does. Chuck Entz (talk) 01:26, 18 July 2017 (UTC)


Based on the discussions linked in the vote, I created Wiktionary:Votes/pl-2017-07/Gallery. --Daniel Carrero (talk) 11:38, 17 July 2017 (UTC)

"Obsolete" forms that were never really usedEdit

The discussion that started here and the contributions of this anon give rise to a problem that we need to solve – what do we do with word forms that were never naturalised and are they to be considered "obsolete"?

Context: Romanian underwent a change of writing systems in the late 18th century. Transylvanian scholars adapted the Latin alphabet to the Romanian language, using orthographic rules from Italian. The Cyrillic alphabet remained in gradually decreasing use until 1860, when Romanian writing was first officially regulated. At the time, countless mixed alphabets were introduced, some were even used simultaneously. If you were to read texts from this period, it wouldn't come as a surprise if one word was written using several alphabets; it all depended on who you were reading.

Adding to an already difficult linguistic period, there was also a tendency in the works of several scholars to re-Latinise the language ad absurdum (for instance Dicționarul limbii române, 1871-1876, by August Treboniu Laurian and Ion C. Massim).

E.g. the word băiețel ("little boy") was written as baiatellu in the aforementioned dictionary.

The spelling is completely subjective to the ideas and beliefs of the scholars who wrote it. It does not in any way, shape or form describe how the word was actually pronounced or written by the public.

Therefore, my question is should word forms such as baiatellu be included under Alternative Forms as obsolete even if they were never actually used? The form has indeed citations and would technically fulfil minimum requirements for inclusion, but it somehow feels wrong to add it considering that it was used only in a tight-knit circle of scholars and authors of the time. I have similar hesitations when it comes to forms without diacritics (e.g. -țiune vs. -tiune), because it would cause disarray amongst Romanian entries and possibly other languages too.

Any input is highly appreciated (@Redboywild, @Word dewd544). --Robbie SWE (talk) 09:57, 18 July 2017 (UTC)

Maybe a "hypercorrect" gloss? We've had a few such English entries where e.g. an æ spelling has been added that doesn't stand up to scrutiny. Equinox 12:20, 18 July 2017 (UTC)
I would tag them as both "obsolete" and "rare". If there's a standard explanation that you would use in a number of entries, you could also create a template. Chuck Entz (talk) 13:40, 18 July 2017 (UTC)
I'm sorry to have to be a massive pain, but the implications of said solutions are daunting. It would give every Romanian entry several alternative forms, most of which would be artificial and/or unknown to a majority of Romanians. These forms, especially the made-up Latin forms from the late 19th century, would erroneously suggest that Romanian was morphologically closer to the other Romance languages than it actually was. We would basically be accepting counterfactual efforts from some scholars to revamp the historical evolution of the Romanian language. I would personally not go anywhere near forms that popped up during this period – only veritable alternative forms and attested archaic/obsolete forms such as nație for națiune, pâne for pâine (DEX is pretty good at mentioning these alternative and archaic/obsolete forms together with veritable sources from prose and poetry), etc., deserve to be mentioned, IMHO. I think the problem is exacerbated by the lack of written sources in Romanian dating from before the 16th century. It makes it hard to create a historical timeline for the Romanian language – where we have Middle English and Old English, in Romanian we have nada. --Robbie SWE (talk) 16:13, 18 July 2017 (UTC)
It's up to you to decide what to work on. If a word appears in print it can be included, provided it is properly tagged, even if it was used by an author promoting historical revisionism. DTLHS (talk) 16:41, 18 July 2017 (UTC)
If there are three independent citations of forms like you mention (i.e. they don't just appear in the work of one author), then someone who wants to spend the time adding them can do so. I would suggest adding a few sentences at WT:ARO that describe the issue ("in the 1800s circles of scholars proposed many spellings for such-and-such reasons that never caught on and are now obsolete...") — perhaps WT:ARO#Moldovian_and_Cyrillic_Romanian (where the allowance of Old and New Cyrillic forms is explicit in Wiktionary:Votes/2011-10/Unified Romanian) can be generalized into a section on spellings. Then, one could make a qualifier template that links to that explanatory section, to put after those spellings when they're listed in alternative forms, and one could also make a "form of" template to use in the entries for the spellings themselves. Maybe the wording could be "obsolete respelling of X proposed in the 1800s"? - -sche (discuss) 17:31, 18 July 2017 (UTC)
I'm personally of the mind that we shouldn't really bother too much with these. It's just going to add unnecessary confusion for those who aren't very familiar with the language and its historical evolution, or otherwise just take a lot of effort to explain the (rather obscure) context of these forms of the words. At any rate, I won't really be involved with this, as I still have other things to work on. Word dewd544 (talk) 17:58, 18 July 2017 (UTC)
It's definitely within our ambit. If we're worried about adding unnecessary confusions, we should figure out how to record the information in a way that isn't confusing. This is hardly a problem limited to Romanian; few natural languages had the spelling standard that persists today (if the community has, indeed, decided on a spelling standard) upon the birth of writing in that language. Glancing at Shakespeare's First Folio, it seems we have much of the old 17th century spelling, but not linked from the modern spelling in any way.--Prosfilaes (talk) 21:33, 18 July 2017 (UTC)

Ok then. Humour me for a minute – let's play a game of what-if.

What if I were a scholar, specialised in linguistics with a strong proclivity for English. Let's say I hate foreign influence on the English language – Anglo-Norman, Latin and other Romance languages have ruined this Anglo-Saxon gem and nothing would tickle my fancy more than to cleanse the language from these aberrations. In a Tolkienesque manner I author a book proclaiming my agenda and later, a voluminous dictionary where I completely refurbish the English language – vocabulary, grammar and morphology, you name it, have all been Anglo-Saxified. Several fellow colleagues agree with me, write books about my work, cite me frequently and some even continue my purification crusade. Others criticise my work and call me a nutcase (and rightly so, if you ask me), nonetheless plenty of quotes, but not so many headlines (you know, the media is too busy covering Trump's latest tweet or something like that). Flash forward 150 years, someone finds my work and thinks "Wow, English sure has changed – I've never seen these words and archaic spellings before. I think I'm going to add them to Wiktionary as obsolete forms". These contributions are easily cited because finding citations is a piece of cake, so they pass WT:CFI. Mind you, no one else – authors, newspapers, mass media, or that Instagram celebrity who is famous for doing nothing – has ever used the words in my dictionary.

Back to the present. If the consensus is that the postulation above is feasible and something we should accept, then I think I'm going to need Prozac and a call to my shrink cause the world has gone mad I tell you! --Robbie SWE (talk) 18:48, 19 July 2017 (UTC)

All words in all languages. As long as they pass CFI, tag them as obsolete, rare (and maybe even make a special label or template with an informative link that says following the abandoned orthography of Dr. So-and-so). I'd only link to them from a lemma in an autocollapsed box with qualifiers. —Μετάknowledgediscuss/deeds 18:56, 19 July 2017 (UTC)
How would they be easily cited if no one used them? Three independent cites is not a trivial hurdle. There have been quite a few English spelling reformations, but few of them can offer even one real printing in the spelling; we count uses, not mentions. The Deseret alphabet, supported by the local government, could arguably not reach that level for anything.
There's a lot of marginal language use. People are welcome to not spend their time on anything they don't feel is worth it. But if someone wants to record this history of Romanian, it's entirely in our ambit.--Prosfilaes (talk) 21:42, 19 July 2017 (UTC)
"What-if"? See w:Linguistic purism in English. We've had problems with otherwise very good contributors- even admins- trying to push this. We document everything that's been actually used (as opposed to mentioned), but we explain what it is, and we don't allow uncommon forms as translations or in definitions, nor do we usually link to them from the main forms. That way someone who runs into it somewhere (e.g. Google Books) knows what it is, but we aren't promoting it. Chuck Entz (talk) 14:13, 20 July 2017 (UTC)
The point of my somewhat overdramatic "what-if" story is to exemplify what I believe to be an absurd stance – the situation for Romanian is that we have word forms coined by scholars, mentioned within their like-minded networks but not actually used by anyone else. I don't think it is in our best interest to add these forms as alternative forms in main namespaces (like the anon did), because it suggests that they were common at the time. I'm not going to work with these forms, but I dread that someone else will find them extremely interesting and add them to existing Romanian entries. --Robbie SWE (talk) 17:54, 20 July 2017 (UTC)
If they're not actually used, then they're not relevant. You say "coined by scholars, mentioned within their like-minded networks but not actually used by anyone else" which avoids the important question of whether they were actually used by anyone. If they were, then we should have entries on them. I believe we should add alternative forms on them, that we should link all alternative forms that are citable, with appropriate notes, but that's not the important thing.--Prosfilaes (talk) 19:26, 20 July 2017 (UTC)
Romanian forms with -tiune and (silent) ending u were used - and not just mentioned - and are attestable as for WT:CFI. If there are doubts, please use WT:RFVN. Those old spellings are similar to for example old English spellings which are included in the English wiktionary too. (It's not necessarily hypercorrect, and even English spellings with æ or œ are not necessarily hypercorrect.)
If -tiune and (silent) ending u were rare, then it could also be because Romanian was rarely written or rarely written in Latin characters in the 19th century, and not just because -țiune and u-less spellings were the common forms. Anyway, as others pointed out there could be more informative labels than just obsolete.
Latinising spellings, as -tione instead of -tiune/-țiune in the dictionary by Laurianu and Massimu, probably aren't attestable. If they unexpectedly are attestable, then the label simply could/should be more informative than just obsolete, for example it could be [[Wiktionary:About Romanian#Spelling|Latinising spelling]]; obsolete, rare/uncommon. Also dates could be added in the label like 19th century Latinising spelling, or inventors could be mentioned if there are any and if they are famous like Latinising spelling following Laurianu and Massimu. (If the inventors are not famous, then it's not really help- and useful to mention them in a label.) - 18:58, 23 July 2017 (UTC)
I think there's a general consensus that there's not a problem creating entries for these things. The real argument seems to be about linking them to entries using standard spellings, which is not an issue resolvable by WT:RFVN.--Prosfilaes (talk) 17:43, 24 July 2017 (UTC)

Adding DemoticEdit

Can we add Demotic (the stage of Egyptian between Late Egyptian and Coptic, not the Greek vernacular) as a language (perhaps egx-dem)? It’s cropping up in a lot of Coptic etymologies (e.g. ϣⲉⲣⲓ (šeri), ϩⲁⲓ (hai), ϩⲟⲟⲩⲧ (hoout)) and some others (e.g. lily) with no clear way to link to it. The script and transliteration are different from (hieroglyphic/hieratic) Egyptian, as is the grammar and a good part of the lexicon, so that splitting it off wouldn’t result in significant duplication of content. Traditional lexicography keeps the two separated (cf. the Wörterbuch der ägyptischen Sprache vs. the Demotisches Glossar) with good reason. — Vorziblix (talk · contribs) 16:30, 19 July 2017 (UTC)

Sounds reasonable to me. —Aɴɢʀ (talk) 21:53, 19 July 2017 (UTC)
@Angr Since I don’t have the requisite admin rights to edit the module, could you (or any other admin) add the following to Module:languages/datax:
m["egx-dem"] = {
        canonicalName = "Demotic",
        otherNames = {"Demotic Egyptian", "Enchorial"},
        scripts = {"Latinx", "Egyd"},
        family = "egx",
        ancestors = {"egy"},
        wikipedia_article = "Demotic (Egyptian)",
and add the line
   ancestors = {"egx-dem"},
to Coptic in Module:languages/data3/c? (The tabs might need to be fixed if not copied from source.) Thanks. — Vorziblix (talk · contribs) 00:36, 20 July 2017 (UTC)
Added. DTLHS (talk) 00:47, 20 July 2017 (UTC)
Either way is fine, but looking over results from e.g. Google Books or web search, unqualified ‘Demotic’ in English almost always means Egyptian Demotic (or simply the adjective) and Egyptian Demotic is almost always called simply ‘Demotic’, whereas Greek Demotic is generally specified as such. Context also makes it rather unlikely that the two would be confused, especially given that we don’t have Demotic Greek as a language or dialect separate from (Modern) Greek. However, if consensus favors changing the name, it should be fairly easy to do. — Vorziblix (talk · contribs) 05:15, 20 July 2017 (UTC)
Yeah, it's common enough to call it just "Demotic" (like also the script Egyd).
But on the subject of naming conflicts, we have both Category:Egyptian languages and Category:Egyptian language, i.e. a family and a language have the same name. Wiktionary:Families advises that this should be avoided, but does it actually cause any problems other than in etymologies where "from Egyptian" (compare "from Germanic") and "from Egyptian" would be indistinguishable? If that's the only issue, it seems like it can be worked around without renaming anything. - -sche (discuss) 09:28, 21 July 2017 (UTC)

Vocalisation of laryngeals, againEdit

Can anyone please help me deal with Victar in Reconstruction:Proto-Indo-European/h₂reh₁- and Reconstruction:Proto-Indo-European/Hreh₁dʰ-? The two given reconstructions make no sense. A sonorant in a zero-grade root becomes syllabic, this is standard PIE. So this means that the laryngeals next to it certainly don't become syllabic. Syllabic sonorants in Germanic develop an epenthetic -u- in front of them, which is what would be expected in such a form. The fact that something else is found implies that the reconstruction is wrong. How do I explain this? I'm tired of being forced into an edit war in order to keep dubious information out of Wiktionary. Clearly there is no consensus to include it, so why should it be included anyway? —CodeCat 21:20, 19 July 2017 (UTC)

It's definitely unexpected for HRHC- to make the second laryngeal syllable rather than the R, but maybe someone's discovered a new sound law by which (word-initial?) HRHC- surfaces as post-laryngeal RəC- > RaC- in Germanic rather than the normally expected R̥̄C- > uRC-. Are there any PGmc words that do start with uRC- < HRHC-? All the uRC- words I can find in CAT:Proto-Germanic lemmas (*umbi, *und, *under, *unhtaz, *unseraz, *urbą) seem to come from *(H)R̥C-, not *(H)RHC-. —Aɴɢʀ (talk) 21:49, 19 July 2017 (UTC)
uRC from RHC: *kundaz, *kurną, *gulþą, *hurną, *hulliz, *þunnuz, *spurą. Kroonen notes for *bladą < *bʰl̥h₃tóm that the -a- must be secondary since it can't reflect an inherited form of the root in any grade. There may be other cases of such "impossible" grades with laryngeal-final roots in Germanic. —CodeCat 19:53, 20 July 2017 (UTC)
None of those are word-initial, though. —Aɴɢʀ (talk) 21:19, 20 July 2017 (UTC)
These are sourced from Kroonen, so in the absense of alternative analyses, I'm not sure what else are we supposed to do here. Perhaps we might add a question mark, and explain the actual issue in the PGmc entries themselves (once they have been created). --Tropylium (talk) 15:35, 23 July 2017 (UTC)
We're not required to go with Kroonen. This is one of the reasons why I opposed blindly following sources in the past. Sometimes they really do lead you somewhere nonsensical. My own interpretation of the situation is that Kroonen is probably essentially correct, but that the derivation is post-PIE. They would have occurred at a time when laryngeals were no longer consonants, but the laryngeal nature of certain roots was not yet entirely lost. A derivation like *bladą is only possible if speakers somehow "knew" that a (or a predecessor) was the vowel to be used in the zero grade of such laryngeal roots, which in turn must have arisen by analogy with CHC-shape zero grades where a is the regular development. However, it is important to note that there is no a in the past plural of strong verbs anywhere, even in verbs of laryngeal roots. Instead, classes 6 and 7, where most laryngeal roots are, have no zero grade altogether. —CodeCat 15:44, 23 July 2017 (UTC)
Most of what we list in PIE root entries under derivations are preforms (projections into PIE) and not proto-forms (comparative reconstructions) anyway. A single reflex in a single branch, say Celtic *sutus < "*séwH-tus" < *sewH- typically does not warrant reconstructing *séwHtus for PIE itself. This is after all why we (and also other reference works) normally list such descendants under just the PIE root, not under any actual PIE term.
So the wider question is: do these pre-forms have to adhere to canonical PIE grammar, or is it acceptable to give pre-forms that clearly were not PIE and indicate later formation? (Note that, from an Indo-Hittite viewpoint, this would also have to include quite a bit of morphology.) We do want to link later formations from PIE entries somehow, and the current approach seems like a workable compromise. Maybe we can add a disclaimer to WT:AINE about forms in derivatives-of-roots lists. --Tropylium (talk) 13:03, 26 July 2017 (UTC)
Normally, the actual reconstructable proto-forms are red/bluelinked and can have their own page, while the projections are unlinked. It becomes difficult when there is actually no possible PIE preform, like in the case of *bladą. We should mention in this case that it's not a PIE preform. —CodeCat 13:37, 26 July 2017 (UTC)

Wiktionary meetup 2, United StatesEdit

I'll be in Sandusky, Ohio a lot this summer. I can set up a meeting there with someone who would be able to go to northern Ohio. We'll meet for ice cream or lunch or something. I don't care what we do. I just want to meet a Wiktionarian!

(I could also go to Cleveland, Columbus, Toledo, or any other city within that general area.) Reply or post on my talk page and we'll exchange contact info or whatever if necessary, and figure out where it is in Ohio you want to meet. PseudoSkull (talk) 00:21, 20 July 2017 (UTC)

Pinging Ruakh.​—msh210 (talk) 00:25, 23 July 2017 (UTC)
Thanks for thinking of me, but I live in the Seattle area now. —RuakhTALK 04:46, 23 July 2017 (UTC)

WT meetup, SpainEdit

Anyone finds themselves near Barcelona this summer too. Send me a message. --Recónditos (talk) 15:36, 20 July 2017 (UTC)

Damn I wish I could meet you! PseudoSkull (talk) 15:40, 20 July 2017 (UTC)
Haha, I have always wondered whether Wonderfool's Spanish location was a lie or not. Also whether he actually married an heiress. I FORGET NOTHING. Equinox 00:33, 23 July 2017 (UTC)
An heiress? Lol, no. --Recónditos (talk) 12:24, 23 July 2017 (UTC)

Strategy discussion, cycle 3. Challenge 4Edit

Hi! The movement strategy discussion is still underway, and there are four challenges that you may discuss:

  1. How do our communities and content stay relevant in a changing world?
  2. How could we capture the sum of all knowledge when much of it cannot be verified in traditional ways?
  3. As Wikimedia looks toward 2030, how can we counteract the increasing levels of misinformation?
  4. and the newest one: How does Wikimedia continue to be as useful as possible to the world as the creation, presentation, and distribution of knowledge change?

The last, fifth challenge will be released on July, 25.

If you want to know what other communities think about the challenges, there's the latest weekly summary (July 10 to 16), and there's the previous one (July 1 to 9).

If you have any questions, you may ask here (please, remember to ping me). The FAQ might be helpful as well.

Bot request: create entries for Japanese verb and adjective formsEdit

Please someone use a bot to create the entries for all Japanese verb and adjective forms, if that's OK.

I've been trying to learn Japanese and I think maybe these entries would be helpful.

Unless people consider these entries unwanted for some reason. --Daniel Carrero (talk) 16:08, 20 July 2017 (UTC)

Not knowledgable in Japanese, but I believe all inflections of all words in all languages should be added to Wiktionary regardless. So, in all technicality, they should be welcome. PseudoSkull (talk) 19:45, 20 July 2017 (UTC)
I think that we have to decide what forms we want first. Also, there are conflicting views.
This article describes a set of conjugation rules widely used in order to teach Japanese as a foreign language. However, Japanese linguists have been proposing various grammatical theories for over a hundred years and there is still no consensus about the conjugations. Japanese people learn the more traditional "school grammar" in their schools, which explains the same grammatical phenomena in a different way with different terminology (see the corresponding Japanese article). (w:Japanese verb conjugation)
Because the Japanese language is written without space, different grammar systems tend to have different notions on what constitutes a word. The 学校文法 (gakkō bunpō, school grammar) system tends to cut sentences into smaller pieces to help understand the development of the language. It is used in Japanese schools and dictionaries, but is not designed for a foreign audience who have no experience with the language. A new grammar called 日本語教育文法 (nihongo kyōiku bunpō, Japanese-language education grammar) has been devised since 1960s. It simplifies the “school grammar” system a lot and is widely used in learning materials for non-native speakers. The difference is that the former provide “stems” used to form words and the latter provide prefabricated forms to be used in sentences. (Appendix:Japanese verbs)
In Japan, adjectives and verbs alike all have 未然形, 連用形, 終止形, 連体形, 仮定形, and 命令形 (see our current entries), while for foreigners, verbs have dictionary form, a-form, i-form, u-form, e-form, o-form, and te-form, while adjectives are something else.
(Actually, I've been thinking about this. See [1].) —suzukaze (tc) 01:54, 21 July 2017 (UTC)
(@Eirikr, Atitarev, Dine2016, TAKASUGI Shinji, Fumiko Take, Wyangsuzukaze (tc) 02:36, 21 July 2017 (UTC))
We should avoid adding bound forms because there is no consensus between traditional grammarians and modern linguistes. The following forms may have their own pages:
Negative 書かない 食べない 来ない しない
Volitional 書こう 食べよう 来よう しよう
Polite 書きます 食べます 来ます します
Past 書いた 食べた 来た した
-te 書いて 食べて 来て して
Condition 書けば 食べれば 来れば すれば
Imperative 書け 食べろ 来い しろ
TAKASUGI Shinji (talk) 03:41, 21 July 2017 (UTC)
@Shinji, some questions for you:
What do you mean by "bound forms"? By some interpretations, all of the above except the Imperative are "bound forms".
  • If you mean the causative and passive, I think omitting these does a disservice to that portion of our user base who might be beginner-level studiers of Japanese. The rules for the passive, for instance, and whether to add れる (-reru) to the 未然形 (mizenkei, irrealis or incomplete form) or られる (-rareru) instead, are easy enough once you know them. But for anyone who doesn't know the rules, this kind of information is typically best presented in conjugation tables. And someone running across a verb form like 食べさせられました (tabesaseraremashita, was made to eat something, polite causative-passive past tense) may well not know that the lemma form is 食べる (taberu, to eat).
I'm not necessarily advocating that we create entries for forms like 食べさせられました, but I do think we need to ensure that a user searching for "食べさせられました" can somehow find their way to the lemma entry at 食べる, and have access (via tables, or links to other pages here or on Wikipedia) to the information needed to make sense of the longer conjugated forms. Perhaps having full entry pages is the best way to do this. Perhaps instead we just need to include the conjugated forms somewhere within the lemma pages. Or perhaps there's an altogether different approach. My concern is ensuring that users are able to find what they need.
Also, do you have any opposition to the inclusion of other polite conjugations, such as the past form -ました (-mashita), or the volitional form -ましょう (-mashō)? Various materials targeting English-speaking learners include the polite forms not as a single row, but as a whole column, showing each of the conjugations. ‑‑ Eiríkr Útlendi │Tala við mig 18:17, 21 July 2017 (UTC)
I meant non-final forms such as mizenkei (ex. 書か-, 食べ-). I should have said stem or radical probably. — TAKASUGI Shinji (talk) 23:55, 21 July 2017 (UTC)
Why not improve on the search function instead? Forms of verbs and adjectives are never ending for agglutinative languages. If the search string contains kana and is Japanese-looking, compare it with a repertoire list of all existing Japanese lemmas and their autogenerated forms. Output terms which are most similar to the search string, with their definition, sorted by increasing Levenshtein distance between the search string and term. That would suggest 食べさせる (tabesaseru)―the causative form of 食べる (taberu, to eat)―as the closest match for 食べさせられました (tabesaseraremashita). I think a similar approach is used by the online Korean dictionary Daum. Wyang (talk) 22:27, 21 July 2017 (UTC)
I think it's a great idea to add Japanese inflected forms - verbs and adjectives. They already exist in CAT:Japanese verb forms and CAT:Japanese adjective forms. There's a lot of work though but it can be done with a bot. Care should be taken if a form coincides with another word, as in hiragana spelling of  () (koi) - こい (koi). I support the same for Korean verbs and adjectives and other languages. Recently Persian verb forms were added. The work on search function can be done in parallel. I also think that the forms in the inflection tables should be wikified (linked) as in the majority of inflection tables for other languages. --Anatoli T. (обсудить/вклад) 02:37, 22 July 2017 (UTC)
  • WF would quite like to do it. I remember once, about 10 years ago, WF wrote a bot to add inflected forms of Ancient Greek verbs. Knowing nothing about the language, and with only a smattering of botting experience behind him, he was promptly blocked. --Recónditos (talk) 12:22, 23 July 2017 (UTC)

I want to note that Weblio includes inflected forms in their searches. They provide it theirselves. —suzukaze (tc) 21:13, 31 July 2017 (UTC)

Sorting VietnameseEdit

I just noticed that we don't have automatic category sorting for Vietnamese, which has an extremely diacritic-rich writing system. Should we? How does it work? Are the tone diacritics ignored for sorting purposes, so that à ả ã á ạ are all sorted as a? What about the non-tone diacritics? Are ă/â ê ô/ơ ư sorted together with a e o u, or are they sorted separately? And what about đ? Is it equivalent to d for sorting purposes, or are they separate? —Aɴɢʀ (talk) 10:56, 21 July 2017 (UTC)

For pinging purposes: our currently active editors who claim some knowledge of Vietnamese are @Wyang, Atitarev, Fumiko Take, HappyMidnight, Monni95, MuDavid, Mxn, PhanAnh123. —Aɴɢʀ (talk) 11:01, 21 July 2017 (UTC)
Thanks for the ping but I can only confirm the order "a, á, à, ả, ã, ạ" provided by Stephen below. In case it's not obvious, a, ă and â are separate letters (in this order), also the correct order for similar letters is: d, đ; e, ê; o, ô, ơ; u, ư. Digraphs (gi, kh, ng, nh, ph, th, tr) are not separate letters. --Anatoli T. (обсудить/вклад) 04:14, 22 July 2017 (UTC)
This was discussed previously at User talk:Fumiko Take#Sort. Vietnamese dictionaries sometimes have different practices of sorting the diacritics and tones, but I think the method proposed in the linked discussion is a good one to use. That will require that Module:links and Module:languages/data2 provide customisation for sorting so that the sort key can be generated externally by a sorting function (Module:vi-sort of sorts). Wyang (talk) 11:06, 21 July 2017 (UTC)
Based on the thread you linked to, I think at the very least we should edit Module:languages/data2 to strip the tonal diacritics. I can do that right now if there are no objections. Categories already ignore capitalization for sorting purposes for all languages. Anything beyond that would go beyond my editing abilities, but at least I can take the first step. —Aɴɢʀ (talk) 11:42, 21 July 2017 (UTC)
@Wyang: I've modified Module:languages so that the sort_key value in a language's data table can be the name of a module that contains a sortkey-generating function. The function (currently) must be named makeSortKey and it is automatically supplied the arguments text, langCode, scCode, the same arguments that are supplied to transliteration modules. That should allow you to create a Vietnamese sortkey-generating module. — Eru·tuon 19:11, 21 July 2017 (UTC)
My attempt (without knowing of Erutuon's edits):
	sort_key = {
		from = {
			'à', 'ả', 'ã', 'á', 'ạ',
			'ằ', 'ẳ', 'ẵ', 'ắ', 'ặ',
			'ầ', 'ẩ', 'ẫ', 'ấ', 'ậ',
			'è', 'ẻ', 'ẽ', 'é', 'ẹ',
			'ề', 'ể', 'ễ', 'ế', 'ệ',
			'ì', 'ỉ', 'ĩ', 'í', 'ị',
			'ò', 'ỏ', 'õ', 'ó', 'ọ',
			'ồ', 'ổ', 'ỗ', 'ố', 'ộ',
			'ờ', 'ở', 'ỡ', 'ớ', 'ợ',
			'ù', 'ủ', 'ũ', 'ú', 'ụ',
			'ừ', 'ử', 'ữ', 'ứ', 'ự',
			'ỳ', 'ỷ', 'ỹ', 'ý', 'ỵ',
			'ă', 'â', 'ê', 'ô', 'ơ', 'ư',
			'([1-5])([^%s]+)', -- move tone number to end of syllable
			'([a-z₁₂₃]+)([^a-z₁₂₃1-5]+)', -- add tone 0 to syllables that are not followed by a number
			'([a-z₁₂₃]+)$', -- add tone 0 to syllables that are followed by the end of the string
		to   = {
			' ',
			'a1', 'a2', 'a3', 'a4', 'a5',
			'ă1', 'ă2', 'ă3', 'ă4', 'ă5',
			'â1', 'â2', 'â3', 'â4', 'â5',
			'e1', 'e2', 'e3', 'e4', 'e5',
			'ê1', 'ê2', 'ê3', 'ê4', 'ê5',
			'i1', 'i2', 'i3', 'i4', 'i5',
			'o1', 'o2', 'o3', 'o4', 'o5',
			'ô1', 'ô2', 'ô3', 'ô4', 'ô5',
			'ơ1', 'ơ2', 'ơ3', 'ơ4', 'ơ5',
			'u1', 'u2', 'u3', 'u4', 'u5',
			'ư1', 'ư2', 'ư3', 'ư4', 'ư5',
			'y1', 'y2', 'y3', 'y4', 'y5',
			'a₁', 'a₂', 'e₂', 'o₂', 'o₃', 'u₃',
			'%1' .. '0' .. '%2',
			'%1' .. '0',
It can transform the string Tuyên ngôn toàn thế giới về nhân quyền của Liên Hợp Quốc ; công bằng ; Đại ; Ác-si-mét into tuye₂n0 ngo₂n0 toan1 the₂4 gio₃i4 ve₂1 nha₂n0 quye₂n1 cua2 lie₂n0 ho₃p5 quo₂c4 ; co₂ng0 ba₁ng1 ; d₁ai5 ; ac4 si0 met4.
It's a shame that Lua error in Module:languages at line 348: data for mw.loadData contains unsupported data type 'function'; using a function as the third parameter for gsub may have made dealing with diacritics easier.
edit: forgot about 'y' —19:49, 21 July 2017 (UTC)
suzukaze (tc) 19:45, 21 July 2017 (UTC)
@Suzukaze-c, Wyang: I'll see if I can convert that long series of replacements into a function in Module:vi-sortkey, unless either of you is working on a function now. It might be more efficient to first decompose, then handle the diacritics. — Eru·tuon 20:22, 21 July 2017 (UTC)
I've added a subscript 0 for unmodified vowel letters (that is, a plain vowel letter with or without a tonal diacritic), to make sure that the modified ô and ơ sort directly after plain o. Otherwise, I wonder if modified vowel letters would sort in unacceptable positions. (Hypothetical example: ngôn ngo₂n0 should sort directly after ngon ngon0, but perhaps it would sort after ngoy because would sort after y. So I think ngon should have the sortkey ngo₀n0.) But I don't know how sortkeys work, when non-alphabetic characters are involved, and I could be wrong. Does anyone know if the subscript 0 is needed? — Eru·tuon 21:45, 21 July 2017 (UTC)
Great work on Module:vi-sortkey, thanks. I'm not working on this at the moment, so please feel free to make any changes. Not sure about the sorting algorithm in Lua either, but a good method of testing whether the entries are properly sorted would be to check whether the {{der3|lang=vi}} output using a large number of Vietnamese words is correct. Wyang (talk) 22:20, 21 July 2017 (UTC)
I think I might have been wrong about needing subscript 0, but I'm getting confused now. If someone could look at the documentation page of the module and figure out if the function is working, I would appreciate it. You can process a list of words using the showSorting function on the documentation page of the module. — Eru·tuon 22:58, 21 July 2017 (UTC)
Okay, yeah, my reasoning above was wrong. ngôn should sort after both ngon and ngoy, as ô is a different letter from o. So I think the sort order is fine now. But if someone could confirm, that would be great. — Eru·tuon 23:15, 21 July 2017 (UTC)
Not sure if this will be helpful or not. There is confusion in Western software for Vietnamese in regard to sort order. In Microsoft Word 2010, the order is given as a, à, , ã, á, . In Microsoft Excel 2010, it is: a, á, à, ã, , . These are incorrect. The MS Word 2010 order comes from the physical order of the Vietnamese tones on a Vietnamese keyboard. The order of the keys is not the sort order.
The alphabet, in correct order, is: a ă â b c d đ e ê g h i k l m n o ô ơ p q r s t u ư v x y (the twelve vowels being: a, ă, â, e, ê, i, o, ô, ơ, u, ư, y). The six tones are: a, á, à, ả, ã, ạ, in this order. Therefore, the vowel a, including its associated forms ă and â, take up eighteen places in the sort order:
a, á, à, ả, ã, ạ
ă, ắ, ằ, ẳ, ẵ, ặ
â, ấ, ầ, ẩ, ẫ, ậ
Altogether, the 12 vowels plus 6 tones take up 72 places in the sort order. —Stephen (Talk) 01:18, 22 July 2017 (UTC)
@Stephen G. Brown: I've added the order of tonal diacritics that you describe to Module:vi-sortkey. @Fumiko Take gave the "Microsoft word 2010" order in the talk page discussion linked above, saying that it was used by the Institute of Linguistics of Vietnam and Vietnam National University Publishing House. I can't verify either claim, but the order can be changed easily if necessary. — Eru·tuon 01:58, 22 July 2017 (UTC)
Interesting, I didn't know it was used for Word. Any way, I consulted those huge dictionaries published by those institutions, but there don't seem to be any online copies or previews, so I guess you'll just have to take my word for it. ばかFumikotalk 06:02, 22 July 2017 (UTC)
I'm not sure if there's such thing as a "correct" order. Normally, whenever I recite the tones, it's "ngang, sắc, huyền, hỏi, ngã, nặng", which is how I learned them at grade school. But the dictionaries seem to use either the Tang-poetry-inspired order, or that which parallels with the four tones of Middle Chinese (ngang and huyền - level; hỏi and ngã - rising; sắc and nặng - departing/checked). ばかFumikotalk 06:09, 22 July 2017 (UTC)
MS Word is a word processing program, which is what would be needed to compile, edit, and print big Vietnamese dictionaries. It's likely that the Institute of Linguistics of Vietnam and the Vietnam National University Publishing House used MS Word to produce those dictionaries. Twenty-five years ago, they would have had to sort all of the entries by hand, which is a huge job. Usually they had to write each entry on a card, which they then stored in long card-file boxes designed for the purpose. They moved the cards around by hand to achieve sorting, and then they would type the information from the cards. Today, they can use computerized sorting, which is accurate and almost instantaneous. Those institutions and publishers probably accepted the MS Word word order. To do otherwise would have been difficult and expensive. So what does this mean? Maybe Vietnam is accepting this new word order as an official one. You are our expert for Vietnamese, Fumiko, so the decision is yours to make. If the Institute of Linguistics prevails on MS to use a different sort order in the next edition of MS Word, then it will be easy for us to change our word order as well. So whatever you decide is okay with me. —Stephen (Talk) 12:02, 22 July 2017 (UTC)
I'm not aware of any respectable source that uses the "ngang, sắc, huyền, hỏi, ngã, nặng" order (most dictionaries I've seen that do are from inferior publishers who can't even decide whether to use "từ điển" or "tự điển", so it's safe to just disregard them altogether), so I guess you'll just have to go with the "ngang, huyền, hỏi, ngã, sắc, nặng" order. ばかFumikotalk 07:04, 23 July 2017 (UTC)

I've added the sortkey module to the data table for Vietnamese. It currently uses the order given by @Stephen G. Brown, rather than the order of the Institute of Linguistics, but it can be switched easily if @Fumiko Take wants to go with the other order. — Eru·tuon 04:56, 23 July 2017 (UTC)

DotsiesEdit Huh. —Justin (koavf)TCM 06:45, 22 July 2017 (UTC)

It totally misses on the fact that the human brain is good at recognising shapes. —CodeCat 10:03, 22 July 2017 (UTC)
Sort of fun. Did the creator lose interest? No tweets since 2013. Equinox 10:08, 22 July 2017 (UTC)
 --Daniel Carrero (talk) 10:30, 22 July 2017 (UTC)

"Proverb" ain't a part of speechEdit

Just a thought: remember how we started to get rid of the Initialism and Abbreviation headers because they aren't actually parts of speech (e.g. BBC functions as a proper noun)? - though we still have lots of relics like TLA. Shouldn't we also get rid of Idiom and Proverb on the same grounds? Obviously it's good to know when something is a proverb (and we could use the normal categories for this, maybe a {{lb|en|proverb}}), but Proverb definitely isn't a PoS. And I never really knew what Idiom was good for. We would still have Phrase as a wastebasket taxon for anything that doesn't fit into another PoS. I'm not too bothered either way, but it feels consistent and logical, especially if we're moving towards some semantic (WikiData?) model where a PoS header needs to represent an actual PoS. What is your opinion? Equinox 08:59, 23 July 2017 (UTC)

Can we not call them Sentence? —CodeCat 10:44, 23 July 2017 (UTC)
They are not guaranteed to be sentences though the entry is likely to be the core of a sentence. DCDuring (talk) 11:55, 23 July 2017 (UTC)
I think that "Proverb" is a more-or-less perfect name to describe a proverb. --Recónditos (talk) 12:18, 23 July 2017 (UTC)
Yeeeees, the question is more whether we should put it in a gloss. "Football" is a good gloss for a lot of your sports journalism trash but you don't put that between the double equals signs. Equinox 12:31, 23 July 2017 (UTC)
I don't see any advantages in putting Phrase instead of Proverb, are there any proverbs which aren't phrases? Crom daba (talk) 13:50, 23 July 2017 (UTC)
The common core of a proverbial expression that takes many forms could be a non-constituent, ie, not a phrase. I don't remember whether we have made entries of that kind. DCDuring (talk) 16:20, 23 July 2017 (UTC)
  • I'm fine with "Proverb", although I'd support getting rid of "Idiom" (which is usually supplanted by {{lb|xx|idiomatic}}). —Μετάknowledgediscuss/deeds 16:22, 23 July 2017 (UTC)
Support highly. Please remember that a section heading for POSes are for POSes and not for anything else. Should we also consider "formality" a POS, for instance? PseudoSkull (talk) 02:39, 24 July 2017 (UTC)
The argument in favour of keeping proverb as a part of speech is that they are typically used in a more isolated fashion than other phrases. They can always(?) stand alone, whereas other phrases are woven into a sentence no differently than any other word. I think "Phrase" should go, however, and I can't recall ever seeing "Idiom", but that seems out of place to me. "Phrase" and "Idiom" don't inform the reader how a term/expression is used. "Proverb" on the other hand, does. Andrew Sheedy (talk) 04:33, 24 July 2017 (UTC)
  • A note: Idiom is already explicitly forbidden by WT:ELE, so we don’t need any changes to start getting rid of it. — Vorziblix (talk · contribs) 09:09, 24 July 2017 (UTC)
Proverb, letter, suffix, prefix, symbol, definitions... the easiest solution is not to get rid of “proverb” as a part of speech, but to stop calling our definition-section headings “part of speech” in the first place. — Ungoliant (falai) 15:31, 24 July 2017 (UTC)
I'd support saying "Definitions section" instead of "POS section" in the future. EL could be edited to arrange that if people want. Some entries (Chinese I believe) even have "Definitions" as a POS header. --Daniel Carrero (talk) 15:34, 24 July 2017 (UTC)
I think that would be a good idea. Chinese does use a definitions section sometimes because (I think) many words have ambiguous functions, e.g. with many nouns easily being used as adjectives or adverbs. —Aryaman (मुझसे बात करो) 19:20, 26 July 2017 (UTC)

I would say that a proverb is a kind of set phrase, as hypernym. At least so I would say in Spanish, French and Catalan terminology with its equivalents frase hecha, phrase faite, frase feta. --Vriullop (talk) 07:57, 27 July 2017 (UTC)

Japanese pitch accent requests by Special:Contributions/

Is it fair to bulk-request Japanese pitch accents, as in インボイス (inboisu)? They are not so readily available in dictionaries. Hardly present in online dictionaries and occasionally available in printed dictionaries and paid apps. --Anatoli T. (обсудить/вклад) 09:38, 23 July 2017 (UTC)

Yes, we should definitely include them. If they are not readily available, that's even more reason for us to provide them. —CodeCat 10:43, 23 July 2017 (UTC)
I've added the ones I could find in Daijirin. Wyang (talk) 11:01, 23 July 2017 (UTC)
@Wyang Thanks. For me, accessing Daijirin has become cumbersome. BTW, shouldn't [ìńbóꜜìsù] show "m" for consistency? --Anatoli T. (обсудить/вклад) 11:42, 23 July 2017 (UTC)
That thing (whatever it is called) is based on the romanisation (inboisu). Wyang (talk) 11:44, 23 July 2017 (UTC)
I've been using Weblio辞書 for Daijirin accents, but unfortunately it doesn't show which vowels are devoiced. --Dine2016 (talk) 16:01, 23 July 2017 (UTC)
@Dine2016 Thanks. I forgot about this resource. It's the only one online, I think. I purchased Daijirin for 17 AUS$ but my android is now malfunctioning and I have switched to an iPhone. Unfortunately, there is no licence transfer and I am not sure if I use this current phone for long. It's a problem with purchased apps. --Anatoli T. (обсудить/вклад) 07:19, 24 July 2017 (UTC)
Any Australian IP that does mass "theme" edits with difficult languages like that is probably an Awesomemeos sock, though I can't be certain enough to start playing whack-a-mole with them. It looks like they've figured out how to keep from always geolocating to the same place (though in this case they probably actually are in Sydney), but their approach to editing is pretty distinctive. Chuck Entz (talk) 03:30, 27 July 2017 (UTC)

Request for adminshipEdit

The main motivation is to be able to edit javascript pages (i.e. gadgets, MediaWiki:common.js, other's javascript pages etc.). Unfortunately, template editor does not allow me to to edit js pages.

What I am going to do:

  • General cleanup of javascript infrastructure.
    • Mainly that includes moving stuff from one place to another.
  • Extract gadgets from MediaWiki:Gadget-legacy.js and also make disabling legacy gadgets in preferences not result in a catastrophe.
  • Modernize gadgets (that is, use jQuery, clean up code, drop deprecated code, etc.)
  • rewrite LangMetadata (currently defined in MediaWiki:Gadget-TranslationAdder.js) to use modules rather than hardcoded data.
  • elimiate the use of langrev subtemplates in MediaWiki:Gadget-TranslationAdder.js and possibly add a better autocomplete.
  • Eliminate JsMwApi in favor of mediawiki's own Api.

Let's make Wiktionary great again!

Any objections?--Dixtosa (talk) 12:01, 23 July 2017 (UTC)

We definitely need someone who is willing and able to tackle these issues. There are a few more open issues with translation tables as well:
  • The conversion of the translation adder code to not rely on a fixed table structure, but instead use the translations-cell CSS class which was added to {{trans-top}} some time ago.
  • Deprecation and removal of {{trans-mid}} in favour of CSS-based balancing, which also includes the removal of all balancing-related features from the translation adder. This relies on the previous step.
  • Migrating translation tables to use vsSwitcher, which doesn't need a surrounding div.
  • Redoing the "favourite languages" feature of translation tables, so that favourite languages are shown as a reduced translation table in the table's collapsed state, rather than in the header of the table. This relies on the previous change, since the older NavFrame system does not allow for content to be shown in the collapsed state, whereas vsSwitcher does.
CodeCat 15:53, 23 July 2017 (UTC)
Wiktionary:Votes/sy-2017-07/User:Dixtosa for admin DTLHS (talk) 15:58, 23 July 2017 (UTC)
I'd be particularly in favor of any JUS improvements that resolved the intermittent, chronic problem with loss of the show/hide controls and sometimes other functionality implemented in JS. DCDuring (talk) 16:24, 23 July 2017 (UTC)
Sounds great. I can get on board with "MWGA". I second @DCDuring's comment. — Eru·tuon 21:31, 23 July 2017 (UTC)
I am not sure exactly how bot work relates to adminship. But Dixtosa is a name that I trust. So sure. Equinox 22:03, 23 July 2017 (UTC)

Wiktionary:Votes/2017-07/Rename categoriesEdit

Based on the discussions linked in the vote, I created Wiktionary:Votes/2017-07/Rename categories. This is a large project, so this vote will start in two weeks and then it will end in two months. --Daniel Carrero (talk) 13:35, 24 July 2017 (UTC)

Limiting user vote creationsEdit

Is there a limit to how many votes of user can create in a given time? If not, I think there should be. --Victar (talk) 19:17, 25 July 2017 (UTC)

What limit you would like, exactly? --Daniel Carrero (talk) 19:27, 25 July 2017 (UTC)
I think we just need to have every vote approved by at least five editors or so (in the BP or maybe elsewhere, depending on the topic) before it can be created. --WikiTiki89 20:25, 25 July 2017 (UTC)
How would brigading be dealt with? -- 20:41, 25 July 2017 (UTC)
What exactly do you mean by brigading? --WikiTiki89 20:48, 25 July 2017 (UTC)
How to ensure a neutral assessment of the eligibility of a vote? Should votes be made about votes? -- 23:15, 25 July 2017 (UTC)
To clarify, if five editors (or however many we decide) want to have a vote and a hundred editors don't, we would still have the vote, because those five editors approved it. --WikiTiki89 14:51, 26 July 2017 (UTC)
I agree, I think each vote should get pre-approval. I think users should also only be able to create max 2-3 votes per month. --Victar (talk) 21:13, 25 July 2017 (UTC)
I think if each vote is pre-approved, it we won't need any rate limit. --WikiTiki89 23:12, 25 July 2017 (UTC)
You say that, until someone puts in 10 vote proposals. --Victar (talk) 23:17, 25 July 2017 (UTC)
If five other editors approve each one, what's the issue? Anyway, I don't think we need formal vote proposals. If everything is done right, the issue should already have an ongoing discussion before it is decided that there needs to be a vote. --WikiTiki89 14:51, 26 July 2017 (UTC)
How would the pre-approval process work? Wiktionary:Votes/pl-2015-09/Coauthoring policy votes suggests this: "The proposed requirement that all policy votes have at least one coauthor, that is, a distinct individual who at the very least makes one edit to the descriptive section of the voting page before it starts, even if just to list themselves as a contributor." That vote was created in 2015 and never started. As of today, the vote does not meet its own requirements to start: it doesn't have two contributors yet. --Daniel Carrero (talk) 12:08, 26 July 2017 (UTC)
By five (or however many we decide) editors mentioning in the discussion of the issue that there should be a vote. --WikiTiki89 14:51, 26 July 2017 (UTC)

At the moment nine different votes are running, created by five different users. Do we need a votes "watchdog"? I think there should be a limit on how long a vote runs for, some run for two months. DonnanZ (talk) 06:18, 26 July 2017 (UTC)

I created this two-month vote: Wiktionary:Votes/2017-07/Rename categories. It has not started yet. I think it was a good idea because it's a large proposal. It gives more time for people to read, think and discuss about it. @Dan Polansky sometimes creates two-month votes too. In my opinion this is not an issue, but I can change the "Rename categories" vote to one month if people prefer. --Daniel Carrero (talk) 09:11, 26 July 2017 (UTC)

Too much time is wasted on these trivial votes. Wyang (talk) 09:18, 26 July 2017 (UTC)

@Wyang What votes would you say are trivial? --Daniel Carrero (talk) 09:36, 26 July 2017 (UTC)
Sorry, I mistyped the ping, here it is again: @Wyang. --Daniel Carrero (talk) 09:37, 26 July 2017 (UTC)
Most of the votes running now. The whole idea of creating a vote after every discussion is just wrong. It is continuing to encourage uninformed self-assurance, over critical analysis of the issues. There have been many examples of counterproductive decisions made in the past as a consequence of relying on collective ignorance; superficially having a decision made by such majoritarian democracy looks good, but it could be really damaging in the long run. An example is the decade-long merge-split-merge vacillation of Chinese. Making it worse is the verbosity of many of the votes, such as Wiktionary:Votes/pl-2017-07/Gallery and Wiktionary:Votes/2017-07/Rename categories. I certainly would not want to read 14,835 bytes for a vote, and should not have been given the chance to in the first place. So much work on entries and developing new gadgets and functionalities could have been finished if the time reading the votes is diverted. Wyang (talk) 10:02, 26 July 2017 (UTC)
You mentioned collective ignorance concerning the merge-split-merge of Chinese, so what about having a rule like this: "Only people knowledgeable in [language] (as evidenced by the number Y of edits in [language]) are permitted to vote in issues concerning [language]." What do you think about that?
Apart from the Chinese issue, how are the "Rename categories" and "Gallery" votes "trivial"? They are votes for major changes. I don't claim them to be perfect, they could have problems to solve, but they can't be "trivial". Are there any better ways to try to implement these projects without votes? I've been trying to work under this limit: 1 vote per week. Some people seem to prefer it that way, although the idea of having that formal rule itself failed that vote.
Votes are often much smaller and easier to read than discussions. True, Wiktionary:Votes/2017-07/Rename categories is 14,835 bytes -- but it was based off Wiktionary:Beer parlour/2017/June#Proposal: Clean up, rename and replace "en:" → "English" in all categories which is 31,924 bytes and could still grow. It's OK if you don't want to read it, but please don't vote oppose on "TL;DR" grounds (although you still have that right). It's great to be able to discuss things on the BP, but for major proposals, votes have this advantage: it should be easier to judge the merits of a specific proposal in the vote (often detailing how exactly a policy would be edited) rather doing something out of a discussion with multiple proposals and where people not always give clear support, oppose, etc. for each idea, and can change their minds in the middle of the discussion. --Daniel Carrero (talk) 10:31, 26 July 2017 (UTC)
Maybe, just maybe, the matters aren't worth "resolution". When this very point is mentioned in the discussion, it is often ignored by the vote advocate. TL;DR is usually a somewhat polite way of saying: "Not worth my time". DCDuring (talk) 11:27, 26 July 2017 (UTC)
Do you have any examples of votes that aren't worth "resolution", and/or votes where that point was mentioned in the discussion and was ignored by the vote advocate? --Daniel Carrero (talk) 11:42, 26 July 2017 (UTC)

For what it's worth, in Wiktionary:Votes/pl-2016-11/Voting limits the "Proposal 2" failed. It was about implementing this regulation to limit vote creation: "The same person cannot create more than one vote in the span of 7 days. (For example, if someone creates a vote on December 9, then they must wait until at least December 16 before creating another one.)" --Daniel Carrero (talk) 09:27, 26 July 2017 (UTC)

We may need another vote on that.
It seems to me that votes are poor substitutes for longer-running consensus decisions. They seem to involve forcing resolution of disagreements for the sake of doing so or for the sake of enabling some kind of often premature standardization. With the passage of time some of these matters resolve themselves, others can be resolved more easily as contributors gain more knowledge. Votes force a discussion to take place whether or not participants have had actual experience with the "problem" being addressed. The proposals themselves are often quite amateurish, making the discussion mostly a matter of correcting gross errors and little time to discuss a mature proposal. Most discussion should take place before the vote is initiated. If no one cares enough to participate in the discussion perhaps the matter isn't of sufficient importance or is a "solution" to a non-problem. DCDuring (talk) 11:19, 26 July 2017 (UTC)
If I'm not mistaken, you seem to be talking about the votes concerning "External links" and "Further reading". It could be other votes too. I do think the "commons" links doesn't fit the "Further reading" and it's still a problem. The use of "Further reading" was an improvement otherwise, in my opinion. Let me know if you are talking about other votes. --Daniel Carrero (talk) 11:42, 26 July 2017 (UTC)
You know, you could create votes with vote-limiting proposals. --Daniel Carrero (talk) 11:46, 26 July 2017 (UTC)
Even though you said "We may need another vote on that.", I don't think there's anyone creating more than 1 vote every 7 days so the vote-limiting rule that failed the vote is being de facto followed even if it's not a formal rule. Of course, you could be thinking about different vote-creation limitations that we could discuss. --Daniel Carrero (talk) 11:59, 26 July 2017 (UTC)
Nobody would bother with this if there weren't a basic consensus that we now had too many votes. The simplest way to avoid a vote on this would be to recognize that consensus. DCDuring (talk) 12:51, 26 July 2017 (UTC)
True, we seem to have consensus on that. What happens now? --Daniel Carrero (talk) 13:02, 26 July 2017 (UTC)
We have a vote on whether or not we have too many votes. —Aɴɢʀ (talk) 14:53, 26 July 2017 (UTC)
Naturally not now, but at some point later I could create a vote on whether or not we have too many votes.
Suppose we want to create a vote to implement @Wikitiki89's proposal: "have every vote approved by at least five editors or so". Do we need approval from five editors or so to create that vote itself? --Daniel Carrero (talk) 15:15, 26 July 2017 (UTC)
Even if we don't need it, would it hurt to wait until we have it? --WikiTiki89 15:19, 26 July 2017 (UTC)
All we need is to enforce existing rules. We already require prior discussion before a vote. I certainly oppose "votes out of the blue" the way Dan has often created them. —CodeCat 15:24, 26 July 2017 (UTC)
@Wikitiki89 Of course not. We could also ask: "Are there five people willing to approve the idea of creating a vote for the proposal of requiring all future votes to be approved by five people first?"
@CodeCat By "Dan", are you referring to me? --Daniel Carrero (talk) 15:27, 26 July 2017 (UTC)
No, the actual Dan. —CodeCat 15:28, 26 July 2017 (UTC)
Sorry, my mistake. In the future, I'd like to create a vote with the proposal: "require prior discussion before a vote". This would serve as a confirmation vote. I dispute the notion that we do have this rule, but if it passes, this will become a formal written rule. --Daniel Carrero (talk) 15:32, 26 July 2017 (UTC)
Are you actually asking, or are you pointing out that we could ask? I am in favor of this rule, however I think it's too early to create a vote. Let's discuss it more. --WikiTiki89 15:40, 26 July 2017 (UTC)
I'm just pointing out that we could ask. I agree that it's too early to create a vote. I agree with this too: let's discuss it more. --Daniel Carrero (talk) 15:46, 26 July 2017 (UTC)
What should happen now is that, say, a vote or two is removed from the list and proposers show some basic self-restraint, so we don't waste time making a rule that shouldn't be required. DCDuring (talk) 18:32, 26 July 2017 (UTC)
@DCDuring: One vote per week, per person, at most looks good to you? What are the one or two votes that you would like to remove from the list? --Daniel Carrero (talk) 18:37, 26 July 2017 (UTC)
This isn't the first time someone has objected to the constant stream of votes. The solution isn't to pin you to a maximum, it's for you to have some responsibility and create fewer needless votes. (And please don't now drag this into yet another 20 paragraphs of criminal-lawyer-Daniel asking "prove to me which ones are needless". Everyone knows.) Equinox 18:40, 26 July 2017 (UTC)
Which of the current votes are needless? This is a reasonable question, it took fewer than 20 paragraphs to ask. --Daniel Carrero (talk) 18:46, 26 July 2017 (UTC)
You are making AGF very difficult for me. As to my preferences, I'd prefer that several of the proposals that you have proposed and continue to favor, that seem like they will or might win, but which I oppose, be withdrawn. I am not sure whether I would also like it if you simply noticed that you are losing credibility with every argumentative response and acted to preserve whatever credibility remains and even restore it or that you continued on your current path, which might lead to none of your proposals passing and a change of climate on this page. DCDuring (talk) 19:19, 26 July 2017 (UTC)
Geez, I was just asking. I know I write argumentative responses sometimes, I don't think that's necessarily a bad thing. But don't you think you write argumentative responses too? When I see your name in the recent changes, responding to a discussion or vote where I participate, I always think before I read your words "here we go, it's time to read some more criticism against what I did again".
Most of the votes I created have passed, some don't and I try to learn from them. It's true that if I lost credibility and none of my proposals passed, this would be a strong incentive for me to stop or avoid creating votes.
Of all the current votes, you voted in 7. You supported the vote for Dixtosa to become an admin. You voted "oppose" in all the other 6 votes pages, half of which were created by me (one per request, which I also opposed eventually). I don't think we can just withdraw the votes that I created and you voted oppose. You mentioned "that you have proposed", so do you have anything against me personally? What would it take for you to support a vote? --Daniel Carrero (talk) 19:56, 26 July 2017 (UTC)

Elu PrakritEdit

This needs a code, preferably inc-elu. Alternative names are "Helu Prakrit", "Helu", and "Elu", and maybe "Old Sinhalese". Descendants include si. It is an Indo-Aryan language (inc). —Aryaman (मुझसे बात करो) 23:08, 25 July 2017 (UTC)

@Aryamanarora elu-prk exists. Madhav P. (talk) 00:30, 26 July 2017 (UTC)
@माधवपंडित: Oh, thanks! —Aryaman (मुझसे बात करो) 01:47, 26 July 2017 (UTC)
To make sure a language doesn't already exist, you can use the search box in Module:languages. — Eru·tuon 02:51, 26 July 2017 (UTC)
@Erutuon On second thougts I don't think the issue ends here. You cannot use the {{inh|si|elu-prk}} tag even though Sinhalese is its descendant. Also the hyperlink Helu links to the wiki article of some Chinese king. Helu doesn't even have its own catagory page. Madhav P. (talk) 07:57, 26 July 2017 (UTC)
@माधवपंडित: Aha... Helu is in Module:etymology languages, so it does not have a dedicated category page (except it could have the category Terms derived from Helu). It is currently considered a subvariety of Sanskrit. I can change that if it is wrong. I fixed the Wikipedia link. Is Helu the ancestor of any language besides Sinhalese? — Eru·tuon 08:17, 26 July 2017 (UTC)
Okay, Helu, from the Wikipedia article, looks distinct enough that it can't be considered a variety of Sanskrit. I promoted it to a full-fledged language and added it as the ancestor of Sinhalese. — Eru·tuon 08:26, 26 July 2017 (UTC)
@Erutuon: Thanks a lot! I think only Sinhalese descends from Helu. Helu is Middle Indo-Aryan while Sanskrit is Old Indo-Aryan. Madhav P. (talk) 08:30, 26 July 2017 (UTC)
@माधवपंडित: You're welcome. Hm, I need some more items for the data file: scripts and ancestor (if there is a nearer ancestor than Proto-Indo-Aryan). — Eru·tuon 08:33, 26 July 2017 (UTC)
@Erutuon: An immediate ancestor would be one of the closely related Old Indo-Aryan dialects very close to Sanskrit but of course it'd be undocumented. Can't say about the script... @Aryamanarora what do you think? Madhav P. (talk) 12:41, 26 July 2017 (UTC)
@माधवपंडित: It's most likely Brah, Brahmi script. Is dv Dhivehi a descendant? Wiki says it is a descendant of Maharastri Prakrit but then goes on to say sometimes it's considered a dialect of Sinhalese. —Aryaman (मुझसे बात करो) 13:42, 26 July 2017 (UTC)
@Aryamanarora: Wiki places Helu in association to, if not under Maharastri. I think these two prakrits are more closely related to each other than they are to other prakrits. Madhav P. (talk) 13:45, 26 July 2017 (UTC)
@माधवपंडित: They do seems to be, both of them drop almost all medial consonants, but imo Elu has a completely different phonetic system for the "standard" Maharastri Prakrit. But perhaps the vernacular Maharastri sounded more like Elu than we know. —Aryaman (मुझसे बात करो) 13:58, 26 July 2017 (UTC)
Wikipedia says that Dhivehi descends from Maharashtri Prakrit or Helu in different places on the page, but there are no sources for either claim. (I've added Brahmi script to Helu.) — Eru·tuon 17:48, 26 July 2017 (UTC)
elu-prk isn't a properly formatted code, it should be renamed. —CodeCat 15:25, 26 July 2017 (UTC)
@CodeCat: Would you have an alternative? It would be easy to change the code now, as it's hardly used. — Eru·tuon 17:48, 26 July 2017 (UTC)
Aryaman's original proposal. —CodeCat 17:55, 26 July 2017 (UTC)
Any objections from others? — Eru·tuon 18:00, 26 July 2017 (UTC)

Biblical Hebrew hapax legomenaEdit

The ongoing discussion about making Latin a WDL has made me wonder whether we allow Biblical Hebrew hapax legomena (and dis legomena), considering that:

  • CFI no longer considers usage in a well-known work to be sufficient,
  • we treat Biblical Hebrew and Modern Hebrew as the same language,
  • we consider Hebrew a WDL.

In principle, those three facts mean that we would exclude Biblical hapaxes and disses, except for those (like גבינה, זכוכית, and לילית) that have gone on to become regular words of modern Hebrew. How do we want to handle this situation? Shall we:

  1. ban Hebrew words used only once or twice in the entire Hebrew corpus;
  2. divide Hebrew into Modern Hebrew (he, presumably including Medieval Hebrew) and Biblical Hebrew (hbo, presumably including Mishnaic Hebrew), making the former a WDL and the latter an extinct language;
  3. consider all of Hebrew an LDL;
  4. ignore the issue and decide on hapaxes on a case-by-case basis?

Solution 2 is what we've done for Greek, which is divided into grc and el, and solution 4 is apparently what we've mostly done for Latin and what we're currently arguing over. For that reason I'd prefer NOT to apply solution 4 to Hebrew. My preferred solution is 2, but others may disagree. (Personally I think 2 is actually the only logical solution to the Latin Question as well, but this thread isn't for talking about Latin.) —Aɴɢʀ (talk) 12:45, 26 July 2017 (UTC)

This goes back to our old repealed policy of allowing a word used once in a well-known work. The reason we repealed it, is that if nobody ever used or talked about that word again, then we probably don't need to be included. Thus there are no real hapax legomena in Biblical Hebrew when you include non-Biblical Hebrew, because each of them has been discussed and used later, specifically because of its unusualness in the Bible. --WikiTiki89 14:54, 26 July 2017 (UTC)
Indeed, I suspect you are manufacturing a problem. To follow up on Wikitiki's point, can anyone find even a single Biblical Hebrew entry that would fail RFV under our current rules? —Μετάknowledgediscuss/deeds 15:12, 26 July 2017 (UTC)
There do seem to be true Biblical Hebrew hapaxes [2], but we don't have entries for them yet, either because our coverage of Hebrew skews heavily toward Modern Hebrew, or because people know they wouldn't pass RFV. The words in question may be discussed (i.e. mentioned) later, but are they used later? I know some of them are (I mentioned some above), but all of them? —Aɴɢʀ (talk) 15:43, 26 July 2017 (UTC)
I think you misunderstood me. If you consider the corpus of Biblical Hebrew alone, then of course there are true hapax legomena. But when you consider Hebrew as a whole, including later Hebrew, most, if not all, of these Biblical hapax legomena will be discussed and used again later. --WikiTiki89 15:48, 26 July 2017 (UTC)
No, I understood. My question is, are all of them used (not merely discussed) again? What about the two entries other than פלדה (which is a modern Hebrew word too) in Category:Hebrew hapax legomena? Are they used (not mentioned) at least three times across all stages of Hebrew? —Aɴɢʀ (talk) 15:57, 26 July 2017 (UTC)
Out of those three words, זדה is not actually Biblical Hebrew, but from the Siloam inscription, so it is a different situation that we might need to discuss. The other two are used at least in Modern Hebrew. --WikiTiki89 16:05, 26 July 2017 (UTC)
Then take my "Biblical Hebrew" to mean "all Hebrew from before the 4th century CE" or whatever cutoff point is customary for the line between Mishnaic and Medieval Hebrew. Maybe we can call it "Classical Hebrew". The point remains: if Hebrew is all one language, and that one languages is a WDL, and זדה is not used (as opposed to mentioned) at least three times by three different authors, then our current rules do not allow its inclusion. —Aɴɢʀ (talk) 16:16, 26 July 2017 (UTC)
Well its silly to put Biblical and Mishnaic Hebrew together on one side and Medieval and Modern Hebrew on the other side. Mishnaic Hebrew is a lot more similar to Medieval Hebrew than to Biblical Hebrew. If anything, the line would be drawn between Biblical and Mishnaic. But regardless, if you mean to talk about examples like זדה, then let's talk about those. The contradiction is between these two points: (a) In the context of Hebrew as a whole, it is not likely that someone would encounter this word and want to know what it means, and so does not need to be included. (b) If "Epigraphic Hebrew" were to be considered its own language, then this word would be included, as similar words are in ancient languages with even smaller corpi, so it doesn't make sense to exclude it just because it happens to be part of a larger language. I think we need to resolve this contradiction more generally, rather than specifically for Hebrew, as it applies to many other languages, notably the recently-much-discussed Latin issue (although that details of that case are a bit different). --WikiTiki89 18:11, 26 July 2017 (UTC)
The reason I brought up Hebrew specifically is that is the only other language I can think of besides Latin where we consider the ancient form and the modern form to be one and the same language. Other cases where the ancient form and the modern form of a language are similar enough that it's conceivable to consider them a single language (Greek, Armenian, Icelandic/Norse) have two codes, one for the ancient form and one for the modern. Although on reflection, I guess we have just one code for all stages of Arabic and Chinese as well. At any rate, what this comes down to is the absurd situation we're currently in where a large number of users are saying "Post-1500 Latin is to be treated like either a WDL or a conlang; pre-1500 Latin is to be treated as an extinct language; but they're both the same language", and I wanted to see how we handle parallel situations. It does look like זדה currently does not meet CFI, but I bet if someone were to nominate it for deletion on those grounds, most people would vote to keep it, because generally we do keep words found only in inscriptions of ancient languages. —Aɴɢʀ (talk) 18:28, 26 July 2017 (UTC)
I don't know why you're only considering "ancient" and "modern". English is also a good example: Early Modern English had a lot of forms that we don't include, that we probably would include if it had been its own language. And there are many other languages with this sort of situation. --WikiTiki89 18:43, 26 July 2017 (UTC)

Mansi varietiesEdit

We have been getting a decent influx of Mansi lemmas recently, thanks to @Martinus Poeta Juvenis. This might be a good point to consider if we should treat Mansi as one language or as several.

The Mansi varieties are very different from each other: there are almost no cases where a standard Northern Mansi word has the same shape as its the Southern Mansi cognate, and sometimes they are very different indeed (e.g. 'gristle' is Southern /nʲeːrkɤː/, Northern /ńaːriɣ/). In many cases, reconstructions of Proto-Mansi are also available in literature (in this case *ńī̮rɣɜ or *ńē̮rɣɜ). The only written variety is Northern, and its spelling system mostly cannot be extended for other varieties (e.g. there are no signs for /ɤː/, /æ/ or /ɒ/). Inflection differs too: compared to Northern, Southern Mansi has no dual, but has the accusative and comitative cases. A few scholars by now consider "Mansi" to be a language family with up to four individual languages (Northern, Southern, Western, Eastern).

I would suggest:

  • reserve the code mns for Northern Mansi, which is the only living variety;
  • create new codes at least for Proto-Mansi (mns-pro?), Southern Mansi (ugr-sms?) and Central Mansi (ugr-cms?).

I'm not sure if separate Western and Eastern codes are needed at this point: they're a dialect continuum, and we may need a more general Wiktionary discussion at some point about what we want to do with linguistic field data covering dozens of closely related unwritten varieties. Treating everything as a separate language seems ineffective.

pinging also: @Panda10, @Neitrāls vārds, @Mulder1982 and just in case, @Alcenter. --Tropylium (talk) 13:34, 26 July 2017 (UTC)

Are there any attempts at latinisation for those non-literate Mansi varieties? For example I use transcription schemes given in "The Mongolic languages" for normalizing various phonetic spellings of East Yugur, Baonan, Daur, Mogholi and Khamnigan. I've also contemplated making an ad-hoc one for Sary-Yugur, but maybe it would be going too far. Crom daba (talk) 17:33, 26 July 2017 (UTC)
Most dialects have reasonably standardized linguistic transcription schemes, but they're per individual dialect, not dialect group. E.g. the verb 'to stay': Southern koľt-, Eastern: Lower Konda χoľt-, Middle Konda kʷoľt-, Upper Konda kʷuľt-, Western: kuľt-, Northern: χuľt- (= literary хульт-); or the noun 'mold': Southern ka͔šək, Eastern: Lower Konda xāšγə, Middle Konda kē̮səγ, Western: Pelymka kašša, Vagilsk kē̮šša, Northern: xāssi (= literary ха̄сси). It would seem like overkill to add separate entries for all variants. --Tropylium (talk) 19:05, 26 July 2017 (UTC)

If it provides practical benefit, addition of Proto-Mansi should be completely uncontroversial, you can probably ask -sche.

My personal opinion: languages that do not have a literary standard do not count as "languages". Collecting their data under a single entry (which is for the variety with an established literary standard) also makes for easier navigability as opposed to when they are scattered across different entries.

IMO, Cyrillization from a scholarly transcription like UPA should be reserved as a last resort measure, like, for example, I was able to provide attestation to these genitalia terms in Erzya (дёрть (djortʹ), etc.) that this anon added only through cyrillizing Paasonen's UPA entries because the Russian dictionaries are scrubbed of anything sex-related. In my defense there are definitionless auto-cyrillizations in Oahpa of most of Paasonen's UPA entries (courtesy of User:Rueter I presume) so I'm still in the clear. An example of such a last resort situation. But this shouldn't be the norm, imo.

Tangentially related, perhaps you, Tropylium, would be willing to make an UPA-IPA conversion module? More specifically: just write down the UPA-IPA correspondences. I started making Module:mdf-IPA for quick conversion of Paasonen's UPA into IPA (I just copied CodeCat's et-IPA module.) I think this could be handy.

Perhaps the transcription-only varieties can be listed under Pronunciation? In UPA, or converted to IPA. Japanese, for example, mixes at least 2 transcriptions (pitch + IPA), e.g., 計画. Neitrāls vārds (talk) 18:22, 3 August 2017 (UTC)

References section only for <references/>?Edit

I was under the impression that under recent policy changes, the "References" section should only be used for <references/>, i.e. to show inline references that are present elsewhere in the entry. However, User:Gamren has pointed out that our policy doesn't actually say so. So what is going on? —CodeCat 10:45, 27 July 2017 (UTC)

But under the prevailing regime, we have no policies that haven't been voted on. In each case, what has been voted on is the wording of a specific proposal. DCDuring (talk) 12:49, 27 July 2017 (UTC)
We allow "References" sections with simple bullet points instead of <references/>, as per Wiktionary:Votes/2016-12/"References" and "External sources". The vote did propose to require always using <references/> in "References" sections, but @This, that and the other and @Tropylium opposed the idea of introducing that specific limitation. --Daniel Carrero (talk) 13:22, 27 July 2017 (UTC)
I see. I'm not sure if I understand the difference between the sections then. What would I use to refer to another dictionary which contains an entry for the term? —CodeCat 13:35, 27 July 2017 (UTC)
In the vote I linked above, please see the comments of Tropylium, TTO and @I'm so meta even this acronym (and maybe others). I'm not saying I personally agree or disagree with them, but by voting that way they helped to shape the regulations as they are now. --Daniel Carrero (talk) 13:47, 27 July 2017 (UTC)
Sorry, I did not answer your last question properly. When you want to refer to another dictionary which contains an entry for the term, please use "Further reading". --Daniel Carrero (talk) 13:48, 27 July 2017 (UTC)
Even if that dictionary is used to "prove the validity of what is being stated", and in which readers may "verify the information available"? Writing "Further reading" instead of "References" is not much more work, it just seems counter-intuitive.__Gamren (talk) 16:38, 27 July 2017 (UTC)
Obviously not in the carefully considered opinion of those who supported the proposal, which they have carefully studied and for which they had their own clinical experience and good evidence. DCDuring (talk) 16:47, 27 July 2017 (UTC)
But, as was discussed before, we use quotations to attest words. If we wrote "References" just to link to the same word in external dictionaries, this would make it sound like we know that the word exists because it's in those dictionaries.
We can use the "References" to "prove the validity of what is being stated" and "verify the information available" when we are making statements in etymologies and usage notes, for example. --Daniel Carrero (talk) 16:50, 27 July 2017 (UTC)
Okay. So "references" may support everything except 1) that the term exists, 2) that it is of the specified POS, and 3) that it means what we say it means? Then, e.g. diff, diff are erronous? Most Danish entries, at least, are like this (DDO seems to be very frequently linked-to here). Perhaps a bot can be taught to recognize the string ===References===\n* {{R:DDO}} and equivalents?
For Greenlandic affixes, I have rarely been using dictionaries (since both DAKA and its ancestor Oqaatsit are crap for those purposes) but mostly Bjørnum's and Nielsen's grammars (on e.g. -lior, -isag, -suaq), both of which have lists of affixes, as references to both meaning and morphological behaviour (see both Usage notes and the headword line). Is this also wrong?
As an afterthought, what if there is a word in an LDL that has no quotations, but is found in approved dictionaries? Are these latter then still not to be called references?__Gamren (talk) 20:24, 27 July 2017 (UTC)
To answer that question, I'd like to use @Angr's words from this discussion (except I don't speak Ancient Greek so I'll just trust him on the examples): "And ideally (but admittedly totally unrealistically), we should be writing our own definitions "from the bottom up", i.e. on the basis of citations, rather than taking them from other dictionaries. For example, we should be saying that μῆνις (mênis) means "wrath" not because LSJ tells us that's what it means, but because we observe that that's what it means in "Μῆνιν ἄειδε, θεά, Πηληιάδεω Ἀχιλῆος οὐλομένην"."
Yes, I believe diff, diff are erroneous. those Please use "Further reading" even when a word in an LDL that has no quotations, but is found in approved dictionaries.
I don't edit in LDLs, I'm just trusting the judgement of people who participated in discussions and votes and who edit in LDLs. The consensus and rules can change if needed. --Daniel Carrero (talk) 20:43, 27 July 2017 (UTC)
This discussion scares me. No comment by a single user in a vote discussion can be taken as policy. It is only the proposal voted on that is approved. If the text of policies has been altered based on those comments, the change should be null and void. If a vote can't be run properly, it should not be run at all. How many alterations of our policy pages have been purportedly made as result of a vote, but actually with reference to a mere comment? DCDuring (talk) 23:17, 27 July 2017 (UTC)
@DCDuring: To clarify: in Wiktionary:Votes/2016-12/"References" and "External sources", most people in the vote supported the whole proposal, fewer people opposed the whole proposal, some people opposed specifically the rule about requiring <references/>. The final vote count allowed for almost the whole proposal to be implemented, except the <references/> was not implemented at all. What's wrong with that? It's not a comment by a single user is taken as policy, it's quite the opposite: a few oppose votes were enough to not implement a rule. --Daniel Carrero (talk) 23:36, 27 July 2017 (UTC)
The approach you have chosen to take is to view each element of a proposal as separable, whereas they usually constitute a whole. And, of course, there is damn little analysis of how a range of actual entries would look under this or most other proposals. The introduction of spurious component proposals distorts the voting process. Instead of there being an attempt to reach a consensus view in pre-vote discussion, the vote is undertaken, apparently to force discussion of issues that would not otherwise be discussed, probably because they are of insufficient concern to attract interest. I really don't get what legitimate Wiktionary interest the process of policy ossification-by-vote is in aid of. DCDuring (talk) 00:22, 28 July 2017 (UTC)
That specific vote was discussed in Wiktionary:Beer parlour/2016/November#Suggestion: Mention on WT:EL the fact that external links ≠ references, which was a large discussion (26.849 bytes). People voted for each component separately, so I just counted their votes at face value. There's nothing wrong with that.
It's true that for Wiktionary:Votes/2016-12/"References" and "External sources" we didn't have an analysis of how a range of actual entries would look like. I agree, this could have been an improvement. (other votes do have that analysis: Wiktionary:Votes/pl-2017-07/Gallery lists 132 entries as examples) --Daniel Carrero (talk) 00:45, 28 July 2017 (UTC)

So, for the narrow purpose I outlined above -- using grammars as sources on Greenlandic affixes -- what is right, References or Further reading?__Gamren (talk) 15:57, 28 July 2017 (UTC)

Please use "Further reading" if you are just placing a link with no comments, or use "References" if you are adding references for something written in the etymology or the usage notes. --Daniel Carrero (talk) 18:02, 28 July 2017 (UTC)
@Daniel Carrero: There's nothing prohibiting the use of references in sections other than Etymology or Usage notes, though. I think references may even be necessary for definitions of affixes, and of words with a more elusive meaning like Ancient Greek γε (ge). — Eru·tuon 18:38, 28 July 2017 (UTC)
About using References for definitions themselves: if that makes more sense, fine by me. I would just like it to be written in the rules at some point if it is allowed.
It's true that References may point to sections other than Etymology or Usage notes.
@Angr, the entry γε (ge) is using only the "References" section right now with 9 links. Based on the current rules, the right thing to do would be placing all the dictionaries in a "Further reading" section, and using the "References" section only for sources that serve as evidence for something in the entry, like some material about the large "Usage notes" section. Do you see any problem with that? Maybe that entry is a special case somehow, where the "References" section is good enough as it is? I'm asking you because you edit Ancient Greek entries and you participated in link and you have been helping to add "Further reading" where applicable. --Daniel Carrero (talk) 19:01, 28 July 2017 (UTC)
What is the policy on the meaning of the term references in general? Is there also some practice reflected in WT:Glossary or Appendix:Glossary? In the absence of a specifically redefined meaning for reference ("reference work"), a contributor could refer to any work that supported any content of any part of an entry or the entry as a whole. There wouldn't seem to be any existing restrictions, nor do such restrictions seem useful, except as a device to impose controls. I fail to see why such content controls improve Wiktionary. I see some advantage to preventing there from being too many confusingly similar headings (ie, noncontent material, especially displayed in large type). DCDuring (talk) 19:40, 28 July 2017 (UTC)
One of my favorite uses of references is to enable one to compare and contrast our approach to defining a term to those taken by others. This often comes up when we try to improve actual content in an entry, which so often is limited to such mundane matters as definitions. As an example, see Wiktionary:Tea_room/2017/July#to_channel. DCDuring (talk) 20:05, 28 July 2017 (UTC)
It seems like, based on the use of the section, "References" means "Sources", though it's not defined anywhere. — Eru·tuon 20:02, 28 July 2017 (UTC)
Indeed. Nor does it seem to me to wise to so define it in any way that is restrictive as to purpose. Obviously we would exclude spam, advertising, etc on other grounds. DCDuring (talk) 20:08, 28 July 2017 (UTC)
@Daniel Carrero: AFAICT everything under γε#References is a dictionary entry of some kind, so I would simply change the header to ===Further reading===. It doesn't look like anything there is being used to reference the usage note. —Aɴɢʀ (talk) 20:40, 28 July 2017 (UTC)
@Angr: Thanks, I changed "References" to "Further reading" in that entry.
@Erutuon, DCDuring: These terms are defined here: Wiktionary:Entry layout#References, Wiktionary:Entry layout#Further reading. --Daniel Carrero (talk) 10:46, 29 July 2017 (UTC)
@Daniel Carrero: I wrote most of the usage note in γε (ge), so you should've asked me. I don't recall where I got it from, but it was probably a paper not mentioned in the list of reference works, because the LSJ is decidedly unclear on the topic of the position of the word. (It's not really that long; it just looks like it because of the quotations. It should be longer, because it's an important and hard-to-grasp word.) — Eru·tuon 18:42, 29 July 2017 (UTC)
@Erutuon: I apologize for not asking you. --Daniel Carrero (talk) 19:55, 29 July 2017 (UTC)

Accessible editing buttonsEdit

--Whatamidoing (WMF) (talk) 16:56, 27 July 2017 (UTC)

Like +1. Wyang (talk) 21:17, 27 July 2017 (UTC)
I really hate these new giant buttons everywhere. They take up too much screen space and don't integrate with the browser as well. What's wrong with using default browser buttons? If there are accessibility issues, let the browsers take care of it by having the option to change what default buttons look like. --WikiTiki89 21:24, 27 July 2017 (UTC)
I agree with Wikitiki. However, I wish not for them to be abolished completely but for there to be an option to personally disable them. —suzukaze (tc) 23:33, 27 July 2017 (UTC)

Arabic form I with middle فتحةEdit

Arabic form I verbs with فتحة in the middle consonant of the past الْمَاضِي may change it for any vowel in the middle consonant in the non-past (imperfect) indicative الْمُضَارِع. Therefore, it'd be very helpful to organize them in groups depending on which vowel or vowels they have, and add those categories to Category:Arabic_verbs. --Backinstadiums (talk) 14:09, 27 July 2017 (UTC)

I supposed this would be a fairly straightforward task, since entries are filled in using 'templates'. Am I wrong? --Backinstadiums (talk) 18:39, 27 July 2017 (UTC)

I'd imagine it could be done by Module:ar-verb, which serves {{ar-verb}}. What should the umbrella category be named, and the subategories? Perhaps "Arabic form-I past verbs by middle vowel", "Arabic form-I past verbs with the middle vowel x"? Any suggestions as to the name of the category, @Atitarev, Benwing2, others? Actually, the umbrella category should be under Arabic form-I verbs, because it applies only to form I. — Eru·tuon 18:48, 27 July 2017 (UTC)
Okay, "past vowel" is used in some of the verb categories already. So I would propose Arabic form-I verbs by past vowel, or perhaps Arabic verbs by past vowel (since only form I has variation in the past vowel), and Arabic form-I verbs with past vowel a or Arabic verbs with past vowel a. —This unsigned comment was added by Erutuon (talkcontribs) at 13:57, 27 July 2017‎ (UTC).
Categorizing by past vowel alone is insufficient, I think we should have individual categories for each past-and-non-past vowel combination (and just to point out for anyone who is unaware, this only applies to form-I verbs). --WikiTiki89 19:23, 27 July 2017 (UTC)
Sounds good, as long as you don't mean in exclusion to categories for individual past and non-past vowels. I think there should be both individual past and non-past vowel categories, and categories for combinations. For instance, كَتَبَ، يَكْتُبُ (kataba, yaktubu) could be placed in categories for "past vowel a", "non-past vowel u" and "past vowel a and non-past vowel u". There could be umbrella categories for both individual and combination vowel categories, and a master category for "Arabic verbs by vowel" or something. — Eru·tuon 19:32, 27 July 2017 (UTC)
I don't think we need the categories for individual past and non-past vowels. The past and non-past vowel pairs are interrelated and shouldn't be separated. For example, most active verbs have a-u (a being the past vowel and u the non-past), while most active verbs with gutturals as one of the last two root consonants have a-a. Some active verbs have a-i. Stative verbs usually have i-a or u-a. All other combinations are rare (for strong verbs at least). And of course these are general rules that have many exceptions; a-u verbs can be stative, i-a verbs can be active, etc. So really taking either one separately doesn't tell you much about the verb. --WikiTiki89 21:22, 27 July 2017 (UTC)

@Wikitiki89, Erutuon I have already proposed enabling users the exploration of the corpus of terms, implementing a user-friendly interface --Backinstadiums (talk) 07:23, 28 July 2017 (UTC)

@Backinstadiums: That is a cool idea, but I really have no idea how to implement it, as I've got very limited programming skills. — Eru·tuon 16:11, 28 July 2017 (UTC)

Request for help with cleanup: User:DTLHS/cleanup/lemma categorizationEdit

I have generated this report to detect entries that are not in either lemma or non-lemma categories matching their language. There are approximately 30,000 entries- feel free to remove pages if you complete them. DTLHS (talk) 01:20, 28 July 2017 (UTC)

Thank you! That's very useful. — Eru·tuon 01:35, 28 July 2017 (UTC)
Spanish is 99% done. Most on the lists were mistakes from various Wonderfools. --Recónditos (talk) 19:02, 28 July 2017 (UTC)
Many of the entries are cases where a headword template was used, but the template doesn't call {{head}}. Perhaps a separate list of all headword-line templates which do not transclude Module:headword in some form would be useful for finding these. —CodeCat 20:29, 28 July 2017 (UTC)
Many headword templates that don't use {{head}} are not categorized, so I don't know how to find them other than going through this list. DTLHS (talk) 20:32, 28 July 2017 (UTC)
Yet another thing to clean up! Uncategorised headword templates! —CodeCat 21:06, 28 July 2017 (UTC)
I've cleaned up some English entries. Some were actually FL entries. There are also multiple problems. Many are also to be cleaned up because they use Abbreviation, Initialism, or Acronym as headers. Often several PoSes are required.
Should we take the trouble to remove resolved items? That's quite tedious if whole pages are involved. DCDuring (talk) 23:54, 28 July 2017 (UTC)
Yes, please try to remove pages that you've fixed. DTLHS (talk) 23:57, 28 July 2017 (UTC)

Strategy discussion, cycle 3. Challenge 5Edit

There are only three days left (plus today) to take part in Cycle 3 of the Wikimedia strategy discussion. Insights to the last challenge our movement is facing has just been published. The challenge is: How does Wikimedia meet our current and future readers’ needs as the world undergoes significant population shifts in the next 15 years?

The previous challenges are:

  1. How do our communities and content stay relevant in a changing world?
  2. How could we capture the sum of all knowledge when much of it cannot be verified in traditional ways?
  3. As Wikimedia looks toward 2030, how can we counteract the increasing levels of misinformation?
  4. How does Wikimedia continue to be as useful as possible to the world as the creation, presentation, and distribution of knowledge change?

On this page, you may read more, and suggest solutions to the challenges. Also, if you're interested in related discussions that are taking place on other wikis, please have a look at the weekly summaries: #1 (July 1 to 9), #2 (July 10 to 16), #3 (July 17 to 23).

In August, a broad consultation will take place, but it'll differ from what we've been conducting since March. This is your last chance to take part in such a discussion! SGrabarczuk (WMF) (talk) 17:51, 28 July 2017 (UTC)

Delete SoP compounds in languages like German and DutchEdit

This is related to the deletion discussion of Kinderleichenficker. The user who put this up mentioned that the word was disgusting and so he got a lot of "keep"s since that's not a justified argument. But he also mentioned something very justified, namely that this a purely random combination of words. And since English child corpse fucker would without doubt be deleted as SoP, the German version should also. Our policies right now don't allow for closed (one-word) compounds to be deleted as SoP. This should be changed for languages that both a) freely allow the formation of compounds and b) spell all of these compounds in one word. Kolmiel (talk) 14:16, 29 July 2017 (UTC)

  • Admittedly Kinderleichenficker is not a nice word, but it sounds like a bad idea. Maybe Duden can be used as a guide. DonnanZ (talk) 14:22, 29 July 2017 (UTC)
How does it sound like a bad idea? --Barytonesis (talk) 14:32, 29 July 2017 (UTC)
The idea to delete ALL German and Dutch (Finnish, etc.) compound words is wrong because there may are a lot of compound words that are idiomatic and would pass CFI or have already. Using English is a guide is a start but some languages just require separate CFI, which should be defined. Using other dictionaries' approach may be helpful for discussions. What makes a word in English is not the same not only in German, Dutch, Finnish, Estonian, etc. but also in all languages using scriptio continua, notably Asian languages, like Chinese, Thai, Khmer, Lao, etc. and also Vietnamese, which uses spaces after each syllable. --Anatoli T. (обсудить/вклад) 14:39, 29 July 2017 (UTC)
But the OP is not suggesting that we should delete ALL of them; he's rather saying that a mere orthographic convention shouldn't be a sufficient reason for including them all without question (which doesn't mean that he doesn't want to include any, only those which do happen to be idiomatic). --Barytonesis (talk) 14:47, 29 July 2017 (UTC)
Our mission statement is to include all words in all languages. Compound words are most definitely words, so we should obviously keep them. SemperBlotto (talk) 14:50, 29 July 2017 (UTC)
Since "words" has been deemed to not exclude any of open compounds, closed compounds, phrases, and non-constituents, would we have any basis other than whimsical or "democratic" ones for excluding, say, for excluding? DCDuring (talk) 19:28, 29 July 2017 (UTC)
I oppose this. They're still one-word compounds. It's okay for some languages to end up having more words than others because they compound differently. —Μετάknowledgediscuss/deeds 06:40, 30 July 2017 (UTC)
Past discussions seem to show no consensus for such a proposal: see Talk:Zirkusschule (mentioning also "Tanzschule"), Talk:Sportlerherz, Talk:Plastikschwanz, and Talk:neuntausendneunhundertneunundneunzig. --Dan Polansky (talk) 08:32, 30 July 2017 (UTC)
I'd really like to hear from de-4 speakers, really ONLY from them. DCDuring (talk) 12:57, 30 July 2017 (UTC)
I disagree with that. Such matters should be decided by those impacted, which includes native English speakers and en wikt editorship in general. If native English speakers find it convenient to have Tanzschule and neuntausendneunhundertneunundneunzig in the dictionary, they should be allowed to have a say and make a decision. There really isn't any secret or tacit native knowledge about the language that is inaccessible to English speakers; a ten-year-old can see what sort of beasts these German closed compounds are. --Dan Polansky (talk) 13:54, 30 July 2017 (UTC)

I would also like to add Scandinavian languages to this discussion, because they form compounds pretty much the same way as in German. I agree in principal with Kolmiel, but would also like to hear from de-4 speakers. --Robbie SWE (talk) 13:05, 30 July 2017 (UTC)

  • Well, if you're going to include non-idiomatic German (Dutch, Finnish, Swedish...) compounds (and thus forego half of our CFI, really), you're going to include every single word combination anyone was ever bothered to record in three durable sources. It's as simple as that, now everyone has to decide if they want that. For the other: English as a West Germanic language doesn't have less words. Words are syntactical units, not squiggles of (cyber) ink below a minimum distance on (cyber) paper. Child corpse fucker is as much a word (compound noun) as Kinderleichenficker and if you want to have a structured/coherent dictionary, you'll either keep both or neither. Korn [kʰũːɘ̃n] (talk) 13:34, 30 July 2017 (UTC)
    The concept "word" is ill-defined, and "syntactical units" is not a particularly functional definition for our purposes. Given the way this wiki is structured, the large scale structure is not particularly relevant or noticed, which is good because we've got many languages with one or two words, and many languages with a few hundred, and a few with hundreds of thousands, and that is far more important to the overall structure and coherence than exact rules for what words we include in German and English.--Prosfilaes (talk) 20:56, 31 July 2017 (UTC)
No, please no, pleeeease no. I beg you no. Oppose. I oppose because someone who does not have knowledge of the language and sees this word will want to look it up. I myself have done this with a Germanic language in which I was completely unfamiliar; I just picked out every single word and looked it up on Wiktionary. Plus, we already have tons and tons of Germanic "SOPs" here, so going through and trying deleting all of these would be pointless and a complete waste of time.
Look, Germanic compounds are no different than regular compounds, thought they may seem random and obviously they can be created freely. Look at, for instance, English schoolteacher. Should we delete that? Because that looks pretty "SOP" to me, except, wait, it's completely not. It's one word. English can make seemingly random compounds too; it just doesn't do that as often. See playfield, or horsefucker. As long as something is one word, it's not SOP. It's never SOP. Do not delete those because of this. Please please please please please please do not delete those.
I don't care how much or how little knowledge I have of German or Dutch, my comment will not be dismissed as irrelevant, because this can be compared to similar English compounds as well. I don't think English natives would be in favor of deleting schoolteacher or playfield. So don't delete the German compounds either. I don't care how many there are. Please just don't delete them. Please! PseudoSkull (talk) 15:48, 30 July 2017 (UTC)
These compounds are not strictly speaking SoP. They're transparent, sure — you can predict the meaning if you know the individual words well enough. But the fact that a particular meaning is expressed by a particular transparent compound is itself unpredictable, that is: lexical. It's conceivable that instead of Kinderleichenficker, German would have ended up using some other expression such as ˣNekropädophiler. Similarly, 'necrophilia' is still Nekrophilie and not something analytic like ˣLeichenfickerung.
If the concern is doing away with truly ad hoc compounds that don't have any consistent use, our existing attestation criteria should be good enough. --Tropylium (talk) 17:08, 30 July 2017 (UTC)
Uhm. Kinderleichenficker is an unidiomatic ad hoc compound, don't get any wrong ideas. This is not a situation where an entry was created which tells you 'the right word' to use, it's just a random amalgamation of word that could be exchanged for any other amalgamation of terms with the same meaning. Korn [kʰũːɘ̃n] (talk) 18:34, 30 July 2017 (UTC)
Two options besides keeping the inconsistent status quo: remove transparent German compounds spelled as one word, add transparent English compounds spelled as multiple words. I'm more inclined to go the opposite direction: include more English compounds spelled as a series of separate words, even when they are supposedly sum-of-parts. Really they aren't sum-of-parts, their meaning is just relatively predictable, as Tropylium says. — Eru·tuon 17:45, 30 July 2017 (UTC)
The status quo is not inconsistent. Words written in scripts that split words by spaces are kept. It may not correspond with certain other definitions of the word "word", but it is internally consistent.--Prosfilaes (talk) 20:24, 30 July 2017 (UTC)
Okay, your point is correct. I meant inconsistent in the sense of treating morphologically similar words in two languages entirely differently simply because of spelling conventions. — Eru·tuon 21:22, 30 July 2017 (UTC)

Even if I – to a certain degree – adhere to the all-words-in-all-languages-mantra I do see a problem here. What do we do if a common compound noun in language A is considered SoP in language B? For instance Swedish viltvårdsområde, which in English means game preserve area, would probably be considered SoP with the reason that there are an infinite amount of areas and not all deserve entries. In the case of Kinderleichenficker, I'm not against it because it is offensive or inherently repulsive – I consider it an artificial construction which makes sense morphologically, but still not worth having in a dictionary. Should it be deleted? I'm still on the fence, but it does force us to evaluate what we're doing here: all words in all languages is a valiant vision, but in the process of fulfilling it, do we stop aspiring to provide some "kind" of quality? --Robbie SWE (talk) 17:55, 30 July 2017 (UTC)

I have to say I think some of you guys have the wrong idea. game preserve area should not be an entry, but, if attested, viltvårdsområde should be an entry. Don't even compare long Germanic compounds to their English phrase equivalents and call that justification, because we literally could sit here and do that for thousands of words for years on end. Why should these entries exist? Because most anglophones I've ever met IRL can't speak another Germanic language and wouldn't even know what a "compound" in the sense of linguistics is. We should ALWAYS consider the casual reader who has no knowledge of a particular language. An anglophone can guess that we at enwikt wouldn't have an entry for school board employee, but if they didn't know what [place German/Dutch compound translation of that phrase here] actually meant at all and just wanted a translation, Wiktionary is an important resource for that. On the German/Dutch/Swedish/Danish/Norwegian Wiktionary, it might be understandable (but would still be disagreed upon by me) not to include long compounds like this, but on the English Wiktionary, we DEFINITELY need such compounds. PseudoSkull (talk) 04:19, 31 July 2017 (UTC)
@PseudoSkull: To be nitpicky, school board employee is not a phrase; it's a compound and grammatically a single word in English, just as its equivalent in German or Dutch would probably be. It is composed of three nouns, which cannot form a phrase in English.
But the point about compounds spelled as one word being harder to figure out than compounds spelled as multiple words is quite relevant, and I think it's a convincing rationale for covering German compounds but not the corresponding English ones. It's harder for a person to tease apart the components of a German compound and figure out its meaning, simply because it's written as one word. So, even when a German compound's meaning is just as predictable as a corresponding English one (for which we wouldn't have an entry), it may be more useful to have an entry for the German one because it's harder to decode. — Eru·tuon 04:49, 31 July 2017 (UTC)

I oppose; there's no clear evidence the word that started this fuss will pass RFV anyway. SoP is a complex subjective rule; in languages where compound words are clearly demarked by spaces, there's no reasons to interject SoP in there.--Prosfilaes (talk) 20:28, 30 July 2017 (UTC)

That word is irrelevant. The point is the principle, which is as clear as the sun. I'm really shocked that this self-evident proposal is going to fail. It makes me want to quit wiktionary. Or else, spend the rest of my time creating German compounds. (I could make you regret this!)
Please don't quit. --Barytonesis (talk) 11:24, 31 July 2017 (UTC)
And what about Turkish? It's an agglutinative language. Do you want to include all Turkish words, too? It's ridiculous. Kolmiel (talk) 07:02, 31 July 2017 (UTC)
Fulminating about how "self-evident" your proposal is doesn't help anything. The status quo has the advantages that telling whether a German series of letters without spaces is an acceptable entry is trivial, whereas SoP is very subjective. You can create whatever entries you want; there are a number of entries that I could create in English but don't see the value of.--Prosfilaes (talk) 20:45, 31 July 2017 (UTC)
So then why don't we get rid of SOP entirely? That way determining whether any entry is acceptable will always be trivial. And we will fill up with garbage in all our languages equally. There is no reason for German to be special just because it happens to spell certain kinds of things without a space. --WikiTiki89 20:53, 31 July 2017 (UTC)
We don't get rid of SOP entirely because we want some rule to include useful entries for multiword phrases. German is special in many ways, including having a huge corpus of transcribed text; should we handicap it so it doesn't outpace Cornish? German compounds being more opaque than compounds written in languages that use more spaces is certainly a reason to include certain German words.
I am skeptical that English and German are that much different. A Christmas Carol (chosen for convenience) uses 4,582 distinct space and punctuation delimeted words; Der Weihnachtsabend uses 5,081. Even accounting for the long tail, I doubt that German has more than double the words of English, which falls beneath the margin of error IMO. It wouldn't mean filling up with garbage by any means.--Prosfilaes (talk) 21:39, 31 July 2017 (UTC)
What does Cornish have to do with anything? I'm not complaining that German has too many entries and that that's unfair to other languages. I'm complaining that German has too many unnecessary entries, regardless of what other languages have. I don't accept the notion that not having a space makes the compound more opaque. If the compound is opaque, it's opaque whether it's with a space or not. If it's transparent, it's transparent with a space or not. --WikiTiki89 17:11, 1 August 2017 (UTC)
Soyourearguingthatscriptiocontinuaisaseasytoreadaswritingwithspaces?Particularlyforpeoplewhoarenotfluentinthelanguageinquestion?--Prosfilaes (talk) 21:37, 1 August 2017 (UTC)
That's not at all what I am arguing. In fact what I am arguing is that despite the fact that it's more difficult to read, we still shouldn't have entries for SOP terms just because they don't have a space in the original language. That's why we don't have entries for entire sentences in languages like Chinese, that are written entirely without spaces. That's also why being spelled without a space should not be in itself enough of a reason to include a compound in languages like German. --WikiTiki89 21:44, 1 August 2017 (UTC)
I prioritize usability over this theoretical analysis. We've established a principle that SoP terms in English that are spelled without spaces still can get entries. I don't see that other Germanic languages are different enough to justify making it harder for users to look up words, and discouraging people from creating German entries because it's unclear whether they'll be deleted. (There are a number of entries in English I wouldn't bother making because the rules on keeping them are unclear.)
Chinese is not relevant; in practice, each Chinese character is in effect a space-delimited word, with multicharacter words being treated much the same as multiple word phrases in English. I'm not discussing scripts that don't use spaces; they by necessity have different rules. If we were even talking about Inuit, I wouldn't hold this position. But we're talking about another Germanic language with a slightly higher propensity to compound without spaces. It doesn't justify changing clear rules that prioritize usability for rules that prioritize theoretical correctness.--Prosfilaes (talk) 22:35, 1 August 2017 (UTC)
Just to elaborate on this a bit: In Turkish, ev is "house", evim is "my house", evimde is "in my house", evimdeki is "the one in my house", evimdekiler is "the ones in my house", evimdekilerin "of the ones in my house", etc. etc. etc. etc. etc. etc. Do you see how this explodes into literally billions of words? Kolmiel (talk) 07:26, 31 July 2017 (UTC)
Actually, these Turkish examples aren't compounds (ev is a noun, the remaining parts in all examples are suffixes), so it's outside the scope of this discussion about compounds. It's probably better to start a new discussion about the many non-lemma word forms in agglutinative languages like Turkish. -- Curious (talk) 12:41, 2 August 2017 (UTC)
  • The benefits to keeping single-word compounds outweigh the negatives, IMO. Even native speakers may, however infrequently, not realize how to decompose a word (what its parts are) — that is true even in English (with unspaced compounds) — and non-native speakers will certainly face that problem; wiki is not paper and has the space to help them. Some words can be decomposed in more than one way, like Wachstube (consider also things like Hochzeit and Vollzug). And as Tropylium notes, the fact that one compound exists (and perhaps another alternative compound does not) can be lexical information, however slight — in languages like English, 'usual' collocations are sometimes given as usexes or in usage notes. And each "SOP" compound is generally considered a word, not multiple words, which puts them in our remit IMO. That's different from English "county school board employee" which is four words (even if they might be said to form a single term in some contexts, like "in this contract, the term county school board eomployee means..."). (Consider how "Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz" is sometimes said to be the longest German word, or a famously long German word, but you couldn't say that anything like "county school board employee" was the longest English word, because people would object that it's clearly more than one word.) As Meta says, it's OK if some languages have more words than others. - -sche (discuss) 06:24, 31 July 2017 (UTC)
  • @PseudoSkull In Dutch a school board employee would be a schoolbestuurslid. Please note how employee does not translate to werknemer as you might expect. But does it even make sense in English? Shouldn't it be a school board member?

@Robbie Did you just call me a viltvårdsområde? Seriously, I wouldn't know how to seperate that.

Vilt(game) vård(preserve) sområde(area)?
Vilt(game) vård(pre)som(serve) råde(area)?
Vilt(game) vårdsom (preserve) råde(area)?
Viltvård(game) som (preserve) råde(area)?
Viltvård(game) sområ (preserve) de(area)?
Vilt(game) vård(preserve) sområ(area) de(grammatical gender indicator)?

If you think it shouldn't be that hard, here's a little game for you to play. Figure out how to seperate the following list of Dutch words. You are only allowed to use English wiktionary and you are not allowed to just look up the whole word. All seperate words can be found on English wiktionary. All words either exist or are plausible compounds.


If a word passes RfV it should be kept in my opinion. W3ird N3rd (talk) 10:02, 31 July 2017 (UTC)

I support the notion that being spelled as a single word should not be a criterion for inclusion. --WikiTiki89 15:01, 31 July 2017 (UTC)

@W3ird N3rd challenge accepted! Only using Wiktionary, this is how I would separate the words – mind you Swedish helps a lot, so I offered their Swedish equivalents to show just how:

hoerenloperij – hoer(en) + loper(-ij) = horlöperi (N.B. a calque, doesn't actually exist in Swedish)
koeieuier – koe(ie) + uier = kojuver
oehoeëi – oehoe + ei = berguvägg
hottentottententententoonstelling – hottentott(en) + tent(en) + tentoonstelling = hottentottältutställning (N.B. a calque, doesn't actually exist in Swedish)
feeënverzamelplaats – fee(ën) + verzamelplaats = fesamlingsplats (N.B. a calque, doesn't actually exist in Swedish)
zwartkeellijstereieren – zwart + keel + -ij + ster + eier(en) = svarthalsadtrastägg (N.B. a calque, doesn't actually exist in Swedish)
vogelbekdiereierstokken – vogelbekdier + eierstokk(en) = näbbdjursäggstockar (N.B. a calque, doesn't actually exist in Swedish)
onzinwoordenopsomminkje – onzin + woord(en) + opsomminkje = nonsensorduppräkning (N.B. a calque, doesn't actually exist in Swedish)
reisspellenverzamelbox – reisspell(en) + verzamel + box = resespelsamlingsbox (N.B. a calque, doesn't actually exist in Swedish)
bibliotheekboekenlijst – bibliotheek + boek(en) + lijst = biblioteksbokslista (N.B. a calque, doesn't actually exist in Swedish)

Thank you for the distraction, but let's get back to the subject. Your examples underline the topic at hand – would they pass WT:CFI if someone decided to add them? I can't speak for your Dutch equivalents, but the Swedish ones are absolutely bogus even if they morphologically and grammatically are correct.

After reading through the comments I have to agree with @PseudoSkullviltvårdsområde (which by the way should be separated vilt + vård(s) + område) is attestable and should be included while game preserve area shouldn't. But the discussion was about Kinderleichenficker. We really shouldn't accept all compounds just because they can be created. I get the feeling that some here believe that compounds written together are more likely to be accepted while their English, French, Italian or whatever equivalents would most likely be deleted because of SoP. If this is true, that we are indulgent towards certain languages and stringent towards others, then we have to re-evaluate our policy. --Robbie SWE (talk) 19:28, 31 July 2017 (UTC)

It is true; cf. WT:LDL. The idea that every series of letters delimited by spaces is a word is facially language-neutral. There's no way to switch things around so that it will be neutral in any absolute sense, and if you're concerned about that, I think it's like rearranging chairs on the deck of the Titanic; we are vastly biased towards English in our entries, having three times as many entries for English as for German (Wiktionary:Statistics), and while generally we support major languages better, there's a lot of bias towards European languages and languages of the British Isles.--Prosfilaes (talk) 21:18, 31 July 2017 (UTC)
@Robbie SWE Very good! And I suspect you may be better at this than the average reader. You did miss a few though, but I'm surprised you got much of it right. Here is the solution, you will enjoy this:

W3ird N3rd (talk) 21:21, 1 August 2017 (UTC)

Keep. Many larger compounds are, in fact, made up of smaller compounds. If you simply treat each compound as a long string of words, it becomes unparseable nonsense. Knowing which of the parts belong together is essential for the interpretation, and this is lexically significant. To take a few examples from the collapsed list above:
  • vogelbekdiereierstok: literally "bird beak animal egg stick", which is complete nonsense. vogelbekdier and eierstok are non-SOP compounds, but you can only parse the combination properly if you know of these smaller words.
  • bibliotheekboekenlijst: literally "library book list". It one isn't as bad, but there's still parsing ambiguity. Is it a list of library books or a book list at a library?
CodeCat 17:42, 1 August 2017 (UTC)
You have the exact same ambiguity if you have spaces. Is a "library book list" (in English!) a "list" of "library books" or a "book list" at a "library"? And regarding "you can only parse the combination properly if you know of these smaller words", if only there was a dictionary where you can look up these smaller words... Oh right that's what Wiktionary is. --WikiTiki89 18:58, 1 August 2017 (UTC)
I didn't vote yet, but just to be clear: keep. As for bibliotheekboekenlijst and vogelbekdiereierstokken, neither would pass RfV but they are plausible words. A boekenlijst is (besides any list of books) a list of books that must be read as a school assignment. Following that a bibliotheekboekenlijst would probably be a list of books that must be read as a school assignment and are expected to be collected from the library. But it could indeed also be a list of all books in a library.
Vogelbekdiereierstokken doesn't pass RfV, but that's only because we don't eat them. The sentence "Scientists analyze platypus ovaries" would translate as "Wetenschappers onderzoeken vogelbekdiereierstokken". Some similar examples:
varkenslever - varken + lever, pig + liver (will pass RfV)
schapendarm - schaap(en) + darm, scheep + bowel (will pass RfV)
fietshater - fiets + hater, bike + hater (will pass RfV)
The following are perfectly valid words that will NOT pass RfV:
fietshaterhater - fiets + hater + hater, bike + hater + hater (somebody who hates people who hate bikes)
fietshaterhaterhater - fiets + hater + hater + hater, bike + hater + hater + hater (somebody who hates people who hate people who hate bikes)
fietshaterhaterhaterhaterhaterhaterhaterhaterhater (you know where this is going, right? yes, we can make infinitely long words that will never pass RfV)
Dutch: fiets + ventiel + afsluitring + dop + verf + verwijderaar + allergie + gezeur + aficionado + hater
English: bike + valve + obturator + cap + paint + remover + allergy + whining + fan + hater
Perfectly valid word to describe somebody who hates fans of whining about allergies for the paint remover used for removing the paint used on the cap of an obturator for a valve of a bike. Or something. Linguistically correct nonsense. Again, compounds should be kept but only if they pass RfV.
You have no idea how much fun Scrabble can be in these languages. W3ird N3rd (talk) 21:21, 1 August 2017 (UTC)
I don't get your argument. The fact that these compounds could have idiomatic meanings means that the ones that have idiomatic meanings should be kept because they are not SOP; this has nothing to do with the fact that they are spelled without spaces. But why should spaceless compounds that are SOP be kept? --WikiTiki89 21:34, 1 August 2017 (UTC)
Varkenslever (pig liver), schapendarm (sheep bowel) and fietshater (bike hater) are not idioms. Let me further explain with another example. On you can search a government-approved list of words that exist. Words that are not on the list may also exist, but those on the list are guaranteed to exist. An example of a word on the list is schapenboer. That's a sheep farmer. (SoP) But somebody who is learning Dutch and tries to look it up in a dictionary without SoP words will find schap(en) + boer. (because singular of schapen/sheep is schaap, not schap) Plural of schap is actually schappen but I'm not expecting somebody who is looking up schapenboer to know that. A schap is a shelf and a boer can, besides a farmer, also be a shop or salesman. So somebody who specializes in selling shelves could be called a schappenboer which is completely different from a schapenboer. And nevermind (boo! SoP!) the schapeboer which is the pre-1996 spelling of schapenboer. W3ird N3rd (talk) 22:20, 1 August 2017 (UTC)
Nevermind is a bad example. For one thing, I would say even never mind is an idiomatic phrase. And secondly, even if never mind were SOP, the fact that nevermind is spelled without a space is irregular for such an adverb+imperative combination, and for that alone it would merit inclusion. --WikiTiki89 15:52, 3 August 2017 (UTC)
  • Don't forget compound words can use interfixes such as -e- and -s- (in Norwegian anyway) to link the parts of a word. DonnanZ (talk) 17:58, 1 August 2017 (UTC)
  • Definitely oppose as a de-4. Our three-use rule is sufficient to protect us from most nonce formations (it alone is the reason why Kinderleichenficker will be deleted) in WDLs, and the fact that paper dictionaries tend not to list most semantically transparent compounds will protect us from them in LDLs. Really, if we wanted to reduce our size by eliminating forms with transparent morphology, we should start with obvious inflected forms like indicated and participles. Having forms like that is IMO more absurd than having things like Zirkusschule. But I still don't advocate deleting indicated and participles. —Aɴɢʀ (talk) 18:20, 1 August 2017 (UTC)
There is de:indicated --Peter Gröbner (talk) 18:53, 1 August 2017 (UTC)
  • By the way, to anyone (@WikiTiki89 ?) who still thinks this is a good idea: wouldn't you agree that such a rule shouldn't discriminate? So it should apply to all languages. So you should get rid of the already mentioned schoolteacher and playfield as well. Also COALMINE, schoolbus, flowerpot, cupholder, drive-in movie, drive-through, racecar, maximum-security, password, arachnophobia, lab rat and countless others. Or would you seriously propose removing the Dutch entry from schoolbus while leaving the English entry intact? W3ird N3rd (talk) 06:06, 2 August 2017 (UTC)
    Yes exactly, I've always been a strong opponent of WT:COALMINE (but that's actually a slightly separate issue that comes up given the fact that our current policy requires the inclusion of spaceless compounds). But many of the compounds you just listed are actually not SOP, and should be kept (even if they were in German). In fact, I think the reason we have this rule in the first place is because in English, it is very likely that the reason a term is spelled without a space is because it is not SOP (obviously this is not always the case). But I would definitely say that an SOP compound in English that is spelled without a space, and that is expected to be able to be spelled without a space by the conventions of English orthography, should be deleted. --WikiTiki89 15:52, 3 August 2017 (UTC)

Arbitrary section break for editing convenienceEdit

  • I don't think it's a good idea to delete the Dutch compounds, see my comments here (archived here). -- Curious (talk) 12:41, 2 August 2017 (UTC)
  • Keep/Oppose. I didn't care either way when this discussion started, but I've been won over to the inclusionist side. If I were learning German, and in my typical fashion, learning by reading books and looking up unfamiliar words in Wiktionary, I would probably get frustrated and start using Google translate or another alternative if they weren't included here. Wiktionary exists to help people decipher the meaning of words, and that includes those which have pretty transparent meanings (mainly because figuring out where the line between transparent and not transparent is isn't necessarily easy). I propose, however, that compound entries for Germanic languages use a "compound" template (if this is not the status quo already), so that their definitions would look like this (rather than giving an acutal definition):
    1. compound of vilt, vårds, and område
This would sufficiently mark an entry as SOP, IMO, and would be an immensely useful inclusion in our dictionary. Our goal should be to include as much that would help people as possible (within reason, of course, but I firmly believe this is within reason). Andrew Sheedy (talk) 15:44, 2 August 2017 (UTC)
@Andrew Sheedy: I think that is a good solution to the problems of opacity to someone who doesn't know what the components are and transparency to someone who does. — Eru·tuon 17:18, 2 August 2017 (UTC)
That's what the Etymology section should say. The definition line should give a definition. And I must say, having looked up vilt, vård, and område, I still don't know what viltvårdsområde is supposed to mean. "Wild health care area"? —Aɴɢʀ (talk) 16:05, 3 August 2017 (UTC)
I think the situation is much like with initialisms and acronyms. Technically, their expanded form is more etymological than definition material, but we include that information in the definition line, with an added gloss if the meaning of the acronym/initialism is not readily understandable from its expanded form, and the expanded form does not have an entry. I simply think we ought to treat compounds the same way. Andrew Sheedy (talk) 15:50, 18 August 2017 (UTC)
  • I oppose deleting the likes of Kopfschmerz in languages like Germans and Dutch (=keep). I might be open to deleting such compounds with high number of stems (Kopfschmerz has two), but we have enough database space, don't we? Again, only those compounds attested in use can be included, not those whose construction is merely plausible. These entries do help non-German speakers break long strings of letters separated neither by space nor by hyphen or ~ into components, and thereby render at least a small lexicographical service to the users. I oppose the notion that this issue should be decided for en wikt exclusively by Germans and Dutch, respectively: the challenge of breaking the compounds into components, if any, is above all for non-native speakers, and to casual users of the language, even those who speak the language very little. To see what others are doing, let's check e.g. Duden's Kopfschmerz[3], whose definition "Schmerz im Kopf" sounds pretty SoP to me, with respect to Kopf + Schmerz. --Dan Polansky (talk) 09:33, 5 August 2017 (UTC)
  • I'm curious about how proponents of this change would like that SOP obsolete compounds or current SOP compounds with obsolete elements be treated. Many such words are not analysable to modern speakers. (I strongly tend to oppose/keep by the way.) Lingo Bingo Dingo (talk) 13:50, 14 August 2017 (UTC)

I am going to remove auto-balancing from {{top2}}, etc.Edit

I am going to restore these to their forms pre-April of this year. Sorry, it is just too broken given the previous assumption of no auto-balancing. Benwing2 (talk) 15:04, 29 July 2017 (UTC)

  Support --Daniel Carrero (talk) 15:26, 29 July 2017 (UTC)
Where does it render badly?--Dixtosa (talk) 15:43, 29 July 2017 (UTC)
In proto-Slavic entries, for example, the descendants from the three branches of Slavic (west, east, and south) used to properly occupy their own columns; now it’s no longer so. — Vorziblix (talk · contribs) 08:13, 31 July 2017 (UTC)
This template should never have been used for that sort of organization. We should create a dedicated template for that. --WikiTiki89 14:58, 31 July 2017 (UTC)
(edit conflict) They work after a fashion as I have said before (in the Grease Pit?). Balancing is not what I call perfect. It is better to convert tables to {{der3}} {{rel3}} etc. DonnanZ (talk) 15:46, 29 July 2017 (UTC)

current Wiktionary policy on synonyms?Edit

I notice at label we have synonyms both under the definitions and as a separate header. What is the current policy on this? ---> Tooironic (talk) 00:34, 30 July 2017 (UTC)

There is none. I think it would be agreed that they should be in one location and not both however. DTLHS (talk) 03:41, 30 July 2017 (UTC)

category:English childish termsEdit

I think that this should be a subcategory of either category:English colloquialisms or category:English informal terms. I only bring this up because it seems like an unusual practice here for registers to have their own subcategories, but if nobody objects within a day or two, I’ll go ahead and make the adjustment myself. — (((Romanophile))) (contributions) 03:00, 30 July 2017 (UTC)

  Support Korn [kʰũːɘ̃n] (talk) 13:35, 30 July 2017 (UTC)
How can I consider which one is childish? Does it exist in other languages too?--Octahedron80 (talk) 13:37, 30 July 2017 (UTC)
As for your second question, yes: Category:Childish terms by language. Or if you don't consider that proof in itself (you'd be right to), then I'll give my feeling as a French speaker: yes, we do have the same in French. --Barytonesis (talk) 14:16, 30 July 2017 (UTC)
Here is my attempt. Hopefully I did not break anything. Oh, and the topic title should have been category:Childish terms by language, but hopefully this isn’t a big deal. — (((Romanophile))) (contributions) 06:39, 31 July 2017 (UTC)
@Romanophile: That's not going to do anything. Category structure is handled by a submodule of Module:category tree, which you can get to by clicking "Edit category data" on the category page. — Eru·tuon 06:51, 31 July 2017 (UTC)
This seems an odd change. For one thing, I'm not sure if I would automatically think to look for "childish terms" under "colloquialisms" or "informal terms". But I suppose those are more or less accurate descriptions of childish terms, as children don't tend to speak formal English, unless they're unusually precocious or bookish, and adults don't tend to speak formal English to children. Also, there might be other "terms of usage" subcategories that could be put under another subcategory of "terms by usage". (That's not really an argument against, just why this one?) I feel the category "terms by usage" needs to be divided up somehow, because there's such a long list of subcategories, but I'm not sure what the master subcategories should be. — Eru·tuon 07:03, 31 July 2017 (UTC)
I don't really see the problem. I'm just deeply, deeply disappointed that Category:English hillbilly terms doesn't exist. W3ird N3rd (talk) 07:22, 31 July 2017 (UTC)

Generate Template:audio parameters by defaultEdit

The second parameter of this template is pretty silly. It has no default, so you have to provide it, but all people ever put in is "Audio". Why not just make that the default, so we can get rid of the parameter in the majority of cases?

The first parameter could conceivably also be generated by default, if the audio files follow a standard format. On gezinnen for example, the template has all it needs to generate the link itself.

Finally, I propose to modify {{audio}} so that it alternatively accepts the language as the first parameter, like many other templates already do. —CodeCat 14:13, 30 July 2017 (UTC)

Replace subject word with templateEdit

I noticed that here, you just copy the word. So for the article on water, the example would be "I drink water". On the Dutch wiktionary, we use a template. So we say "I drink {{pn}}". {{pn}} is a shortcut for {{PAGENAME}}. You might wonder "so fucking what?". Well, if the spelling changes we only have to change the title. Also if we create a synonym, we don't have to alter much. So I'd like to propose English wiktionary starts using a template for this as well, it's quite handy. W3ird N3rd (talk) 18:39, 30 July 2017 (UTC)

It seems unlikely that the spelling of water is going to change in the immediate future. DTLHS (talk) 18:43, 30 July 2017 (UTC)
I'm so sorry, obviously my proposition would only apply to the example used and not to any of the other 5000000+ words on wiktionary. Even if it did, it is unlikely the spelling for any of those would ever change. Also synonyms don't exist, it's just something I made up. Why would any word have a synonym if it means the same thing? It's madness! W3ird N3rd (talk) 18:49, 30 July 2017 (UTC)
(1) if the word was inflected at all the template couldn't be used (2) spellings of words don't change that frequently, and even if they did, the usage example can still serve as an example of the obsolete spelling. DTLHS (talk) 18:56, 30 July 2017 (UTC)
Partially true. For example, Dutch wiktionary would use {{pn}}s for "waters". But if there is a change in the word itself we don't use the template. Dutch spelling does change every once in a while (1905, 1934, 1947, 1955, 1995, 2006), but that may be because Dutch spelling is regulated by law. W3ird N3rd (talk) 19:27, 30 July 2017 (UTC)
And (3) if the word is properly displayed with stress marks, macrons or breves, or other things not included in the pagename, the template shouldn't be used. — Eru·tuon 19:04, 30 July 2017 (UTC)
I'm a little bit sceptical that the need to learn yet another template to edit pages competently would be outweighed by any benefits this would bring. Equinox 18:54, 30 July 2017 (UTC)
Yes, I think we can live without this feature. SemperBlotto (talk) 18:57, 30 July 2017 (UTC)
I understand the concern. I think it's handy, especially when creating a synonym, but you all make valid points. W3ird N3rd (talk) 19:27, 30 July 2017 (UTC)
One situation where this might conceivably be useful is where a word has a zillion different spellings (djellaba, I'm looking at you!) and we'd like to use the same usex for all of them. But our current "small alt-form pages pointing to a complete full-form page" model doesn't really require that. Plus there can be situations where one spelling is used for senses that another isn't (e.g. perhaps obsolete senses of humour). Equinox 19:36, 30 July 2017 (UTC)

Clarification on References and Further ReadingEdit

Are semantic relations supposed or allowed to be supported by references?

Is showing approaches other references take to defining a term supposed or allowed to appear under References or Further reading?


What about showing that a selection of references don't have entries for an item we include? Should that go under Boasting?

What about works that serve multiple functions? Should they appear under both References and Further Reading?

Just asking. DCDuring (talk) 01:41, 31 July 2017 (UTC)

Please help review Gfarnab (talkcontribs)'s entriesEdit

He/she has been editing profusely in many languages (including Arabic, Aramaic, Chinese, Danish, German, Hebrew, Hindi, Persian, Russian, Sanskrit and Swedish), but I have serious doubts about their ability in those languages. The edits to Chinese entries demonstrate extremely poor level of understanding of Chinese. Please help review their other edits. Wyang (talk) 10:15, 31 July 2017 (UTC)

Gfarnab is clearly not a native Dutch speaker. I will fix onlangs. Are we sure the user is not using Google translate or something? W3ird N3rd (talk) 10:32, 31 July 2017 (UTC)
I've fixed all the Hindi and Sanskrit entries previously. He/she was using which has a lot of neologisms and incorrect terms. His/her Hindi seems okay, but perhaps I'm underestimating Google Translate. —Aryaman (मुझसे बात करो) 11:17, 31 July 2017 (UTC)
He or she seems to know Russian (or copying from another source?) but the usage of templates is poor and not using stress marks and automatic transliterations. Having doubts about some etymologies. --Anatoli T. (обсудить/вклад) 11:27, 31 July 2017 (UTC)
The examples in Chinese are extracted either from said language's Wikipedia, the RFA news site or literature classics (obtained from, so please tell me where have they gotten it wrong and I will fix it myself. The other possibility is that Wyang does not know classical Chinese and is not familiar with its vocabulary and syntax, in which case I highly recommend him to toll the effort because it totally pays to read Confucius, Mencius et al. -------
The Hindi examples are extracted from either news sites (like, the Wikipedia or the several Hindi grammars and textbooks I own. -------
I made the Dutch and German examples my own and I still do not know what was wrong with mines in "onlangs", but I apologize in advance if they were erroneous.
Russian etymologies I get from Vasmer, as I cite in the references. The examples I find in -------
Yes, my templates suck: I mostly copy from existing ones and other than that I mostly do not know what to do, but I am improving fast in that department!
Thank you for your attention. And cheers! -------
@Gfarnab: The Hindi usexes are great, I'm just worried about copyright status. For Sanskrit use Monier-William's dictionary please ({{R:MW}}), it's out of copyright unlike Btw, I'm sure Wyang knows a lot about Classical Chinese. —Aryaman (मुझसे बात करो) 01:00, 1 August 2017 (UTC)
@Aryamanarora: So, I just checked the book I was most often using, is this one from 1992 (ie, copyrighted), so today I have resorted to tweaking the example phrases (using आहट instead of आवाज़ as it appears in print, or गाड़ी instead of घर, etc), which amounts to changing about 15% of such short phrases: would this be ok?
re: "I still do not know what was wrong"... That's the point. You're editing in languages you don't know, so you won't be able to tell when you get things wrong. This is a dictionary. People depend on it for accurate information. If you don't already know that it's correct, don't add it. Chuck Entz (talk) 13:43, 31 July 2017 (UTC)
Nice to see you reply here. When it comes to templates, it takes a while to get used to the way things work here. I won't fault you for that, if you add good data you can let others worry about templates and wikitags.
I will tell you exactly what the problem was with your Dutch examples. I didn't check your German examples (my German is not that gut) so I can't comment on that.
"Hij heeft onlangs tot de anjers in zijn bloempot water gegeven|He has recently watered the carnations in his flowerpot"
Wiktionary:Example_sentences - Example sentences should be kept simple. "in his flowerpot" could be removed and "carnations" could be replaced with "flowers".
"Hij heeft onlangs tot de anjers in zijn bloempot water gegeven" is broken. The word "tot" should be removed to make this into a valid sentence.
Removing "tot" will only make it technically correct, the Dutch don't talk that way. In this context, the preceding question would be "Have the flowers been watered recently?" because you want to know when the flowers will need to be watered again. The answer would typically be something like "Hij heeft de planten pas nog water gegeven.". (He watered the flowers not too long ago.) Or maybe (in a more formal setting) "De planten hebben recentelijk nog water gekregen." (The flowers have been watered recently.)
I'm now really forced to think about it. I think "onlangs" is typically used for events that are not repeated often or daily routine jobs:
"De computers zijn onlangs van de laatste updates voorzien" (The latest updates have been installed on the computers recently)
"Wij hebben onlangs ons huis verkocht" (We recently sold our house)
"We hebben onlangs nog een stagiaire aangenomen" (We recently hired an intern)
As for your other example:
"Ze heeft onlangs gewend aan koude douchen 's ochtends|Lately she has grown accustomed to cold showers in the morning"
Again, too long. "in the morning" can be removed.
The most likely way to convey this message: "Ze is sinds kort gewend aan een koude douche 's ochtends.". A classic hebben/zijn swap. Don't feel bad, many Dutch people get it wrong as well. Take a look at and bookmark that site.
You can say "koude douche" (adjc noun, cold shower) or "koud douchen" (adjc verb, showering with cold water). "Ik neem een koude douche" and "Ik ga koud douchen".
Another correct way is: "Ze is ondertussen gewend aan koud douchen" (Meanwhile, she got used to taking cold showers).
Another problem here is (and I guess you couldn't know that if you're not a native speaker) that any English sentence with "lately" does not properly translate to "onlangs" in a Dutch sentence. Or at least I haven't been able to think of one, they all end up broken. So you kind of walked into a trap there. W3ird N3rd (talk) 14:48, 31 July 2017 (UTC)
Thank you for taking the time to explain that, W3ird N3rd, which I have read attentively. I will stick to examples from manuals, dictionaries and corpora if that is ok with y'all.
It isn't. You can't use copyrighted content here. Terms of Use. What sources that may be copyrighted did you use so far? We will have to take a look at anything that's less than 100 years old. W3ird N3rd (talk) 15:06, 31 July 2017 (UTC)
The 2005's "Diccionario avanzado árabe" by Federico Corriente I guess is subject to Copyright and I have used it several times (always citing the source) and the non-Biblical Hebrew examples that I used are mostly from 1982's Assimil "L'hebreu sans peine" (ie, a copyrighted work). Other than that I do not think I have used copyrighted material (as I said, I used publicly-available corpora and news-sites). Shall I erase all those?
Just because those writings or a news website are public does not mean they are not copyrighted. See Wiktionary:Quotations#Copyright for further details. You can sign your messages with ~~~~. W3ird N3rd (talk) 15:36, 31 July 2017 (UTC)
I have also found that Gfarnab's edits to Hebrew, Aramaic, and Arabic contain a lot of mistakes (even if they sometimes contain useful content). He needs to be educated about our practices. --WikiTiki89 14:59, 31 July 2017 (UTC)
I have also found that the Hebrew examples I copy from the Bible are then removed, is that some tradition of yours that I need to be educated about?
If you give me an example, I'll tell you why it was removed. --WikiTiki89 17:17, 31 July 2017 (UTC)
I can give you more than one, Mr. Wikitiki: you erased the examples I had written in צוד ( adding that the meaning was already described in the צד entrance, but you did not transfer the examples as you did in your edit of פילל (thank you for caring about that one, by the way). Plus the etymological explanation I had copied from Strong's Concordance was removed without explanation ( You have furthermore removed other examples I took the time to introduce because they were "not good anyway", which I hope you understand it does not confer any extractable lesson so that I become a better Wiktionary user. Regards, Gfarnab.
In the case of צוד, I simply did not have time. Feel free to add everything back in the appropriate places (צד for the verb, ציד for the noun). In the case of גג, if that's what Strong's Concordance said, then it's simply wrong. Just because someone else says it, does not mean it is right. Feel free to discuss further and ask about any other edits or anything else you have questions about on my talk page. --WikiTiki89 17:39, 31 July 2017 (UTC)

August 2017

travel gameEdit

Would travel game (a board game or card game that was modified to be playable by passengers during a trip) be considered a SoP? W3ird N3rd (talk) 05:34, 1 August 2017 (UTC)

Looks good to me. I'd class I spy and the number plate game as my favourite travel games from when I was a kid in a car. --WF on Holiday (talk) 23:04, 1 August 2017 (UTC)
That's actually not even the definition I meant. I meant games like chess or Ludo that have been modified (e.g. with magnetic game pieces) to be played in a car or on a train. Amazon link to clarify. W3ird N3rd (talk) 00:10, 2 August 2017 (UTC)
I thought about it and the definition I was originally thinking of (and yours as well) is SoP after all because there are other "travel" things. But travel didn't have an adjective section yet. It does now.
  1. (in a compound) An object or activity that has been designed or reworked for use while travelling.
    (object) I've packed the chess travel game in my travel bag and I've got my travel cup in the cupholder, I'm ready to go!
    (activity) Let's play a travel game. I spy with my little eye..

W3ird N3rd (talk) 04:26, 2 August 2017 (UTC)

Aaaaand it's gone. @SemperBlotto, is there a reason you just chucked the whole thing instead of moving it into an additional definition for the noun? I had looked at running (like "running man") and noticed it had an adjective section, but on closer inspection that is used for other meanings of running.. I think. I'm not even sure. I can't entirely explain why running is an adjective in all meanings mentioned but travel isn't. W3ird N3rd (talk) 06:28, 2 August 2017 (UTC)
(@SemperBlotto. —suzukaze (tc) 06:45, 2 August 2017 (UTC))
Thanks, I'm still learning how these things work. I looked it up: So it appears travel acts as a deverbal adjective. So it seems either SemperBlotto is wrong or somebody needs to remove the adjective section from exciting or I may be losing my marbles. W3ird N3rd (talk) 06:57, 2 August 2017 (UTC)
I dunno, travel in travel game sounds like the noun travel to me: a game used during travel. It's weird to try to think of it as a verb. — Eru·tuon 07:07, 2 August 2017 (UTC)
Right, so it's So "travel game" can't be added because I suspect it's SoP yet the information in travel and game don't really allow one to figure out what a "travel game" would be. And this information can't be added to travel either. Okay, my marbles are definitively gone. W3ird N3rd (talk) 07:28, 2 August 2017 (UTC)
Were I looking at this naively, it would be ambiguous to me whether this meant "a game suited for travel" or "a game related to travel" or "the travel industry" or "one of a genre of games somehow related to some definition of travel", or ..... I don't think that dictionaries should act as if they are well suited to hold users' hands as they try to figure what a phrase or sentence or larger unit of language unless there is true novelty or obscurity worse than what I have advanced as my own naive view of alternative meaning. DCDuring (talk) 19:41, 2 August 2017 (UTC)
Well, it's not a phrase, it's a compound noun. I think from what you're saying, it's (for a naive reader) not transparent. — Eru·tuon 19:57, 2 August 2017 (UTC)
Gee, lots of people would call it a noun phrase or NP. Do we have a policy about which school of labels we follow? DCDuring (talk) 21:35, 2 August 2017 (UTC)
Not that I'm aware of. The criterion of spacing bothers me because it means that if you happen to add spaces between the parts of a compound, then it suddenly changes to a phrase. So honeybee is a compound, while honey bee is a phrase. Utterly arbitrary. There has to be a more solid criterion than spelling. — Eru·tuon 22:02, 2 August 2017 (UTC)
If a multi-word expression is attestably spelled solid, we have decided that is sufficient evidence to say that phrase, usually a bare NP, is includable. That criterion is intended to shortcut our repetitive, amateurish arguments about including such terms. DCDuring (talk) 22:29, 2 August 2017 (UTC)
Huh. I was talking about criteria for whether something is a compound noun, not CFI. — Eru·tuon 22:36, 2 August 2017 (UTC)
I think that, in practice, we try to avoid academic discussions with only indirect application to Wiktionary. It seems to me a good practice. DCDuring (talk) 23:46, 2 August 2017 (UTC)
You're probably right. I'm quite annoyed by compounds being called phrases, but it isn't particularly useful to discuss. Back to the content of your post, you recognize potential ambiguity with travel game but still don't think it should be included. I find that baffling, given that English Wiktionary is used by lots of people who don't speak English well. I would imagine that at least some of them would misunderstand travel game in the ways you mention. — Eru·tuon 01:59, 3 August 2017 (UTC)
@Erutuon: To me those ambiguities are typical of those that arise in interpreting any NP/compound noun that one hasn't heard before. In normal speech, the context shows one definition to be the most relevant of all of the ones that are possible from the definitions of the component terms. I consider the situation to be illustrative of why we focus on transparency of meaning in the context in which a term is used, given the definitions of the component terms. DCDuring (talk) 07:48, 3 August 2017 (UTC)
It really, really, REALLY wouldn't be the first time I turn to Wiktionary (or any other dictionary) to look up a word that I have no proper context for. For example when something like this happens in a TV show:
So what do you hate most?
-Any travel game.
Why do you hate that so much?
And then the show continues. Maybe it's a running gag. Maybe it's a reference to something in a previous episode that I missed. Maybe it refers to some character trait. Maybe it refers to some event or tradition that I'm not aware of, like some scandal in the country where the show was made. Maybe it's just plain random.
Alternatively, some word will pop up in my head randomly but I can't remember the context I heard it in. Out of curiosity I try looking it up. What it comes down to is as simple as this: Wiktionary is useless to look up any ambiguous SoP so I'll be forced to go elsewhere. If that's your goal I'd say mission accomplished. W3ird N3rd (talk) 23:47, 3 August 2017 (UTC)
Why wouldn't it be arbitrary? Why should you be able to draw a clean line between a compound and a noun phrase? There's no clear line between a "canoe truck", a "turnip truck" and a "fire truck". Certainly, though, English words spelled without spaces are more likely to be organic unions with a unique meaning, whereas noun phrases are more likely to be spelled with spaces and have meanings obvious from the individual words.--Prosfilaes (talk) 23:24, 2 August 2017 (UTC)
I dunno, it seems axiomatic that syntactic categories (word, phrase) should be based on something other than spelling, such as syntactic behavior. If they coincide with spelling, great. Honey bee behaves no differently from honeybee, so it is in the same syntactic category. Maybe there are spaced-out compounds that could with more justification be called phrases. I agree, though, that there is something determining whether a compound can be written with spaces: if it would be too long as a single word, or its meaning is obvious from its constituent parts. At some point on the continuum of each characteristic, it's acceptable to write a word either way. But I don't think either characteristic has anything to do with syntactic category (word or phrase) either. — Eru·tuon 01:59, 3 August 2017 (UTC)
Given my background in computer science, it seems axiomatic that you lex before you parse, and that you have to figure out what a word is before we starting figuring out what stuff means. That's sometimes not possible in computer or human languages, and pauses in audio would be more reliable than spaces in text, but things should be broken into words ideally before we get into syntax.--Prosfilaes (talk) 03:20, 3 August 2017 (UTC)
@Prosfilaes: I don't know anything about computer science or quite what lex and parse mean, but what I mean by syntactic category is word, phrase, clause, or noun, verb, adjective, etc. So which things are words is connected to syntax. Anyway, from what programming I've done (mostly on Wiktionary), programming languages are far more tightly constrained and more straightforward to analyze (if not figure out what their actual purpose is) than human languages, so I don't know how much of the process is similar to analyzing the lexical or syntactic categories of human words. — Eru·tuon 21:22, 4 August 2017 (UTC)
The basic ideas of lexing and parsing used in computer languages were designed by Chomsky for use in human linguistics. The point is, we can't talk about nouns and adjectives before we figure out what words are. In both human and computer languages, you lex (split text into words and specific punctuation marks) and then you parse, and occasionally you're forced to go back and relex the text in light of the parsing. But in both cases, you do the vast majority of breaking stuff into words before you start trying to figure out the meaning. There's a reason why spaces and verbal pauses exist in languages; it's to make it easy to clearly split things into words.--Prosfilaes (talk) 23:04, 4 August 2017 (UTC)
Well, it seems my use of the name syntactic category for word and phrase got you on the tangent of lexing before parsing. I don't know, maybe syntactic category isn't the right term. I have no idea. And I don't see how lexing before parsing relates to whether compounds are words or phrases. — Eru·tuon 23:22, 4 August 2017 (UTC)
I would imagine (but I can't speak for Prosfilaes) that if "travel game" is a word in your dictionary, you can just look it up and you know what it means. If it's not in your dictionary, you will assume it's just two words and you look up travel and game. From that, some systems (like Google translate, Babel Fish, etc) could probably end up being fooled into assuming this is roadkill or a really annoying basketball game. W3ird N3rd (talk) 00:01, 5 August 2017 (UTC)
DCDuring, I hadn't even thought of those interpretations yet. Thinking about that, I realized game also means wild animals hunted for food. It would depend heavily on context whether a non-native speaker could actually make that mistake, but I think it would be funny as hell. In the text "We were very hungry because we didn't pack enough food. But at least while on this trip, we enjoyed some travel game." the "travel game" could actually be interpreted as roadkill. Bon appétit! I'm hoping Wiktionary:Beer_parlour/2017/August#Allow_more_SoP_compounds.2C_similar_to_Dutch_and_German. or another rule change based on that will fix this in the future, but I don't think I'm going to hold my breath. W3ird N3rd (talk) 02:12, 3 August 2017 (UTC)
The MWE is also a synonym of away game. DCDuring (talk) 01:08, 5 August 2017 (UTC)

order Arabic disambiguating entries orthographically, not by verbal formsEdit

Currently Arabic disambiguating entries are ordered by verbal forms instead of orthographically, which is not the optimal lexicographical approach. Thus, for ease of reference, يُوجدُ should appear just once in the page for يوجد, specifying it could belong to either verbal form I or verbal form IV. --Backinstadiums (talk) 08:47, 1 August 2017 (UTC)

It should be just as easy as modify a line of code --Backinstadiums (talk) 12:39, 4 August 2017 (UTC)

@Backinstadiums: Huh? What line of code? — Eru·tuon 18:14, 4 August 2017 (UTC)
@Erutuon: I mean it cannot be that much of fuss, just a different grouping in a specific case. If anything should be clarified further, please let me know. --Backinstadiums (talk) 20:57, 4 August 2017 (UTC)
@Backinstadiums: To do this, the template {{ar-verb-form}} would have to no longer display the form number and many entries would have to be edited (there are 35,188 entries in Arabic verb forms, some of which will contain homophonous verbs with different Form numbers). The editing part would be a lot of work, and would probably have to be done by bot, as the entries were in large part created by bot. I'm agnostic on whether the change would be helpful or consistent with Wiktionary organizational principles, and no one else has responded: @Atitarev, Wikitiki89, Benwing2? — Eru·tuon 22:22, 4 August 2017 (UTC)

Just like any issue in life, no matter how much is already done, if it's not in accordance to the optimal lexicographical approach which enables ease of reference to improve the user's usability, action must be taken on it as soon as possible not to worsen resources even more --Backinstadiums (talk) 15:39, 10 August 2017 (UTC)

August LexiSession: circusEdit

Let's go to the circus!

The monthly suggested collective task is to collect words about the circus. I've noticed that Wikisaurus:circus does not exist, and auguste is a kind of clown, so this a great opportunity to look around this topic together!

Let's stop clowning around and juggle some ideas together!

By the way, Lexisession is a collaborative experiment without any guide or direction. You're free to participate however you like and to suggest next month's topic. If you do something this month, please let us know here or on Meta, to let people know that English Wiktionarians are doing something on this topic. I hope there will be some people interested in making some contributions!   Noé 13:43, 1 August 2017 (UTC)

Here's a good start - to be added to Category:en:Circus if appropriateEdit

Circus and sideshow attractionsEdit

Maybe this is me being slightly grumpy because some people in another discussion I started don't seem to entirely grasp what I was suggesting, but aren't some of these SoP?
I personally don't have a problem with any of these and luckily I'm not a SoP nazi, but after a few RfDs this project could end up with more red links than it had when it started. W3ird N3rd (talk) 03:30, 4 August 2017 (UTC)
  • Some of these seem lame and/or SoP. On the other hand sources such as Carny Lingo show that there is a large vocabulary of great charm and linguistic interest. I doubt that we will get very far into that highly desirable content this month, but extracting a list of terms from that and similar sources would be useful for Wiktionary, IMO. I'm not at all sure that the terms fit well into the categories suggested, many better assigned to a category based on usage context, eg, Category:English circus slang or similar. Examples, barnstorm, blow a tip, blow one's pipes, build a tip, burn the lot, carry the banner, clean the Midway, cool out, bail the counter, bat away. Unfortunately, I don't know that we have a good system of such categories, instead duplicating encyclopedic-type "topical" categories. DCDuring (talk) 08:29, 4 August 2017 (UTC)
    I suppose much of this would fit in Category:English circus slang. I hope that {{lb|en|circus slang}} or {{lb|en|circus|slang}} would work. DCDuring (talk) 08:36, 4 August 2017 (UTC)
    My hopes are in vain. I hope someone can rectify the operation of {{lb}} so anyone interested can help us play along with this cross-project effort. Note that there is a considerable overlap between criminal slang and circus slang. DCDuring (talk) 08:48, 4 August 2017 (UTC)

Next steps for Wikidata accessEdit

Hello all,

Thanks to @Daniel Carrero there's now a page to centralize all the discussions and information related to accessing Wikidata data from English Wiktionary. I hope we can improve it soon with examples and documentation :)

We also suggest an enabling date for the arbitrary access: September 7th. If you have any question or concern, feel free to ask. Thanks to the people who worked on this! Lea Lacroix (WMDE) (talk) 13:52, 1 August 2017 (UTC)

Thank you. September 7th looks good to me. --Daniel Carrero (talk) 03:18, 2 August 2017 (UTC)

Best practices for Oxford -ise/-ize variantsEdit

I just made Birminghamize. What should be put at Birminghamise? —Justin (koavf)TCM 00:58, 2 August 2017 (UTC)

It is ridiculous that Wiktionary lacks a basic policy on how to handle these variant English spellings: afraid of offending others / nobody willing to take charge / deeming status quo as good enough / etc., ... so there we go - both color and colour can evolve in parallel. Wyang (talk) 06:05, 2 August 2017 (UTC)
The problem is that someone who dares to set a standard will likely get into an edit war. So nobody touches it with a pole. —CodeCat 19:49, 2 August 2017 (UTC)
This would be a perfect application of Wikidata. —Justin (koavf)TCM 23:56, 2 August 2017 (UTC)
I have never heard of Birminghamise, so unless you can find it being used there is no point in making an entry. But normally -ise verbs are labelled "British spelling" so they can appear in Category:British English forms. DonnanZ (talk) 23:45, 5 August 2017 (UTC)

Etymology giving me problemsEdit

Can someone review:

for the etymologies that I've added? All of these words are directly taken from Spanish but I've clearly not made them all correctly formatted. Also, I'm not sure if there's a different way of noting a language which is a creole based on [x] versus a language which simply adopts one word from [x]. (E.g. the difference between a Haitian Kreyol word derived from French versus using "facade" in contemporary English). Thanks. —Justin (koavf)TCM 02:24, 2 August 2017 (UTC)

The relation between a creole word and its etymon from the lexifier doesn’t fit very well with the inherited/borrowed dichotomy. We should consider adding templates for other special kinds of derivation like this and substrate “borrowings”, semi-learned borrowings, etc. — Ungoliant (falai) 02:45, 2 August 2017 (UTC)
It's good to see someone else basically ratify that. For creoles/pidgins, it's really a different matter than to go from stages of a language (Old English → Middle English → Modern Englishes) or inheritance in a family (Proto-Germanic → English). —Justin (koavf)TCM 04:43, 2 August 2017 (UTC)

French Wiktionary monthly news - ActualitésEdit


I am happy to inform you that the 28th issue of Wiktionary Actualités just came out in English!

As usual, Actualités is in English but talk about French Wiktionary and lexicography in general.

In this edition main articles are: a presentation of the Lingua Libre project to record words, a summary of a strange dictionary and a thought about lemmas and grammatical categories. And more: shorts, statistics (including new ones like the number of pages that include a link to a thesaurus) and an explanation about the Linter.

As usual, it is translated in English by non-native speakers, so it is not perfect, but it can be improved by readers (wiki-spirit as usual). Please note that we do not received any money for this publication and we are not supported by any user group or chapter. It is only written by the community. Feel free to leave us comments!   Noé 09:09, 2 August 2017 (UTC)

Allow more SoP compounds, similar to Dutch and German.Edit

So there was a discussion last month about deleting SoP compounds in German and Dutch. Now triggered by "travel game", perhaps we could explore pros and cons for the opposite. That is, allowing English SoP compounds in ways similar to the way they would be allowed in German and Dutch.
So exactly what does that mean? Put simply, if some SoP would pass an RfV and is not using any common/universal word (like "brown" or "fan") it would be allowed. This means you still can't create brown leaf or large box, but you could create burger joint and sheep farmer. Also computer chip and lab rat, those already exist but I'm not sure how they could be justified by the current rules. Optionally you could exclude any SoP with a space that is unambiguous. (like sheep farmer)

  • Pro: while it may be possible to figure out the meaning of a SoP by looking up the parts, it's not always easy. The parts may have more than one possible meaning so you need to figure out the correct meaning for all the parts.
  • Pro: in the case of "travel game", travel and game don't really make it clear what a travel game is. Travel game is probably SoP and adding the attributive noun use to travel just results in an instant revert. So basically it's impossible to describe a "travel game" on wiktionary.. in English. I could, however, describe the Dutch word reisspel.
  • Pro: fietshater (bike hater) would pass RfV and should be allowed by the current rules. It wouldn't be allowed with these new rules because hater is universal and can apply to thousands of nouns and verbs.
  • (added august 4) Pro: translations. How would you translate juice extractor (SoP) to Dutch? Juice is sap, but how to translate extractor? The correct answer is sapcentrifuge, but would you have ever guessed it? Wiktionary is useless in this case, and this example wasn't even that ambiguous.
  • (added august 5) Pro: We can delete ex-pilot. (Wiktionary:Requests_for_deletion#ex-pilot).
  • Con: there will be more entries on wiktionary.

I'm not taking a stance on this myself yet, I just think it's worth thinking about. I may not be seeing the whole picture. I haven't made up my mind yet and I think it's a good idea. I wonder what you think. W3ird N3rd (talk) 09:18, 2 August 2017 (UTC)

Sorry W3ird N3rd, strong oppose on that. As I mentioned in the discussion you're referring to, we need to have some sort of quality control around here – having more entries on Wiktionary doesn't necessarily boost our credibility if said entries are redundant. --Robbie SWE (talk) 09:57, 2 August 2017 (UTC)
There would still be a form of quality control. RfV requirements still apply and common words are not allowed. Optionally you could add that if the SoP is fully transparant (like sheep farmer) it is still not allowed, while allowing burger joint and travel game. You talk about quality control, but you actually don't have that control right now as I could create fietshater (bike hater) and you probably couldn't do a thing about it. W3ird N3rd (talk) 10:47, 2 August 2017 (UTC)
I support allowing all attestable English compound words, and no longer making it spelling-dependent. Consequently, WT:COALMINE would be superfluous, as coal mine would no longer depend on the attestability of coalmine for inclusion. —CodeCat 10:53, 2 August 2017 (UTC)

Perhaps we could deal with SOP compounds differently than with other lemmas, effectively soft redirecting them to their constituents while keeping them for consistency, maybe like
(literally) A mine from which coal is dug
(literally) An exhibition (tentoonstelling) of Khoekhoe (Hottentot) tents (tent)
While being subject to usual attestation rules and linked from translation tables as hottentottententententoonstelling f where applicable (of course we won't have a page for Khoekhoe tent exhibition to link translations from).
It could get messy with languages with transliteration though. Crom daba (talk) 13:04, 2 August 2017 (UTC)
Why don't we include any rubbish as terms and forget about CFI? Who cares about this dictionary and its reputation, anyway? --Anatoli T. (обсудить/вклад) 13:33, 2 August 2017 (UTC)
We already host all sorts of rubbish, my approach would make it more manageable and invisible in most use cases. Crom daba (talk) 14:00, 2 August 2017 (UTC)
I don't see the connection between being more inclusive of compounds and quality. The more useful lexicographical content we can provide, the better. RFV provides good quality control, alongside making sure our entries are clean and properly formatted. —CodeCat 14:33, 2 August 2017 (UTC)
@Atitarev and @Crom daba, please be aware that hottentottententententoonstelling is a terrible example that isn't even related to this discussion because it is a joke word and tongue-twister. This word is not, will not and has never been used to refer to any kind of actual exposition. I used it in the other discussion to demonstrate how hard it can be to break down Dutch compound words, but hottentottententententoonstelling isn't SoP. W3ird N3rd (talk) 14:42, 2 August 2017 (UTC)
Yes, I've read the entry, I'm merely using it to show how non-idiomatic words could be formatted. Crom daba (talk) 14:49, 2 August 2017 (UTC)
You may know that, but I think Atitarev possibly doesn't and now thinks Wiktionary will be filled with thousands of rubbish words like that. W3ird N3rd (talk) 14:54, 2 August 2017 (UTC)
oppose. There's no need for most multi-word English terms in English, and nobody will look them up.--Prosfilaes (talk) 23:02, 2 August 2017 (UTC)
Wanna bet? How can you be so sure? Nobody has any idea what passive users search for. DonnanZ (talk) 14:16, 3 August 2017 (UTC)
Also, people do look them up. Pageviews for lab rat are similar to minibar. In addition, how can you say nobody will look something up when the thing in question doesn't (or isn't allowed to) exist? And another thing: translations. We can't have a juice extractor because it's SoP. So now translate juice extractor into Dutch. Good luck with that. You will correctly find sap for juice but how are you supposed to translate extractor? Here's the answer: a juice extractor in Dutch is a sapcentrifuge. Which you could have found if you had looked up juicer (it just so happens a single-word synonym exists here, this is not always the case), but you won't find that if you're looking for a juice extractor, which is the term I'm most familiar with. The very fact is that I had to look up this example on Wikipedia: w:Juice extractor which helped me find juicer. And it's just sheer luck that a juice extractor happens to be encyclopedia-worthy. W3ird N3rd (talk) 01:10, 4 August 2017 (UTC)
This is true. Professional translators seldom need ordinary dictionaries (such as collegiate dictionaries), we want dictionaries that are mainly multi-word, such as the French-English Dictionary of Petroleum Technology. Multi-word dictionaries are the gold-standard and they command high prices. My Dictionary of Petroleum Technology cost me $115 in 1980. In my translating company, we virtually never used any of the ordinary dictionaries (such as Websters, OED, Random House, American Heritage), we only purchased and used the very expensive multi-word dictionaries. Even now that I'm retired, I never use the simple word dictionaries. Almost all the terms I ever have to look up are multi-word terms, and Wiktionary does not handle those. Translators have to equip themselves with a pile of very expensive dictionaries, and all of them multi-word. —Stephen (Talk) 03:46, 4 August 2017 (UTC)
  • I find the juice extractor argument convincing. So far our CFI mainly cover "does anyone want to look it up?" and less "might anyone want to translate it?" Been on a treasure hunt within Wiktionary for translations myself in the past. Korn [kʰũːɘ̃n] (talk) 14:33, 4 August 2017 (UTC)

General question: what does SoP compound even mean?

Compounds have a continuum of transparency of meaning, but they generally do not have a single possible meaning. If they are formed from two nouns (as for instance travel game), there are several possibilities. I'm somewhat rusty on them, but I gather that travel game is a tatpurusha, where travel is added to game to signify a particular type of game, and travel has the meaning of a particular prepositional phrase (or in Sanskrit a grammatical case. Putting aside the other forms of compound, the relationship of travel to game is unknown when you're newly encountering the word. The actual relationship, in terms of grammatical cases, is locative: "a game played during travel". But there are other possible interpretations, such as "a game consisting of travel" (like, I dunno, a long-range treasure hunt?). Meh, it's not a very good example, or I'm not very good at brainstorming about possible meanings.

There are compounds that might be clearer: for instance, bike-hater. A noun combined with hater is often the object of hater (the thing that is hated). But even there, theoretically it could mean "a hater who is on a bike".

So I don't think compounds can be SoP in the same way that regular phrases or sentences are, like "some people hate bikers". There isn't one predictable semantic relationship between the elements of a compound the way there is with phrases. In the previous case, some people is the subject of hate, and bikers is the direct object of hate: that's the only way it can go down.

So what is a SoP compound? I have no idea. I think it should be well-defined for it to serve as a CFI. — Eru·tuon 21:15, 4 August 2017 (UTC)

  • "We were very hungry because we didn't pack enough food. But at least while on this trip, we enjoyed some travel game." roadkill!
  • travel game: A game to play on a journey (like I spy or punch buggy)
  • a physical game, like chess, designed for use on a journey (magnetic pieces etc)
  • geocaching / geohashing
  • travel business (Doug Parker is a big name in the travel game)
  • Something that resembles a game with rules, despite not being designed: in the travel game, being held up for security checks is becoming less of a drag and more of a routine nowadays
  • The ability to seduce someone, usually by strategy:
Watch him. He's got a great travel game.
-He's got a WHAT?
Travel game. Basically he just takes any chick he picks up to Paris. Guaranteed success.
  • The travel game that is used by airlines where they offer cheap tickets but charge extra for additional luggage, meals, toilet visits and use of the oxygen mask is really disgusting.
  • (basketball) I'm getting tired of these travel games. They just travel for most of the game time. It's not funny anymore.
  • (childbirth) There's nothing fun about the travel game, but all that is forgotten when the mother is holding her newborn baby.
Will this suffice? ;-) W3ird N3rd (talk) 23:45, 4 August 2017 (UTC)
Then again, all of these sound valid to me, which could be an argument that compounds really are a sum of parts, or rather a product. Crom daba (talk) 16:03, 5 August 2017 (UTC)
That's the trick. They may sound valid, but most of them are completely invalid.
Unlikely to pass RfV:
  • roadkill
  • a game of basketball with lots of travelling
  • the ability to seduce someone
  • game that involves travelling
  • a questionable or unethical practice
  • childbirth
Using a universal part:
  • travel business (possibly won't pass RfV either)
  • something that resembles a game with rules, despite not being designed (possibly won't pass RfV either)
  • a game to play on a journey
  • a physical game, like chess, designed for use on a journey
So by the proposed guideline, only the last two definitions would be included. But that's not final, you could argue about exactly what should and should not be included. For example, you could argue that if a valid entry for travel game already exists, it's acceptable to add travel business (if that would pass RfV) while at the same time not allowing an entry to be created solely for travel business. W3ird N3rd (talk) 16:56, 5 August 2017 (UTC)

What about if I would word it like this:

  • SoP compounds with an irregular translation in another language are allowed. (or at least their translation section would be) This will allow juice extractor because of sapcentrifuge.
  • SoP compounds with a space or hyphen that have no irregular translations in another language, have only one meaning and this meaning can be reasonably obtained by looking up the first definition of the seperate words are not allowed. This would possibly cover sheep farmer assuming there are no irregular translations in another language.
  • SoP compounds using parts that can be universally applied (the parts are not related in any way) are not allowed, unless they are idiomatic. This excludes "brown leaf", "large box", "luxury boat" and ex-pilot but allows more cowbell. (w:More Cowbell)
  • (added august 6) Compounds without a space or hyphen that have only one non-universal part (like sockless) are only allowed if their usage is vast - far beyond the current three-independent-durably-backed-up-sources rule. Common words like hopeless or pointless should be kept, but exactly how much sense does an entry for boatless make?
  • Any entry still needs to be able to pass an RfV.

Maybe this is more clear? W3ird N3rd (talk) 16:56, 5 August 2017 (UTC)

I believe we should have some rules similar to WT:COALMINE in order to avoid unproductive RFD discussions and give contributors a chance to to predict if their new SOP entries will pass RFD. I guess many editors don't feel liked spending time on creating entries that later on might get deleted. This will slow down the rate of coumpound term entry growth which IMHO are necessary for a usable multiligual dictionary. I dont't believe that a vote to allow all attested terms currently has any chance to pass. In the past some such rules have been proposed:
  • Including all terms with lesser common single-word synonyms.
  • The lemmings priciple which would grant inclusion if a term is covered by a list of trusted dictionaries (which still have to be specified).
  • We already have some translations-only entries (Category:English non-idiomatic translation targets), however there is yet no rule to prevent their deletion. We probably want to keep them if they have idiomatic translations for a number of languages.

Matthias Buchmeier (talk) 23:32, 5 August 2017 (UTC)

From what I understand, I would have to create a vote at Wiktionary:Votes. I would probably need some help to have any chance of getting that right. You say such a vote would have no chance to pass, but if there's anything I learned from politics it's this:
  • If you want something to pass, bring it up for voting when everybody who is against it is on vacation.
  • If you want something to pass, just attach it to another bill that is being voted on that will pass. (not possible on Wiktionary)
  • If neither of those are feasible but at least a third of eligable voters are in favor, just bring it up for voting again and again and again and again. Sooner or later it'll pass because either those who are against it missed the vote, those who are against it don't vote because they figure it'll never pass anyway (that's one of the reasons Trump was able to win) or some current event or hype changes what people think and the vote passes.
There are more strategies, but these are the big ones. Once it has passed, it'll be virtually impossible to take it off the books again. W3ird N3rd (talk) 03:38, 6 August 2017 (UTC)

I just came across the following: towelless, fishless, bikeless, streetless, boxless, fireless, woodless, barless, magazineless, goldless, bronzeless, schoolless, cardless, mapless, pantless, sockless, appleless, watchless, morningless, kingless, bossless, condomless, monitorless... (this goes on endlessly) Next time you see somebody saying Dutch or German needs to be treated differently on Wiktionary, slap them in the face with this list. W3ird N3rd (talk) 07:00, 6 August 2017 (UTC)

The problem is that a lot of contributors have the justified fear that allowing all attested multiword compounds would flood the database with low quality entries. I believe that the best way to overcome this problem would be some set of well-designed inclusion rules. Matthias Buchmeier (talk) 17:48, 6 August 2017 (UTC)
I think the load of -less variants is low quality. A bunch of these wouldn't even pass RfV. So be it German, multiword compounds or just plain English -less variants of words: we need better inclusion rules. The current inclusion rules allow rubbish like boatless while prohibiting travel game. They will also allow fietshater and perhaps even bike-hater while nothing prevents lab rat from being deleted. I think the five-bullet point list I made above is at the very least a good start. But if there's no chance of any change ever becoming policy, I might as well give up. In that case a completely new wiktionary needs to be started, which would be a downright shame. W3ird N3rd (talk) 06:30, 7 August 2017 (UTC)
boatless would easily pass RfV, and is a translation of an Egyptian term (iww, with a hook above the i) that would pass your translation terms argument. I recall an old dictionary has a page of un- compounds without definitions; it hardly hurts us to give stuff like that boilerplate entries.--Prosfilaes (talk) 09:23, 7 August 2017 (UTC)
Looks like boatless is a bit of an odd duck. It's not used a lot on websites (which is what I initially checked for), but quite a few books use the word. As for the translation, I wasn't aware of that and was only referring to the RfV. I don't terribly mind having such entries around, but it just feels like insanity to have those while not allowing entries that are not nearly as obvious "because SoP". W3ird N3rd (talk) 13:30, 7 August 2017 (UTC)

Kajkavian – language, dialect or something inbetween?Edit

Recent changes to Kajkavian prove that there is a dispute in the linguistic community as to the classification of this dialect/language. I hate to see this entry be turned into a political battlefield, so let's decide once and for all – is it a dialect or language, and should this page be protected to avoid any future disputes? --Robbie SWE (talk) 10:12, 2 August 2017 (UTC)

I'd stick with the conservative option and call it a dialect, at least until it gets an army and a navy. Crom daba (talk) 10:52, 2 August 2017 (UTC)
It's still not settled. Its status has been disputed for a long time, but it has been classified as a dialect of Serbo-Croatian since about 1950 or so. Many of the Yugoslavs get very worked up about it, one way or another. I agree with Crom daba, I think we should keep it as a dialect until there is something closer to a consensus that it's a separate language. I thought about getting an opinion from User:Ivan Štambuk, but Ivan seems to be absent. I think it's been over a year since Ivan's last serious edit. —Stephen (Talk) 11:17, 2 August 2017 (UTC)
If you're interested in opinion of other Yugos, @Vorziblix, Biblbroks might respond. Crom daba (talk) 11:31, 2 August 2017 (UTC)
Opinions from other Yugos might be helpful, but only if they are linguists and are philosophically moderate. The last time we asked for Yugoslav opinions, everybody from the Serbian, Croatian, and Bosnian Wikipedias came here and we almost had a shooting war. With User:Ivan Štambuk, we knew his education and philosophy, so he was very helpful in things such as this. Ethnologue does not recognize it yet. SIL mentions it only as a literary language. I don't know what to make of that. —Stephen (Talk) 13:35, 2 August 2017 (UTC)
I don’t have a strong opinion either way, as I’m not knowledgeable enough about Kajkavian to say whether it would be more convenient to keep it merged or split. For reference, however, here’s an old discussion of this same subject with Ivan Štambuk. — Vorziblix (talk · contribs) 21:33, 2 August 2017 (UTC)
"at least until it gets an army and a navy" Wait, all I need to have my own language is an army and a navy? Why has no one told me this before! **starts gathering troops**
On a more serious note, you may want to look at and compare the West Frisian language, a dialect that relatively recently became recognized as a language. W3ird N3rd (talk) 15:18, 2 August 2017 (UTC)
We are completely indifferent to official "recognition". We consider things separate languages (and give them separate codes) based on linguistic considerations, though admittedly our results are not always consistent: we treat all Serbo-Croatian and Chinese varieties as a single language (each), but we treat Bokmaal and Nynorsk as separate languages. —Aɴɢʀ (talk) 16:23, 2 August 2017 (UTC)
In considering these types of questions, I would like us to put more emphasis on lexicographic convenience and less on "linguistic considerations"- that is, will splitting or merging these languages make it easier to maintain the dictionary? Will it make it easier for users to find information that they want? DTLHS (talk) 16:58, 2 August 2017 (UTC)
I think the Frisian case is still interesting to look at. People who only speak Dutch can barely if at all understand Frisian, but for a long time they were (for example) not allowed to use evidence in Frisian in court. It was not until 1980 that Frisian got the status of a required subject in primary schools. I think it also took a while before they got their own Wikipedia. And they are very, very, very proud of their language and it sounds like that is a factor with Kajkavian as well. If you are curious how different it really is, try The narrator is speaking Frisian, the man who appears after 14 seconds into the video is speaking regular Dutch. For written text, try versus For a long time this wasn't recognized as a seperate language. W3ird N3rd (talk) 17:08, 2 August 2017 (UTC)
We're already led by convenience, Serbo-Croatian wouldn't have won out were it not massively inconvenient to quadruple our work here. Crom daba (talk) 18:28, 2 August 2017 (UTC)
Only in some cases. We have both Scots and English and two different varieties of Norwegian (as well as just "Norwegian"). DTLHS (talk) 18:36, 2 August 2017 (UTC)

New competitionEdit

Hello. If anyone wants to play Emoji-Pictionary, I set up a game at User:WF on Holiday/Comp. As with most games I started in Wiktionary, there are probably loads of mistakes, loopholes, spellos, bad grammars and confusing instructions. But once we've got used to them, we can play happily. On a side note, I'm sure some of our Previous games could be modified by some tech-savvy folks in such a way as to allow normal people to play them. --WF on Holiday (talk) 23:18, 2 August 2017 (UTC)

Arbitrary behavior of certain administratorsEdit

There is an administrator being completely arbitrary on certain entires, as you might see here for example: [4] where he eliminates a translation of a word on the basis that he does not feel that it is a good translation, and yet the example he leaves in place about an LGBT film festival contradicts his assertion. This is, sadly, a consistent pattern and not merely one example; originally I had added "queer" as a translation while citing a specific example of it being translated that way in the name of an Israeli organization, and he eliminated it on the basis that he personally felt it did not fit and was offensive. His behavior is despotic; instead of requesting verifications he just acts as an absolute authority and is nothing but combative when I ask for simple things like justifications for his actions.

It's bad for the project because there are processes. He does not seem to be holding himself to the standards that other wiktionary users are held to, but acting as if it's his personal dictionary. He disagrees with a translation so instead of putting a RFV template on it, he just deletes it and locks the page.

Furthermore he's projecting a considerable amount in his responses, acting as if I am trying to impose my personal views when I am citing specific examples and he is citing no examples other than "I speak Hebrew," which i don't think is the way things normally go on Wiktionary? Like I speak Esperanto but I still have to justify my work on Esperanto terms, as 99% of Wikimedia users have to do.

I don't think Wiktionary was created so that certain people could impose their opinions without justifying them, and people who justify their edits by giving specific examples are treated as if they are troublemakers. I think it was created for the opposite reason and that fairness and transparency are still supposed to be important. Ligata (talk) 14:04, 3 August 2017 (UTC)

I recommend people engaging in a discussion over this take a look at the respective admin's talk page and think of the fact that Wiki-projects are known to prevent new users from joining by stubborn aggressive culture of long-term users. I also strongly advocate that the discussion here not get derailed by a smokescreen (talking about Hebrew definitions) but instead stay on topic (proper conduct and bureaucracy). Korn [kʰũːɘ̃n] (talk) 14:48, 4 August 2017 (UTC)
I had a similar issue with this travel edit. It may have been the wrong place, but I think those were some good examples. Instead of correcting it or requesting a fix/cleanup he just chucked it. In most cases that would be the end of it, but I mentioned him on this page asking to explain this. I don't expect most new users to be that assertive or to even notice their edit has been undone. He still hasn't shown up here and I thought he ignored it, but only just now do I see he did do something in response to that (or so the timelines would suggest): which is nice, but I think that would still benefit from the examples I had written. But I can't risk putting something back in that was removed by an administrator. I can understand his time is limited and he can't properly fix every mistake he finds. I get that. But isn't that what Wiktionary:Requests_for_cleanup would be for? W3ird N3rd (talk) 20:30, 4 August 2017 (UTC)
We don't even have time to resolve everything at WT:RFC as it is now (see all the archived unresolved requests). --WikiTiki89 20:38, 4 August 2017 (UTC)
Is that a valid argument for deleting/reverting edits that aren't perfect? The idea behind a wiki is that a valueable contribution doesn't have to be complete or perfect. But by reverting edits that are not perfect, you can quickly discourage any new users from hanging around. In the long term, you will indeed not have enough manpower to verify and clean edits. The cleanup request page isn't very well advertised, that may also contribute to this. W3ird N3rd (talk) 21:16, 4 August 2017 (UTC)
Some badly formatted entries are found many years after they are created. Thus, dealing with them as soon as they are noted is essential. —CodeCat 21:18, 4 August 2017 (UTC)
Some - so you just delete everything before anyone could even have a chance to fix it. If mice keep getting into your house, the solution is not to burn down your house. W3ird N3rd (talk) 01:40, 5 August 2017 (UTC)
If your house could do with some new furniture, but you can't afford any, the solution is not to fill it with mice... Equinox 10:25, 5 August 2017 (UTC)
But if many of your friends are carpenters, you might fill it with not-quite-perfect furniture and put a post-it on it to remind you something needs to be done about it, instead of sitting around in an empty house. And possibly chuck the nonperfect furniture anyway if it's still not fixed after a month. The very least IMHO is that the user who made the edit is (could possibly be partially automated) informed about what was wrong and what needs to be changed before putting that content back. Right now it's just "POOF, it's gone, and if you put it back you risk a ban". Like my examples for travel, I think they would now fit perfectly below the usage note, but I feel like it's a risk to put them back in because SemperBlotto is an administrator. I would have already done it had SemperBlotto been a regular user.
Obviously edits that you would consider mice (vandalism) are not what I'm talking about here. W3ird N3rd (talk) 13:05, 5 August 2017 (UTC)
I do sort of take your point. It's bad that we automatically revert every mess when some (10%? who knows?) messes contain something good. But the entries are public-facing. It suggests that maybe we need some kind of "limbo" or intermediate edit-o-space that allows stuff to exist before it's shown to every random visitor. I can't be the first wikidork to think of this. For now, although it's annoying, I think our approach is as good as it gets. Equinox 00:29, 7 August 2017 (UTC)
Wikipedia uses "Wikipedia:Pending changes" on controversial pages so that edits don't go live until they have been reviewed. We could perhaps apply it to all pages here, and patrol the log of pending changes needing review, instead of our current system of "patrolling" Special:RecentChanges, which some changes slip through. But the actual result might be an extremely large backlog of pending changes awaiting review. This was discussed at least once before; I don't recall many people having strong opinions, but enough opposed it that it wasn't implemented. - -sche (discuss) 06:01, 15 August 2017 (UTC)
  • Since the actions complained of are not administrative in nature, perhaps it would be better to title this section "Arbitrary behavior of certain editors". Cheers! bd2412 T 14:58, 5 August 2017 (UTC)
@BD2412 But there's a difference. If an administrator removes something, you can't put it back. Even if you slightly alter it and believe that is sufficient to fix it, you can't put it back because the user who removed it happens to be an administrator. If you do it anyway you risk a ban. This wouldn't trouble me nearly as much if a regular user had deleted it, I would just fix it and put it back without having to worry about it. W3ird N3rd (talk) 03:03, 6 August 2017 (UTC)
I don't think that's true at all. It would be a substantial misuse of administrative authority to use that authority in connection with one's own editing dispute. bd2412 T 03:06, 6 August 2017 (UTC)
This.__Gamren (talk) 08:38, 6 August 2017 (UTC)
@BD2412 User_talk:Stephen_G._Brown#Abuse_of_blocking_and_page-deleting_powers_by_SemperBlotto.3B_de-cratting_and_de-sysopping_required feels too much like that for me to risk it. While the user in question was wrong (and making silly demands), it makes it clear to me that putting back any content deleted by an administrator is risky. W3ird N3rd (talk) 09:18, 6 August 2017 (UTC)

Extinct speciesEdit

Are there any categories for extinct species, or do they go in other categories? I just unearthed Kangaroo Island emu. DonnanZ (talk) 16:02, 4 August 2017 (UTC)

A taxonomic approach would just put them in existing categories (where they exist) alongside extant species. A language-centered approach would favour putting them somewhere else, and not mixing them with extant species. —CodeCat 16:12, 4 August 2017 (UTC)
There is a convention in taxonomic names to place the symbol "" before the name unless such a symbol is not necessary due to context. (See practice on Wikispecies.) We have begun implementing the practice of putting the "" on the inflection line for entries of extinct taxa and elsewhere if the word extint is not already in a label.
English vernacular names do not use the symbol, so it is arguable that a categorical distinction might be useful for some purposes. For many purposes, however, the presence or absence of the word extinct together with the capabilities of search would be sufficient. DCDuring (talk) 22:12, 4 August 2017 (UTC)
There is no value lexical value in saying if a particular species is extinct or not, anymore than if a particular institution is defunct, or a person is deceased. All that matters is if the term still has some kind of usage or currency. —Justin (koavf)TCM 00:07, 5 August 2017 (UTC)
By what definition of lexical? Does lexical exclude definitions, ie, semantics? We have the word extinct in so many definitions. DCDuring (talk) 01:04, 5 August 2017 (UTC)

Can anyone get through to User:Jeff Weskamp?Edit

They are adding Cherokee entries with manual transliterations, even though automatic transliterations work perfect for Cherokee. This isn't really a big issue, but it's silly and so I left a message on their talk page. They don't seem to have noticed it at all, though, even after I sent another message. Is anyone able to get through to them? A user that ignores their talk is bad, even if they aren't currently causing trouble. —CodeCat 17:10, 4 August 2017 (UTC)

Jeff also edits other Native American languages, including Navajo. I have not checked all of his edits, but quite a few of them.Those I've checked always seem good, even if he adds transliterations unnecessarily. I have attempted to talk with him a time or two, but I don't believe he has ever replied to anyone. I've known other editors who try to avoid interpersonal communication, so it does not seem all that odd. Jeff just takes it to an extreme level. —Stephen (Talk) 22:11, 5 August 2017 (UTC)
Jeff is now adding improper categories to entries, so I hope they will start listening. —CodeCat 19:10, 19 August 2017 (UTC)

Languages distinguishing dotted and undotted iEdit

Recently I added some code to distinguish dotted and undotted i (Iı, İi) in Turkish and Azeri sortkeys . Till now, they were merged by being converted to lowercase (→ iı, ii) and then uppercase (→ II, II) using English rules (mw.ustring.upper). Thus, words beginning with both i and ı were sorted under I when they were categorized using templates.

Currently the fix only applies to Turkish and Azeri. Are there any other languages currently on Wiktionary that distinguish dotted and undotted i? — Eru·tuon 20:42, 4 August 2017 (UTC)

The following languages have entries with both dotted and undotted i's: Azeri, Crimean Tatar, Egyptian, English, Gagauz, German, Italian, Karakalpak, Tatar, Translingual, Turkish, Zazaki. DTLHS (talk) 21:10, 4 August 2017 (UTC)
Egyptian, German, Italian? And even English? Really? —CodeCat 21:14, 4 August 2017 (UTC)
Italian: dımlı, German: homurdanmayı, Egyptian: ḥtrı͗, English: Category:English terms spelled with ı. DTLHS (talk) 21:26, 4 August 2017 (UTC)
The Italian and German look like errors. The Egyptian is used with a combining diacritic, and it should just use a regular i. As for the English, most of them are probably better attested with a regular i and therefore should probably moved to those spellings. Regardless, English speakers would not treat i and ı as different letters, so sorting them together is correct. —CodeCat 21:31, 4 August 2017 (UTC)
@DTLHS I don't think the German entry you fixed is correct, still. In the lemma entry, the inflection table says it's the definite accusative form. —CodeCat 21:54, 4 August 2017 (UTC)
I guess, then, what I'm really asking is for which languages would we actually want the sortkeys to distinguish the two? — Eru·tuon 21:30, 4 August 2017 (UTC)

I'm going to guess that all the Turkic (and Turkic-influenced) languages in the list should have dotted and dotless i distinguished: in addition to Turkish and Azeri, Crimean Tatar, Gagauz, Karakalpak, Tatar, Zazaki. — Eru·tuon 21:51, 4 August 2017 (UTC)

(edit conflict) Judging by w:Dotted and dotless i, there's the potential in Turkic languages that use the Latin script, even as an alternative, but nowhere else except for ad-hoc use in romanization. Our entry for ı lists only Azeri, Crimean Tatar, Gagauz, and Turkish. Of course, texts in other languages can have names attested in their original spelling, but such cases are so rare that I doubt there are many (if any, at all) with dotting determining their order in any confusing way. Chuck Entz (talk) 21:57, 4 August 2017 (UTC)
I've put the languages that I listed above in a table in Module:languages. I should verify that each one actually has a regular orthographic system that uses the letters, though. — Eru·tuon 00:05, 5 August 2017 (UTC)
Okay, I looked at Wikipedia articles and Wiktionary categories, and Crimean Tatar, Gagauz, Karakalpak, Tatar, and Zazaki all seem to either regularly use dotted and dotless i, or have entries that use them. — Eru·tuon 00:31, 5 August 2017 (UTC)

Category name: "words pseudosuffixed with" or "words ending in"Edit

Which naming convention should be used for suffixlike endings: "words pseudosuffixed with" or "words ending in"? Examples for both: Category:Esperanto words pseudosuffixed with -acio; Category:Esperanto words pseudosuffixed with -enco; Category:Hungarian words ending in -ikus. --Panda10 (talk) 23:58, 4 August 2017 (UTC)

I prefer "ending with", because I haven't heard "pseudosuffix" before, but I wonder how we could prevent the creation of ridiculous categories for every sequence of letters at the end of the word: like for naming, ending with -g, ending with -ng, ending with -ing (though that's a suffix), ending with ming, ending with -aming. That is, what counts as a "pseudosuffix" or ending such that it gets to have a category? — Eru·tuon 00:03, 5 August 2017 (UTC)
I think these things (pseudosuffixes) are called formatives. Crom daba (talk) 00:26, 5 August 2017 (UTC)
Also desinence. --Vriullop (talk) 07:41, 5 August 2017 (UTC)
See also: previous discussion in July at Etymology Scriptorium.
"Desinence" means typically "inflectional" rather than derivational. With some ovelap with "formative", there's also "formant", used to refers to endings that are not known to be certainly segmentable at all (so e.g. ölyv would have a "formant" -v). "Ending in" is probably good enough a starting point, provided that we craft descriptions for these that clarifies that they are not pseudo-rhyme categories (e.g. we would not want sing in a category "English words ending in -ing").
Something that specifies the etymological origin, such as "ending in Latinate -ikus" might work. This also prevents the risk of bloat through people starting to add "ending in -X" as useless "wrapper" categories for every "suffixed with -X" category.
I'm not sure how these categories should be meshed with the pre-existing suffix categories, though. Do we put them in parallel, or as a parent category for the corresponding proper suffix category? I would lean towards the former, with crosslinks from the category description, but I'm open to arguments in other directions. --Tropylium (talk) 07:48, 6 August 2017 (UTC)

Sanskrit vs. Old Indo-AryanEdit

Currently, Module:languages lists only Sauraseni Prakrit as a direct descendants of Sanskrit. This is IMO completely misleading because there is nothing to prove that Sauraseni is any more a descendant of the Vedic dialect of Old Indo-Aryan than any other Prakrit. A simple example is Sanskrit क्षेत्र (kṣetra, region), from Proto-Indo-Iranian *ĉšáytram. The regular outcome of *ĉš in Middle Indo-Aryan is "ch". This is found in all of the Dramatic Prakrits as "chetta" (alongside a "kh" form, that likely came later as part of artificial alignment with Sanskrit), including Sauraseni. Indeed, where Sanskrit simplifies Proto-Indo-Iranian clusters to क्ष (kṣa), the Middle Indo-Aryan languages preserve the original cluster. If Shauraseni was a direct descendants of Vedic Sanskrit we would see only "khetta", no "chetta". So, that being said, we have two options.

  1. Remove Sauraseni as a Sanskrit descendant – Note that CAT:Terms inherited from Sanskrit has been cleared out with Wyang's help, so no module errors will occur. This is keeping in line with our treatment of Sanskrit as only Vedic Sanskrit (+Classical Sanskrit), not all Old Indo-Aryan.
  2. List all of the Dramatic Prakrits (Sauraseni, Maharastri, Ardhamagadhi) as direct Sanskrit descendant – This was suggested at Category talk:Hindi Tadbhava, and would involve treating Sanskrit as a dialect continuum of all Old Indo-Aryan + Classical Sanskrit. WT:ASA would have to be modified accordingly.

Personally, I think either option is better than the status quo. —Aryaman (मुझसे बात करो) 04:00, 6 August 2017 (UTC)

Pinging @JohnC5, माधवपंडित, DerekWinters. —Aryaman (मुझसे बात करो) 04:01, 6 August 2017 (UTC)
I would prefer option #2, ie, considering Sanskrit to be the entire group of mutually intelligible dialects, for the sake of convenience. Wiktionary treats Avestan, Old Norse & Serbo-Croatian as one language while in reality they're all two or more dialects. We can do the same for Sanskrit. ɱɑɗɦɑѵ (talk) 04:46, 6 August 2017 (UTC)
Not to mention none of the non-Vedic dialects are (well-)attested. And we could always have a reconstructed entry *च्शेत्र/*च्षेत्र (*cśetra/*cṣetra) if it is needed. —Aryaman (मुझसे बात करो) 06:34, 6 August 2017 (UTC)
There is already dialectal diversity within "Sanskrit". Strictly speaking even Classical Sanskrit does not descend from Vedic Sanskrit precisely, but from a parallel dialect that was not written down until later. This in mind, we could probably treat all Middle Indo-Aryan (and most of New Indo-Aryan) as descendants of "Sanskrit". Where MIA diverges from Classical Sanskrit, it would be possible to create reconstructed Sanskrit forms (similar to Category:Latin reconstructed terms). Perhaps we could outright consider merging "Proto-Indo-Aryan" into Sanskrit? Same deal as how we already equate Latin with Proto-Romance. --Tropylium (talk) 07:58, 6 August 2017 (UTC)
I agree that Sanskrit should be the collection of OIA dialects put together. However we cannot merge it with PIA because we need PIA for the Mitanni language. DerekWinters (talk) 15:11, 6 August 2017 (UTC)

making Tagalog an LDLEdit

This was supported in WT:RFVN#hagok by @Metaknowledge, Mar vin kaiser, Atitarev, Stephen G. Brown (I think). @Rgt2002, TagaSanPedroAko may also have opinions. Please discuss.__Gamren (talk) 08:19, 6 August 2017 (UTC)

I agree that Tagalog is an LDL. —Stephen (Talk) 08:42, 6 August 2017 (UTC)
I also agree that Tagalog is an LDL. --Mar vin kaiser (talk) 09:22, 6 August 2017 (UTC)
Do we have any quotations of Tagalog in use in Wiktionary? Do we know of any online corpora that we can use? Is a usable corpus to find quotations in use? Can Tagalog texts be found in Google books? What methods can a third party use to verify that Tagalog is so poorly documented that we should allow single mentions for it? --Dan Polansky (talk) 09:55, 6 August 2017 (UTC)
Yes, Tagalog is very poorly documented here in Wiktionary, but thanks for me being a native speaker of Tagalog, I am making efforts to make Tagalog a largely documented language here, from being a least documented language, or a LDL. I agree that Tagalog is still a LDL, and yes, there will be efforts to add quotations showing sample use of Tagalog words for a certain sense. Maybe finding interesting quotes in Tagalog by notable persons, if not by Tagalog-language publications, may help. -TagaSanPedroAko (talk) 11:23, 6 August 2017 (UTC)
@TagaSanPedroAko: The discussion is not about whether Tagalog is well documented in the English Wiktionary but rather whether it is well enough documented on the Internet, by which the users of the phrase mean, whether there are enough quotations of Tagalog in use (not dictionaries) to be found on the Internet. Since, these quotations of Tatalog in use is what the English Wiktionary uses for verification, per WT:ATTEST. And there is a proposal to allow single mentions in dictionaries to suffice for verification of Tagalog; single mentions do not suffice for English, Spanish, German, and multiple other languages. --Dan Polansky (talk) 11:50, 6 August 2017 (UTC)
There are a very few mainstream Internet sources for use in quotes that use Tagalog. The vast majority of Tagalog sources on the Internet will mostly be self-published, but if you can find one reliable one, like a book in Google Books or a Tagalog news website, then, here we go. I'm aware that there are reliable Tagalog (or Filipino) sources in the Net that attest use of certain words, but that will be difficult with the majority of Philippine Internet media use English. If I can dig through a reliable source, then, good.-TagaSanPedroAko (talk) 11:58, 6 August 2017 (UTC)

Unsolicited Babel requestsEdit


"Could you please add {{Babel}} to your user page? I'd appreciate it. --Dan Polansky (talk) 08:41, 5 August 2017 (UTC)"

I suppose @Dan Polansky means well, but in my book this is spam. The biggest problem I have with this is that he makes it look like it's a personal message. He says he re-types it every time he posts it, but it's still the same message every time. I wouldn't mind if he wrote a personal message for every request and explained why it would be so valuable to him to see that user getting a Babel, or if he would make it clear in the message that it's not really personal.

I personally don't appreciate these messages, but maybe it's just me. W3ird N3rd (talk) 10:17, 6 August 2017 (UTC)

  • The primary purpose of user pages it to give other editors an idea of an editor's competence in a particular language. Babel boxes are the best way of achieving this. Please add a babel box to your own user page (if and when you create one). SemperBlotto (talk) 10:22, 6 August 2017 (UTC)
I have seen plenty of users with a babel box, I thought about it and decided not to create a user page at this moment. If and when I do, I don't think I'll add a babel box. I don't really like them. W3ird N3rd (talk) 10:34, 6 August 2017 (UTC)
  • Funny how you're complaining about Dan "spamming" talk pages with something useful to the project... by spamming this forum page. —Μετάknowledgediscuss/deeds 00:44, 7 August 2017 (UTC)
  • I think you don't know what spam is. Spam means unsolicited bulk electronic messages. Being useful or not doesn't matter, although useful spam is less likely to be frowned upon. If you are getting e-mail that you didn't ask for from you local supermarket with various offers that you actually like, it's still spam. I have only brought this up here, nowhere else. I don't have any intention of posting this anywhere else either. I'm also not asking anyone to do or buy anything. You may find this pointless and you are entitled to your opinion, but that does not make this forum post spam. In my opinion the babelbox is getting enough exposure as it is. If such messages are accepted, it might lead to a slippery slope. I just wanted the community to be aware of this phenomenon, if the community thinks it's fine I'll say no more. W3ird N3rd (talk) 05:49, 7 August 2017 (UTC)
The Beer Parlour is the place to discuss these things. This discussion is not spam. That said, personally I'm OK with Dan requesting people to use babel boxes. Sometimes we need to know who speaks a certain language, and the boxes make that job easier. --Daniel Carrero (talk) 05:53, 7 August 2017 (UTC)
I think Babel boxes are a good thing,t requesting them is a good thing, and not responding constructively to such a request is a bad thing. DCDuring (talk) 06:06, 7 August 2017 (UTC)
I also think that it pays for such a request to have some explanation of the purposes served. DCDuring (talk) 06:08, 7 August 2017 (UTC)
Adding a Babel table should be our standard policy, if it's not already. A standard {{welcome}} message includes that request. If users refuse to tell other users what languages they know or they don't they should go somewhere else. Not knowing a language doesn't mean that you can't edit in that language but others editors can check your edits accordingly or monitor edits. --Anatoli T. (обсудить/вклад) 06:21, 7 August 2017 (UTC)
Technically you've fulfilled the request. You've added {{Babel}} to your userpage. Wyang (talk) 06:29, 7 August 2017 (UTC)
And now that we know that you can't speak any languages, any of your contributions will be ignored. SemperBlotto (talk) 06:42, 7 August 2017 (UTC)
[5]suzukaze (tc) 06:48, 7 August 2017 (UTC)
But, at an earlier, saner time: [6]. DCDuring (talk) 11:16, 7 August 2017 (UTC)
I don't exactly like that W3ird N3rd doesn't have a Babel box, but if the user doesn't want one don't make them feel forced to have one. Some very contributing members of Wiktionary don't have user pages at all. That said, W3ird N3rd isn't exactly spamming this forum, but I just don't feel like this discussion is appropriate for the beer parlour, especially since it's targeted at one user alone (Dan). PseudoSkull (talk) 01:49, 8 August 2017 (UTC)
It feels a bit out of place indeed, but I've looked around and Wiktionary:Information desk, Wiktionary:Tea room and Wiktionary:Grease pit were clearly the wrong places. Although this post is indeed about one user, my comment was about the phenomenon. I don't know if any other users are doing this, but what I said would apply to them all the same. My biggest issue is probably this line: "I'd appreciate it." which was repeated for all users. Maybe it's because I'm Dutch (the Dutch are known for being direct), but I just can't stand it when someone pretends to care.
Just one more thing. I mentioned the possiblity of a slippery slope. One of the reasons I don't want a babel box is because (depending on how many languages you know) it looks like a unicorn just barfed a rainbow. We all know the average Wikipedia user page looks like a Christmas tree and while it won't happen overnight, it must have started somewhere and the road to hell is paved with good intentions. It may not happen at all - but if users start pushing a template, even if this one now is a useful one, it might. I believe it would be more wise not to allow any users to promote templates this way and if it is believed the babel box isn't getting enough exposure, have the administrators decide on a way to inform users. But clearly, I'm standing alone on this one. W3ird N3rd (talk) 03:26, 8 August 2017 (UTC)
If the issue you have with the babel box is too much unicorn barf on user pages, then you could use a different method to give information on what your native language is and what your levels of proficiency are in other languages. — Eru·tuon 03:53, 8 August 2017 (UTC)
There's no slippery slope here: Wikipedia-style user boxes aren't allowed, with the exception of Babel, time zone, and maybe one or two others that provide useful information. That's the way it's been since long before I started here 5 years ago, and I doubt it will change. Chuck Entz (talk) 04:40, 8 August 2017 (UTC)
  • I also see no slippery slope. The Wiktionary community has been very careful to avoid unicorn barf.
And I also see no disingenuousness on Dan's part. I, too, appreciate it when users add Babel boxes to their user pages -- at least, when those Babel boxes are at least vaguely accurate, as they provide the community with useful and usable information on who understands which languages, and roughly to what degree. For a multilingual dictionary project, this kind of user metadata is very useful.
FWIW, W3ird N3rd's behavior comes across as immature, and willfully disrespectful of Wiktionary norms, albeit on a minor scale that's more of a slight annoyance than anything actionable. I suspect some of his (her?) reticence comes from the Wikipedia culture and a lack of familiarity with the Wiktionary project. On Dan's part, I see no spam, and nothing inappropriate in asking for a Babel box.
I hope W3ird N3rd can learn more about how Wiktionary functions, and grow to be a comfortable and productive member of the community. ‑‑ Eiríkr Útlendi │Tala við mig 06:09, 8 August 2017 (UTC)
If my contributions in the main dictionary space are not productive, I might as well stop contributing. It's not going to be all that much better in the future. I thought I was being productive, but thanks for pointing out to me that I'm not. I know you think this is immature, but why should I care? Either I really am not productive, in which case you should just think "good riddance" or I am but you insult me (at the very least that's how this comes across), in which case why should I stay? W3ird N3rd (talk) 14:21, 8 August 2017 (UTC)
  • My perspective: 1. Yes, distributing the same message electronically to a larger number of people is spam. Textbook definition. 2. I see no harm in every user receiving this spam message once as it is merely a request for a useful addendum. 3. This is a Wiki-project, not Lord of the Flies, Jante or a Catholic School in a Celtic country. Wiki itself is based on and centered around voluntary contributions. Of course the community can come together and regulate things to prevent harmful additions to the project, but demanding any user share any information on himself or add a specific thing, that is: Forcing involuntary contributions, is the fucking opposite of what this project is supposed to be and everyone who entertains that trail of thought is indeed about to open Pandora's Box and pervert Wiktionary (an open project where everyone can partake) into a generic online dictionary run by a junta of seniors. Korn [kʰũːɘ̃n] (talk) 10:11, 8 August 2017 (UTC)
Just the request isn't even what bothers me most. Had it been worded like "Could you please add {{Babel}} to your user page? The Wiktionary community would appreciate it." I wouldn't have been even close to as annoyed as I was now. I know what many here will say: "what am I complaining about, that's hardly any different at all, what sort of moron are you, yadda yadda yadda". To me this would make all the difference. It would make it clear Dan isn't personally asking me to do this, he is asking on behalf of the Wiktionary community. Which also means that if I decide to ignore it, I'm not letting Dan down personally. To me, that's a big difference. Again, I don't expect anyone to side with me. It's just my opinion. Yes it is a stupid opinion. I'm a stupid person and there's no need to further comment on that, I admit it, move on. W3ird N3rd (talk) 14:21, 8 August 2017 (UTC)
Instead of "Could you place Babel to your user page? I'd appreciate it," you wanted "Could you please add Babel to your user page? The Wiktionary community would appreciate it"? I can't see the difference and English is my native language. Dan is Czech and he does not have a perfect command of English. Most of our editors have a different language as their first language. It has never occurred to me to be offended by English comments that are not just so. I think most people write the best they can and they don't mean to offend or confuse. The reader should bear some of the load of communication by showing a more tolerance and understanding. It improves the atmosphere. —Stephen (Talk) 16:03, 8 August 2017 (UTC)
I tried to explain it, I'll do it again knowing full well it won't make a difference. If you say "Please do X, I'd appreciate it." I feel like I'm letting you down when I don't do it. (and the community may or may not care about X) If you say "Please do X, the community would appreciate it." it tells me the community in general would prefer this, I'm not letting you down personally if I don't. I wouldn't even think this difference, or at least what I perceive as a difference, would be language-dependent. I suppose not every individual would recognize this difference though. And maybe somehow I'm the only one. In which case I'm wrong and my faulty interpretation lead to a long and useless argument of misunderstanding and contempt. Well, if my understanding of the English language is that shitty I probably shouldn't be here anyway. Which was another reason I wouldn't want to add a babel box: I can't judge to what degree I master any language. W3ird N3rd (talk) 16:38, 8 August 2017 (UTC)
@W3ird N3rd: I don't think that I would feel like I'm letting anybody down by not adding a Babelbox, no matter how the message asking for it was worded. It's really not that important to discuss this imo. —Aryaman (मुझसे बात करो) 04:55, 11 August 2017 (UTC)
It seems to me your English is just fine. Personally, I disagree that Dan's phrasing was due to him being Czech. I suspect he prefers in general not to speak on behalf of "the community". But I could be wrong. — Eru·tuon 17:23, 8 August 2017 (UTC)
Indeed, I don't like to speak on behalf of community. The Babel practice is common but the appreciation is mine. --Dan Polansky (talk) 10:47, 19 August 2017 (UTC)
For example this sentence: "The reader should bear some of the load of communication by showing a more tolerance and understanding.". To me, this seems wrong. (the most obvious fix to me would seem to be to change "a more" to "a little more") It could be a joke (writing a broken sentence to prove your point), a genuine error (even a native could make mistakes) or (which would seem more likely as English is not my native language) this is correct but I just don't understand it. I also think I don't write text the way most people do today: I don't use any kind of spell checker or autocomplete. That may also result in me looking at language in a different way. W3ird N3rd (talk) 17:01, 8 August 2017 (UTC)
(An academic discussion on what is spam) "distributing the same message electronically to a larger number of people is spam": Not really. In my job, I receive job-related emails from management that are distributed to a larger number of people, and they are obviously not spam; spam filters are not designed to remove these kinds of messages. A message related to Wiktionary purpose posted in multiple instances on Wiktionary is not necessarily a spam. The definition of spam is not so simple as some people think; I don't think I have a good comprehensive definition. Being posted to a larger number of people is a component of being a spam, but that alone does not suffice. By the way, our welcome messages are much more of a spam than these requests for Babel given how long they take to read. --Dan Polansky (talk) 10:47, 19 August 2017 (UTC)

Weird arrow next to uses of {{taxlink}}?Edit

@Erutuon, DCDuring, Sgconlaw There is a weird arrow that sometimes appears next to the name of species and such that are formatted using {{taxlink}}. What's its purpose? It looks wrong, and is mentioned nowhere in the documentation. Can we get rid of it? For an example, see пога́нка (pogánka). Thanks! Benwing2 (talk) 20:44, 6 August 2017 (UTC)

I have categories to detect the conditions that cause them, which I consulted as soon as I saw "weird arrow" in the alerts. I found поганка in one of the categories and eliminated it. If they occur when you use taxlink, that means we already have an entry for the taxon involved and the template should be removed. Besides the situation of a new use of the templates there can be "many" entries that are affected by adding a new taxon or vernacular name. When I add either type of entry I try to eliminate any uses of the template in linked entries that would generate the "weird arrow". I will add something about this in the documentation for the two templates, though I don't expect it will be consulted, this being the first time it has come up, though I might be wrong. DCDuring (talk) 21:06, 6 August 2017 (UTC)
Also, I watch the category (as well as most other taxon-related categories) and would have detected the entry the next time I checked my watchlist. DCDuring (talk) 21:09, 6 August 2017 (UTC)
I remember seeing the "weird arrow" before. DCDuring, wouldn't it be sufficient for the template to place entries that require your intervention in the category, without the arrow also appearing? — SGconlaw (talk) 21:23, 6 August 2017 (UTC)
We already have such categories, which I aggressively police to keep empty.
The trouble is that it takes me quite a while to find the instances of redundant templates without using ctrl-f on the displayed text to find "=>". It is always at least a bit faster with the "=>". The problem is worst in entries with unusually large Hyponyms or Derived terms sections, with multiple L2 sections, with the use of {{taxlink}} or {{vern}} in the middle of definitions for polysemous terms or in unexpected locations.
If someone knew a way so that something displayed in the entry that optionally only an anointed few (me included) could see, we could eliminate the need for anyone to consult and grasp the documentation to eliminate the offending "=>". DCDuring (talk) 21:40, 6 August 2017 (UTC)
No idea how to do that. Maybe it could be made more understandable by replacing it with some reduced-size text like "needs attention" (compare the "Invalid ISBN" warning generated by {{ISBN}}), but I don't know whether you think that would make the warning too prominent. — SGconlaw (talk) 21:49, 6 August 2017 (UTC)
The offending "=>" can be eliminated with CSS. If we enclose this symbol in a HTML tag with a unique class name (say class="taxlink-redundant"), and create a CSS style rule that vanishes it (display: none;), which can either be placed in the HTML tag or in MediaWiki:Common.css, then the symbol can be un-vanished at will. Putting the style rule in MediaWiki:Common.css requires the help of an admin. Let me know which option you would prefer and I can give further help. — Eru·tuon 21:56, 6 August 2017 (UTC)
@Erutuon:'s solution seems great. I'm an admin. I would just need to be instruction as to what to put where so that I could still see the "=>" (which has the advantage of being easy to type and rarely used except for this purpose). The name for the style could be something like "redundant template finding aid" or a comprehensible abbreviation of that. DCDuring (talk) 22:16, 6 August 2017 (UTC)
I guess its value as a recruitment tool for proper (non-redundant) use of {{taxlink}} and {{vern}} is not much of a consideration. DCDuring (talk) 22:18, 6 August 2017 (UTC)
Why shouldn't I be taking the approach of having some red text telling folks that they should remove the offending template? I think there is precedent for that. It might even be in continuing use. DCDuring (talk) 22:21, 6 August 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── (edit conflict) Well, I suggested the class name taxlink-redundant, but you can choose a different one. Another idea: redundant-taxlink-mark? Whatever you choose, it should be made up of basic Latin and hyphens. Spaces will be misinterpreted. The code to add to MediaWiki:Common.css (the period . indicates that what follows is a class name):

.taxlink-redundant {
display: none;

And the code to add to your Special:MyPage/common.css:

.taxlink-redundant {
display: inline;

And then, in the template code for {{taxlink}}, replace <sup>=></sup> with <sup class="taxlink-redundant">=></sup>.

If you want to use a different class name, just replace taxlink-redundant in each of the three code snippets with whatever class name you choose. — Eru·tuon 22:33, 6 August 2017 (UTC)

If you want to keep the mark, how about changing it to a message with instructions that only displays in preview mode? For example, <sup class="error previewonly"><small>(Replace {{temp|taxlink}} with a regular link.)</small></sup>. Admittedly, that will be even more annoying than the little arrow thingy. — Eru·tuon 22:37, 6 August 2017 (UTC)

I was hoping to use the same class for both {{vern}} and {{taxlink}}. It might be useful for other similar applications though I don't know of any.
We have plenty of instances of the much more annoying technique used to enforce correct use of templates by displaying 80 or more characters of red text, sometimes with incomprehensible messages buried in them.
I will sleep on this before implementing and give others a chance to weigh in, but thanks for the implementation suggestion. It seems to fit the bill perfectly. I take it that CSS is not much more burdensome on server resources than HTML and doesn't raise the risk of latency problems like JS. DCDuring (talk) 23:15, 6 August 2017 (UTC)
@Erutuon: You had mentioned above that we could accomplish the optional display of default-hidden text if we "create a CSS style rule that vanishes it (display: none;), which can either be placed in the HTML tag or [] ". Where exactly would the HTML tag reside? DCDuring (talk) 18:47, 8 August 2017 (UTC)
The HTML tag that I mean is the <sup>=></sup> that appears in the template source code. — Eru·tuon 18:51, 8 August 2017 (UTC)
That seems like a better implementation, since evidently I am the only one using and virtually the only one aware of this. I could include a reference to the decloaking technique in the documentation for {{taxlink}} and {{vern}}. No adminship required either. DCDuring (talk) 20:01, 8 August 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Well, if you choose that option, you will have to use the following code in your common.css:

.taxlink-redundant {
display: inline !important;

The !important makes the style rule overrule the CSS in the HTML tag; otherwise, the CSS in the tag will win out. — Eru·tuon 20:37, 8 August 2017 (UTC)

That is what I will do. Thanks for the help. If I have problems, I will see you on your talk page. DCDuring (talk) 20:53, 8 August 2017 (UTC)

reading out "-"Edit

In results, such as 2-0, how is the hyphen spelled out? I think such a pronunciation should be added to its entry --Backinstadiums (talk) 21:16, 6 August 2017 (UTC)

Isn't it silent in many cases? The score would just be read as "two nil". I suppose on occasion it would be read "two to nil". (Also, it should really be an en dash.) — SGconlaw (talk) 21:24, 6 August 2017 (UTC)
@Sgconlaw: The singer Tee Grizzley, in his song First Day Out, says "two and o" for 2-0 at min. 3:48 --Backinstadiums (talk) 22:10, 6 August 2017 (UTC)
I agree with SGconlaw - two nil. It can be different in broadcast results, if the home team loses it would be "Team xx nil, Team yy two". DonnanZ (talk) 23:31, 6 August 2017 (UTC)
If we're talking about sports scores and the like, no one would ever say "two nil" or "two to nil" in Canada. It would be one of the following (in rough order of frequency): "two nothing", "two to nothing", "two to zero", or possibly "two zero". Andrew Sheedy (talk) 18:40, 7 August 2017 (UTC)
I just realized this is irrelevant, as the discussion is about the hyphen, not the 0, but oh well. Andrew Sheedy (talk) 19:03, 7 August 2017 (UTC)
:-D — SGconlaw (talk) 03:46, 8 August 2017 (UTC)
It could be either read as "to" or as nothing. When counting game wins rather than the in-game score, it's often read as "and" (in the US at least), as in "My friend and I are five-and-three", although this is more often done for wins-vs-losses of a single party (this is the case in Backinstadiums's song reference above, even though those are trials and not literal "games"). --WikiTiki89 18:55, 7 August 2017 (UTC)

Missing category?Edit

I can't find any category for Washington, D.C. or District of Columbia, only for the state of Washington. I guess there should be one, but what should the name be? DonnanZ (talk) 23:17, 6 August 2017 (UTC)

  • The main form is at "Washington, D.C." so I made a category for that. There are plenty of words that refer to the District. Good call. —Justin (koavf)TCM 23:29, 6 August 2017 (UTC)
Brilliant, thanks. DonnanZ (talk) 23:37, 6 August 2017 (UTC)

Share your thoughts on the draft strategy directionEdit

At the beginning of this year, we initiated a broad discussion to form a strategic direction that will unite and inspire people across the entire movement. This direction will be the foundation on which we will build clear plans and set priorities. More than 80 communities and groups have discussed and gave feedback on-wiki, in person, virtually, and through private surveys[strategy 1][strategy 2]. We researched readers and consulted more than 150 experts[strategy 3]. We looked at future trends that will affect our mission, and gathered feedback from partners and donors.

In July, a group of community volunteers and representatives from the strategy team took on a task of synthesizing this feedback into an early version of the strategic direction that the broader movement can review and discuss.

The first draft is ready. Please read, share, and discuss on the talk page. Based on your feedback, the drafting group will refine and finalize this direction through August.

SGrabarczuk (WMF) (talk) 16:11, 8 August 2017 (UTC)

Unsorted formationsEdit

I've seen Unsorted formations in descendant trees formatted as either * Unsorted formations or ; Unsorted formations. Is there a written guideline for this? --Victar (talk) 16:48, 8 August 2017 (UTC)

The standard practice is with *, so that it's listed on the same level as all other formations. —CodeCat 15:55, 9 August 2017 (UTC)
@CodeCat: Is that outlined in a guide or something somewhere? Like I said, I've seen both, so there doesn't seem be true "standard". @JohnC5? --Victar (talk) 20:47, 11 August 2017 (UTC)
@Victar: I've always used ;. On an unrelated note, Victar, please don't just start moving around entries (specifically the new entries) without discussing with anyone. I'm not convinced that was a good choice and may now have to revert all that. If the is not phonemic then it should not be included; if it is then it shouldn't be subscript. It is extremely frustrating that you just did this. —JohnC5 02:43, 12 August 2017 (UTC)
@JohnC5: I only moved two entries; not the end of the world. Also, very unrelated and should have been discussed elsewhere. --Victar (talk) 03:03, 12 August 2017 (UTC)

Quotations vs. CitationsEdit

I'd like to know the protocol for using them, as well as the differences they are meant to represent --Backinstadiums (talk) 14:43, 9 August 2017 (UTC)

I don't know if we have a standard for the terms, but I have been using the term citation to refer to sources providing evidence for information stated in entries, which are usually placed in "References" or "Further reading" sections. The {{cite}} and {{R:}} groups of templates may be used for this purpose. On the other hand, a quotation is an extract from a source that is provided as an example of the entry in use, and which is placed directly under a definition. The {{quote}} and {{RQ:}} groups of templates is used for them. For example, at merlion, there is one "citation" in the "References" section, and a number of "quotations" under the various definitions. However, note that entry pages have a tab called "Citations" which really contains quotations. — SGconlaw (talk) 15:07, 9 August 2017 (UTC)
I've been confused by this as well. Seems like they are used interchangeably. I've seen plenty of quotations from books that are over a hundred years old, providing no clue of how the entry is used today or how you could use it yourself. Personally I prefer examples. They don't come in a collapsed box, there is no question about proper citing due to copyright issues and they are designed to show how the entry is and can be used without clutter. Personally I put quotes and citations all the same on the citations page. W3ird N3rd (talk) 17:16, 9 August 2017 (UTC)
here are my unpopular opinions: quotations and citations are used interchangeably, I don't think there's a meaningful distinction. "Examples" are made up and may not reflect actual usage, there are no potential copyright issues with quoting from parts of works. The citations page is at best useless and at worst actively harmful and should not be used except to collect evidence for missing words or senses. DTLHS (talk) 17:26, 9 August 2017 (UTC)
If examples don't reflect actual usage they are likely to be bad examples. Copyright issues could arise if a quotation is too long or not properly attributed and laws for this possibly vary around the world. Quotations often are not reflecting actual (common) usage either, so I don't think that's a good reason to have them. W3ird N3rd (talk) 20:18, 9 August 2017 (UTC)
My understanding is that Wiktionary's servers are based in the USA, so it is primarily US law that must be complied with. It is unlikely that the quotations we use would violate copyright for two main reasons. First, all material published before 1923 is in the public domain in the USA and can be freely reproduced. Secondly, most of our quotations are obtained from works available on Google Books and the Internet Archive. If it is possible to view either a snippet or a full page preview of a book on Google Books, then the use of that portion of the book must be fair use under the law. Ergo, quoting an even shorter portion on Wiktionary must also be fair use. — SGconlaw (talk) 11:29, 10 August 2017 (UTC)
I wouldn't go so far as to say that availability on Google Books is indicative of anything, but the amount of text in the kind of quotes we use should fall under fair use. If the quote is too long for fair use, it's way too long for our purposes. Chuck Entz (talk) 14:06, 10 August 2017 (UTC)
In the books I've been reading lately, I've come across at least one to two dozen words in each one that we don't have entries for. Is it safe to take quotations for each of those words from the same book? How many quotations should I limit myself to to avoid violating fair use? Andrew Sheedy (talk) 17:47, 10 August 2017 (UTC)
@Andrew Sheedy First of all you should obviously check those words haven't been made up by the writer and they pass WT:CFI. There is no limit. For each word you should limit yourself to one or two quotes, there is no point anyway in having more quotations from the same work. As for quoting from the same work but on different page entries on Wiktionary, I would say that if the total amount quoted from the work is less than 5% of the entire work you have absolutely nothing to worry about. For a book that means there is no practical limit. For a poem a bit more would be allowed, some poems just might accidentally end up being entirely quoted here in small bits. As long as there's no obvious intention to violate copyright by overquoting a work, you'll be safe. W3ird N3rd (talk) 19:11, 10 August 2017 (UTC)
I'd completely forgotten about the 5% rule--thanks for reminding me. And don't worry, I always make sure to find citations for words before I add them (which is the main reason I haven't gotten around to adding more...). Andrew Sheedy (talk) 17:15, 11 August 2017 (UTC)
@Andrew Sheedy not sure if you are being sarcastic, genuinely grateful, referring to the 5% rule in general or if there really is a 5% rule for quotes/citations. It's just a number I picked, it could have also been 1%, 3%, 7%, etc. The point remains the same however, for fair use (which I think includes the right to quote for the U.S.) it would generally be a reasonable safe threshold. It could be exceeded in various cases, if I wrote a review for a poem that is twice the length of the original poem, there's a good chance I could cite the entire poem in small pieces. But under 5% for all quotes combined you simply don't have to worry about it - which is the majority of the time. W3ird N3rd (talk) 18:33, 11 August 2017 (UTC)
I thought it was an actual rule (i.e. you can legally reproduce 5% of a work). Maybe it is in Canada? I'll have to look that up. Andrew Sheedy (talk) 02:40, 12 August 2017 (UTC)
Yes, Wiktionary servers are in the U.S., but Wiktionary content might be reused by people in other countries without fair use. W3ird N3rd (talk) 19:11, 10 August 2017 (UTC)
Not that it is terribly relevant to the conversation, but Wikimedia servers are not all based in the US, nor should we expect that they will reside in the US exclusively in the future. - TheDaveRoss 12:53, 11 August 2017 (UTC)
Indeed irrelevant to the discussion at hand. I suspect servers outside the U.S. are caching servers, caches have different rules, but if someone wants to know more they should start a new discussion. W3ird N3rd (talk) 05:06, 12 August 2017 (UTC)
  • I think (but this just my interpretation) that citations and the citation page are perhaps meant for long quotes. "I have a dream", "Yes we can" or "Build a wall" would be a quotes. This would explain why quotes are allowed in the main dictionary space: copyright generally shouldn't apply to a quote. For example:
We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard
This is a quote and there's pretty much no chance this is copyrighted, similar to the moon not being copyrightable. However:
We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win, and the others, too.
Is citing Kennedy and likely can be copyrighted, so it requires proper attribution (what proper is depends on the country you are in) and/or be allowed by fair use. W3ird N3rd (talk) 19:11, 10 August 2017 (UTC)
I doubt very much if the latter quotation is a breach of copyright. It is still only a small portion of the entire speech. It might be a different matter if we reproduced, say, a third or half of the speech, but that isn't what we do anyway. I agree with Chuck that the quotations we use here at the Wiktionary are unlikely to raise copyright issues. — SGconlaw (talk) 21:50, 10 August 2017 (UTC)
Fair use is a lot more permissive than the right to quote, but is specific to the United States. W3ird N3rd (talk) 03:25, 11 August 2017 (UTC)
For what it's worth, speeches made by government officials in their official capacity are in the public domain, so none of that speech is copyrighted in the US. - TheDaveRoss 12:58, 11 August 2017 (UTC)
That's true for this example, I should have mentioned that. Thanks. W3ird N3rd (talk) 05:06, 12 August 2017 (UTC)


Do we have a page which lists all entries containing "Idiom" as a headword? If not, can we get one made? I guess we prefer Verb rather than Idiom for things like take an axe to. -WF

  • Before it was declared forbidden, I've used the ====Idioms==== header in the past for set expressions using a particular word, such as at 糞#Idioms. I see that some other JA entries have these expressions listed under ====Derived terms====, which doesn't seem quite right either, as these aren't "terms", sometimes comprising even full sentences.
What is the accepted header for these items now? ===Verb=== is not applicable for most of the Japanese expressions I can think of. ‑‑ Eiríkr Útlendi │Tala við mig 17:24, 9 August 2017 (UTC)
Please limit this to English-only. “Idiom” is the conventional translation for the Chinese part of speech of chengyu. Wyang (talk) 07:42, 10 August 2017 (UTC)
Are we going to have "Haiku" as a part of speech next? —CodeCat 09:45, 10 August 2017 (UTC)
Tangentiality. Are you OK? Wyang (talk) 09:51, 10 August 2017 (UTC)
That doesn't seem a part of speech coordinate with noun, verb, adjective, but I don't know how it would be used. Are they not used as nouns, verbs, adjectives, or something else? It seems like using "Word coined by Shakespeare" as a part of speech header. Admittedly, we have "Proverb", which might be similar. — Eru·tuon 17:32, 10 August 2017 (UTC)
A proverb is generally nothing more than a sentence. —CodeCat 17:40, 10 August 2017 (UTC)
  • Great discussion, but what I wanted was a page listing all entries with {{head|en|idiom}} —This unsigned comment was added by WF back from hols (talkcontribs) at 15:26, 10 August 2017‎ (UTC).
    There's no way to get a single-page listing, but you can search the wikicode for insource:/\{\{head\|en\|idiom/. [edit: Actually, it is a single page, just because there are so few.] — Eru·tuon 20:32, 10 August 2017 (UTC)
    It's also possible to de-list "idioms" from Module:headword/data. Then it would no longer be recognised as a valid POS category and end up in Category:head tracking/unrecognized pos. —CodeCat 20:35, 10 August 2017 (UTC)
    Thanks Erutuon! That's exactly what I wanted. I'm been making my way through those pages. A little cleanup done, and a few of them sent to RFx. --WF back from hols (talk) 23:40, 11 August 2017 (UTC)
    Without a plan on what to do after that, that would be unwise. There are 2,305 entries using {{zh-idiom}} (insource:/\{\{zh-idiom/) and 4,317 entries in the category for Chinese idioms, so that tracking category would become cluttered. And the cooperation of editors who handle Chinese would be needed to get the entries moved to the proper part of speech. — Eru·tuon 21:16, 10 August 2017 (UTC)
    I really hate the mentality that everything that is “improper” in English is assuredly improper in other languages by default, and needs to be “fixed”. Idiom is a perfectly fine part of speech in Chinese, and is in fact the most common translation for Chinese chengyu. Chinese lexicography treats these as a separate category of words, and there are numerous dictionaries compiled just for words belonging to this category. The comprehensive Chinese dictionaries typically do not mark entries by their part of speech, due to the analyticity of the language. In those monolingual and bilingual dictionaries that do, these words are either marked by  成  (cheng, idiom) (primarily in bilingual dictionaries) or unmarked (in Chinese–Chinese dictionaries), in juxtaposition to  名  (noun),  动  (verb),  形  (adjective),  副  (adverb),  惯  (phrase),  谚  (proverb),  歇  (xiehouyu), etc. Examples include the Contemporary Chinese Dictionary, the Comprehensive Standard Chinese Dictionary, the Oxford Chinese Dictionary, the Times New Chinese–English Dictionary and so on. The same idiom can be used as noun, verb, adjective, adverb, etc. depending on the context in the sentence, and their use is different from that of proverbs, phrases, and xiehouyu. Wyang (talk) 22:27, 10 August 2017 (UTC)
    Ahh. If they can be used as multiple other parts of speech, then I can see the lexicographic usefulness of keeping them as they are rather than trying to list all the other parts of speech they can be used as. However, it would be helpful to distinguish them somehow from the concept of idiom in English, which is quite different. The description in the category page Chinese idioms is probably not correct. — Eru·tuon 22:53, 10 August 2017 (UTC)
    I guess those entries should also contain {{lb|en|idiom}} so they show up in Category:English idioms if they are indeed an idiom. Since there are so few that shouldn't be a problem. Some recently disappeared already so this looks like it's getting phased out. W3ird N3rd (talk) 20:47, 10 August 2017 (UTC)
    But is idiom a context in which the word appears? If not, then it might be a misuse of the label template. —CodeCat 20:52, 10 August 2017 (UTC)
    I agree that the POS should be based on how they are used and not where they come from. Thus, I would say "proverbs" should really have the POS "clauses". --WikiTiki89 21:00, 10 August 2017 (UTC)
    Possibly, but in that case the English idiom category will have to be populated in some different way. insource:/\{\{lb\|en\|idiom\}\}/ gives 210 hits. W3ird N3rd (talk) 21:34, 10 August 2017 (UTC)

MW has a new feature to see dates of coinagesEdit (koavf)TCM 07:30, 10 August 2017 (UTC)

Very cool. But really, the dates are sense-specific, and hence word-specific only if the word is monosemic. Wyang (talk) 07:39, 10 August 2017 (UTC)
At first I thought you meant MediaWiki, and was worried they were up to another waste of human resources. --WikiTiki89 15:42, 10 August 2017 (UTC)

-градить and other "combining form"sEdit

A bunch of Russian entries are appearing in Category:head tracking/unrecognized pos, because they use the POS category "verbal combining forms" which is not valid. They are also being categorised as verbs, which is even less correct because these forms don't actually exist. They are only found in compounds, and are thus comparable to creating cran for the first morpheme in cranberry, or liezen for the base verb of verliezen. Something should be done about these. They can't be moved to the reconstruction namespace, they are not reconstructions because they are not conjectured to exist; we know they don't exist. A valid POS should also be used so that they don't clog up cleanup categories anymore. —CodeCat 18:00, 10 August 2017 (UTC)

What's wrong with "Combining form"? Crom daba (talk) 18:54, 10 August 2017 (UTC)
Perhaps they should be recategorized as "combining forms" or have that category added. It is a recognized lemma type in Module:headword/data. I agree they don't really count as verbs in a sense. But I think you are against the idea of a combining form, because you've recategorized combining form entries that I've created. — Eru·tuon 19:03, 10 August 2017 (UTC)
A combining form is a non-lemma form that is used when combined with another morpheme. That's very different from this. —CodeCat 19:05, 10 August 2017 (UTC)
Why is it not a lemma? I see how you can say it's not a real word, but it is a lemma in that it is a form representative of a paradigm of related forms (i.e. the conjugated forms given in the conjugation table). --WikiTiki89 19:30, 10 August 2017 (UTC)
These are lemmas, I'm not disputing that. I'm saying that combining forms aren't lemmas. Most of the categories in Category:Combining forms by language contain nonlemmas, even though the categories themselves are categorised as lemmas. —CodeCat 20:15, 10 August 2017 (UTC)
Oh. Yeah, it looks like what we use "combining form" to mean is completely different from what these are. In fact I was actually in favor of removing the hyphen from these entry names. I think we need to come up with a special name for these. Something like "unused base verb". --WikiTiki89 20:20, 10 August 2017 (UTC)
There's also things like Judeo- which are a bit in between. They are of course combining forms of nouns in Ancient Greek, but in English they don't really belong to anything. Or do they? —CodeCat 20:28, 10 August 2017 (UTC)
I think that's a separate unrelated issue. **гради́ть (**gradítʹ) morphologically could have stood on its own if it existed, but it just so happens that it only exists with prefixes (although it's quite possible that it did exist in Proto-Slavic or earlier). Judeo- I would say is a combining form whose uncombined forms don't exist. --WikiTiki89 20:41, 10 August 2017 (UTC)

Wiktionary: a translation dictionary only?Edit

Should we stop pretending to be a good monolingual dictionary, for the achievement of which the wiki way ("wisdom of crowds") seems ill-suited? Would we be better of playing to what seems to be our strength: translation. This would mean "translation target" would be an automatic justification for any English entry and would upgrade the importance of phrasebook entries and common collocations. It would probably benefit from simplification of complex polysemous entries like those for technical, let alone really polysemous terms. DCDuring (talk) 19:21, 11 August 2017 (UTC)

The project should probably be forked, to support the deletionist and never-delete-anything-ist camps. Equinox 19:29, 11 August 2017 (UTC)
A "good translation dictionary" necessarily describes complex polysemous English words, so no. DTLHS (talk) 23:12, 11 August 2017 (UTC)
How can we a good translation dictionary now, then? DCDuring (talk) 04:23, 12 August 2017 (UTC)
I suspect this has been triggered (although probably not initiated) by the revert of my edit on "technical". Wiktionary:Requests_for_cleanup#technical W3ird N3rd (talk) 23:45, 11 August 2017 (UTC)
Don't take it too hard. I know that basic nouns, verbs and adjectives with multiple senses are hard and the most basic function words are harder yet. If cleaning up technical were easy, then I would have done it myself. I'm out of practice and never successfully tackled any basic function words. DCDuring (talk) 04:29, 12 August 2017 (UTC)
I think the solution, to appease both camps, is to actually allow the oft-discussed collocations section/namespace/whatever. This would allow us to be a far better translations dictionary because each collocation would have a translation section and we wouldn't have to resort to the controversial "translation target argument." The inclusionists would also be able to include far more, since much of what is hotly debated in RFD could be kept as a collocation. Deletionists could also be satisfed because there would be less pressure from inclusionists to keep SOP collocations in the mainspace. Andrew Sheedy (talk) 04:53, 12 August 2017 (UTC)
Wiktionary has included languages other than English for a long time, if not from day 1. So it's only right that Wiktionary should be translation-oriented. One inconsistency I have found regarding SoP terms is that there are entries for vegetable soup and pea soup, yet none for tomato soup, and there's bound to be translations for that. One nice touch I have just found is translations for soft-boiled egg and hard-boiled egg listed under boiled egg. DonnanZ (talk) 13:36, 12 August 2017 (UTC)
Inclusion of collocations is at most half of the solution. If, as User:DTLHS notes, "[a] 'good translation dictionary' necessarily describes complex polysemous English words", how do we improve our entries for such terms? Or is the current state of these entries good enough for translation work and for ESL learners, with native speakers mostly ignoring such entries anyway?
If we include more collocations, can we rely on the entries for collocations to share the burden of the definitions for verbs like go (go clubbing) and "particles" like abox and away?
Is it reasonable to admit that we can't really help those users who take a component-oriented approach to looking at sentences? Just as we say that users need help in determining where morphemes break in German and other compound nouns, we should also say that users can't be expected to know which meanings are only fully captured in collocations. Expecting users to wade through derived terms in go to find go clubbing does not seem very realistic. If a user knows to go to clubbing, that user probably doesn't need the go clubbing entry at all. DCDuring (talk) 16:22, 12 August 2017 (UTC)
I don't really object to making translation our focus. However, in order to be a truly comprehensive translation dictionary, I think we also have to be a comprehensive monolingual dictionary. And I don't think we're really doing as badly as you feel. Yes, we're a long way from being another OED, but we're also good enough that I'm able to use Wiktionary as my primary dictionary. Conversely, we actually suck at translations from English into other languages (even common ones like French and Spanish). I'm really not convinced it's our strength. FL to English translations tend to be much better, but even these are often lacking. The reality is that we're a work in progress on all fronts, and always will be.
Now, if we include collocations, I think we should handle them more or less as follows:
  1. Do not move definitions from main entries over to collocation entries (some duplication is fine, and people should still be able to find what the want in the main entry);
  2. Create separate entries for them, rather than hosting them in another mainspace or on the same page as any of their component words (we could potentially treat collocations like "forget about" differently from "piece of furniture", the latter having its own entry, the former sharing a page with "forget");
  3. Label them with a banner just as we do for phrasebook entries, to mark them as SOP and allow us to continue to function as a monolingual dictionary, regardless of our focus;
  4. Include full definitions in collocation entries, for clarity;
  5. Eliminate obvious information like pronunciation or etymology from collocation entries, but retain things like translations and synonyms;
  6. Link to collocations from the entries of each of their component words (excluding really basic words, like articles);
  7. Use either "Derived terms" or "Related terms" (possibly renamed) or a new "Collocations" section to host lists of collocations, subdividing the list into different categories as necessary;
  8. Allow collocations in all languages so that we can truly function as a translation dictionary: someone translating from French should be able to look up "pointe de pizza" or "se faire tuer" (or find these in the entries for pointe and pizza / faire, and tuer) and find the corresponding English collocations, "piece of pizza" and "get oneself killed".
I doubt we'll ever solve the problem of people taking a component-oriented approach to looking up multiword terms or collocations. But that doesn't mean we shouldn't try to be helpful to those who do know how to identify multiword expressions. The best we can do is list multiword terms and collocations in the entries for each of the constituent parts, and make long lists easier to navigate by splitting them up by category. Andrew Sheedy (talk) 17:46, 12 August 2017 (UTC)
I'd prefer it if we hosted collocations but they were not listed at constituent lemma pages and generally had close to zero incoming links to them. Crom daba (talk) 18:48, 12 August 2017 (UTC)
How would a person find them all then? Andrew Sheedy (talk) 21:50, 12 August 2017 (UTC)
  • One interesting aspect of mass inclusion of collocations as a new class of entry is that we would be substituting two boundaries that needed some kind of policing for one. Instead of a single include/exclude decision, we would need to decide whether to include or exclude and whether something was a collocation or an idiom. I am not confident that we would achieve any more agreement in total on these two decisions than we do now on one. Are we imagining that any collocation at all would be entered, subject to current RfV? live free or die? parlare con tono di condiscendenza? Wouldn't we be increasing the number of truly offensive items? Should we exclude full sentences that are not proverbs and not phrasebook entries? (More decisions!!!) DCDuring (talk) 19:31, 12 August 2017 (UTC)
Very true, although maybe we would be able to create a stricter set of criteria for determining whether something is SOP or not? I think a lot of RFD debates would be mostly solved if entries could be kept as collocations: those where terms are technically SOP, but not transparently so (sometimes because they use obscure senses of a word), and are not necessarily easily understood (e.g. "nature preserve"); those where an expression uses a more or less consistent word order and has become a fixed phrase; and those where the only justification for keeping an entry is its value as a translation target. I think most people could agree to keep such entries, but label them as collocations. Andrew Sheedy (talk) 21:50, 12 August 2017 (UTC)

IPA ≠ audioEdit

Entries where the pronunciation is different from that in the audio, as for example in polemic, should be automatically detected and listed --Backinstadiums (talk) 20:25, 11 August 2017 (UTC)

How do you propose we do that? DTLHS (talk) 20:28, 11 August 2017 (UTC)
@DTLHS: Auto-generated subtitles could be created using some software and then compare both columns of data. 90% of the job would be done that way, the rest could be manually reported individually as @Wyang has proposed --Backinstadiums (talk) 07:12, 12 August 2017 (UTC)
You vastly overestimate the ease of generating written transcriptions, much less IPA, from audio files. Others can probably explain better why this is so difficult. See e.g. [7]. DTLHS (talk) 07:32, 12 August 2017 (UTC)
Not automatically, but perhaps via a more accessible feedback system: “Saw an error on the page? Report it here.” Wyang (talk) 21:52, 11 August 2017 (UTC)

Merging Category:Chinese language and Category:Sinitic languages to a single category (Category:Chinese language(s)?)Edit

Sinitic languages is just another name for the Chinese languages. It is confusing to have both categories on Wiktionary. It seems there is room for improvement in the category system for macrolanguages; there are categories such as Category:Mandarin terms derived from Sinitic languages which really should be renamed to Category:Mandarin terms derived from other Chinese languages. Wyang (talk) 07:14, 12 August 2017 (UTC)

I agree that the current situation, in which we have two sets of categories for what is basically the same entity, is confusing. It would be hard to merge the two categories, however. x language is a category created by {{langcatboiler}} that uses data from Module:languages, while x languages is a category created by {{famcatboiler}} that uses data from Module:families. And currently only a language with a data file can have entries; a family cannot. I'm not sure how to merge the two in the existing system. And what code would we use for the combined entity? How can we make something be simultaneously a language and a family? — Eru·tuon 23:41, 12 August 2017 (UTC)

Language request: Old KannadaEdit

Old Kannada (Kannada: ಹಳೆಗನ್ನಡ (haḷegannaḍa)) needs to be included. Proposed code: okn. It is a Dravidian language. Immediate ancestor: Proto-Tamil-Kannada. Scripts: Brahmi, Kadamba, Kannada. Descendants: Middle Kannada -> Modern Kannada. ɱɑɗɦɑѵ (talk) 07:28, 12 August 2017 (UTC)

That seems like a reasonable language to add. Can you give any examples or indication of how different it is from Kannada kn? (Other notes: Exceptional codes need to be formatted differently, so it would have to be dra-okn. We cannot add Kadamba script because it seems that it is not in Unicode. Proto-Tamil-Kannada is also not registered as a language.) —Μετάknowledgediscuss/deeds 07:33, 12 August 2017 (UTC)
@Metaknowledge: There's a significant difference between Old Kannada & its modern descendant. It's barely intelligible with modern Kannada. Some sound changes (like the transformation of Proto-Dravidian *p to Kannada [h]) are not present in Old Kannada. The case-suffixes are also different. As for the script, I hope it'll be acceptable to create lemmas in Old Kannada in the brahmi or the kannada script. -- ɱɑɗɦɑѵ (talk) 12:26, 12 August 2017 (UTC)
@माधवपंडित There seems to be a distinction between Old Kannada and Purva Halegannada. Should we encode them separately? DerekWinters (talk) 13:27, 12 August 2017 (UTC)
@DerekWinters: I saw that as well. About 500 years of time gap. I think Pūrva-Halegannada is what we'd call pre-Old Kannada or Proto-Kannada. But the matter source is small... as it is, Halegannada is poorly documented on the internet. If i'm not wrong Proto Kannada attestations are from just a few oldest inscriptions. Perhaps we can make Proto Kannada an etymology only language, used in etymology but cannot have lemmas of its own. -- ɱɑɗɦɑѵ (talk) 13:35, 12 August 2017 (UTC)
It can be like Primitive Irish or Pictish, attested from very few sources. Personally I think it better to add it separately. DerekWinters (talk) 13:39, 12 August 2017 (UTC)

A quick update on changes of translation adderEdit

I have updated the gadget to fetch language scripts from the module. Also, it fails (gracefully of course) if the input script is not in the list of scripts from module. So, you may notice some functionality changes. Let me know if the changes are for the worse. Dixtosa (talk) 19:21, 12 August 2017 (UTC)

Is it anything to do with the annoying little +- signs that have popped up in translations sections? DonnanZ (talk) 19:30, 12 August 2017 (UTC)
Those were always there, but the spacing is off now ([8]) (Chrome) DTLHS (talk) 19:37, 12 August 2017 (UTC)
Yes. Fixed. Dixtosa (talk) 20:29, 12 August 2017 (UTC)
Also added the ability to hide the transliteration input if the language has automatic transliteration that overrides manual.--Dixtosa (talk) 13:08, 20 August 2017 (UTC)

Distinction between derived and related termsEdit

It's been a long while since I did any serious editing here. I've been updating some of the derived words sections. I noticed that the section for "language#Derived terms" looked very sparse, so I added some more entries. After doing so, I saw that many of them were already in the Related terms section.

Has policy changed lately? My understanding has always been that Derived terms is for words formed by appending affixes ("metalanguage") and compounds ("dead language", "language lab") and that Related terms was reserved for words that are etymologically related in some other way ("linguistics", "lingua franca").

I notice that the Wiktionary:Entry_layout page doesn't make this very clear and doesn't give any examples. Perhaps it could be updated?

In the meantime, I'll tidy up Derived terms and Related terms for language, but please revert if this is no longer the way things are done.

Paul G (talk) 19:38, 12 August 2017 (UTC)

Technically a derived term is also a related term. So sometimes there are lists of terms that people have just put all together under "related terms" without distinction. Your understanding matches mine and your edits to language look fine. DTLHS (talk) 19:43, 12 August 2017 (UTC)
That, too, is my understanding of the distinction between the two terms. — SGconlaw (talk) 20:06, 12 August 2017 (UTC)
Confusion can also be caused be placing some derived terms under hyponyms and others under derived terms or related terms. Personally I would like to see hyponyms done away with - I can hear the protests already. DonnanZ (talk) 20:38, 12 August 2017 (UTC)
Thanks for the responses. There seem to be a number of pages where derived terms are words formed with affixes and related terms are compounds — rock, for example — so some editors at least seem to have thought this is what the sections are for. — Paul G (talk) 20:47, 12 August 2017 (UTC)

IPA policyEdit

The (phonemic) English pronunciation keys in most of the major dictionaries (as well as the associated Wikipedia article use ⟨r⟩ as a standard phoneme. I feel that if this is the common usage it ought to be a standard policy across Wiktionary pronunciation sections. In many articles people have replaced ⟨r⟩ with ⟨ɹ⟩, ⟨ɚ⟩ et al. and while this is phonetically correct, it goes against the phenomic standard, and had created a disparate mess with little to no consistency. The best solution in my opinion is to just have both phonetic and phonemic pronunciations wherever possible, and make it a policy that ⟨r⟩ belongs in /r/ and ⟨ɹ⟩ belongs in [r], etc. This has the advantage of giving the maximum amount of information, while remaining in standard with M-W, Collins, etc. Any input would be appreciated. --Pariah24 05:07, 13 August 2017 (UTC)

I wonder if it is possible to create {{en-IPA}} to standardise the generation of IPA for English (and represent the dialectal variation; cf. International Phonetic Alphabet chart for English dialects). Having manual IPA on all 480,000+ English lemmas would be a logistical nightmare. Wyang (talk) 05:17, 13 August 2017 (UTC)
We had a discussion about this many years ago. At first I was in favor of using /r/ in the phonemic representation of English, but eventually I came around to the idea of using /ɹ/, chiefly because we are not an English-only dictionary. If we were, if English Wiktionary had only English entries, I still would prefer /r/; but because we have entries in thousands of languages, including ones where /r/ really does stand for [r], I think it's ultimately less misleading to use /ɹ/ for English. —Aɴɢʀ (talk) 07:39, 13 August 2017 (UTC)
I agree with Angr. If we use /r/ for [ɹ], readers seeing /r/ in other languages might mistakenly believe that they represent the same, or similar sounds. We fill a different niche than other English dictionaries, and as a result, our policies might differ in some areas. Andrew Sheedy (talk) 17:14, 13 August 2017 (UTC)
I agree with Angr and Andrew. Since most English speakers pronounce ⟨r⟩ as [ɹ], it's appropriate to use /ɹ/. I'd second Wyang on creating an English IPA template. — justin(r)leung (t...) | c=› } 19:28, 13 August 2017 (UTC)
There's no need for a separate template- normalization can take place in the IPA module. DTLHS (talk) 19:29, 13 August 2017 (UTC)
How can one implement a module without a template, exactly? —Aryaman (मुझसे बात करो) 20:39, 13 August 2017 (UTC)
Huh? All IPA is already processed through Module:IPA. All we would need to do is implement specific rules for English. DTLHS (talk) 20:47, 13 August 2017 (UTC)
@DTLHS: I would assume we would make MOD:en-IPA and implement it in {{en-IPA}}, just like every other language with an IPA module. —Aryaman (मुझसे बात करो) 23:03, 14 August 2017 (UTC)
What about English dialects? How do we ensure the correct symbols are used and symbols are used in a consistent manner, for say, RP? Wyang (talk) 21:34, 13 August 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I'm more concerned with just having a standard to go by than what that standard is. I guess I'll start changing /r/ when I see it, although I still think it would be helpful—especially on pronunciations that differ significantly from the standard phonemes—to have separate /phoneme/ and [phone] pronunciations. Pariah24 (talk) 23:41, 13 August 2017 (UTC)

What do you mean by "pronunciations that differ from the standard phonemes"? — Eru·tuon 23:55, 13 August 2017 (UTC)
Sorry if that's an awkward way to put it...I mean pronunciation differences in accents/dialects, and loanwords, and cases like pun and spun whose pronunciations are both /(s)pʌn/ but phonetically are [pʰʌn] and [spʌn]. It would be helpful to someone learning English to have both versions. Pariah24 (talk) 00:11, 14 August 2017 (UTC)
Ahh, I see. Phonetic transcriptions showing the exact pronunciation of stops are welcome. I think there are some transcriptions like that already. As for accents, keep in mind that many dialectal features are phonemic, because dialects do not all share the same phonological system, and so we show them in the phonemic transcriptions. You can see examples in Appendix:English pronunciation. Not shown on that page are the phonemic transcriptions for American English dialects without the horsehoarse merger. (See hoarse for an example.) — Eru·tuon 00:38, 14 August 2017 (UTC)

Regarding whether to create a separate module and template for English IPA transcriptions: I think it would be much neater than adding a lot of English-specific stuff to Module:IPA. I say a lot, because I think it would be a good idea to automatically convert between different transcription systems for RP, if we could get someone who knows enough about them. For instance, automatically displaying both the OED's more old-fashioned transcription of lot, /lɒt/, and Geoff Lindsey's more modern one, /lɔt/. @Mr KEBAB proposed something like this, but I haven't done anything with the idea yet. — Eru·tuon 00:02, 14 August 2017 (UTC)

@Erutuon: Yes I did, but if we're going to use Lindsey system here we should fully follow it, not cherry-pick some of the symbols and not others (I'm saying this because I believe I proposed a mixed system a year ago, this is not a good idea for several reasons). Mr KEBAB (talk) 00:40, 14 August 2017 (UTC)

I think it would be very nice to have something similar to what we have for Ancient Greek and Latin, with different regional or period pronunciations all indicated. The template input for English would obviously have to be the broad phonemic transcription, though, rather than the spelling of the word. We'd have to be careful, however, of cases where pronunciation variants actually represent different phonemes rather than differences of realization; in such a case we'd probably need to call the template multiple times on the page, each time with a separate phonemic transcription and corresponding generated phonetic transcriptions for various dialects – there would be parameters for which dialects/variants to include or not include. – Krun (talk) 00:51, 14 August 2017 (UTC)

An input from outside. In French Wiktionary, we had a large discussion on pronunciation and neutrality two years ago and we renewed our policy. We started by defining that phonetic information have to be based on audio recordings and have to be several to describe variety (in space, time, social groups). A phonemic information have to be based on a specific analysis, made on a specific dialect and can't stand for a whole language. There is a diversity of phonemic representation. To be neutral on this perspective is not to select and promote one analysis (equal one variety) but to give the different analysis, with sources. So: phonetic with audio sources, phonology with written sources (linguistics piece of work).
Finally, we consider the needs of the readers, and we consider they do not need dozen of phonetics and dozen of phonological representations. They want a short information, giving a usual way of pronouncing a word, consensual, as unmarked as possible, and we created a third way to indicate this specific information, with backslash signs like \θis\. This last one is provided in the first part of the page, and the other ones on the second part of the page, for people eager to have more precise information. It was quite not a huge change, but a great improvement in the frame it offer for people to add new information without colliding with existing ones. Less controversies on "false phonological representation" and more accurate descriptions. If you want to know more about this, I can help you, or translate some pieces of French Wiktionary policy   Noé 10:16, 15 August 2017 (UTC)
I think Wiktionnaire has a good system. I find it interesting how the very broad pronunciation is included in the header, but I don't like how more detailed pronunciation is relegated to the bottom of the entry and often neglected. Having a very broad transcription with everything else in a collapsable box might be a good solution. On the other hand, it would be hard to decide what transcription to use when a word has been affected by a merger or split in many dialects. Andrew Sheedy (talk) 16:07, 18 August 2017 (UTC)

Words with uncertain readingEdit

Recently the Egyptian entry jsqꜣrwnj


was added alongside our previously existing entry jsqꜣrnj


But these aren’t in fact two different attestations with two different spellings; they’re both representing a single attestation from the Merneptah Stele, where the original engraver inscribed a hieroglyph very poorly and modern authors have proposed two different readings of what it was intended to be. Do we have any policy about what to do in such a case — keep only the more plausible/widely accepted entry? Keep both? (And, if so, what would they be marked as? Alternative forms, even though they really aren’t?) — Vorziblix (talk · contribs) 10:01, 14 August 2017 (UTC)

Perhaps create an "alternative reading" template, and use it in the entry for the less widely accepted reading. Then list the less widely accepted reading in the Alternative forms section for the more widely accepted form. — Eru·tuon 16:30, 14 August 2017 (UTC)
Sounds good. For now I’ll just do {{form of|Alternative reading}} rather than an altogether new template, but if more of these start cropping up, so that categorizing them becomes useful, I’ll go for a separate template. — Vorziblix (talk · contribs) 23:18, 14 August 2017 (UTC)
Thanks, that clears things up a lot. — Vorziblix (talk · contribs) 23:18, 14 August 2017 (UTC)
Another example is ᚐᚆᚓᚆᚆᚈᚈᚋᚅᚅᚅ / ᚐᚆᚓᚆᚆᚈᚈᚐᚅᚐᚅ. In most cases, it's possible to be reasonably certain how to read an inscription, but when it's not (in individual cases), the practice does seem to be to have multiple (cross-linked) entries. Whether or not it is sensible for one of the entries to be a "form of" redirect to the other entry depends on whether the difference in reading entails a difference in meaning. - -sche (discuss) 06:30, 15 August 2017 (UTC)

Flag gadget edit requestEdit

Could an admin change the URL for the Ancient Greek flag in MediaWiki:Gadget-WiktCountryFlags.css from Flag_of_Palaeologus_Dynasty.svg to Byzantine_imperial_flag,_14th_century,_square.svg? The file has been moved, and there's been an error message in the browser console because the CSS file tries to load the file using the old name. — Eru·tuon 17:41, 14 August 2017 (UTC)

DoneDixtosa (talk) 17:55, 14 August 2017 (UTC)

What's the deal with the garbage "American Sign Language" entries?Edit

Is there an editing tool that produces these, maybe with ASL as the first in a list of languages? I don't think it's a single vandal producing all of them. DTLHS (talk) 20:41, 14 August 2017 (UTC)

@DTLHS: Do you have a link or diff? —Justin (koavf)TCM 21:09, 14 August 2017 (UTC)
People often create fully-formed ASL entries with all the usual headings, but with no actual content or definition. Yes, there is a tool that creates these, but I can no longer find it. I've seen it before. Equinox 21:11, 14 August 2017 (UTC)
  • It's the New Entry Creator; one of its defaults is ASL. —Μετάknowledgediscuss/deeds 21:35, 14 August 2017 (UTC)
    Actually, it's the second-from-the-top entry template, on the search results page, not the New Entry Creator. There should really be an AbuseFilter to take care of those. --Yair rand (talk) 01:21, 21 August 2017 (UTC)

@DTLHS: how would you improve them? --Backinstadiums (talk) 22:12, 14 August 2017 (UTC)

I don't think you understand. They're contentless entries that are deleted on sight. —Μετάknowledgediscuss/deeds 22:16, 14 August 2017 (UTC)

Adding language code 'ghc'Edit

Hi all, I am thinking it might be useful to add the code 'ghc' for the historic common written language of Ireland and Scotland, particularly in cases where it's not clear whether a word derives from Irish or Scottish Gaelic. Gherkinmad (talk) 21:33, 14 August 2017 (UTC)

I don't know what lect you are referring to or when it was used. We have codes for Old Irish (sga) and Middle Irish (mga), and those should suffice. —Μετάknowledgediscuss/deeds 21:37, 14 August 2017 (UTC)
I can kinda see the point. While, technically, Scottish Gaelic can be seen to be differentiating itself from Irish as early as the Book of Deer, for pretty much the entire Middle Ages you can't really tell between them. And everything after 1200 is currently classified as either ga or gd. So a Classical Gaelic could be seen as a useful intermediary step:
  • pgl Primitive Irish (–c.600)
    • sga Old Irish (c.600–c.900)
      • mga Middle Irish (c.900–c.1200)
        • ghc Classical Gaelic (c.1200–c.1800)
          • ga Modern Irish (c.1800–)
          • gd Scottish Gaelic (c.1800–)
That would require some refactoring, though. It would make etymologies slightly less messy: as it is, there appears to be an issue with taking a gd word back through ga to mga. This way, they could both branch from ghc. --Catsidhe (verba, facta) 21:55, 14 August 2017 (UTC)
Do we have a resolution? Gherkinmad (talk) 23:08, 14 August 2017 (UTC)
Resolution? We barely have the start of a discussion! Also, this sort of thing has been suggested before (by me at least once, IIRC) and not happened, so maybe a wider debate will have some impact. --Catsidhe (verba, facta) 23:26, 14 August 2017 (UTC)
(Without expressing an opinion on whether this is a good or bad idea,) it would be possible to add 'ghc' as an "etymology-only language" so that etymologies could refer to it, even if we don't want to add it as a "full language" with its own entries / language sections (which might duplicate many mga and ga entries?). - -sche (discuss) 06:46, 15 August 2017 (UTC)
  • @Angr is the expert, and he hasn't voiced a need for this as far as I've seen. But I'd like his thoughts. —Μετάknowledgediscuss/deeds 03:57, 15 August 2017 (UTC)
    For reference, the code was removed following this discussion in 2013. (I have no great knowledge of the subject and defer to people like Angr and Catsidhe who are familiar with the Irish language(s).) - -sche (discuss) 06:37, 15 August 2017 (UTC)
    My views haven't changed since that 2013 discussion. I think mga, ga, gd, and gv are sufficient to cover all Goidelic lects from the 10th century to today. The problem with making it an etymology-only language is that etymology-only languages are varieties of one particular existing language, but the whole motivation behind ghc is to avoid calling it either Irish or Scottish Gaelic (because it's basically both). —Aɴɢʀ (talk) 08:33, 15 August 2017 (UTC)
    For the purpose Catsidhe is talking about, it seems like it could be considered a variety of Middle Irish... but then, I don't see why branching Scottish Gaelic and Irish from ghc is any better than branching them both from mga, or why branching them from mga like we do now causes "an issue" — Catsidhe, can you explain? - -sche (discuss) 09:39, 15 August 2017 (UTC)
    No one considers Middle Irish going as late as 1800, though. Middle Irish is generally seen as ending around 1200 (much earlier than Middle English, for example), so we consider everything after that to belong to one of the modern languages, even though the literary language (as opposed to the colloquial language) is virtually identical in Ireland and Scotland until around 1800. —Aɴɢʀ (talk) 11:55, 15 August 2017 (UTC)
    Which is why I find the distinction between Early Modern Irish and Early Scottish Gaelic (to 1800) to be annoyingly artificial. There is nothing linguistic which distinguishes just about any given 14C Irish from 14C Scottish. The only way you can tell in most cases is by knowing beforehand where or by whom it was written.
    Also, having ga cover 800 years of history makes it tricky to use for both historical research and for current usage. Unless you're paying attention, it can be easy to miss that one word became moribund in the 16C, and another entered the language in the 1980s. The former case isn't going to help if you're writing a letter to someone in Gaoth Dobhair, the latter isn't going to help if you're doing Mediaeval research. --Catsidhe (verba, facta) 12:16, 15 August 2017 (UTC)
    Yes the motivation is to avoid calling the language either Irish or Gaelic, because in the case of the English word Gael we simply don't know which variety it came from, and we might be a little more honest if we simply said so. The OED has the word first in modern English in 1774/1810 from Scottish Gaelic, but I completely accept that the word might have a longer history in the language, and so I was thinking we could meet Angr half way by saying it derives from Classical Irish/Gaelic, or otherwise simply that it derives from Middle Irish. What we cannot do is say that the English spelling Gael derives from the Irish Gael, because it is in origin a simplification of Gàidheal/Gaedheal/Gaoidheal, having nothing to do with the recent Irish spelling. As the entry reads now it could give the impression that the modern English spelling derives from the modern Irish one, which we all know is not true. Gherkinmad (talk) 16:48, 15 August 2017 (UTC)
    @Angr OK the matter has been more or less resolved. However I would still advocate the ghc code for cases where there is a further intermediary stage, otherwise there are three words for Gael in modern Irish: Gaoidheal, Gaedheal and Gael, all covering the same time period. Any thoughts? Gherkinmad (talk) 16:03, 16 August 2017 (UTC)
    There will have to be three entries for modern Irish anyway, since the spellings Gaoidheal and Gaedheal were used up until the mid-20th century, long after ghc would be over. That's the main reason for my opposition to ghc: it would increase unnecessary redundancy. If we had it, we would have to have Gaoidheal in both ga and ghc instead of just ga; likewise we would have to have new ghc entries for common words whose spellings haven't changed, like fear, bean, mac, , , athair, máthair, and so on and so forth. It doesn't seem worth it to me to duplicate the effort. —Aɴɢʀ (talk) 16:15, 16 August 2017 (UTC)
    @Angr, @-sche Could we add ghc as an etymology-only language? Because as the matter stands, one would have to say that the modern Irish word is first attested in print in 1567 in Scotland, without further explanation. I just don't see a way to credit this properly without referring to a further intermediary stage of the language which is of course still Irish. Gherkinmad (talk) 16:53, 16 August 2017 (UTC)
    As was said before, etymology-only languages always have a parent language that they belong to. This parent is used, for example, in determining which section should be linked to. So which language does ghc belong to, Irish or Scottish Gaelic? It doesn't solve the problem at all, just moves it. —CodeCat 18:29, 16 August 2017 (UTC)
    It belongs to Middle Irish: Scottish Gaelic and (Late) Modern Irish could both branch from ghc whose parent language is mga. I'm sorry for pressing this so much, I just know you really have to make your case if you want to edit Wiktionary. Gherkinmad (talk) 19:28, 16 August 2017 (UTC)
    If ghc's parent language is mga, then this carries the implication that all ghc terms are mga terms. Every link to a ghc term in fact creates a link to a mga entry because of how the parent of an etymology language works. Since every link should have an entry behind it, it implies that any links in etymologies to ghc terms attested from 1200 to 1800 are implicit requests for Middle Irish entries to be created on those pages. So, you want us to create Middle Irish entries for terms attested as late as 1800? —CodeCat 20:05, 16 August 2017 (UTC)
    If that's what will eventuate, no, I don't. I apologize to all code really is not my forte. All I'd want to know (anybody) in closing is how we could accommodate the fact that Gaoidheal and Gaoidheilg (and so Gael, Gaeilge etc.) first saw print in Scotland? Gherkinmad (talk) 21:03, 16 August 2017 (UTC)

Tagalog enclitic formsEdit

In Tagalog, any word ending in "a, e, i, o, u, or n" has an enclitic form (sort of). For example, the word "malaki" (big), to say a "big person" one says "malaking tao", adding an "ng" at the end. And that goes for adjectives, nouns, verbs, all words. The question is, do we make an entry of the enclitic form for all the words in Tagalog that has them? --Mar vin kaiser (talk) 10:51, 15 August 2017 (UTC)

It sounds a bit like English -'s or Latin -que, i.e. a clitic that can be added to virtually anything. And we don't have entries for person's or virumque, so I'd say we shouldn't have an entry for malaking either, but just one for malaki and one for -ng. BTW, how do words ending in other sounds behave? —Aɴɢʀ (talk) 11:58, 15 August 2017 (UTC)
@Angr: Well, for example, the word "maliit" (small), to say a "small person" would be "maliit na tao". Actually some see the word "malaking" to be a contraction of the word "malaki" and the word "na" which links words together. One problem is that for example, the word "taong", it could mean four things,
  1. "taong" - a black veil for mourning
  2. "taóng" - water container (we don't write diacritics to indicate stress in Tagalog, so both are under the same entry)
  3. "taong" - the word "tao" (person) + "na"
  4. "taóng" - the word "taón" (year) + "na"
So my point is, shouldn't the last two be in the entry "taong" also? --Mar vin kaiser (talk) 13:23, 15 August 2017 (UTC)
@Mar vin kaiser: Well, look at butcher's: it has several meanings of its own, but the transparent one of butcher + the clitic -'s isn't actually listed. —Aɴɢʀ (talk) 13:42, 15 August 2017 (UTC)
@Angr: Good point. Although, the entry it's has it. But I do see your point. --Mar vin kaiser (talk) 13:45, 15 August 2017 (UTC)
@Angr: The reason why I feel it's important is because for example, any two words that are beside each other, the first one has to be in enclitic form, and think of the number of entries that have two words. For example, "free will" is "malayang loob", but there won't be any entry for "malayang", only "malaya". And that would go for all the other entries that have two words. --Mar vin kaiser (talk) 13:59, 15 August 2017 (UTC)
@Mar vin kaiser: It's probably at it's because in the standard written language, the one thing it's isn't is it + the possessive -'s, but only it + the contracted verb -'s. As for the headword line, that's not a problem. At the entry for malayang loob, just add |head=[[malaya]][[-ng|ng]] [[loob]] to the headword template. —Aɴɢʀ (talk) 14:13, 15 August 2017 (UTC)


Versageek has been inactive for more than a year, so per the WMF policy her checkuser rights have been revoked. The policy requires that any local wiki have two or more checkusers if they have any, my rights have been suspended as well pending our electing another. We can opt not to bother having local checkusers and simply rely on the stewards to take care of requests, or we can nominate one or more new checkusers and have some elections.
From my perspective it is not strictly necessary to have local checkusers, but it is convenient. Almost all of the work these days is keeping track of and blocking the long-term pests, and making sure we are actually blocking Wonderfool when we think we are. - TheDaveRoss 12:59, 15 August 2017 (UTC)

Having local checkusers is definitely a good thing. I'm surprised WF hasn't made any votes to encheckuserify anyone. —Μετάknowledgediscuss/deeds 16:57, 15 August 2017 (UTC)
"Encheckuserify"? Beware, lest you affixiate. — Kleio (t · c) 20:02, 15 August 2017 (UTC)
I thought User:Chuck Entz was a checkuser, since he does a good job of keeping track of the IPs/locations of various vandals. He seems like a good candidate for the position. - -sche (discuss) 19:47, 15 August 2017 (UTC)
Oddly enough, I probably wouldn't have as much to say if I were a checkuser, since I understand there are fairly strict rules about what information obtained with the checkuser tools can be disclosed and when you can use them. Right now, I get pretty much all my information from geolocating just about every IP that does something out of the ordinary and looking for patterns (that and monitoring the abuse filter logs). I'm not sure what I would be allowed to say/do if I spotted an IP that had earlier turned up in a checkuser investigation (though I could probably block them). That said, I'm game, if everyone thinks it's a good idea. Chuck Entz (talk) 02:43, 16 August 2017 (UTC)
I actually think Chuck is a great candidate, but I was under the impression that we had an old (unwritten?) rule that no one user should have all the user rights at en.wikt simultaneously. —Μετάknowledgediscuss/deeds 04:19, 16 August 2017 (UTC)
You both bring up good reasons for pause. Who else wants the job? We could nominate WF; then he'd have to ID himself to the Foundation to get the flag... ;) lol - -sche (discuss) 05:10, 16 August 2017 (UTC)
I think Chuck is a great choice as well. Re "having all the rights", I don't see a problem there. Our 'crats have a fairly limited scope of responsibility which doesn't much change how they might be able to (ab)use the CU tools. This is a different story than other wikis which have roles such as ombudsmen, abrcom, etc.
Re limiting your ability to act, I have not found that to be a problem. In the cases where an anonymous contributor is connected to a previously blocked logged-in account you may have to be somewhat oblique (e.g. not using the name of the blocked account, just saying that they are evading a block) but that is actually a fairly rare situation. - TheDaveRoss 14:56, 16 August 2017 (UTC)
Probably better to have some local ones. Equinox 19:49, 15 August 2017 (UTC)
Local is good, but what about Chuck's stated concerns. Where is it written that checkusers can't disclose publicly available info? Who can be asked about this? DCDuring (talk) 04:27, 16 August 2017 (UTC)
Perhaps we can create a new class of superuser: "Chuckuser". DCDuring (talk) 04:28, 16 August 2017 (UTC)
lol! - -sche (discuss) 05:10, 16 August 2017 (UTC)
@DCDuring: The policy dictating the use of the tool is here, and is also governed by the privacy policy and the access to nonpublic information policy. There are lots of words there, but essentially it is OK to talk about publicly available information, and it only gets tricky when your interpretation of public information is affected by nonpublic information. - TheDaveRoss 15:02, 16 August 2017 (UTC)
So Chuck's concerns are in maintaining a "Caesar's wife" standard, probably appropriate. DCDuring (talk) 22:04, 16 August 2017 (UTC)

For what it's worth, I have these user rights on Wikispecies, so I am already vetted by the WMF. I would be willing to have those tools here. —Justin (koavf)TCM 05:12, 16 August 2017 (UTC)

Metaknowledge started a vote for Koavf, and I made a comment on the discussion page there suggesting that we also vote on admin status at the same time. - TheDaveRoss 12:35, 21 August 2017 (UTC)

I would like to become a checkuser. --Daniel Carrero (talk) 07:13, 16 August 2017 (UTC)

DI CheckUser. PseudoSkull (talk) 22:09, 16 August 2017 (UTC)
@PseudoSkull, I don’t think we accept most of the specialized terminology and abbreviations used by Wikipedia/Wiktionary here, such as CheckUser, RfV, RfD, and so on, but we put them in the Wiktionary:Glossary. —Stephen (Talk) 22:27, 16 August 2017 (UTC)
If there are 3+ external citations, I would disagree. PseudoSkull (talk) 22:28, 16 August 2017 (UTC)
But let's discuss that elsewhere. Perhaps in WT:TR so that the discussion at hand can continue. PseudoSkull (talk) 22:28, 16 August 2017 (UTC)

Review of Ecjklangs (talkcontribs)' contributionsEdit

Most of these sex-related entries appear only in Urban Dictionary (OneLook backs me up on this), but some entries - such as sexcess - are somewhat citable. Not sure how durable they are though (I mean, floorcest anybody?). Anyone in the mood for a look-through? --Robbie SWE (talk) 08:47, 16 August 2017 (UTC)

Translations added by IvanScrooge98 (talkcontribs)Edit

This erroneous edit by User:IvanScrooge98 in Recent Changes attracted my attention. A quick check of their recent additions of Chinese translations shows that he/she is certainly a non-speaker of Chinese. A large proportion of their added Chinese translations were outright incorrect, others often problematic. Some recent, outright erroneous examples include: diff, diff, diff, diff, diff, diff. It's a shame that such sloppiness was not picked up earlier and was allowed to persist for such a long time. Their additions of translations in other languages also need to be thoroughly checked. Wyang (talk) 10:23, 16 August 2017 (UTC)

Hmmm… excluding the first one (from zh.wiktionary), I based the other edits, as I usually do, on the respective Wikipedia articles. I'm sorry if there's something wrong and willing to fix my mistakes. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 14:32, 16 August 2017 (UTC)
You should be careful when using non-English Wikipedias as a source because they are full of made-up garbage. Every now and then I have to remove a Portuguese translation that you add because it doesn’t meet our attestation criteria or are inaccurate. There’s no harm in using Wikipedia as a starting point when researching translations, but you should at least check Google Books. — Ungoliant (falai) 14:48, 16 August 2017 (UTC)
Guess I should more when I can. Sorry. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 15:14, 16 August 2017 (UTC)
Most of the errors in Chinese translations are in your inferred Pinyin readings and traditional/simplified forms; these are more serious factual errors. Please see if you can fix the examples above, now knowing that they contain errors. Many of your added Chinese translations are sum-of-part terms which do not warrant inclusion on Wiktionary, but that is less serious of a problem. Wyang (talk) 00:59, 17 August 2017 (UTC)
@Wyang: is diff fine, for instance? [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 13:40, 17 August 2017 (UTC)
It's better, though both terms are SoP and should link to the individual components. Could you please also fix the six others? Wyang (talk) 21:54, 17 August 2017 (UTC)
@Wyang: would you mind check if my attempts are correct? Also, should I undo my additions at Warwick and Portoferraio? [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 10:22, 18 August 2017 (UTC)
Not really, you haven't fixed the factual errors on those pages. It's all right- I will fix them. Please do not add other Chinese translations. Wyang (talk) 10:26, 18 August 2017 (UTC)

definitions vs. predicatesEdit

The entry for alt-right and the Tea Room/2017/August#alt-right discussion are recent manifestations of a failure to respect the concept of a definition. We do know how to do so, but sometimes some contributors act as if they believe that any predicate about a definiendum that they or someone else puts down in writing is a potential definition.

"Headquarters of US military imperialism" is not a definition of Pentagon, whether or not you believe the truth of "The Pentagon is the headquarters of US military imperialism".

What do we have to do to see to it that this basic notion of lexicography is respected? Would voting on a policy help? A definition style guide? DCDuring (talk) 22:02, 16 August 2017 (UTC)

Statistics for numbers of etymologiesEdit

For those of you who wanted to know, as of August 16, 2017, the largest number of etymologies any entries on the English Wiktionary has is 15. That entry is zꜣ. (Wouldn't it be amazing if you ever saw "Etymology 27", "Etymology 9432", etc. LOL ROFL LMAO) PseudoSkull (talk) 01:57, 17 August 2017 (UTC)

"Etymology 4320603, Etymology 4320604" PseudoSkull (talk) 01:58, 17 August 2017 (UTC)
You're ready for Wikidata   Noé 15:29, 17 August 2017 (UTC)
Quite impressive! — Eru·tuon 19:11, 17 August 2017 (UTC)

AWB RightsEdit

I'd like to use AWB on Wiktionary, mainly to run typo fixes, and a regex I made to split long See Also/Related terms/etc sections into columns, and to be able to search inside templates, and to dump Wiktionary data offline for faster searching and whatnot. I already have rights on English Wikipedia. Pariah24 (talk) 19:08, 17 August 2017 (UTC)

I notice that no one has even nominated you for autopatroller status, which means that all your edits are marked for review. It seems silly to have people not trusting your edits enough to stop checking all of them, but at the same time giving you the ability to make them in bulk. Chuck Entz (talk) 02:36, 18 August 2017 (UTC)
DI AWB (wiki sense). PseudoSkull (talk) 04:44, 18 August 2017 (UTC)
I really don't care if people review my edits, and the AWB policy makes no mention of that as a prerequisite. I've been editing Wikipedia for quite a while longer than I've had this account, and this "we don't trust your edits" business sounds pretty anti-AGF to me. Never had an admin say something like that to me before; do you speak this way to everyone? A simple no would have sufficed. Pariah24 (talk) 08:53, 18 August 2017 (UTC)
I am sure Chuck was not intending to offend. We have the edit patrol feature enabled here, and the general practice is that once someone has been editing for a little while the people who patrol edits notice that they make reasonable edits and don't need to be patrolled any longer. If you are not yet set to autopatrolled status it may indicate that you have not edited here sufficiently (or, sometimes, sufficiently well) to have been noticed and flagged by a patroller. I would suggest that you just continue making good edits and I am sure you will be autopatrolled an eligible for AWB in no time. - TheDaveRoss 12:21, 18 August 2017 (UTC)
Thank you. Somehow I managed to give the impression that we're so suspicious of them that we have them under surveillance, or that we think there's something wrong with their edits, or that we only talk to the cool people who already know the secret handshake... The simple fact is that AWB access requires that we know the contributor in question well enough to be sure they know Wiktionary's standards and practices well enough to avoid making mistakes, since those mistakes would be propagated much more widely using the AWB tool, and that we just don't know them well enough- yet. Chuck Entz (talk) 08:41, 20 August 2017 (UTC)

Tsolyáni languageEdit

Do we include words in this fictional language? I'm working my way through some missing French nouns and came across zaqé which our French friends define as "Troisième jour de la semaine dans le calendrier tsolyáni". SemperBlotto (talk) 05:48, 18 August 2017 (UTC)

No, see Wiktionary:Criteria_for_inclusion#Constructed_languages. DTLHS (talk) 05:50, 18 August 2017 (UTC)

Using HTML attributes instead of classes for WT:ACCELEdit

Currently, WT:ACCEL has data passed to it using CSS classes, so that the resulting Wikicode looks like this on bar: <span class="form-of lang-en plural-form-of"><b class="Latn" lang="en">[[bars]]</b></span>. There's a few points to note about this.

  1. There's two wrapping HTML elements, span and b, even though these could easily be combined into a single b element, as long as WT:ACCEL is modified to recognise not just span elements.
  2. If step 1 is done, then there is no more need for the lang-en CSS class, because WT:ACCEL can extract it directly from the b element's lang= attribute.
  3. HTML allows you to specify custom attributes named data- followed by any text. We can use this, rather than CSS classes, to specify the inflectional data.

All in all, the line above would end up looking like this: <b class="Latn" lang="en" data-accel-form="plural">[[bars]]</b>.

What do people think of this change? @Dixtosa in particular. —CodeCat 12:54, 19 August 2017 (UTC)

Looks a bit cleaner. Equinox 12:56, 19 August 2017 (UTC)
Looks cleaner, yes, but I do not see any other benefit... yet. --Dixtosa (talk) 13:21, 19 August 2017 (UTC)
I very much like this, even if there are no benefits besides the neatness. — Eru·tuon 23:46, 19 August 2017 (UTC)
@Dixtosa Can MediaWiki:Gadget-WiktAccFormCreation.js line 20 be modified to $('.form-of').each(function(){, and line 23 to var formof_classnames = $(this).closest(".form-of")[0].className.split(' ');? This will allow elements other than span to contain the acceleration information, which facilitates step 1. —CodeCat 11:22, 20 August 2017 (UTC)
Step 1 has been completed, and the link on bar now looks like this: <b class="Latn form-of lang-en plural-form-of" lang="en">[[bars]]</b>. Step 2 can now be implemented. It might be as simple as putting something else in line 76 of MediaWiki:Gadget-WiktAccFormCreation.js, but I'm not sure what. The way the code is currently written, it only passes the classes to the function (the details parameter), not the wrapper element itself. Would replacing line 76 with lang: $(link).closest("[lang]").attr("lang"), be sufficient? Perhaps the code should be restructured so that the element itself is passed around instead of only the classes, but I will leave that to Dixtosa to implement. —CodeCat 18:34, 20 August 2017 (UTC)

Disambiguate WS entries by languageEdit

So wikisaurus:juoppo -> wikisaurus:fi:drunkard, wikisaurus:drunkard/Finnish or wikisaurus:drunkard/fi and

wikisaurus:insane -> wikisaurus:en:insane, wikisaurus:insane/en or wikisaurus:insane/English

Whether we use English or native words in the pagetitle, collisions would quickly happen as soon as someone added non-English words (which they may have refrained from out of uncertainty). I personally prefer the first scheme, because it is similar to what we use for topical categories like Category:de:Graph theory and because it does not imply the existence of useless superpages (parent page? root page? the opposite of a subpage). As for using native versus English words: the WS entry is tied to meaning seen as abstract from specific words, so I do not see why we should not use English. Are there any large synonym groups that cannot be succinctly expressed in English?__Gamren (talk) 13:23, 19 August 2017 (UTC)

I prefer Wikisaurus:English/drunkard, following the same scheme as Rhymes pages. —CodeCat 13:26, 19 August 2017 (UTC)
I prefer to keep the current Wikisaurus setup for its simplicity until it becomes obvious that collisions are an actual problem. --Dan Polansky (talk) 14:02, 19 August 2017 (UTC)
Here are some strings that might be expected to have many synonyms in more than one langauge: god (Danish/English), person, gut (Nynorsk/German), pen (Welsh/Norwegian/Mindiri/Mapudungun). Is it obvious yet?__Gamren (talk) 14:18, 19 August 2017 (UTC)
From what I have seen, collisions have not become an actual problem yet. Currently, we cater for collisions by being setup for multiple languages per Wikisaurus page, on the model of the mainspace. If you start expanding Danish part of Wikisaurus and you run into obstacles preventing you from productively expanding that part, we can see how to best remove them. --Dan Polansky (talk) 14:27, 19 August 2017 (UTC)
For reference, one of the subject home pages: Wiktionary:Wikisaurus#Multilingualism. One past discussion: Wiktionary:Beer_parlour/2009/March#Wikisaurus_-_non-English_entries - here, a suggestion was made that would lead to wikisaurus:fi:juoppo. --Dan Polansky (talk) 14:39, 19 August 2017 (UTC)
I have now edited WS:god and created WS:da:beautiful (I would be fine with CodeCat's suggestion above, as well).__Gamren (talk) 18:25, 19 August 2017 (UTC)
I support having pages like Wikisaurus:English/drunkard, per CodeCat. It would be consistent with rhymes and reconstruction pages.
--Daniel Carrero (talk) 12:42, 21 August 2017 (UTC)

employment category?Edit

Do we have a category for employment related terms like job title, trade union, severance pay, etc.? This would eminently useful IMO. ---> Tooironic (talk) 02:16, 20 August 2017 (UTC)

English names for letters of the Arabic languageEdit

Do we include these? Our page Arabic script has a table of them, but they link to the Arabic letters themselves. SemperBlotto (talk) 04:47, 20 August 2017 (UTC) (I've just added the French zhâl - hope it's OK)

Category:en:Arabic letter names DTLHS (talk) 04:48, 20 August 2017 (UTC)
So my French term seems to be wrong - I can't figure out how to correct it. SemperBlotto (talk) 04:51, 20 August 2017 (UTC)
What do you mean wrong? DTLHS (talk) 04:54, 20 August 2017 (UTC)
We're probably missing the English names of some letters if you're concerned that it's a red link. DTLHS (talk) 04:56, 20 August 2017 (UTC)
Also some of the entries currently in Category:en:Arabic letter names are not letters; they are Arabic diacritics. Wyang (talk) 04:57, 20 August 2017 (UTC)
OK, I'leave it alone - totally outside my comfort zone. SemperBlotto (talk) 05:00, 20 August 2017 (UTC)

Edits by

This IP user has been adding "Ancient Armenian" (= Old Armenian) terms as etymons for Modern or Ancient Greek terms, while deleting the old etymologies, as well as some other things. The etymologies are dubious: for all of them, because the terms were attested before the time of Old Armenian, and doubly so for some, because they are phonologically implausible (առասպել (aṙaspel) supposedly yielding μῦθος) or the etymology actually goes the other way (ῥινόκερως was calqued by ռնգեղջիւր (ṙngełǰiwr)). Not sure what to do here besides revert the edits, which I've done. It would probably be better manners to explain to him or her, but I don't feel like it. — Eru·tuon 06:42, 20 August 2017 (UTC)

I reverted an edit by this same idiot (using a slightly different IP) that added "ancient Armenian" to the etymology for an Old Armenian term- they're even further out there than you give them credit for. They're changing their IP, so I doubt they'll read anything you leave on their talk page, but it never hurts to try, I guess. That said, feel free to revert them- as far as I'm concerned, they're only one step removed from the vandals who randomly replace language headers with the names of their own languages. Chuck Entz (talk) 08:13, 20 August 2017 (UTC)


Hi there. I'm taking a month off Wiktionary to concentrate on things IRL. You won't be hearing from me at all. So, in the unlikely case that you see someone here who you think might be me, who's following my edit patterns or whatever, it won't be. Thanks. . --WF on Holiday (talk) 17:49, 20 August 2017 (UTC)