Wiktionary:Beer parlour

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit

September 2016

{{also}} templateEdit

Hello -- I noticed that of the c. 495 thousand entries which differ from other entries only in diacritic marks or capitalisation, only c. 172 thousand have {{also}} templates. Would it be worthwhile for me to add these to the remainder? Also, some dozens of thousands which do have these templates are missing a subset of the items in their respective congruence classes. Would it also be worthwhile to complete the arguments for these templates? An example is gort and ğort. Apologies for coming here with such a fiddly question. Isomorphyc (talk) 01:56, 1 September 2016 (UTC)

Yes, if you're confident you can de-diacritize and classify them correctly. DTLHS (talk) 01:59, 1 September 2016 (UTC)
If it can be done conveniently (and correctly), yes.
Some users, especially but not only, in English-speaking countries are not facile with diacritics, eg, me. More importantly, I don't think anonymous users have access to the means we offer to overcome keyboard limitations.
IMO the most important part of the task is to make sure that on all entries that use only the no-diacritic Roman character set {{also}} includes all the entries that use diacritics that correspond to the plain entries. English, Latin, and "Translingual" are the only languages that matter to me. A smaller subset would be only lemma entries.
I'm sure there are other points of view. DCDuring TALK 02:10, 1 September 2016 (UTC)
Yes, please! I have asked before for someone to do this. Note that there may be a limit to how many arguments {{also}} can handle, and there is in any case a limit to how many we would want to display (let's discuss what that would be: more than 15 links?). For terms that would otherwise display more than that number of alsos, it is preferable to set up and link to an appendix, the way a links to Appendix:Variations of "a" rather than listing all 100+ variants directly in a. - -sche (discuss) 02:20, 1 September 2016 (UTC)
@-sche: For what it is worth, there are 77 congruence classes having more than 15 members and 129 classes having more than ten members. The largest groups are bo (19), y (19), s (20), sa (20), n (21), i (24), u (38), e (41), o (57) and a (61). My lists currently do not include differences in punctuation; the classes will be slightly larger and more numerous when this is included. The idea of creating an appendix for classes larger than ten or fifteen sounds reasonable to me, but if I create such appendices, they will provide less information than those which already exist. I would be uncomfortable also including, for example, the same sequence of letters in other scripts or Hanzi represented by the same letters in some transliteration scheme, as is currently the practice. I do believe this can be done without errors on a very large and somewhat easy to define subset of the relevant entries, mostly deferring work on scripts with which I may be uncomfortable. Isomorphyc (talk) 02:47, 1 September 2016 (UTC)
Since we're talking about the difference between no appendix and one that's not as complete as it possibly could be, I don't see the problem: this is a wiki, and others can expand on those later. Chuck Entz (talk) 03:32, 1 September 2016 (UTC)
Yes - I think that a bot could do this even better. SemperBlotto (talk) 05:39, 1 September 2016 (UTC)
It's well suited to a bot, but a bot would not be able to create the appendix pages when there are larger numbers. To do this, the appendix pages would need to follow a standard format and not have any additional information added. —CodeCat 12:25, 1 September 2016 (UTC)
But a bot could generate a list of appendix pages that need to be created and pages that need to be added to them. --WikiTiki89 12:54, 1 September 2016 (UTC)
  • I'm all in favor of this. If there's any risk of {{also}} taking more than about 8 or 9 arguments, then an "Appendix:Variations of..." should be created anyway. —Aɴɢʀ (talk) 11:26, 1 September 2016 (UTC)

Second LexiSession : paths, roads and waysEdit

Dear all,

Apologies for writing in non-native English; please fix any mistakes you may encounter in these lines!

The Tremendous Wiktionary User Group, a nice and open gathering of Wiktionarians, is happy to introduce the second chapter of our collective experiment: LexiSession.

So, what is a LexiSession? The idea is to coordinate contributors from different languages to focus on a shared topic, to enhance all projects at the same time! It may remind you of the Commons monthly contests, but here everyone is a winner! First LexiSession was about cat and it was a beginning. For this second LexiSession, we offer a month - until the end of September - to pave the way! There is plenty of names for different kind of roads, streets, avenues, and ways, and wiktionaries can be very helpful to help people to pick the correct one to describe or to translate.

English Wiktionary already have a Wikisaurus:road and a Wikisaurus:way but there is still a lot of information to provide. Well, why is it almost in alphabetical order? How to distinguish between roadway and motorway (for instance)? Is it possible to help readers with pictures or something? These are not instructions, and everyone is welcome to imagine new solutions to provide information about semantic networks and variation. Also, you may be interested to know that French Wiktionary already has eight different thesaurus about streets in eight different languages, including English.

Please share your contributions here! You can also have a look at what other Wiktionarians are doing, on the LexiSession Meta page. We will discuss the processes and results in Meta, so feel free to have a look and suggest topics for the following LexiSession.

Thank you for your attention, and I hope you will be interested in this new way of contributing. I'll get back to you later this month for an update! Noé (talk) 10:53, 1 September 2016 (UTC)

Great topic (wasn't too keen on the cats). For this session I'm especially interested in local names which are only used in a specific city or region. Also interesting would be to describe (also visually?) hierarchies of paths. – Jberkel (talk) 12:23, 1 September 2016 (UTC)
@Noé I'll try to contribute more to this one, provided school doesn't get in the way! Let's hope participation is better than last time. :) (And as a tiny note about your English, don't forget that the third person singular of have is has....) Andrew Sheedy (talk) 01:38, 2 September 2016 (UTC)
@Noé another correction, also there are plenty of names.

Hey guys, end of September: is anyone have made something for this LexiSession? As we are in October now, we'll start a new session! I know a month is quite short, but it's the idea behind LexiSession. Please give a hint if you have made something thank to this LexiSession. Don't be sad if you haven't participate this time, there's be more to come! Noé (talk) 10:50, 2 October 2016 (UTC)

I made some small changes, but nothing substantial I'm afraid. School prevents me from being able to contribute to other projects on top of my usual edits (which are usually only to entries of words that I have had to look up). Hopefully the next time I have a holiday, I'll be able to really participate! Andrew Sheedy (talk) 18:30, 2 October 2016 (UTC)

borrowing → borEdit

Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor passed. Results: 14-5-3 (73.68%-26.32%) (not counted: +1 late oppose vote). Can someone please do the honors and edit the template in all entries?

FYI: See Thread:User talk:CodeCat/borrowing → bor. In the discussion, I asked CodeCat first, but she said: "I don't think it's right to do it given the strong opposition." Do we need to discuss this further before doing the change? I was hoping we could go on with it and have {{bor}} used in a way more consistent with {{inh}} and {{der}}.

As I mentioned in the conversation with CodeCat, I believe I found some important numbers concerning how the templates are used. Correct me if I make any mistake in the numbers or their interpretation. {{inh}} and {{inherited}} were created together, and it appears that almost all entries that display an etymological inheritance use the shorter form. {{borrowing}} was all we had available for 5 years -- that is, the shorter {{bor}} did not exist. Then {{bor}} was created 1 year ago and about 2/3 of entries of borrowed terms already use {{bor}} rather than {{borrowing}}. This is one reason why I see a trend towards shorter names, confirmed in the vote.

In the discussion, CodeCat suggested leaving shortcuts as shortcuts and long forms as long forms. Feel free to discuss this idea. I disagree with it: people who used the longer syntax {{borrowing|it|pizza|lang=en}} in entries from 2010 to 2015 did it because it was the only format available; once the shorter {{bor|en|it|pizza}} came to exist, people started to use it. --Daniel Carrero (talk) 16:01, 1 September 2016 (UTC)

Distinction between topical and context-based usage categories?Edit

The general purpose of context labels is, as far as I can discern from what others have said, to specify the context in which a specific sense applies. Presumably, it is not understood in that sense in other contexts. However, there are a few systemic problems:

  • Context labels add categories that do not indicate this restricted context. Category:Physics, which is added when you put {{lb|xx|physics}} on a sense, has nothing to do with restricted usage. Instead, it's just a general category where all terms related to the topic of physics can go. As a consequence, some editors are led to think that context labels are just a fancy means of putting entries in topical categories.
  • Worse still, some context labels put entries into "set"-type categories, but display a topical context label. {{lb|xx|particle}} puts entries in Category:Subatomic particles while showing "physics". This is confusing when used on very widespread terms like electron, which are used far outside the "physics" context.

We already have "slang" categories, like those in Category:English slang, but we have none for jargon or restricted-context senses that are not slang. However, I think these are sorely needed. It is very valuable to distinguish senses used only in physics, from those related to physics. What can be done to remedy this situation? —CodeCat 20:25, 1 September 2016 (UTC)

I would favor using longer and more explanatory names for topical categories. I'll give a few examples. Feel free to suggest any changes.
"names of" (proper noun examples)
"names of" (place names -- subdivision(s) if they exist, country)
"names of" (common nouns) (are those acceptable?)
"relating to" (or "related to"?)
--Daniel Carrero (talk) 21:16, 1 September 2016 (UTC)
I've been "guilty" of using the context labels to categorize items, and don't agree with the current strict usage policy. The example given in WT:ELE is
{{lb|en|informal}} An [[informant]] or [[snitch]].
It says "Such labels indicate, for example, that the following definition occurs in a limited geographic region or temporal period, or is used only by specialists in a particular field and not by the general population". Informal language however is used by large parts of the general population.
Using category links to categorize is just very awkward, they're invisible and tend to be scattered around the wiki code, at the bottom of the page or somewhere else, and have maintenance problems (forgetting to remove the link when the definition is removed/changed). Conversely, labels are close to the definition, and if the label is removed then the category is removed as well. – Jberkel (talk) 11:12, 8 September 2016 (UTC)
A fundamental problem is that sometimes a topic is also a usage context and sometimes it isn't. For example, a military slang term for a civilian, belongs in a usage context "military", but is not topically "military", and boat is topically "nautical" when applied to a ship, but is not used in a "nautical" context. The category problem is bad enough, but we aren't helping users notice, let alone understand, the distinction to be made between usage context and topic. DCDuring TALK 19:21, 2 October 2016 (UTC)

For French Verbs: Displaying participles in the headerEdit

I'm copying what I wrote on the discussion page for {{fr-verb}}, as I forgot that Mglovesfun was no longer active:

Would it be easy enough to have {{fr-verb}} display in a way similar to {{pt-verb}} and {{es-verb}}? This would increase consistency between French and other languages on Wiktionary (including English, Spanish, Latin, and Portuguese), which would be a big plus. I would suggest including the present and past participles. I would do this myself, but I'm not very technologically inclined.... I would love to see it implemented, though! Andrew Sheedy (talk) 01:34, 2 September 2016 (UTC)

Mglovesfun is active. His username is Renard migrant.
I'm mildly in favor. It should be Luacized as we already have a module that generates most verb forms. And I can't do that, I'm afraid. Renard Migrant (talk) 17:26, 5 September 2016 (UTC)
I generally oppose copying inflection information from inflection tables. I prefer the format used by Dutch verbs (lopen) where principal parts are shown when the table is collapsed. —CodeCat 17:28, 5 September 2016 (UTC)
I suppose that's a workable option, but I would much prefer that all the Romance languages be consistent between each other , given their similar grammar, etc. Andrew Sheedy (talk) 17:59, 5 September 2016 (UTC)
What does any of this even mean? UtherPendrogn (talk) 18:31, 13 September 2016 (UTC)
Since I suspect you know what a participle is, I'd imagine your question is what would this actually look like:

faire ‎(present participle faisant, past participle fait)

ok? Renard Migrant (talk) 18:34, 13 September 2016 (UTC)
What I don't get is why. French present participles don't get used nearly as much as they do in English, Spanish, and Portuguese. --WikiTiki89 18:37, 13 September 2016 (UTC)
@Wikitiki89 The main reason is that one can see the conjugation at a glance without having recourse to the conjugation table. For example, if we were to have the first person singular of the indicative and the present and past participles in the header, one could look at the header for a verb like mourir or craindre and see: mourir ‎(first person singular meurs, present participle mourant, past participle mort) or craindre ‎(first person singular crains, present participle craignant, past participle craint).
That would allow a reader familiar with French to conjugate all composite tenses, as well as most of the present and subjunctive, among others. Also, while the present participle may not be as common in French as in other languages that use the header, the past participle is far more common. I'm also a big fan of consistency, and see no reason why French shouldn't have such a header, when it could be helpful to users like me. Andrew Sheedy (talk) 20:48, 2 October 2016 (UTC)
@Wikitiki89 (Pinging again because my signature wasn't in the same paragraph as the ping.) Andrew Sheedy (talk) 20:49, 2 October 2016 (UTC)
@Andrew Sheedy: I got both your pings. The ping and signature can be in different paragraphs. The rule is that both have to be in a new paragraph (as recognized by the diff tool). As for the conjugation, there are more important things missing from the headword line for faire than the active participle, such as the 1/2 person plural present, the imperfect, future, etc. Those should have priority over the active participle, but there are so many of things you can put there that it would create too much clutter, and that is why we provide a conjugation table. --WikiTiki89 17:47, 5 October 2016 (UTC)
@Wikitiki89 Very true, but then why have a header for Spanish, Portuguese, and Latin? I'm not going to fight to have the present participle included rather than something else, but I feel like the past participle, at the very least, should be in the header. Andrew Sheedy (talk) 22:36, 5 October 2016 (UTC)
@Andrew Sheedy: I can't speak for Spanish and Portguese. For Latin we give the four traditional "principle parts". As for French, I have no problem with giving the past participle (for some reason I thought this discussion was only concerning the present participle). We should also choose a few other select forms. What's normally given in a typical French monolingual dictionary? --WikiTiki89 14:04, 6 October 2016 (UTC)
@Wikitiki89 French dictionaries tend not to give information on conjugation, or to give it separately. In Bescherelles (conjugation books), however, the participles are typically visually distinct, as well as the first person singular of the all non-compound tenses and the first person plural of the present tense of the indicative, subjunctive, and imperative. Obviously, that's too much to put in a header, but including (a) the first person singular of the present indicative, (b) the present participle, and (c) the past participle, would allow the reader to form nearly any tense.
My attachment to the present participle is due to that last fact. For example, verbs in the second group are defined as those verbs that end in -ir in the infinitive and -issant in the present participle. In other words, by displaying the present participle, a reader would be able to see that the verb was in fact regular, saving them a look at the conjugation table. For irregular verbs in which the root changes, it is very typical for the first person singular to have one stem while the present participle has the other (which forms of the verb use which stem is fairly readily predictable). For example: mourir: je meurs, present participle mourant, past participle: mort(e)(s) (other forms of the verb: meurt, meure, mourons, mourus, mouriez, etc.); écrire: j'écris, present participle écrivant, past participle: écrit(e)(s) (other forms of the verb: écrit, écrive, écrivons, écrivis, écriviez, etc.); plaire: je plais, present participle plaisant, past participle: plu (other forms of the verb: plaît, plaise, plaisons, plus, plaisez, etc.). Note that I used the same verb forms in the same order for each of the examples for the sake of comparison. Obviously they don't all match up perfectly, but the three forms of the verb I suggested for the header cover virtually all the permutations of the verb stems between them. It's not difficult to extrapolate from them to form the rest of the conjugation. Andrew Sheedy (talk) 01:51, 7 October 2016 (UTC)
But why specifically the present participle? There are other forms that can be used to show that particular stem. I also think it's important to show the future stem at least when it's not the same as the infinitive. The stem of the present participle I think is nearly always the same as the stem of the imperfect. So why not show the past participle and the first person singular of the present, imperfect, future, and perhaps subjunctive? Perhaps we can display these only when they are not obvious. And we can even give the present participle when it is completely irregular, such as for avoir (ayant). --WikiTiki89 15:43, 7 October 2016 (UTC)
I would be fine for doing that for irregular verbs, but I feel like it would be too cluttered if that many forms were included for every verb. I agree that it would be helpful to give the first person future, as there are often extra R's added and such in that tense. The present participle would be helpful for identifying second group verbs (but then so would the present subjunctive) and for forming other parts of speech, such as adjectives, but I don't think it has to be included. Andrew Sheedy (talk) 20:58, 8 October 2016 (UTC)

Proto-Celtic verb lemmasEdit

@CodeCat, Victar, UtherPendrogn, Nayrb Rellimer, Florian Blaschke, and anyone else who cares: Right now we have only two Proto-Celtic verbs, *ber- (which uses the stem as the lemma) and *brusū (which uses the 1st person singular present as the lemma). Does anyone object to my settling on the 3rd person singular present as the lemma form for Proto-Celtic verbs? That's what we're already using for verb lemmas for Proto-Celtic's ancestor (Proto-Indo-European) as well as for its best attested early descendant (Old Irish). This would entail moving *ber- to *bereti and *brusū to *bruseti. Is that OK with everyone? —Aɴɢʀ (talk) 17:29, 2 September 2016 (UTC)

What is used as the lemma for modern Celtic languages? --WikiTiki89 17:34, 2 September 2016 (UTC)
The imperative for the modern Goidelic languages, the verbal noun for the modern Brythonic languages. —Aɴɢʀ (talk) 18:40, 2 September 2016 (UTC)
That seems a little strange, but then what do I know. In any case, I definitely support your proposition. --WikiTiki89 18:44, 2 September 2016 (UTC)
What about the old Brythonic languages? WT:Lemmas has nothing. —CodeCat 18:45, 2 September 2016 (UTC)
I know Welsh mostly descends from 3.sg. --Victar (talk) 18:57, 2 September 2016 (UTC)
I've been using the verbal noun for Middle Welsh, too, but I've been thinking it might be good to use the 1st person singular present (which is what the Geiriadur Prifysgol Cymru does for literary Welsh) and have the verbal noun be separate (as the verbal noun is separate for the Goidelic languages). —Aɴɢʀ (talk) 19:00, 2 September 2016 (UTC)
  SupportCodeCat 17:37, 2 September 2016 (UTC)
  Abstain On one hand, PCelt's descendants are mostly 3.sg, but on the other hand, it's nice to have it in line with Latin, who's descendants are also not in 1.sg. *shrug* --Victar (talk) 18:44, 2 September 2016 (UTC)
Is there any common practice in reference works (aside from the infinitive, which some dictionaries use for everything)? Chuck Entz (talk) 19:04, 2 September 2016 (UTC)
Sounds good. I've been working on some Proto-Brythonic verbs myself. My userpage has a huge amount of WIP translations. UtherPendrogn (talk) 19:18, 2 September 2016 (UTC)
I've created a rudimentary inflection table for thematic verbs, {{cel-conj-them}}. It's still lacking many forms, as I'm not super well versed on Celtic verbs. I'd like to know especially which principal parts there are and which PIE verb stems they come from. From w:Proto-Celtic language I gather that the present, future, preterite active and preterite passive stems are principal, but their PIE origin eludes me.
The template is implemented with a module, Module:cel-verbs, and new classes can be added there fairly easily. The main issue I'm faced with is the layout of the table. The table on w:Proto-Celtic language has a lot of wasted space, I'd prefer something more compact, but I'm not sure what would work best. —CodeCat 20:00, 2 September 2016 (UTC)
  Support I don't think there's an established practice (Schumacher, for one, uses only stems), but considering Old Irish uses the 3sg too, it makes sense. I'm generally a fan of using the 3sg because it is usually the most frequent and best attested form, and in certain verbs (such as meteorological or impersonal verbs), other forms will be rare at best (though not necessarily nonexistent: for example, in the Old Lithuanian corpus a verb form like "I snow" may be attested in the context of a tale with anthropomorphised clouds). --Florian Blaschke (talk) 01:08, 3 September 2016 (UTC)
I'm a little late, but I   Support moving them to the 3sg. For what it's worth, Matasovic also only gives stems. —JohnC5 14:50, 7 September 2016 (UTC)
  • OK, I've gone ahead and moved all the verb pages (there were only four) to the third-person singular present indicative form. —Aɴɢʀ (talk) 14:46, 5 September 2016 (UTC)

I've been working on a new verb conjugation table. Please let me know what you all think. User:Victar/Template:cel-conj-table --Victar (talk) 02:40, 7 September 2016 (UTC)

I don't think it's an improvement over the existing one. —CodeCat 12:11, 7 September 2016 (UTC)
That's seems certainly to be a tainted matter of personal opinion. --Victar (talk) 15:31, 7 September 2016 (UTC)
I definitely would not use MacBain's dictionary for anything. It's hopelessly out of date now, and wasn't all that up to date even when it was published. —Aɴɢʀ (talk) 13:54, 7 September 2016 (UTC)
Did he get everything right? Obviously not, but you cite the classic along with the modern. It's still a work in progress. --Victar (talk) 15:31, 7 September 2016 (UTC)

Please vote in "Poll: Description section"Edit

Please vote in Wiktionary:Beer parlour/2016/August#Poll: Description section.

Current winners:

  • "Description" = 3 actual support votes
  • "Shape" = 2 actual support votes (my vote is calling it second best) + 1 vote in favour of this section "if we do have it" in the Oppose section.

If enough people prefer "Shape" instead of "Description", I can change the whole vote Wiktionary:Votes/2016-08/Description before it starts: it would become a vote for having a "Shape" section.

If more people prefer "Description" instead of "Shape", it would confirm that the vote can start as-is.

The current results are basically a tie with my "second best" comment weighing a bit in the direction of supporting "Description". If nobody else participates on the poll, I think I'll start the vote as-is. --Daniel Carrero (talk) 14:43, 5 September 2016 (UTC)


The following needs to be posted on WT:NFE:

* [[Module:IPA]] and {{temp|IPA}} now support an additional <code>qual''N''=</code> parameter, to place a qualifying note before a pronunciation.

CodeCat 20:28, 5 September 2016 (UTC)

  Done --Daniel Carrero (talk) 20:31, 5 September 2016 (UTC)


Can I have my sysopship back please? It's getting very frustrating not being able to properly patrol or edit protected pages. I also ask for Module:links, Module:th and Module:th-translit to be restored to the version that puts the transliteration code in Module:th-translit (where it ought to be) rather than Module:links, and ask that this be enforced by all editors. There are currently negotiations for a vote for Wyang's proposal, so it would be inappropriate for him to restore his version and continue the edit war before a vote on the matter has been held. —CodeCat 20:42, 5 September 2016 (UTC)

For the record, negotiations are happening at Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations and the vote talk page.
I support giving back the tools to CodeCat, and to Wyang too. I support restoring modules and templates to the previous version. Whatever the merits of having two separate romanizations (I might even vote support!), I believe the status quo should prevail and that the new proposal should be properly discussed before implementation, especially in case of a huge disagreement like the one that we have now. --Daniel Carrero (talk) 20:48, 5 September 2016 (UTC)
Agreed And this also may be a good reason to implement Template Editor privileges here. —Justin (koavf)TCM 21:55, 5 September 2016 (UTC)
Support Why was CodeCat ever desysopped? --Florian Blaschke (talk) 22:20, 5 September 2016 (UTC)
There are two things that have to happen before I restore sysop rights:
  1. There has to be support from the community for it. This has been trickling in, and probably won't be a barrier.
  2. I have to be convinced that both parties will refrain from any actions that might start the edit war again.
The negotiations at Wiktionary talk:Votes/2016-08/Enabling different kinds of romanization in different locations are a start, but they mostly consist of some variant of "what about this?", followed by some variant of "you're not getting my point". We need to get beyond talking past each other and start talking about serious proposals. We also need to avoid dwelling on past behavior and start discussing what the future is going to look like. Chuck Entz (talk) 22:30, 5 September 2016 (UTC)
FWIW I am OK with restoring sysop privileges, provided both Wyang and CodeCat agree not to resume edit warring. I also think that Module:links should be restored to the status quo ante, with an appropriate vote to resolve the matter. In fact I asked Dan to create this vote in order try to resolve what I thought was the root of the conflict between CodeCat and Wyang. As it happens, Wyang has objected to the vote for various reasons, some of which concern whether the issue of the vote is the right one to be voting on and some of which object to having a vote at all. The amount of contention here indicates we clearly need a vote but I'm open to rewording it. However, this issue is orthogonal to the issue of sysop privileges. Benwing2 (talk) 22:32, 5 September 2016 (UTC)
My only concern is the restoration of existing practice to the Thai transliteration module, and the elimination of custom code from Module:links. If that is accepted then there won't be any edit warring from me, though I do ask what course of action I should take if Wyang restores his version of the modules without a vote to support it. The reason the edit war happened in the first place was because Wyang kept reverting me and no steps were taken to stop him, and he ignored all attempts I made to convince him to stop and wait for consensus/vote. So if Wyang is sysopped again, there needs to be a contingency plan in case he does the same again; some kind of guarantee that others will also step in instead of just me. —CodeCat 22:42, 5 September 2016 (UTC)
Translation: You want us to take your side on the edit war and enforce it for you. I happen to prefer your version, but this kind of talk isn't very helpful. Chuck Entz (talk) 23:27, 5 September 2016 (UTC)
Pretty much, yes. The alternative would be endorsing Wyang's edits without a vote to show such endorsement by the wider community. That doesn't seem like a proper option given how contentious the issue is. Major changes that are contentious should be voted on, yes? —CodeCat 23:55, 5 September 2016 (UTC)
(edit conflict) One part of the problem is figuring out exactly what the status quo ante would be: this started when Wyang added his code to Module:links to implement a very useful change for Thai transliterations/romanizations. CodeCat later extensively reworked the module, in the process removing the code (I'm not sure whether she noticed the code or recognized what it was at the time). This broke a number of Thai entries and several Thai editors asked what was going on, so Wyang added the code back. It's possible that CodeCat, if she was unaware of the earlier code, thought this was something entirely new- she certainly acted as if it were. She reverted his edit, and didn't handle the dispute very well. Wyang got upset and the edit war started. Wikitiki89 came up with a compromise that moved the code out of Module:Links, which CodeCat adopted, but Wyang didn't.
Do we revert it to:
  1. The state before Wyang's first edit? That would wipe out CodeCat's reworking of the module.
  2. The state before Wyang's second edit? (Dan Polanski's choice, if I understand correctly). That would break a number of Thai entries.
  3. The state after Wyang's second edit? (Wyang's choice)
  4. The state after Wikitiki89's edit? (CodeCat's choice)
The last two are the only ones that don't break anything, and either could be considered the status quo ante, depending on how you interpret Wyang's first edit. Chuck Entz (talk) 23:15, 5 September 2016 (UTC)
I don't see any point in restoring anyone's admin rights until the substance of the disagreement is resolved. As I see it, the destructive turn the conflict took is a serious matter, affecting important core software. If the talent involved in the matter cannot resolve it, perhaps someone else should. DCDuring TALK 23:44, 5 September 2016 (UTC)
There's already a vote that attempts to propose Wyang's changes so that a formal consensus can be made. But Wyang doesn't seem very cooperative in formulating the proposal, so it's mostly stuck. Since Wyang thus has no consensus for his proposed reinterpretation of transliteration modules, the status quo remains, which is that transliteration modules provide any kind of romanisation deemed desirable. This is what my and Wikitiki's edits attempted to do. If Wyang does not agree to a vote but forces his own interpretation through edit warring, what can be done? —CodeCat 23:59, 5 September 2016 (UTC)
@Chuck Entz: Hmm, when I wrote my comment I didn't check out the whole history carefully. Since the argument is about the presence or absence of a particular piece of Thai-specific code in Module:links, and if I'm not mistaken this didn't exist before the whole edit war started, then logically the status quo ante shouldn't include it. However, I don't completely understand the ramifications of this. Wyang obviously put the code there for a reason; but CodeCat and Wikitiki seem to believe that the same functionality can be achieved with this code in Module:th-translit. If this is true, then it should be taken out pending a vote to decide the underlying issues. Benwing2 (talk) 00:19, 6 September 2016 (UTC)
The reason the code was placed there by Wyang is because he believes that transliteration modules should only transliterate strictly: character by character. He therefore objects to the modification Wikitiki made, but at the same time, his reinterpretation of transliteration modules is not the agreed status quo. I argue that under the consensus interpretation, a vote is necessary for Wyang's proposal to restrict transliteration modules to just strict transliteration, and have an alternative module system/infrastructure for non-transliterative romanizations. I also believe that under this interpretation, the Thai transliteration code should be placed in Module:th-translit until a vote shows consensus to the contrary. And additionally, even if a vote passes to have separate infrastructures in our modules for transliteration and other types of romanization, the specific code for Thai does not belong in Module:links, but should be handled by said proposed infrastructure in a more general manner. —CodeCat 00:34, 6 September 2016 (UTC)
There was no consensus. What is being repetitively cited as "consensus" is how people perceive romanisations from the angle of languages not making such a distinction. Truth is, appropriate and purpose-oriented romanisation has been the norm in languages with a script-pronunciation discordance, and it has been the consensus for these languages. See for example the differential use of transcriptions and transliterations ({{ko-etym-native}}) in 미끄럽다 (mikkeureopda), by User:Visviva who created the bulk of our Korean entries. The core issue is “why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages”, and the conclusion from the previous discussion is: "the envisageable harm is minimal and benefits are extensive". There is a demonstratable need to maintain the systems separate - our language editors routinely apply different romanisations when editing these languages, and printed dictionaries of these languages show that authors regard that the different modes of romanisation are suited to different purposes. The issue is not whether we should implement use romanisation X in translations right now; the issue is whether the system should be maintained to take this need into consideration and not deliberately confuse the concepts "transliteration" and "transcription" (where they truly make a difference), so that future edits in these languages are not discouraged. Wyang (talk) 03:20, 6 September 2016 (UTC)
What happens now? —CodeCat 19:57, 9 September 2016 (UTC)
This is up to Chuck. I'm not sure where things stand currently. Benwing2 (talk) 16:17, 11 September 2016 (UTC)

Proposal: Redirect all halfwidth and fullwidth forms to their "normal" counterpartsEdit

When there are fewer active votes in the list, I'm thinking of creating a new vote for this proposal:

Redirect all halfwidth and fullwidth forms to their "normal" counterparts.

I feel this should be pretty uncontroversial, but let me know if someone has a reason to keep the halfwidth and fullwidth forms.

Previous discussions:

--Daniel Carrero (talk) 00:59, 8 September 2016 (UTC)

I have a minor objection: Why are single-character half-/full-width forms more important than words spelled with them? We obviously shouldn't duplicate all our entries in half-/full-width forms, so if we can get away without those, why can't we get away without the single-character ones? --WikiTiki89 14:01, 8 September 2016 (UTC)
Actually, CD was a redirect since 2013; I deleted it now. I agree with you about fullwidth words. I believe we don't want entries like  CD, LCD or bye bye, or even redirects like CDCD, LCDLCD, bye byebye bye. But I feel that the possibility of readers searching for single fullwidth characters is higher than for words. If a person searches for "CD" and finds out that we don't have that entry, they might try searching for " C" afterwards.
According to the pageview tool (link) the fullwidth entry got 197 views in the last 6 months. Halfwidth got 12 views. It's not a terribly huge number, but I feel a redirect to the normal forms wouldn't hurt.
In general, for any redundant Unicode characters, I feel it's good to have redirects from the alt form to the "normal" form. Based on that sentiment, I created Wiktionary:Votes/2011-06/Redirecting combining characters and Wiktionary:Votes/2011-07/Redirecting single-character digraphs. Both passed, in 2011.
For better communication, I should probably create a vote with the whole idea that I have in mind. "Voting on: Allowing all single-characters full- and halfwidth forms as redirects. Forbidding full- and halfwidth words, they should not exist even as redirects." --Daniel Carrero (talk) 17:06, 8 September 2016 (UTC)
Actually, I think the problem with many of your proposals is that you create a vote too soon. We should have a long discussion first and only after the discussion has died down and some time has passed should you create a vote (if there had been enough support). --WikiTiki89 17:23, 8 September 2016 (UTC)
Good point. But you can't always have a long discussion: sometimes, nobody, or just a few people, respond to my topic on the BP. If nobody else decides to weigh in on this topic about fullwidth characters, I believe I should create the vote anyway (eventually).
Concerning minor proposals that don't affect a lot of entries (I consider "redirect fullwidth characters, disallow fullwidth words" one of these) and minor policy edits that don't change actual regulations, I think it's okay to start a vote earlier than most other votes. But if creating votes too soon is a problem, I guess I could create a vote after the discussion disappears from the main Beer parlour page. Other proposals were discussed a lot (sometimes in multiple places) before the vote started. If you want, we can talk about specific past votes that I created, to see if I could have done any of them differently.
Then again, there are some proposals that were discussed already but I didn't create a vote for them. I see nothing wrong with creating a vote immediately for some of these, and pointing to the previous discussions. I may even create a new BP discussion just to point out that a new vote was created, and to see if everyone agrees with the wording of the vote. This is not the same as creating a new vote without discussion. --Daniel Carrero (talk) 18:13, 8 September 2016 (UTC)
I'll give you two rules of thumb: If the discussion is still going, it's too early to create a vote (unless it's an urgent matter). If the discussion has not had much input, try to attract more attention to it, or perhaps it is not important enough to be voted on. --WikiTiki89 18:20, 8 September 2016 (UTC)
All right, I'll have this in mind: "If the discussion is still going, it's too early to create a vote (unless it's an urgent matter)."
I partially agree with this: "If the discussion has not had much input, try to attract more attention to it, or perhaps it is not important enough to be voted on." In my opinion, the proposal "redirect fullwidth characters, disallow fullwidth words" is important enough to be voted on and appear on the WT:CFI as actual criteria for inclusion/exclusion of entries, but among the things that need to be voted on, this is not very important, because it affects few entries. --Daniel Carrero (talk) 19:04, 8 September 2016 (UTC)

I created Wiktionary:Votes/2016-10/Redirect fullwidth and halfwidth characters. --Daniel Carrero (talk) 13:39, 21 October 2016 (UTC)


FYI, September 19 is International Talk Like a Pirate Day. I would suggest doing the word-of-the-day as something pirate-related if possible. I think it would be great too if we can create an Appendix or Category of terms traditionally associated with pirate lore, such as "walk the plank." I know in my area (Maryland, USA) there are local businesses offering promotional discounts for customers who come in talking like pirates on September 19. I think a pirate vocabulary guide would be helpful not just for them, but for authors and storytellers as well. Nicole Sharp (talk) 05:24, 8 September 2016 (UTC)


@CodeCat, Victar, UtherPendrogn, Nayrb Rellimer, Florian Blaschke, Anglom, Angr, Chuck Entz, and anyone else who cares:

Several books on Brittonic and Neo-Brittonic suggest that the name Gwydion was "Uidgen" or "Widgen" at this point in time, not Gwidyen, as here https://en.wiktionary.org/wiki/Reconstruction:Proto-Brythonic/Gw%C9%A8d%C9%A3en . Indeed, the "gw" shift seems to have happened from NB to Old Welsh, where it became Guidgen, then in Middle Welsh Gwydyen/Gwydyon and modern Gwydion. UtherPendrogn (talk) 19:01, 9 September 2016 (UTC)

Attestations at *gwir show that the change happened in all languages and is thus of Proto-Brythonic date. —CodeCat 21:24, 9 September 2016 (UTC)
Good. As to the name https://en.wiktionary.org/wiki/Reconstruction:Proto-Brythonic/Kadwall%E1%BB%8Dn , have I reconstructed it correctly? Some of the descendants are messy, I'm sorting them out right now. UtherPendrogn (talk) 22:02, 9 September 2016 (UTC)
The Irish descendants don't match up. Where did they get their -m-? It looks much more likely that they descend straight from the Proto-Celtic form, which was *Kat(u)wellamnos or similar. Gaulish is not a Brythonic language, it got its form of the name straight from Proto-Celtic. —CodeCat 22:11, 9 September 2016 (UTC)
I can also find nothing whatsoever of the Gaulish or Irish names, Google gives zero results. @Angr can you check this? —CodeCat 22:15, 9 September 2016 (UTC)
I got some Google Hits for the Gaulish name and variants of it, e.g. this, but it seems to be a place name rather than a personal name. I can find no trace of an Irish form "Cathfollomon". Cadwallon reconstructs Proto-Brythonic *Katuwellaunos, which in our notation would be *Kaduwellọn, from Proto-Celtic *Katuwelnāmnos. The Brythonic Catuvellauni have the same name. —Aɴɢʀ (talk) 06:28, 10 September 2016 (UTC)
Using Au or the dotted O is a matter of notation, but surely apocope is not? And why did you mention the forms? They are the ones I put.

EDIT: Oh I see now, sorry, I put the Early Brythonic form rather than the Proto-Celtic. Will rectify that and add the PC form.UtherPendrogn (talk) 12:18, 10 September 2016 (UTC)

It does not matter what form the descendant takes, surely? And I reconstructed the Irish ones thanks to the Dictionnaire de la Langue Gauloise by Xavier Delamarre. UtherPendrogn (talk) 12:22, 10 September 2016 (UTC)
Sorry, why are there Goidelic descendants under *Kadwallọn? —JohnC5 17:11, 10 September 2016 (UTC)
That shouldn't be there. Probably a mistake from copy/pasting the Celtic form. Removed now. UtherPendrogn (talk) 19:30, 10 September 2016 (UTC)

Minor edit in WT:EL § interwiki linksEdit

If no one objects, I'll remove "and are listed in the left hand side of the entry" from WT:EL#Interwiki links. Some people complained about it in Wiktionary:Votes/pl-2016-02/Interwiki links, which passed in March 2016. I'd like to do this without a new vote.

Current text: "Interwiki links are used to point to the same word in foreign language Wiktionaries, and are listed in the left hand side of the entry. To point to the page palabra in the Spanish Wiktionary, use:"

Proposed text: "Interwiki links are used to point to the same word in foreign language Wiktionaries. To point to the page palabra in the Spanish Wiktionary, use:"

--Daniel Carrero (talk) 11:47, 10 September 2016 (UTC)

  Done. Let me know if you wanted the mention of the "left hand side of the entry" back. In the vote, a few people were not very happy with that wording. --Daniel Carrero (talk) 02:14, 14 September 2016 (UTC)

Centralization of also-informationEdit

For some time, I thought it would be good to entralize the {{also}} lists in a canonical entry, which would be the diacritic-free lowercase entry if available. The canonical form entry would have the full list while each other form would only link to the canonical entry using {{also}}. For instance, kaca would have a full list while káča would only link to kaca. This would remove a maintenace overhead while bringing only a minor incovenience to the reader; it would also make the tops of many pages less busy.

Does anyone like that idea? --Dan Polansky (talk) 09:03, 11 September 2016 (UTC)

I wouldn't object; I rarely need to see pages with accented titles, however. The obvious alternatives are (i) to have a bot regularly update the alsos (or even a template generate them on the fly?) based on a list of entry titles, or (ii) to use the Variations of __ pages like Appendix:Variations of "be" (but that's an extra click, and a waste of a page when there are very few variations). Equinox 10:30, 11 September 2016 (UTC)
See above discussion of updating contents of uses of Template:also.
If we "centralize", I would prefer that only one (or more) page(s) whose headword(s) had diacritics bore the complete list of headwords in the equivalence class. DCDuring TALK 12:05, 11 September 2016 (UTC)
The problem is that the average reader isn't going to click on the undiacriticed form if they don't see their diacriticed form there. Of course, most people are going to search the undiacriticed form to start with, but their system may have easy ways to type accents, but not macrons, háčeks, etc., so you can't rule the possibility out. Chuck Entz (talk) 14:39, 11 September 2016 (UTC)
This is true. On my German keyboard, it's easy to type â ê î ô û but not ŵ ŷ, so if I'm searching for a Welsh word with a circumflexed vowel, I'll search for the diacriticked form of the first five but the undiacriticked form of the last two. All that said, however, I'd prefer to keep the full list on each page, because you just never know where you're going to end up. —Aɴɢʀ (talk) 15:43, 11 September 2016 (UTC)
  • The current system is better than this proposal, and its main weakness can be solved by having an also-bot continuously updating. —Μετάknowledgediscuss/deeds 17:47, 11 September 2016 (UTC)
I don't think this is a bad idea, but it seems like it would be necessary to have a bot keep things updated no matter if we keep things updated on all pages (checking for new entries that have been created and need to be added to all the {{also}}s) or on one page (still checking for new entries to add to the centralized list, and for any additions of also to peripheral pages, which the bot would presumably remove). Given that, I do think the idea of having a bot update all the {{also}}s is better. Someone just needs to design and run that bot...! - -sche (discuss) 19:01, 11 September 2016 (UTC)
I thought @Isomorphyc had previously volunteered in the discussion above. I don't know whether he has all the skills, but he does run Orphicbot. DCDuring TALK 19:35, 11 September 2016 (UTC)
I think this would need 2 separate templates:
  • caca would have: "See also: Caca, caça, caçà, cáca, căca and ćaća" ({{also}} as usual)
  • Caca would have: "For more entries, see caca" ({{also-more|caca}} or something)
  • caça would have: "For more entries, see caca"
  • caçà would have: "For more entries, see caca"
  • etc.
@Dan Polansky, DCDuring, Metaknowledge: Thanks for pinging me. Actually, I have the code already to do most of this, including realtime updating. The only thing I haven't totally worked out is how I will handle the appendices. It turns out there are a variety of corner cases where users have entered more information into an {{also}} template than one would want, by default, to add, for example, transliterations into other scripts. My current policy has been to retain these where they have been entered, but not to propagate them to other entries. Because of this, centralising the lists will remove the potential for this type of user-generated information. To retain flexibility, my suggestion would be not to centralise the data. I would add that I believe every method for storing this data in modules has significant drawbacks.
For the issues about typing ease raised by User:Chuck Entz and User:Angr, I think users would learn to seek out the {{also}} templates if they were consistently available. I'll test this with the pageview data three months after I have updated to templates to see if an increase in newly linked words with diacritics is seen in aggregate. But I would point out this only partly solves the typing problem because if a word with diacritics has no corresponding entry in pure ASCII, there will be no also template in the easy-to-type location. I have looked at a few newer methods of improving ASCII searchability than which I have tried so far, but that is a different topic, and everything I have looked at has drawbacks. Isomorphyc (talk) 20:19, 11 September 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── The horse may already have bolted from the stable, but is it really necessary for OrphicBot to add alternative forms of an entry in an {{also}} template when they are already listed under the "Alternative forms" section? I think that isn't very useful; {{also}} should perhaps be confined to accented forms (usually in other languages) or differently capitalized forms. — SMUconlaw (talk) 11:22, 12 September 2016 (UTC)

@Smuconlaw: Sorry for giving that impression. It did a little, but these are orthogonal changes because centralisation can be accomplished in the modules and templates without editing the pages. If the data are centralised, the important thing is only to have the actual template on each page; the arguments do not matter. I'll be glad to make the changes if necessary; they're not major. For your question: Wiktionary's normal principle is redundancy over normalisation, largely for reasons stemming out of the fact that we're not a database. Creating inter-template dependencies is not a good idea if it can be avoided. In this case, the exception you propose would be also confusing to users because each lemma in each language can have its own "Alternative forms" section, and the user would need to find the correct one out of potentially many. Moreover, the `correct' one may not even be in the language the user is expecting, defeating the purpose of a purely orthographical index. That said, if this or other exceptions are generally preferred I will implement them. For example, User:YURi has suggested omitting {{also}} links from misspellings to correct spellings, since that this is also redundant. Isomorphyc (talk) 14:57, 12 September 2016 (UTC)
Thanks for explaining. It's not really a big deal for me, but I was wondering whether it made sense in some cases to have both an "Alternative forms" section and the very same information in a "See also" statement at the top of the entry. — SMUconlaw (talk) 17:44, 12 September 2016 (UTC)

Matched-pairs — policy pageEdit

I created Wiktionary:Votes/pl-2016-09/Matched-pair entries — policy page, to implement what was discussed in Wiktionary:Beer parlour/2016/June#Redirects to matched pairs. Feel free to discuss and propose any changes. --Daniel Carrero (talk) 14:04, 11 September 2016 (UTC)

Allow for easier input from the laity.Edit

I recently saw on TV an "educational" program that referred to an 'oyster knife' as a 'paring knife'. This inspired me to look up the term 'Shucking Knife' because this is what I have always called an 'oyster knife'. When I discovered that Wiki did not have a page or a link for 'shucking knife' I was confronted with the overly convoluted requirements that Wiki has in order to let you know that I am aware of a synonym for one of your terms. I had to 'think' much too hard.

Yes you're right it is a difficulty and yet we need to have some sort of minimum standard as well. Shucking knife definitely exists but if you look at shucking and shuck, shuck says '[t]o remove the shuck from (walnuts, oysters, etc.).' which makes me thing it's possibly just a knife for shucking, in the same way that a whittling knife is just a knife for whittling, and therefore does not need an entry. But in general use, being accessible for new editors while trying to maintain consistency throughout our format is a challenge, there's no two ways about it. Renard Migrant (talk) 22:42, 11 September 2016 (UTC)

Restoration of Sysop PrivilegesEdit

Given the amount of time with no action on the disputed issue, I'm prepared to restore sysop privileges to @CodeCat and to @Wyang if they will commit to not editing Module:links except for changes both agree to beforehand, at least until both agree that the conflict is resolved.

Please state here whether you agree to this. Thanks! Chuck Entz (talk) 23:58, 11 September 2016 (UTC)

Can someone else make the changes, then? If neither of us is allowed to edit it, that implies that there is a consensus for Wyang's preferred version. The reason I continue to press this is because I fear that if I don't, nothing will be done about it yet again. —CodeCat 01:39, 12 September 2016 (UTC)
@CodeCat, maybe you could provide a link to the exact revision of the module which you would say is the correct status quo? --Daniel Carrero (talk) 01:45, 12 September 2016 (UTC)
[1], [2], [3]. These three revisions ensure that the Thai transliteration code is placed in the Thai transliteration module where it belongs (according to the current consensus on treatment of transliteration modules), rather than in Module:links where it does not belong. —CodeCat 01:49, 12 September 2016 (UTC)
Do other people agree with reverting the modules to these exact versions?
I'll repeat what I said in another discussion:
  • I support restoring sysop privileges to both CodeCat and Wyang.
  • I support reverting the modules to the status quo, and in the face of this huge disagreement, I urge @Wyang to help in the creation of the vote before implementing any new proposal.
Correct me if I'm wrong: I seem to remember that some entries were already edited based on Wyang's system and reverting the modules to the status quo would break the entries. Still, IMO the status quo should prevail and the entries should be fixed. --Daniel Carrero (talk) 02:03, 12 September 2016 (UTC)
I also support restoring sysop privileges to both CodeCat and Wyang. In addition, I support restoring the modules to the status quo. Unfortunately, as Chuck pointed out, it's not totally obvious what this is, but in my mind, since the edit war specifically concerned references to Module:th in Module:links (+ supporting code), and since the references to Module:th weren't present in the module beforehand, the status quo should not include them: Specifically, it shouldn't include Module:th, 'phonetic_extraction' or the code that references 'phonetic_extraction'. Benwing2 (talk) 02:50, 12 September 2016 (UTC)
Back then there wasn't even any automated romanisation for Thai; restoring the previous version would simply wipe out the romanisations in thousands of Thai entries. I'm really confused. There was no consensus for CodeCat's edit, despite her claiming there is. I was only adding in transcription support at Module:links (which was lacking transcription support) per the consensus of the Thai editors, in a manner that is most appropriate for further editing in Thai and other similar languages. If you do not agree, voice your arguments other than voicing “I don't like it”! I spent so much effort arguing for why storing transcription and transliteration modules separate is beneficial in the long run, and what I got was non-participation and the indifferent “so what happens now?” (1, 2). Decision-making should not be like this - having people voice their opinions without having a critical appraisal of the arguments for and against makes the decisions arrived at highly prone to unintelligence. It shouldn't be the case that you can say your preference and expect it to be enacted without giving a reason. Why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages, when our language editors routinely apply different romanisations when editing these languages, and printed dictionaries of these languages show that authors regard the different modes of romanisation as suited to different purposes? If it cannot be demonstrated that the harms do outweigh the benefits for these languages and there is no willingness to demonstrate, there is no justification for enacting this opinion or restoring the “previous version” which abolishes the functionality altogether. Wyang (talk) 03:54, 12 September 2016 (UTC)
(edit conflict) We're trying to achieve a compromise here. In my book, adopting a version more heavily weighed against one side than the other side even asked for isn't a compromise. What you're asking for basically breaks a large number of Thai entries that were modified in good faith by the Thai community after Wyang provided the capability for it with his first edit. Regardless of how things are going to end up eventually, that's too much collateral damage to make it a reasonable first step toward a compromise. Remember the story of how Solomon pretended he was going to cut a baby in half in order to see from the reaction of the two claimants which was the real mother? This is like cutting the baby in half first. Chuck Entz (talk) 12:41, 12 September 2016 (UTC)
So, over at the Grease pit, @Vahagn Petrosyan had mentioned that many languages require both transliteration and transcription. Do we think that the inclusion of both, if the transcription differs, could kill two birds with one stone? —JohnC5 17:05, 12 September 2016 (UTC)
That's what Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations is supposed to address. But it's not going anywhere. —CodeCat 17:13, 12 September 2016 (UTC)
@Wyang: I have a question for you, and I'm sorry if you already explained it somewhere. I'm going to ask anyway: Given the benefits about your proposal that you explained, don't you think that Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations has a good chance to pass? More importantly, is the linked vote satisfactory for you, or would you change something in the proposal? --Daniel Carrero (talk) 04:03, 12 September 2016 (UTC)
@Daniel Carrero: I believe the answer to your question is on the vote's talk page. —suzukaze (tc) 04:05, 12 September 2016 (UTC)
OK, but Wyang may still choose to help building the vote. If the vote explains the proposal correctly and passes, it will mean we are all on the same page and understand the implemented proposal.
In the previous discussion, Chuck Entz presented a few possible versions of the status quo to choose from. Is anyone interested in discussing what exactly is the right one? If no one objects, I'll just trust CodeCat and revert the three modules to the revisions that she mentioned. --Daniel Carrero (talk) 10:04, 12 September 2016 (UTC)
Why? I have explained the reasons of my objection well enough above, and in the previous discussions. Why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages, when there is ample evidence suggesting the contrary? Nobody was interested in engaging in discussion to argue for the version that you are trying to restore. Why is reverting to a version which cannot be justified even being considered? Wyang (talk) 11:35, 12 September 2016 (UTC)
Please understand: It's not about whether the proposal is good, it's about whether other people agree with it, and are on the same page. That's why some of us are interested in having a vote, which would explain and record the proposal, and let others judge its merits. To put it another way: if the proposal is really good, the vote is probably going to pass and we'll do exactly as you proposed. --Daniel Carrero (talk) 11:57, 12 September 2016 (UTC)
We haven't had votes on the architecture of the modules, so I don't see what makes the "status quo ante" Wyang so sacred. If Wyang took the initiative to overcome a language(s)-relevant limitation of the module architecture, it seems to me that it merits our respect. If our architecture doesn't provide the required flexibility without some kind of kludges, so much the worse for the existing architecture. In this and on many other matters I favor accommodating decentralized decision-making. DCDuring TALK 12:33, 12 September 2016 (UTC)
Wyang's changes don't do anything that could not be achieved within our existing module framework. The three edits Wikitiki made to the modules, and which I proposed they be restored to, show that. The only reason he did it is because he doesn't like the framework (specifically, that transliteration modules do other kinds of romanization too). Therefore, I proposed that if he doesn't like our current consensus on what transliteration modules do and how they are used in other modules/templates, he should make a vote to change it. So far he hasn't shown any interest. Most of what has happened since then is several editors trying to get Wyang to cooperate on formulating a vote, while Wyang himself is skirting around the issue and avoiding a vote. Is this appropriate behaviour when someone's changes have been challenged? And would it be appropriate to allow said changes to remain in place when they have been challenged so heavily and the user is not prepared to let the community decide per vote on the issue? —CodeCat 13:56, 12 September 2016 (UTC)
As I said above, the only point revolved around in the “no”-camp is “I don't like it”, without any explanation given. Why do the harms outweigh the benefits if we keep them separate in these languages, when there is ample evidence suggesting the contrary? You keep citing your version as consensus, but where is the vote showing that? Using purpose-suited romanisation is the consensus for languages with a transcription-transliteration distinction ({{ko-etym-native}}, etc.). If you do not like this practice, you should bring this up in a discussion and explain your reasoning, aside from saying “I don't like it”. There is no point blaming the implementer for implementing what was already a custom in languages you are not involved in, and barring the improvement in the module infrastructure for these languages. Wyang (talk) 22:55, 12 September 2016 (UTC)
As I said before, there is no "no"-camp, just people that you need to convince. The burden of proof is on you. Once that's done, the vote should be able to pass. We are repeating the same arguments over and over. This discussion is going nowhere. I reverted the three modules to the revisions chosen by CodeCat. Feel free to discuss if I should have done something different. --Daniel Carrero (talk) 23:07, 12 September 2016 (UTC)
I reverted the edits I could revert. Discussion is still ongoing; you cannot voice your opinion and expect it to be enacted without justifying it. Any unilateral measure taken constitutes disrespect to the participants of discussion. Wyang (talk) 23:14, 12 September 2016 (UTC)
"you cannot voice your opinion and expect it to be enacted without justifying it" ... ha! I see some irony there, and it's amusing. But it may be just me. Seriously, if I did something wrong please someone step up and say what to do. I restored the modules again. --Daniel Carrero (talk) 23:21, 12 September 2016 (UTC)
You are insane. You did not even know what the contention was, and yet you feel empowered to trample on whatever modules you can get your hands on simply because you can. Wyang (talk) 23:28, 12 September 2016 (UTC)
Good grief. The diff you linked to does not indicate that I'm completely clueless about the contention. It does indicate that I was politely asking you for your opinion on the best way to word a vote. --Daniel Carrero (talk) 23:34, 12 September 2016 (UTC)
Asking for my opinion on the best way to word a vote... when it should not be relayed to a vote at all, because there is no argument input from people arguing we should confuse transliteration and transcription. There are numerous arguments for keeping the modules separate being put forth in the discussion, such as (1) our editors in these languages already implement the practice of using purpose-suited romanisation; (2) printed dictionaries in these languages use differential romanisation and deem the different modes of romanisation as suited to different purposes; (3) it conforms to existing language-specific module infrastructure developed for these languages; (4) it is prospectively designed, and does not discourage further improvements in these languages. But the arguments against? One: "I don't like it". It is unfair to use a vote to end a discussion, when one side is only interested in expressing their opinion and not giving any rationales for it. It is facilitating mindless decision-making. Wyang (talk) 23:46, 12 September 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── You have failed to provide an accurate view of the opinions of other people. "But the arguments against? One: "I don't like it". is a straw man.

Could you please change your mind and be willing to cooperate in the vote? We could add your 4 points in the rationale. --Daniel Carrero (talk) 23:53, 12 September 2016 (UTC)

If I have failed to provide an accurate view of the opinions of other people, then could you please list the arguments against? We are still at a stage in the discussion where we are struggling to list any arguments from one side. This is way too immature to call on votes. Votes are evil. It allows such disproportionate argumentation to be easily distorted to produce an unintelligent consensus for the reason of sheer numbers only. Wyang (talk) 00:57, 13 September 2016 (UTC)


User:Daniel Carrero completely ignored the discussion and proceeded to revert the modules to a version he prefers and locked the modules. This is unacceptable bullying behaviour and shows no consideration for the rules of discussion.

(cur | prev) 23:16, 12 September 2016‎ Daniel Carrero (talk | contribs)‎ . . (138 bytes) (-46)‎ . . updated since my last visit (thank)
(cur | prev) 23:15, 12 September 2016‎ Daniel Carrero (talk | contribs)‎ m . . (184 bytes) (0)‎ . . (Protected "Module:th-translit" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))) (thank)

I urge other admins to please look into this abuse of power and take actions. Wyang (talk) 23:19, 12 September 2016 (UTC)

I don't see bullying going on. I see you refusing to coöperate with him when he seeks to help you resolve this dispute, however. It's easier to decry alleged abuses of power, but the right thing to do is work on moving forward. —Μετάknowledgediscuss/deeds 23:31, 12 September 2016 (UTC)
You might deliver that message to Dan. It certainly seems high-handed to me. DCDuring TALK 23:33, 12 September 2016 (UTC)
It's not bullying, but he shouldn't have done it. Fortunately Anatoli reverted the edits so I didn't have to.
When emotions are high is exactly the wrong time to take such actions- it's just throwing gasoline on the fire. Besides, they were completely out of process and I just don't see the consensus to act now. Chuck Entz (talk) 01:41, 13 September 2016 (UTC)
All right. --Daniel Carrero (talk) 01:46, 13 September 2016 (UTC)

No Middle Danish?Edit

It seems we do not have categories, language codes or anything for Middle Danish. Does Wiktionary subsume Middle Danish under Danish, and if so, why? Has this been discussed before?__Gamren (talk) 14:27, 12 September 2016 (UTC)

I bet it hasn't been discussed before. We can certainly create a language code for Middle Danish if no one objects; I'd suggest gmq-mda. —Aɴɢʀ (talk) 15:13, 12 September 2016 (UTC)
How big are the differences? —CodeCat 15:38, 12 September 2016 (UTC)
Oh we certainly can it's whether we should. Renard Migrant (talk) 16:07, 12 September 2016 (UTC)
You can see some samples at w:History of Danish#Medieval Danish. Maybe someone who knows Danish can tell us if that is as different from modern Danish as Chaucer is from modern English. —Aɴɢʀ (talk) 16:37, 12 September 2016 (UTC)
I can tell right away that the spelling is very distinct from what is used today, but I think a modern Danish speaker could figure that out, at least. However, what is described there is what I'd call Old Danish. The definitions on that page don't really sit well with me. What it calls Old Danish is what we'd just call Old Norse, and it was written in the same time as the Old Icelandic that many more are familiar with. w:Old Norse says: "The 12th-century Icelandic Gray Goose Laws state that Swedes, Norwegians, Icelanders and Danes spoke the same language, dǫnsk tunga ("Danish tongue"; speakers of Old East Norse would have said dansk tunga). Another term used, used especially commonly with reference to West Norse, was norrœnt mál ("Nordic speech")." So even the Icelanders said they spoke Danish, at the time. —CodeCat 16:45, 12 September 2016 (UTC)
Also consider the different definition given for w:Old Swedish. Those years are closer what I would expect for "Old Danish" as well. —CodeCat 16:46, 12 September 2016 (UTC)
I guess the next question is, how late are the words we're already calling Old Danish attested? If our Old Danish words are words/spellings attested up through the 15th century, then the reason we don't have Middle Danish is that what we're calling Old Danish developed directly into (early) Modern Danish. —Aɴɢʀ (talk) 16:57, 12 September 2016 (UTC)
There is a Middle Norwegian stage conventionally dated from 1350–1550, thus contemporary with Late Old Swedish. I think Late Old Swedish is sometimes called Middle Swedish (and Early Old Swedish consequently plain Old Swedish), but rarely, and same for Danish. However, Middle Icelandic is used for the same period. (In Faroese, it's the Old Faroese period.) --Florian Blaschke (talk) 03:14, 13 September 2016 (UTC)
Regarding chronology: Nudansk Ordbog and Den Danske Ordbog agree that (in approximate years, obviously): Old Danish lasted from 800-1100, Middle Danish 1100-1525 and Modern Danish 1525-present (DDO says 1500-present, but that's probably just a matter of precision). Regarding intelligibility: As a non-linguist speaker of Modern Danish, I cannot easily read Middle Danish, even if can recognize cognates once I know the translation. Compare: takær bondæ annær man mæth sin kunæ oc kumar swa at han dræpær anti mannen... with Tager en bonde en anden mand med sin kone, og sker det, at han ikke dræber manden... (see also Gammeldansk Ordbog, which places Middle Danish (gammeldansk) at 1100-1515, and furthermore separates it into older and younger periods, the division being at 1350). Regarding classification: I see that we have lots of references to Middle Danish, but they usually link to Danish entries (see eg. Storm, gilding, nettle). There is also at least one Danish lemma tryde, which I have no reason to believe exists in Modern Danish.__Gamren (talk) 16:58, 13 September 2016 (UTC)
800-1100 would conflict with the generally agreed definition of Old Norse, which was also spoken throughout that period. Essentially, if we adopt that definition, we'd have to say Proto-Norse split into Old Norse and Old Danish in the year 800, which is complete nonsense. —CodeCat 17:01, 13 September 2016 (UTC)
This is probably just a terminological question. Olddansk/runsvenska/Old East Norse was, as I understand it, one of two varieties of Old Norse, which we merge with Old West Norse (which is probably quite justified), and our Old Danish corresponds to gammeldansk, no? So the only question is whether Old Danish is the right word. The definitions I gave above correspond with our definitions (given by @Daniel Carrero, who may wish to say something) and the ones given in the WP article given above, but it is entirely possible this doesn't correspond to usage in Anglophone literature - I really wouldn't know! and I'm sorry if I made this a muddle.__Gamren (talk) 19:34, 13 September 2016 (UTC)


Birgit Müller (WMDE) 14:56, 12 September 2016 (UTC)

Transliteration nomenclature voteEdit

I created this vote: Wiktionary:Votes/2016-09/Renaming transliteration. Please provide feedback on the talk page to help improve the vote as necessary. —CodeCat 16:08, 12 September 2016 (UTC)

If this vote passes, I assume we'll rename all pages in Category:Transliteration policies. I think this should be stated in the vote. --Daniel Carrero (talk) 16:16, 12 September 2016 (UTC)
Yes, it should. And the category itself will be renamed too of course. —CodeCat 16:18, 12 September 2016 (UTC)

WT:CFI should explicitly be for the main namespaceEdit

WT:CFI (under a heading 'scope' perhaps) should explicitly state that it refers only to the main namespace. In other words (as a specific example) *montania is not subject to the rules here. Renard Migrant (talk) 20:39, 12 September 2016 (UTC)

CFI currently states that some things go in appendices, and reconstructions go in the Reconstruction namespace. I think it's better this way. Logically, there are some criteria for inclusion in the Reconstruction namespace; if there were no criteria for inclusion, you could include anything there.
The policy says: "Terms in reconstructed languages such as Proto-Indo-European do not meet the criteria for inclusion. They may be entered in the Reconstruction namespace, and are referred to from etymology sections." I disagree with that wording. It's true that we often say: Proto-Indo-European doesn't meet CFI., but I think this is a problemantic statement. Proto-Indo-European does meet CFI, and the correct course of action is to place it in the Reconstruction namespace.
Relatedly, Reconstruction pages and some appendices follow closely the entry format so, in my opinion, both WT:EL and WT:NORM should explicity mention exactly to what extent they apply to these pages. Related discussion: Wiktionary talk:Normalization of entries#Proposal: encompassing reconstruction pages. --Daniel Carrero (talk) 21:21, 12 September 2016 (UTC)
Reconstructions shouldn't be subject to some criteria for inclusion, but not these ones. I think any reconstruction from a reliable source should be considered a valid entry title. 'Reliable source' of course can be subject to criteria that we can all discuss before implementing. Renard Migrant (talk) 21:27, 12 September 2016 (UTC)
Suppose we use your idea as an actual, formal rule: "any reconstruction from a reliable source should be considered a valid entry title." Where can we place the rule? WT:PROTO is a good candidate, but I don't like how it has a long encyclopedic explanation of what a reconstruction is, instead of a simple link to Wikipedia or to a help page. I prefer policy pages to contain only regulations when possible. If we can delete all this stuff, I would be glad to place (voted and approved) criteria for inclusion of reconstructions in WT:PROTO. I would also like WT:CFI to link to WT:PROTO if we do that. What do you think? --Daniel Carrero (talk) 21:41, 12 September 2016 (UTC)
I think WT:PROTO if anything isn't really a policy page at the moment. It feels more like a Wikipedia entry. It's well-written but we just don't need that much. It also doesn't really contain much actual policy. Renard Migrant (talk) 21:49, 12 September 2016 (UTC)
WT:PROTO said: "It must not be modified without a VOTE." But I did not find a vote that confirms this in the first place, so I demoted it to Think Thank. --Daniel Carrero (talk) 21:56, 12 September 2016 (UTC)
"Any reconstruction from a reliable source", without further cavets, sounds like a bad guideline for reconstruction inclusion. This would allow the inclusion of all sorts of transcription variants of the same reconstruction (which we currently generally standardize away, though allowing them as redirects). More controversially, this would also allow the inclusion of reconstruction variants — cases where all researchers agree that a proto-form is to be reconstructed as the source of data Y, but disagree on what its shape was. I would propose that such disagreements should be covered as discussion within a single entry. --Tropylium (talk) 22:13, 17 September 2016 (UTC)
If there weren't a hundred million votes already taking places there's a couple I'd like to propose. Renard Migrant (talk) 21:00, 12 September 2016 (UTC)
What would you like to propose? --Daniel Carrero (talk) 21:21, 12 September 2016 (UTC)
On my talk page, Dan Polansky and I discussed having single words de jure meet CFI. Sometimes like doglike doesn't actually meet CFI as it's written now. Of course nobody would actually delete it but it would be nice to have to rules cover what actually happens. Renard Migrant (talk) 21:27, 12 September 2016 (UTC)
Good idea. I'd probably support that. (as discussed in: User talk:Renard Migrant#CFI and idiomaticity clarification) --Daniel Carrero (talk) 21:44, 12 September 2016 (UTC)
@Renard Migrant, Dan Polansky: How many active votes do you think we should have on {{votes}}, before you feel it's OK to create the new vote for single words meeting WT:CFI? --Daniel Carrero (talk) 00:13, 13 September 2016 (UTC)
The way I see it: CFI was designed to apply only to the main namespace. Thus, it should be clear that the rules currently at WT:CFI only apply to the main namespace. Of course we need inclusion criteria for other namespaces, and these criteria may also be added to the page WT:CFI, but in a separate section from the current rules that only apply to the main namespace, or may be on its own page. --WikiTiki89 21:49, 12 September 2016 (UTC)

Stress marks and syllable marksEdit

I've been working on putting syllable marks in lately, and I've noticed that the stress marks are interpreted as syllable marks when categorizing words by the number of syllables. When there are stress marks, do we need to put a syllable mark in front of the stress mark, e.g. should university be /ju.nɪ.ˈvɝ.sə.ti/ or /ju.nɪˈvɝ.sə.ti/? — justin(r)leung (t...) | c=› } 23:04, 12 September 2016 (UTC)

I was wondering the same thing. I asked about it to Metaknowledge in User talk:Metaknowledge#Dot together with the stress marker, and he replied there. --Daniel Carrero (talk) 23:09, 12 September 2016 (UTC)
@Daniel Carrero Thanks! Perhaps there should be something in the modules to prevent stress marks and syllable marks from being together. On a related note, should we be following the Maximal Onset Principle? — justin(r)leung (t...) | c=› } 23:48, 12 September 2016 (UTC)
I created Category:IPA for English using .ˈ or .ˌ and started populating it with any categories entries that seem to violate the rule that Metaknowledge described. If we really don't want a dot followed by a stress marker, then I believe the correct course of action would be fixing all entries in the category.
Concerning your question about the Maximal Onset Principle, if you directed it to me, I prefer if someone else more knowledgeable than me answered that instead. --Daniel Carrero (talk) 03:36, 13 September 2016 (UTC)
For English, I would follow the Maximal Onset Principle for stressed syllables first, and also make sure any stressed syllable with a lax vowel has at least one coda consonant. Once the stressed syllables are maximized, the unstressed ones will take care of themselves. In other words, happy should be syllabified /ˈhæp.i/, not /ˈhæ.pi/. That said, however, I do want to reiterate something I've said many times before: syllabification in English is far from obvious, and syllable boundaries are very often perceived to be located within consonants. Evidence suggests that the /p/ of happy is not exclusively in either syllable; rather it's simultaneously the coda of the first syllable and the onset of the second. But there's no convenient way to show that in IPA. For this reason, I personally am often very reluctant to mark syllable boundaries except in cases of vowel hiatus, where it's a convenient way of showing that a sequence of two vowels isn't a diphthong (e.g. Joey vs. Joy). —Aɴɢʀ (talk) 09:33, 13 September 2016 (UTC)
I don't have very strong feelings about putting the syllable boundary marker and the stress marker next to each other. Putting them both isn't wrong, but it certainly isn't necessary. —Aɴɢʀ (talk) 10:19, 13 September 2016 (UTC)
IPA is simply wanting a way to mark ambisyllabic consonants as found in West Germanic. We could add one as a house rule. /hæ‿p‿ɪ/ or something less ugly. Korn [kʰũːɘ̃n] (talk) 11:09, 13 September 2016 (UTC)
Another possibility would be listing both: "/ˈhæp.i/ or /ˈhæ.pi/". --Daniel Carrero (talk) 11:18, 13 September 2016 (UTC)
Definitely not that. That implies there are two possible syllabifications, and worse yet, that there's a way of distinguishing them. As for how to mark it, I think if we must mark it, then /ˈhæp.i/ is the least bad option. If we do invent a house notation, I'd rather use something that takes up less space, like /ˈhæpˇi/; we could define ˇ as meaning "the previous consonant is ambisyllabic". But if I'm honest, I'd really rather just stick to /ˈhæpi/, which is unambiugous, easy to read, and makes no theoretical claims as to syllabification. —Aɴɢʀ (talk) 12:22, 13 September 2016 (UTC)
Personally, I think /ˈhæ.pi/ is better than /ˈhæp.i/, because the latter looks to me like there is meant to be an audible break between the /p/ and the /i/. I agree that because of these problems, it's better to just have /ˈhæpi/. As for putting . before a stress mark, I think it's entirely unnecessary and thus oppose it. --WikiTiki89 13:46, 13 September 2016 (UTC)
I agree with Wikitiki. I think it would be better to omit the syllable marks entirely for English. Benwing2 (talk) 14:27, 13 September 2016 (UTC)
Purely from a user perspective, I'd prefer if a dictionary would have a house notation like /ˈhæṗɪ/, rather than omit information because of minor issues. Korn [kʰũːɘ̃n] (talk) 14:42, 13 September 2016 (UTC)

Not working for two-syllable words?Edit

I noticed that using syllable markers in IPA transcriptions now adds words to categories indicating the number of syllables the words have, but only if the words have three or more syllables. Thus, /əˈfɹʌnt/ or even /ə.ˈfɹʌnt/ does not add affront to "Category:English 2-syllable words". Why? — SMUconlaw (talk) 12:20, 26 September 2016 (UTC)

Please read the description of Category:English 2-syllable words. --Daniel Carrero (talk) 19:26, 26 September 2016 (UTC)
For the record, I oppose having obviously broken and non-working code in the mainspace, and also the passive-aggressive "supposedly"-categories that attempt to pin the blame on the existing, working entries. Equinox 21:50, 26 September 2016 (UTC)
The category title is awful, though it does do something to remove the illusion that we are anything but a work in progress. Why would we want to have categories that were conspicuously mispopulated? Should the offending code be neutered until it is emended? DCDuring TALK 00:58, 27 September 2016 (UTC)
I have removed the code. DTLHS (talk) 01:10, 27 September 2016 (UTC)
Thanks. DCDuring TALK 02:39, 27 September 2016 (UTC)
In case anyone cares, the categories were hidden, so the situation wasn't that bad. --WikiTiki89 17:39, 27 September 2016 (UTC)
Thanks. I care, but I hadn't checked. DCDuring TALK 20:39, 27 September 2016 (UTC)
I think it has been firmly established in previous discussions that the only way to have a syllabification for English Words is to have a human involved. I can understand the desire not to have the syllabification markers in the IPA codes for English words, Since I am interested in the categories for 1 or more syllables, I propose that a new template such as SYL be created and the categories for English 1-syllable terms, 2-syllable etc. be filled by the new template. In the process of creating the SYL "call" I can delete the dots (.) which I have placed in IPA template, and call the This will be harder than the previous method of just reviewing the already created list of words to see which are mis-classified, but I think this might be sufficient to handle your objections. If necessary, the code can create entries in a local user's category space. I volunteer mine. Since I don't know LUA enough, I suggest the new SYL module use the code previously supplied by Daniel Carrero. Bcent1234 (talk) 21:03, 30 September 2016 (UTC)
Why do we need a template? Why not just add the categories manually? You don't have to delete dots unless they happen to be problematic. --WikiTiki89 21:10, 30 September 2016 (UTC)
I'd rather use a template and not add these categories manually. They are too long ("[[Category:English 2-syllable words]]") and would be a pain to write or copy/edit. We can use a shorter template like {{syl|4}} to place the entry in a 4-syllable category. --Daniel Carrero (talk) 00:09, 1 October 2016 (UTC)
can a template be smart about the language or should use include en as a parameter or have en in the name? Bcent1234 (talk) 21:17, 1 October 2016 (UTC)
I'd rather use a template, but since this seems to be something we can't just do (witness the removal of the previous Lua code allowing this to be a work-in-progress) I am just going to put the category call in the pronunciation section of words, and start from scratch. I value syllabification, but don't want to make waves in other folks' domains. As a group project, I support making wiktionary useful for all who can access it. 13:45, 3 October 2016 (UTC)
@Bcent1234: You can use the template {{cln|en|X-syllable words}}, which puts the page in [[Category:English X-syllable words]]. I don't think a short template like {{syl|en|X}} is justified for this purpose. --WikiTiki89 17:52, 5 October 2016 (UTC)

What Needs to HappenEdit

The main obstacle to resolving this dispute is that neither CodeCat nor Wyang trust the process- for good reason. In past disputes, we've had an unfortunate tendency to put out the immediate fires and then sweep the issue under the rug. Faced with this possibility, both have tried to get things the way they want them so that they don't lose out when everyone gets tired of the issue and moves on. The one thing we don't want to do is to jump in and take unilateral action- that will just confirm the worst fears of the one who loses out.

We need to resolve this now, before it becomes out of sight, out of mind. The way to do this is to get down to discussing what the new configuration should look like, in concrete terms.

Notice I said "discussing". We simply haven't gotten to the point of drafting votes, because we're still all talking past each other- any vote will most likely not address the issues needed to resolve the dispute and will just complicate things. The correct sequence is to come to a consensus, and then draft a vote, if necessary.

I can't do anymore at the moment because I'm still at work and it's really late. I'll spend some time on my way home trying to come up with a way to get the discussion started. Please don't blow things up in the meanwhile... Chuck Entz (talk) 02:25, 13 September 2016 (UTC)

I would support passing additional information (such as the name of the calling template and perhaps more) to the romanization module. This would make the Thai-specific code in Module:links that started this whole dispute unnecessary. I still think that there should only need to be one romanization module even if it provides both transliterations and transcriptions. --WikiTiki89 13:50, 13 September 2016 (UTC)
Another detail that hasn't been mentioned much is that Wyang wants to pass link target to the Thai module in order to find the transcription on the linked page. There are numerous reasons why this is a bad idea. Wyang has mentioned that the performance impact of reading the text of a page in a module is not as bad as people might assume at first, but that is not even the only issue. The romanization module must be able to romanize full unlinked sentences (such as in usage examples) and even redlinks. This cannot happen if the module depends on the existence of the link target. Not only that, but it would produce incorrect results for links with alt text, since it would transcribe the linked form and not the displayed form. --WikiTiki89 13:55, 13 September 2016 (UTC)
Is the reason for passing additional information such as the name of the calling template so that the Thai module can show a transliteration in etymologies and a transcription in translation sections? I'm opposed to doing that; I think it would be extremely confusing. Better to show both types of romanization in all places, as I've mentioned before. Allowing this would be a major user-facing change and needs a vote (that's why I had Dan create the vote). If this vote passes, then I think we should still require that transcriptions are always shown, and transliterations are also shown in the places where it's desired (e.g. etymology sections). Benwing2 (talk) 14:23, 13 September 2016 (UTC)
According to Wyang, some entries already do this. It should probably be reversed if there is no consensus for it. Though with how Wyang is, he'll put up a fuss and start another edit war. —CodeCat 14:27, 13 September 2016 (UTC)
I think that transcription and transliteration need to be separated on some level. First of all, one is conceptually an attribute of the script, and other of the language. Thus changes to a transcription of a script will have to be applied to all trans* modules separately making human errors likely. Second, transcription should be available to overriding while transliteration should always be automatically generated. Also, in historical languages using Abjads, it should be noted that having both of these would be useful, as one is a factual shape of the word as found in the text and other an educated guess and both are necessary to explain some etymologies.
Regarding the question of whether both or one romanization should be displayed, I suggest that, no matter what is decided to be the default option, appropriate html tags be placed around the transliteration so that a custom .css file can hide these for users that understand the script in question (seeing anything written in Cyrillic repeated in Latin can be slightly annoying when you already are native in the script).
Yet I do not understand the details of our current implementation and why Wyang's changes are creating problems. If his way of doing this is indeed too harmful I support reverting it, but then please draft an alternative solution to this. Crom daba (talk) 17:36, 13 September 2016 (UTC)
The alternative solution was Wikitiki's changes, which Wyang reverted over and over again and I reinstated over and over again. Contrary to what you might think, Wyang's changes actually did not establish separate transliteration and transcription. It merely bypassed the fact that the Thai transliteration module was called "translit" by putting the code that would have gone in there in Module:links instead. I argued that such code did not belong there, but it still remains there after months of bickering over it. —CodeCat 17:43, 13 September 2016 (UTC)
So what was the issue that Thai editors were complaining about? Crom daba (talk) 17:58, 13 September 2016 (UTC)
Wyang? He was complaining that transcription code should not go in a "transliteration" module, even though it's the normal practice on Wiktionary to do so. Because he didn't want to put the code where it belonged, he started messing with Module:links instead, and that's where I stepped in, and now we have this situation. —CodeCat 18:39, 13 September 2016 (UTC)
The whole point is: transcription and transliteration utilities should be separately maintained in the module system, whenever there is a foreseeable possibility that purpose-suited romanisation may be useful for the language. The argument is how to design a module structure, specifically a romanisation infrastructure, that best supports the features of these languages and therefore the wishes of the language-editing community. We are not proposing that language A should use X format of romanisation, or that Akkadian/Tibetan romanisations should be written as such, or that different modes of romanisation should be used in different locations (cf. link); these are all highly language-specific questions that need to be addressed separately and individually in discussions among knowledgeable editors. Our role here is to envisage the language-specific romanisation requirements that may be proposed, and partition our stored romanisation utilities in a way that is most regular and easiest to invoke, and in a way that does not deter editors in these languages from contributing in a way they consider most appropriate for the language.
The crux is “foreseeable possibility” of purpose-suited romanisation for a language. The reason purpose-suited romanisation is relevant is due to the different natures of the two modes of romanisation: transliteration is spelling-based, thus more etymology-oriented, and transcription pronunciation-based. The case of abjads is slightly different, but the benefit of storing utilities still applies. Why is purpose-suited romanisation and hence transliteration-transcription utility separation relevant on Wiktionary? Because:
  1. It is already being implemented in these languages ({{ko-etym-native}}). It is the consensus of the language community on how romanisations should be differentially applied. It is unreasonable to demand that the practice of using purpose-suited romanisation, which has been adopted universally in a language (you do not edit) for nearly ten years, be “reversed” without supplying any reason.
  2. Printed dictionaries do the same. The following are all the previewable Tibetan-English or English-Tibetan dictionaries on Google Books:
    Tibetan-English: 1, 2, 3
    English-Tibetan: 1, 2, 3.
All the Tibetan-English dictionaries use transliterations to romanise, and all the English-Tibetan ones use transcriptions to romanise. Why? Because different modes of romanisation are suited to different purposes – transliteration for etymology and transcription for translation from English.
  1. It conforms to the existing module infrastructure for these languages. In languages observing a transliterative-transcriptive contrast or languages where transliteration is intrinsically impossible, the transliteration-transcription distinction is strictly adhered to when the language-specific modules were designed. Where transliteration is impossible, the term “transliteration” is not ambiguated to mean “transcription”; we do not have Module:zh-translit and Module:ja-translit, instead we use Module:zh/Module:zh-pron and Module:ja/Module:ja-pron to handle transcriptions. Where the transliteration-transcription distinction makes a difference on a romanisation level, modules are named and maintained unambiguously; there are Module:bo-translit and Module:th-translit for transliteration, and Module:bo/Module:bo-pron and Module:th/Module:th-pron for transcription. It is the consensus of how romanisation utilities are maintained in these highly script-pronunciation discordant languages.
  2. It makes maintenance easier. Maintaining the transliteration and transcription modules separately makes whatever preference there is for the romanisation output less difficult to achieve. Seeing that abjads were raised before, if we decide to apply juxtaposed transliteration-transcription for all abjads or languages X, Y, Z, we can just add in some brief code in the links module to concatenate the outputs of transcription and transliteration modules of these languages (one can also be manually supplied), as these modules have already been recorded appropriately in language_data. If one day we would like to remove transcriptions in romanisations for languages X, Y, Z, we could simply remove the brief code added in earlier, without having to go through all the *-translit modules and delete the transcription passages, wondering whether they should be kept somewhere before they vanish.
  3. Using page parsing to achieve romanisation has no demonstrable harm. Transcription is inherently more difficult than transliteration; it is nearly perfectly automatable for certain languages (e.g. Korean) but most of the time it needs to be achieved using additional tricks, and page parsing is one of the tricks. I cited w:Wikipedia:Don't worry about performance before and I still think it is also very relevant for the technical structure on Wiktionary. The possibility of using page parsing has made us realise that it is perfectly possible to obtain both the transliteration and transcription for a word when they differ greatly, and this is very exciting. I think all the Thai editors would agree that the implementation of parsing since early this year has made their work much easier (Wiktionary:Statistics, sorted by change in #gloss definitions), and I doubt anyone would be in favour of removing this functionality and having to supply romanisations manually. Likewise for Chinese templates.
  4. Having an additional functionality module which does something useful is always beneficial. As long as it is maintained adequately. This could be said of transcription modules using parsing to obtain the romanisations. Even though it will not be able to grab a transcription from uncreated entries, or entries which have no pronunciation information, this is an indication that those entries need to be improved. In the case of Thai, having some automatic romanisation is better than having none and having to supply one manually. In the end, we aim to encompass all words in all languages and utilities have to be adapted to ensure we are at our highest efficiencies while progressing towards that goal. I'm sure the functionalities of this site won't be limited to what is present at the moment. If we want to build a Thai transliterator and a Thai transcriber to romanise a Thai passage (similar to what Google Translate is doing simultaneously to the translation), or if we want to develop a tool to romanise a Tibetan text in different ways, having an infrastructure in place which does not confuse the utilities will be essential.
Very few things are improved all of a sudden. While there is no transcription consideration in the central modules and the transcription modules are not recorded, it is most appropriate to name and maintain the romanisation utilities accurately. When the transcription modules can be recorded in language_data like the transliteration modules, the code should be migrated and rewritten. Above are my rationales for keeping the transcription and transliteration utilities separate for these languages where the different modes of romanisation are contrastive. Wyang (talk) 07:02, 14 September 2016 (UTC)

News from French WiktionaryEdit

Hi all,

French Wiktionary is quite proud to publish every month a page with some fresh news about the project, Actualités. It is not targeting contributors but visitors and people interested into words. After 17 editions, we decided to translate our last edition of August into English, to make this publication available for you. It was quite a long job, so we are expecting your comments to know if it worth it, if we continue to translate our next editions or our previous editions too. Feel free to comments on any aspects of this publication, we are very open to improve it and our translation - as English is not my mother tongue. Thanks a lot to Andrew Sheedy (talkcontribs) and Pamputt (talkcontribs) for this translation! Noé (talk) 09:26, 13 September 2016 (UTC)

@Noé: Merci, mis amis (je sui americain, et no parle franc,ais...) Mis petites contributions. —Justin (koavf)TCM 13:54, 13 September 2016 (UTC)
@Koavf: In case you care, some corrections: mes amis, je suis, ne parle pas. --WikiTiki89 13:59, 13 September 2016 (UTC)
Je (ne) parle pas. UtherPendrogn (talk) 17:31, 13 September 2016 (UTC)
@Wikitiki89:, @UtherPendrogn: Merci! —Justin (koavf)TCM 22:49, 13 September 2016 (UTC)

Wikidata for Wiktionary: let’s get ready for lexicographical data!Edit

Hello all,

The Wikidata development team will start working on integrating lexicographical data in the knowledge base soon and we want to make sure we do this together with you.

Wikidata is a constantly evolving project and after four years of existence, we start with implementing support for Wiktionary editors and content, by allowing you to store and improve lexicographical data, in addition to the concepts already maintained by thousands of editors on Wikidata.

We have been working on this idea for almost three years and improving it with a lot of inputs from community members to understand Wiktionary processes.

Starting this project, we hope that the editors will be able to collaborate across Wiktionaries more easily. We expect to increase the number of editors and visibility of languages, and we want to provide the groundwork for new tools for editors.

Our development plan contains several phases in order to build the structure to include lexicographical data:

  • creating automatic interwiki links on Wiktionary,
  • creating new entity types for lexemes, senses, and forms on Wikidata,
  • providing data access to Wikidata from Wiktionary
  • improving the display of lexicographical information on Wikidata.

During the next months, we will do our best to provide you the technical structure to store lexicographical data on Wikidata and use it on Wiktionary. Don’t hesitate to discuss this within your local community, and give us feedback about your needs and the particularities of your languages.

Information about supporting lexicographical entities on Wikidata is available on this page. You can find an overview of the project, the detail of the development plan, answers to frequently asked questions, and a list of people ready to help us. If you want to have general discussions and questions about the project, please use the general talk page, as we won’t be able to follow all the talk pages on Wiktionaries.

Bests regards, Lea Lacroix (WMDE) (talk)

@Lea Lacroix (WMDE): Thanks to you and everyone at d: for working hard to try to integrate this project into Wikidata. —Justin (koavf)TCM 13:46, 13 September 2016 (UTC)

Open call for Project GrantsEdit

Greetings! The Project Grants program is accepting proposals from September 12 to October 11 to fund new tools, research, offline outreach (including editathon series, workshops, etc), online organizing (including contests), and other experiments that enhance the work of Wikimedia volunteers. Project Grants can support you and your team’s project development time in addition to project expenses such as materials, travel, and rental space.

Also accepting candidates to join the Project Grants Committee through October 1.

With thanks, I JethroBT (WMF) (talk) 14:49, 13 September 2016 (UTC)

Quotation questions (redux)Edit

Last month, we had a discussion about quotations and what should be included and where. Several contradictory opinions were expressed. I'm willing to go make the changes to the quotations I added, but I don't think we quite reached consensus there on what to do. If this is not the right place to find consensus, please advise where I should take the questions at hand. Thanks! --Flex (talk) 17:04, 13 September 2016 (UTC)

I think yet another vote might be in order, alas. It's complicated for me at least because I don't necessarily dislike citations being shown together for different forms (perhaps per user setting), but I don't think they should be stored that way. See my comments in the discussion linked above. Equinox 20:14, 15 September 2016 (UTC)
Ok, since this month is about to expire, I'll put it on next month's cooler. --Flex (talk) 17:49, 28 September 2016 (UTC)

RFDO discussion for Template:character infoEdit

I created an RFDO discussion for a high-use template. See: WT:RFDO#Template:character info. --Daniel Carrero (talk) 05:17, 14 September 2016 (UTC)

Deceased long-term userEdit

Eclecticology, one of Wiktionary's first editors, has died; see [4]. This was announced over at w:en:WP:AN, the en:wp administrators' noticeboard. As a very infrequent visitor here, I don't know your procedures for the accounts of deceased editors, but someone should remove his account's bureaucrat rights, since Wiktionary:Votes/2015-11/Eclecticology for de-admin and de-bureaucratting concluded in favor of removing both those user rights, but somehow only the administrator right was removed. Nyttend (talk) 12:02, 14 September 2016 (UTC)

Thanks for notifying us. It appears that the account does not have any user rights at the moment. —Μετάknowledgediscuss/deeds 19:53, 14 September 2016 (UTC)
Should the accounts of deceased users be permanently blocked in order to prevent hacking? —Aɴɢʀ (talk) 21:05, 14 September 2016 (UTC)
It's been done, but I see no reason to once rights are removed (in fact, it can be quite an annoyance if the block notification turns up on all their userpages when those userpages are still useful to other editors). —Μετάknowledgediscuss/deeds 21:11, 14 September 2016 (UTC)
If they get a cross-wiki block, as far as I know it doesn't show up on their userpages. --WikiTiki89 21:17, 14 September 2016 (UTC)
Some projects block accounts of deceased editors, and others don't, while some projects do other stuff (en:wp protects their userpages and adds a deceased-user template), so I figured I'd just announce it and let you regular editors follow your procedures. Nyttend (talk) 21:57, 14 September 2016 (UTC)
Eclecticology was involved in the establishment of Wiktionary, and was Wiktionary's first bureaucrat. He also created this very forum, the Beer parlour. RIP. --Yair rand (talk) 22:38, 14 September 2016 (UTC)
In case anyone wants to see this. Here's also the first ever BP discussion. --WikiTiki89 22:55, 14 September 2016 (UTC)
Sorry to hear it. But long may he live on in the edit histories! My opinion about blocking is that yes, we should do it where it is confirmed that somebody has died, just for the sake of security. A disused account might somehow be exploited or hacked; a blocked one generally can't be. Equinox 18:21, 15 September 2016 (UTC)


I just noticed that WT:ATTEST doesn't say anywhere that a word has to be attested in the language of the entry. Oversight? --WikiTiki89 22:27, 14 September 2016 (UTC)

We must have read each other's mind because I was thinking the exact same thing. I think that should be added in. It's an assumption that none of us really sought to codify before, but you know what they say about making things idiot proof. —CodeCat 22:29, 14 September 2016 (UTC)
Probably not an oversight. Plenty of words can be attested in words other than the language of the entry. Also, thanks for calling me an idiot. UtherPendrogn (talk) 22:33, 14 September 2016 (UTC)
If the shoe fits, UtherPendrogn. —CodeCat 22:36, 14 September 2016 (UTC)
Sometimes reports written in other languages are the only evidence about the existence and meaning of words in languages that were not reduced to writing until close to or after the time of their extinction or at least the loss of some of their vocabulary. This happens fairly often for names of organisms. Sometimes early explorers', missionaries', et al reports of the organism and a genus name or specific epithet are all that remains. I would think that some words in those languages could be reconstructed from multiple reports written in the language(s) of the explorers, et al. DCDuring TALK 23:10, 14 September 2016 (UTC)
Well here I guess we're talking about uses, not mentions or reconstructions. What you describe would pretty much be a reconstruction or maybe a mention. --WikiTiki89 23:14, 14 September 2016 (UTC)
Nothing is fool-proof, as the saying goes. And I worry that attempting to close a (debatably-existent) loophole that there's been no serious effort to game (a single user misunderstanding the rules does not strike me as a serious i.e. potentially-successful effort to re-interpret them) could cause more harm than good. What would be the effect on words in various extinct languages that are attested only embedded in works in other languages (e.g. an Ancient Greek text includes the only known few Paeonian words, a Spanish-language book gives the only known Ciguayo word)? I hope we can just rely on the majority to be as intelligent as we've been being, in discerning when a text is saying "and que is a word in French" versus when it's saying "and these are some words" and one user is just erroneously arguing "some" is French in that snippet. - -sche (discuss) 05:14, 15 September 2016 (UTC)
I'd appreciate not being called unintelligent if possible. UtherPendrogn (talk) 05:16, 15 September 2016 (UTC)
The alleged repercussions seem like a feature to me, not a bug. This might be a bigger can of worms, but I suspect that languages attested entirely by mentions perhaps shouldn't qualify for regular mainspace inclusion — not necessarily in terms of being moved to an appendix altogether, but they perhaps should be given substantially different treatment (e.g. in terms of entry layout) from better-attestable ones. --Tropylium (talk)
I'm not sure what those differences would be. Our current approach seems to handle them fairly well, actually. —Μετάknowledgediscuss/deeds 22:49, 17 September 2016 (UTC)
What I originally meant was that uses must used in the langauge of the entry. For mentions, I don't think it matters what language mentions them, as long as it can be deduced what language is being mentioned. --WikiTiki89 22:55, 17 September 2016 (UTC)
I think the passage on use-mention distinction covers this, and it's not a loophole. Something like "Venezia isn't a word in French" wouldn't count towards an attestation of Venezia in any language because it's not being used. Renard Migrant (talk) 23:29, 17 September 2016 (UTC)
Except that we allow mentions for some poorly attested dead languages as mentioned above. What I'm trying to say is that "I went to Venezia" cannot count as an attestation of the Italian word "Venezia", because the sentence is in English, even though this is a use not a mention (it can, however, count as an attestion of "Venezia" for English). --WikiTiki89 23:36, 17 September 2016 (UTC)

2nd Definitions voteEdit

I created Wiktionary:Votes/2016-09/Definitions — non-lemma to edit the next piece of WT:EL#Definitions.

This is basically a minor edit that converts two simple vote links into a single line of text. For this reason, I'm just creating the vote without prior discussion.

Let me know if this should be discussed further. If needed, we may postpone the vote. (which I find unlikely, but who knows) Feel free to edit the vote and change the wording. --Daniel Carrero (talk) 12:35, 15 September 2016 (UTC)

Actually, I expanded the voted text with a few bullet points. I believe these are already established rules to be documented. Hopefully, they shouldn't be controversial. --Daniel Carrero (talk) 14:12, 15 September 2016 (UTC)

bor vs. loanEdit

I'm thinking of creating a bot myself to implement the results of Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor. The vote passed with 14-5-3 (73.68%-26.32%) +1 late oppose.

But, as usual with template naming votes, even though apparently the tendency is the short name winning, (I voted support and I have my own arguments to back it up) there are people who voted oppose, defending the readability of the longer names. {{bor}} is a 3-letter name, like {{inh}} and {{der}} -- but "bor" does not really mean anything. Would people prefer using {{loan}} on all pages instead? --Daniel Carrero (talk) 17:55, 16 September 2016 (UTC)

We've already voted on this. It's a done deal. --WikiTiki89 17:58, 16 September 2016 (UTC)
14-4-4. Donnanz struck out his vote but it's still counted in the numbering. But whatever, a pass is a pass. Wikitiki89's right let's not open up the issue again a minute after it's been voted on. Renard Migrant (talk) 23:49, 16 September 2016 (UTC)
Donnanz did not strike out their vote, just a statement which was part of the vote.
I'm happy with that response. I also prefer {{bor}}. I was just checking to make sure. --Daniel Carrero (talk) 00:08, 17 September 2016 (UTC)

Proposed addition to WT:NORM: the plain space (U+0020) and newline (U+000A) are the only allowed whitespace charactersEdit

Under this proposal, any other character that consists only of empty space, whether zero-width or with some width, is disallowed in the wikitext. This includes things like RTL and LTR markers, non-breaking spaces, halfwidth and fullwith spaces, and of course the plain old tab. This change, once implemented by a bot, should reduce the number of unwanted surprises with invisible characters. We could, perhaps, also introduce an edit filter that blocks any edits containing these characters, though we'd need to make an inventory of them first. —CodeCat 23:35, 16 September 2016 (UTC)

  Support -- there's already a rule forbidding the tab, so it should be edited to disallow the others. --Daniel Carrero (talk) 23:38, 16 September 2016 (UTC)
What about HTML character entities? DTLHS (talk) 23:39, 16 September 2016 (UTC)
I think we can allow those, since they're visible to the editor. —CodeCat 23:42, 16 September 2016 (UTC)
Also, FWIW, what about a newline, which is considered to be a whitespace character? — justin(r)leung (t...) | c=› } 23:44, 16 September 2016 (UTC)
A good point. That one is allowed of course. Though I'm not aware of any character other than a newline that looks the same as a newline. —CodeCat 23:46, 16 September 2016 (UTC)
I want to keep fullwidth spaces for Japanese. Although, I will note that MediaWiki disallows the fullwidth space in page titles and automatically changes it to u0020. —suzukaze (tc) 00:02, 17 September 2016 (UTC)
Maybe we should allow different script-specific spaces in quotations and usage examples written in other scripts. Aside from the fullwidth space, are there other spaces like that? --Daniel Carrero (talk) 00:16, 17 September 2016 (UTC)
The whole point of the proposal was to eliminate invisible characters that people can't tell apart or reproduce. The average editor will expect that any empty space is a generic space. —CodeCat 00:29, 17 September 2016 (UTC)
We should still allow it, just only as an HTML entity or with a template. DTLHS (talk) 00:30, 17 September 2016 (UTC)
I would want to know more about the effect this would have on display of RTL scripts before supporting this. Lines with both RTL and LTR scripts can behave in very peculiar ways, and I don't want to make it worse. Chuck Entz (talk) 02:35, 17 September 2016 (UTC)
The LTR and RTL behaviour depends on control characters, mostly in Unicode category `Cf.' I believe this proposal mainly concerns whitespace, in category `Zf,' plus two pseudo-linebreaks in `Zp' and `Zl', and perhaps the control characters '\t' and '\r.' Isomorphyc (talk) 22:23, 19 September 2016 (UTC)
I intended it to include all nonprintable characters, though maybe I didn't make it clear enough. "[...] consists only of empty space, whether zero-width [...]". Control characters fit that description and indeed I would like to get rid of those, too, as they are invisible to the editor. Though of course, if it's not clear already, HTML entities for these characters are allowed by this proposal, so it's not as if we're banning them altogether, we're just banning them in their raw Unicode form because of the editing difficulties they cause.
As a side note, we have actual entries for control characters too, but they are all but inaccessible because of, predictably, technical issues. We should delete these entries. A dictionary shouldn't concern itself with encoding artefacts; you can't tell a space from a non-breaking space in a printed work, and there's no such thing as a control character in print either. —CodeCat 22:33, 19 September 2016 (UTC)
A cursory inspection confirms that we have no or very few RTL control characters, but a few thousand LTR characters which were probably superfluously copied in from various outside sources, for example in Patch and LoD. From a little bit of experimentation with Hebrew entries, I get the impression the LTR/RTL behaviour is handled below the wikitext level. My concern with HTML entities is that we don't want to degrade anyone's native wikitext typing experience by requiring native characters to be rendered either with HTML entities or inappropriate substitutes, as with CJK spaces. Since this depends on the wikitext rendering stack, I would have to do a little bit more experimentation to convince myself of this for a few other characters. But I'm we would agree about the result if this is roughly the principle you have in mind? Isomorphyc (talk) 23:20, 19 September 2016 (UTC)
  • Oppose as phrased, because I remember that I had to use nbsp in template switches since plain white space got not carried over from wikitext into the actual page output. Korn [kʰũːɘ̃n] (talk) 07:13, 17 September 2016 (UTC)
  • As I understand it, the proposal still allows the use of &nbsp; in the edit box, just not an actual nonbreaking space itself. —Aɴɢʀ (talk) 08:05, 17 September 2016 (UTC)
    • In templates, if you're having trouble with a space not being shown, you should use &#32; instead of &nbsp;. The former encodes an actual space (which is what you want), the latter encodes a non-breaking space. —CodeCat 12:36, 17 September 2016 (UTC)

Sounds okay in theory, but (per Chuck) we should probably investigate existing entries containing the "forbidden" chars and see whether we are overlooking any legitimate use cases. Equinox 10:37, 17 September 2016 (UTC)
If we put in an edit filter I would worry about people trying to save a page and not being able to determine why the system says they can't (even after we bot replace existing uses, people will always try to copy and paste from other sources which will inevitably include control characters). DTLHS (talk) 17:17, 17 September 2016 (UTC)
Yes, any edit filter should only tag, not block. Google Books, for example, uses RTL/LTR marks around author names a lot, so anyone trying to helpfully add citations would be blocked. But if a bot is going to make periodic cleanup runs, even a tagging edit filter seems unnecessary. - -sche (discuss) 18:06, 17 September 2016 (UTC)
I wonder if there's any way to automatically remove or replace certain characters when the page is saved. DTLHS (talk) 18:14, 17 September 2016 (UTC)
Since the software automatically replaces e.g. "a" + "combining grave" with "à"; presumably the devs could update it to automatically replace nonstandard whitespace with a regular space (but I don't know if they would). - -sche (discuss) 03:33, 18 September 2016 (UTC)
They replace "a" + combining grave, with "à" because they are defined by Unicode as equivalent (i.e. they mean the same thing). Non-standard whitespace is not defined by Unicode as equivalent to spaces. So the devs probably would not implement this special case just for us. --WikiTiki89 09:40, 19 September 2016 (UTC)
  •   Oppose I use &nbsp;- and have not found a good way to dispense with it in Translingual Hypernyms and Hyponyms sections. It is also in every occurrence of all parameters of {{taxon}}, which is in virtual every taxonomic name entry. It is essential to allowing, say, a dash to follow text with a space without the dash appearing as an orphan on the following line. DCDuring TALK 03:05, 18 September 2016 (UTC)
    • &nbsp;- is an HTML entity, which we have discussed above. DTLHS (talk) 03:08, 18 September 2016 (UTC)
      I was reacting to the proposal expressed in the section title, not as it may be modified. DCDuring TALK 12:29, 19 September 2016 (UTC)
      I think the intention was in fact to allow &nbsp; when necessary in place of the literal character. --WikiTiki89 12:32, 19 September 2016 (UTC)
@CodeCat: This is good, with caveats: I'm only counting about 3500 pages affected, with about 7500 characters total (given 14 `bad' whitespace characters plus \t -- you might have a wider list, as technically '\t' is a control character, not whitespace.) About 3500 of these characters are simply &nbsp literals and can be replaced. Probably `em space' (0x2003) and `thin space' (0x1100) should be done away with-- the next two largest categories. I believe the CJK `ideographic space' (0x3000) should stay. The others are not very common. If anyone is interested here is a list (I omitted user pages, talk pages, etc.) :
  • hex,char,name,count
  • 0xa0, ,NO-BREAK SPACE,3454
  • 0x2003, ,EM SPACE,1136
  • 0x2009, ,THIN SPACE,1100
  • 0x200a, ,HAIR SPACE,687
  • 0x3000, ,IDEOGRAPHIC SPACE,609
  • 0x2008, ,PUNCTUATION SPACE,282
  • 0x2002, ,EN SPACE,165
  • 0x1680, ,OGHAM SPACE MARK,45
  • 0x202f, ,NARROW NO-BREAK SPACE,40
  • 0x2028,(),LINE SEPARATOR,39
  • 0x2005, ,FOUR-PER-EM SPACE,17
  • 0x2004, ,THREE-PER-EM SPACE,4
  • 0x2007, ,FIGURE SPACE,1
Isomorphyc (talk) 21:38, 19 September 2016 (UTC)
We just need to decide which ones we want to convert to a HTML entity, and which to replace with something else like a regular space. I think the em space, thin space and hair space and most other different-width spaces can become regular spaces. The Ogham space should stay, that's actually a printable character, it represents the line on which Ogham letters are written. Also, what about zero-width characters like the LTR and RTL markers, or zero-width non-breaking spaces? —CodeCat 21:46, 19 September 2016 (UTC)
We would have to test to see that they are there accidentally. If somebody typed it with a keyboard, it should stay in most cases, I think; but if it was pasted in from somewhere, has zero width, and has no effect on the presentation, that is a good sign it should go. For what it is worth, to make this more concrete, here is an equivalent list of control characters with their counts in Wiktionary: User:Isomorphyc/Sandbox/Control Characters in Wiktionary. By inspection, it seems inappropriate to remove all of them and worthwhile to remove some. Isomorphyc (talk) 00:02, 20 September 2016 (UTC)
I think there needs to be an exception in cases where the character is part of the normal encoding of a script. The only example I can think of is Persian, where the zero-width non-joiner is used in compound words and before the plural morpheme ها ‎() (for example: شب‌ها ‎(šab-hâ)). It would be unfortunate to have to encode this as {{m|fa|شب&zwnj;ها|tr=šab-hâ}}. --WikiTiki89 14:01, 20 September 2016 (UTC)

Proto-Brythonic verb lemmasEdit

Should we reconstruct verb lemmas as absolute or conjunct 3rd person singulars, as in *ėɣɨd or *aɣ < *ageti? The former is more similar to the Proto-Celtic lemmas in form, but the latter more or less become the standard 3rd person singular in the daughter languages. Anglom (talk) 02:34, 18 September 2016 (UTC)

Or maybe the first person singular, since that's the usual lemma for Middle Welsh in academic material (notwithstanding the fact that we at Wiktionary use the verbal noun instead, a situation which I've been meaning to rectify but haven't gotten around to yet). I don't know what form is usually given as the lemma for the various stages of Breton and Cornish. —Aɴɢʀ (talk) 14:35, 18 September 2016 (UTC)
I thought about that, but the 3rd singular is usually the most commonly attested form in the earlier languages, it feels a little more justified to list them that way. Anglom (talk) 15:30, 18 September 2016 (UTC)
I favour the 3rd singular as well, though I'm not decided on absolute or conjunct. I think I'd prefer the form that descends from the Proto-Celtic lemma directly, but since we already don't do so for Old Irish, the point for Brythonic is moot. Since Proto-Brythonic and Old Irish are similar in terms of development, using the same form for them makes them easier to compare. —CodeCat 12:36, 19 September 2016 (UTC)

Separating transcription from transliterationEdit

We seem to be at an impasse on this issue, with discussion having died out again. Here are a few ideas to start discussion with:

  1. Why don't we have a separate pronunciation parameter? Not only could this be used for transcriptions, it would also be useful for disambiguating homographs like wind. The main drawback is that it could be overused/stuffed with information best left to pronunciation sections.
    The reason I bring this up is that our current romanization method routes everything through the |tr= parameter. For languages that have both transcription and transliteration, that leaves no way to tell which is being displayed. Having a separate parameter also makes it easier to set it up as a parallel to our current treatment of transliteration.
    1. |pr= seems the most logical name for such a parameter
    2. How would we distinguish between the two? I think we should leave transliteration as it is, and use a superscript in front for the transcription: (Transcr:fonɛtɪk spɛliŋ) (with the superscript linked to something informative)
  2. Either way, I don't think we should have language-specific special code in Module:links if we can avoid it: it's currently the seventh-most-transcluded page on Wiktionary, used by 4,889,303 pages. More importantly, it's often used dozens of times on a single page and in a few cases thousands of times. Just on general principles, the part of Module:links that's always executed should be only for things that are general in nature and can't be handled in more specialized routines. Even if the overhead is minimal, the clutter makes it harder to maintain. I can understand temporarily putting in a short-term kludge until a solution can be integrated into the regular module structure, but kludges have a way of growing as more special cases arise. They also are harder to understand/maintain: I don't think it would be obvious to most people that local phonetic_extraction = {["th"] = "Module:th"} has anything to do with transcription, and I'm not sure someone wanting to make changes related to transcription would look for the code where it is now.
    1. I think the best approach to integrating transcription would be to have a separate value for transcription modules in the Module:languages data submodules to parallel "translit_module"
      1. I propose naming it "transcr_module"
      2. I propose naming the entry-point function in these modules "pr()" to parallel the translit modules' "tr()"
      3. It would then be a simple matter of adding parallel code to what we have in module:links for transliteration

I obviously like my proposals, but feel free to tweak, rework or replace any or all of it. The only thing I ask is that we arrive at something concrete, and not more theoretical or who-did-what-and-why-I-don't-like-it talk. Thanks! Chuck Entz (talk) 02:18, 19 September 2016 (UTC)

If we lack cooperation between our Lua module editor, we'll have the situation where transliterations and transcriptions are handled by separate modules for Japanese, Chinese, Thai, Burmese, Tibetan, etc and have no integration with other main modules. Wyang's templates (linked to appropriate modules) like {{th-l}}, {{ja-r}}, {{zh-usex}} exist almost in a separate world. I'd like to be able to transliterate Thai or Japanese by passing Thai phonetic respelling/hiragana with spacing, capitalisation,e tc but also use the features common to other templates. --Anatoli T. (обсудить/вклад) 02:37, 19 September 2016 (UTC)
As stated elsewhere, I am very much in favor of this, though for a different reason. Vahagn and I had discussed how many languages with abjads or other writing systems require both a transliteration and transcription (Hittite, Old Persian, Mycenaean Greek, etc.). This would greatly reduce the amount of |tr= overloading necessary to represent these languages. —JohnC5 02:46, 19 September 2016 (UTC)
|tr= may mean either transliteration or transcription or a mixture of both. For most languages, including abjad-based, the transcription-like transliteration has been the preferred one. That is also the case for Thai but displaying the character sequence (i.e. the "real" transliteration) can still be used for various purposes.--Anatoli T. (обсудить/вклад) 02:54, 19 September 2016 (UTC)
I support this. Wyang (talk) 06:03, 19 September 2016 (UTC)
Sounds good, only I'd prefer it if we didn't bind transcription to phonetics, because for some ancient languages it would be preferable to write for example: (Sogdian) ૛ૣી૒ીૡ૏ો૏ૐ ‎(pš'x'rycyk) (pašaxārēčik) without going into details of what exactly were 'a', 'ā', 'ē' or 'č'. Crom daba (talk) 08:33, 19 September 2016 (UTC)
What do you mean by "preferable". I want to know how to read/pronounce the word, so I want see "pašaxārēčik", as would be the case for Persian and other abjads. The actual string of characters can also be useful for etymologies or for people interested in learning the script.--Anatoli T. (обсудить/вклад) 08:39, 19 September 2016 (UTC)
Perhaps I wasn't clear. It is preferable to write "pašaxārēčik" rather than "pəʃɨxaret͡ʃjək" (don't quote me on this "reconstruction"). Obviously we need both transcription and transliteration (for one, because there still aren't any free fonts for Manichaean Unicode as far as I know). Crom daba (talk) 09:05, 19 September 2016 (UTC)

AWB accessEdit

Hello. I would like to get permission to use AWB on the English Wiktionary. I will use it to update Romanian adjective templates to a new format, since it's too tedious to do manually. I've never used it before, but from what I can tell, it doesn't seem too complicated. Thank you! Redboywild (talk) 09:59, 19 September 2016 (UTC)

You look like a good candidate for AWB but unfortunately I have no idea how to give you access. Anyone know? Benwing2 (talk) 05:05, 20 September 2016 (UTC)
Never mind. All you do is edit the list on the AWB page. Done. Benwing2 (talk) 05:08, 20 September 2016 (UTC)
Thanks a lot! Redboywild (talk) 08:15, 20 September 2016 (UTC)

Statistics to guide improvementsEdit

I've been experimenting with extracting data from a Wiktionary export (enwiktionary-20160901-pages-articles.xml). Along the way, I keep generating stats to help me get a feel for how the data is organized. Many of them seem like they would be of interest to Wiktionary staff and editors who have an eye to making improvements. So I thought I'd ask if that's correct.

Here is an example stat I generated last night. From the English set, in articles that have an =English= header and a =Noun= or other PoS header, here are all the distinct headword template names I found and their counts:

  • en-PP: 1
  • en-Proper noun: 7
  • en-abbr: 510
  • en-acronym: 67
  • en-adj: 97944
  • en-adjective: 64
  • en-adv: 16002
  • en-adverb: 18
  • en-comparative of: 1
  • en-con: 164
  • en-conj: 24
  • en-conj-simple: 36
  • en-conjunction: 13
  • en-cont: 375
  • en-contraction: 27
  • en-decades: 86
  • en-det: 72
  • en-initialism: 879
  • en-interj: 1346
  • en-interjection: 45
  • en-intj: 101
  • en-letter: 53
  • en-note-upper case letter plural with apostrophe: 2
  • en-noun: 207413
  • en-number: 39
  • en-part: 16
  • en-particle: 19
  • en-phrase: 106
  • en-plural noun: 1304
  • en-plural-noun: 6
  • en-prefix: 1012
  • en-prep: 373
  • en-prep phrase: 3
  • en-preposition: 21
  • en-pron: 315
  • en-pronoun: 65
  • en-prop: 329
  • en-proper noun: 23854
  • en-proper-noun: 176
  • en-propn: 11
  • en-punctuation mark: 2
  • en-suffix: 614
  • en-symbol: 52
  • en-usage-equal: 1
  • en-verb: 26654

As you can see, quite a bit of redundancy. And more than a few slated for deletion, like en-abr.

Some of the things I've found have motivated me to do some edits on articles with minor formatting errors. If there's interest, I'd be happy to supply more data like this. Thoughts? Jim Carnicelli (talk) 14:09, 19 September 2016 (UTC)

Are you sure about your regular expressions or other id. method? Just looking at one template, {{en-PP}}, which your listing says is used once, this special page reports that it is used on 137 pages, all but one of which is principal namespace.
I believe that the redundancy is principally attributable to redirects. eg, {{en-prop}}, {{en-proper-noun}}, {{en-propn}} are all redirects to {{en-proper noun}}.
How would we use these statistics for improvements? DCDuring TALK 14:51, 19 September 2016 (UTC)
Please bear in mind I'm new to this. I'm treading lightly because I know I'm surely missing an awful lot of context.
In generating the above list, I had already prefiltered based on the "ns" (I assume that's short for "name-space"), =English= header, and =<PoS>= header, with a finite list of the following parts of speech and pseudo-PoS: Determiner, Conjunction, Noun, Proper noun, Pronoun, Verb, Adverb, Adjective, Preposition, Interjection, Contraction, Prefix, Suffix, Affix, Particle, Numeral, Symbol, Initialism, Abbreviation, Acronym, Phrase, Prepositional phrase. So expect it to be a subset. Also, I'm using an entirely proprietary system. I'm not familiar with all the tools available within Wiktionary, so I don't expect I'll generate the same results.
Given that I've already found and corrected what I believe are minor mistakes in articles, like a missing =English= header in one case and an empty "=====" header-like line, I'm assuming there are many more such formatting errors. My goal is quite simply to help call them out in the event that others might be interested in studying and possibly correcting them. Just another set of eyes.
My personal interest in this has to do with being able to extract structured data like brief definitions and synonyms. I'm impressed so far to see that most of the term definition articles appear to follow a rigorous structure that is parsable. I'm presently focused on English single-word terms with an eye to computational linguistics tasks like part of speech tagging. I plan to make code I write freely available for creating condensed JSON-structured data. Thus far I've been able to transform Wiktionary's 4.7GB articles export into a 100MB JSON file with over 300k terms from the 4M+ articles. I'm struggling now with how best to parse out the head-word templates (e.g., "{{en-noun|-|adoxographies}}") and definition lines.
The above list is one trivial example I thought to include as an illustration. I just want to know that it's worth generating further lists. I don't want to take the time or trouble anyone if there's no interest. Jim Carnicelli (talk) 17:43, 19 September 2016 (UTC)
You may want to look into mwparserfromhell. It can simplify a lot of the work for you. —CodeCat 17:46, 19 September 2016 (UTC)
The headword template you mention has its code at Template:en-noun; unlike certain templates, this one is quite well documented. Templates change a lot (too often, really) so be prepared to revise your parser code very frequently. Equinox 17:47, 19 September 2016 (UTC)
Ah, thank you. I'll look into mwparserfromhell. Also, I have studied the Template:en-noun template. I appreciate how thoroughly it's documented. Some of the other head-word templates are a little less well documented. I was intrigued by finding several (e.g., Template:en-abbr) which are slated for deletion. I assume this means articles that use them still need cleanup. Jim Carnicelli (talk) 17:58, 19 September 2016 (UTC)
It was decided that "abbreviation", "initialism", etc. are not parts of speech, so we should use the appropriate PoS instead (e.g. TLC is a noun). Equinox 18:00, 19 September 2016 (UTC)

Proposal for bot redirects for numbers up to a million.Edit

Per the outcome of various recent deletion discussion relating to numbers, I propose to bot-create about four million redirects which will point otherwise non-idiomatic numbers between 101 and to 1,000,000 to Appendix:English numerals#Naming rules (short scale). The reason that this will come to about four million redirects is that I propose to redirect from:

There are other possible variations:

Basically, I'd like to have a bot redirect all commonly used ways of making all possible non-idiomatic number combinations up to one million. However, in saying this out loud, it sounds pretty crazy. Is this a bad idea? I want people who look up numbers to be taken somewhere for their trouble. bd2412 T 15:22, 19 September 2016 (UTC)

Maybe we could edit {{didyoumean}} to cause any absent page, whose title is a number, to redirect to the appendix? Like Amazing (redlink) redirects to amazing. --Daniel Carrero (talk) 15:32, 19 September 2016 (UTC)
In general, I support your approach, but I would not overload the template with this. There are other really cryptic SoPs like (S01E01) to handle... --Giorgi Eufshi (talk) 06:55, 20 September 2016 (UTC)
Strong oppose. In fact many of the higher numbers will be unattestable. But more importantly, I don't see any reason why numbers are more special than other SOP combinations. --WikiTiki89 15:37, 19 September 2016 (UTC)
Re: attestability, try picking any number at random between 101 and 1,000,000 and do a Google Books search for it. You'll be amazed at how many random references you will find to "437,214 cubic yards of material" or 808,777 hogs having been infected with a disease, or an "increase in book value of ledger assets 279,361". I can virtually guarantee that every single number up to a million (and probably for a good way up from it) is attested in some ledger, census, valuation, report, or record. bd2412 T 15:55, 19 September 2016 (UTC)
I would argue that all numbers up from 10 are SOP. 4, for example, is defined as "The cardinal number four." but really it means "A digit used to form numbers, whose value is four × 10ⁿ, where n is the digit placement counted from the right. In 432, 4 means four hundred. (don't get me started on real numbers and non-decimal bases)" --Daniel Carrero (talk) 16:02, 19 September 2016 (UTC)
@BD2412: Regarding attestability, I was referring mainly to the spelled-out forms. --WikiTiki89 16:49, 19 September 2016 (UTC)
And just to give you an idea of how crazy this idea is, we currently have 440,889 English lemma entries, and you're proposing to create 4,000,000 redirects to one appendix page. --WikiTiki89 17:46, 19 September 2016 (UTC)
The practical drawbacks to this seem larger than the small-to-nonexistent benefits, to me. Who is going to fail to know what 347654 means, but (be able to input it, and) be helped by an appendix? Who is going to fail to know what "four hundred seventy-two thousand, five hundred fifteen" means, but think to look up that whole string rather than the parts? If all numbers are bluelinks, it will drown any effort to see if e.g. a certain number happens to have an entry (due to being idiomatic), a slight drawback, but compared to a slight-to-nonexistent benefit. Having every possible number in this range, including e.g. strings identical to phone numbers, be bluelinks (which, when, edited by someone after the bot, won't show up in a noticeable place like Special:NewPages) also seems like an invitation to easy-to-miss vandalism. And as Wikitiki says, why are these more deserving/needing of entries than other SOP but (or and, or or) "regularly formable" strings? - -sche (discuss) 16:42, 19 September 2016 (UTC)
  • Just on a nitpicky point, telephone numbers would not fall into the sweep of this proposal, unless 100-0000 is a phone number somewhere. It would, however, cover all the zip codes. bd2412 T 17:14, 19 September 2016 (UTC)
    Unless you leave the USA. - TheDaveRoss 17:24, 19 September 2016 (UTC)
    BD2412 cannot leave the USA (without first entering it). --WikiTiki89 18:14, 19 September 2016 (UTC)
    I am actually an American. I only ever leave the U.S. to go to Wikimania. ;-) bd2412 T 12:57, 20 September 2016 (UTC)
    Really? For some reason this whole time I've been thinking you were British... Now I'm wondering where I could have gotten such an idea. It must be that you sign half all your posts with "Cheers!" --WikiTiki89 14:08, 20 September 2016 (UTC)
I also generally oppose doing this via redirects, however if we did something to affect search results which had the same effect I think it might be of use. - TheDaveRoss 16:51, 19 September 2016 (UTC)
If the search results can be tweaked to this effect, that would be a fine solution. bd2412 T 17:12, 19 September 2016 (UTC)
There's another practical problem here. 415 is four hundred and fifteen in the UK and four hundred fifteen in the US and it can't simultaneously redirect to both. But in general, the proposal has no merit because it proposes making redirects for things that aren't words in any language. I have plenty more specific objections, but I think that one alone is enough. Renard Migrant (talk) 17:59, 19 September 2016 (UTC)
AFAICT the proposal is to redirect all of "415", "four hundred and fifteen", and "four hundred fifteen" to the same appendix, which is technically doable, but I tend to agree it's not desirable. - -sche (discuss) 18:23, 19 September 2016 (UTC)
Not to mention that 415 is not only a number in English, but also in practically every other language, so it doesn't make much sense to redirect it to the appendix page on English numerals. --WikiTiki89 18:58, 19 September 2016 (UTC)
Strongly oppose creating the "wordy" ones like three hundred and sixty-seven. Frankly I think the numeric ones would be pretty dumb too but that is more arguable. Equinox 18:32, 19 September 2016 (UTC)
  • A significant benefit would be that we could dramatically reduce the number of times a new contributor tries to add full entries for the terms. And we might be able to reduce the number of discussions of some of the inane matters relating to numbers that appear in some of our discussion pages.
  • Could we accomplish the goal of directing users to appendices by some other means?
As I understand it, we could accomplish the entry-prevention goal by protecting the pages for which we think we don't want entries or, perhaps, by an edit filter. DCDuring TALK 19:06, 19 September 2016 (UTC)
Oppose; just feels totally wrong, and will add tons and tons of unnecessary entries. I think we should have numbers up through 100, plus 200, 300, ... 900, plus powers of ten above that; partly I want these entries for translation purposes, since many languages have non-SOP ways of expressing them. (Plus any non-SOP numbers of course -- 101, 411, etc.) Benwing2 (talk) 04:58, 20 September 2016 (UTC)
I agree with respect to the numbers we should have. The question is what to do about numbers we shouldn't have, but which readers may for whatever reason either look for anyway, or try to create anyway. bd2412 T 13:01, 22 September 2016 (UTC)
  • I think you mean hard redirects, the things using #REDIRECT syntax. I am not very excited even about hard directs. Does anyone collect statistics about the number of page accesses of non-existent entries? This could give us an idea of whether to perform the redirects at least for 1 to 10,000, or the like. Since the would be #REDIRECT things, they would not show up in the number of entries, I figure. --Dan Polansky (talk) 17:14, 8 October 2016 (UTC)

Declension tables versus usage notesEdit

I'm wondering how to treat a certain phænomenon. If certain grammatical forms replace other forms, or create new ones, should that be put into the declension table or the usage notes?
Examples: German subjunctive forms are now used as imperative forms, for phrases like "let's go". And most importantly for me: Low German optative forms replace, piece by piece, Low German preterite forms in the course of 400 years. So should I add the optative forms as alternative forms into the declension tables or make a note about this as usage notes? Korn [kʰũːɘ̃n] (talk) 12:16, 20 September 2016 (UTC)

If it's something that applies to all or most verbs across the board, then it shouldn't be in a usage note as the usage note would have to appear on every single verb entry. Maybe there could be a footnote within the inflection table itself saying something like "Increasingly used as the preterite" or whatever. —Aɴɢʀ (talk) 12:41, 20 September 2016 (UTC)
Sorry, yes, when I say usage note, I do mean one in the table. Cf. vri. Korn [kʰũːɘ̃n] (talk) 12:53, 20 September 2016 (UTC)
I think that's fine, especially for a historical language. For a modern language we might not want to list all obsolete forms in inflection tables. (Though TBH I do have a tendency to put obsolete inflected forms in Irish declension tables, so maybe I'm being hypocritical.) —Aɴɢʀ (talk) 15:12, 20 September 2016 (UTC)


@Angr, Chuck Entz, Anglom, JohnC5, CodeCat, Wikitiki89 Should we include some Proto-Nostratic words? If so, how would they be organised? We obviously can't put them as ancestors to PIE and Native American words without extensive proof~they're linked, of which there is little...? Some words could definitely be linked though, like PIE heu and Native American iw, both originating from a common ancestor (PN?). UtherPendrogn (talk) 20:10, 20 September 2016 (UTC) https://en.wiktionary.org/wiki/User:UtherPendrogn/k%CA%BCo An example of a word. UtherPendrogn (talk) 20:18, 20 September 2016 (UTC)

Nostratic is silly, founded on extremely poor data and poorer assumptions, and flies in the face of what rigour historical linguistics may claim. If there is sufficient reason to compare a form of unclear etymology with one in another language with no sure relationship, that is acceptable, but by no means should Nostratic "terms" be linked to or given serious consideration. —Μετάknowledgediscuss/deeds 20:19, 20 September 2016 (UTC)
Is there a better accepted ancestor to PIE? UtherPendrogn (talk) 20:22, 20 September 2016 (UTC)
Not really. Pre-PIE features are postulated based on internal reconstruction, but there's no higher node phylogenetically that has acceptance in academic linguistics. —Μετάknowledgediscuss/deeds 21:25, 20 September 2016 (UTC)
Does this mean I should stop making Sino-Caucasian entries? Crom daba (talk) 23:46, 20 September 2016 (UTC)
In my opinion, yes. Even if that's phylogenetically valid (which I doubt), it can't really be reconstructed to the standards expected by most historical linguists. —Μετάknowledgediscuss/deeds 00:23, 21 September 2016 (UTC)
What do you mean by "Native American iw"? Are you referring to Amerindian? Having worked a little with Uto-Aztecan and Yuman, I'm more than a little skeptical about that. There are former American Indian phyla such as Hokan and Penutian that have been mostly abandoned for lack of evidence (though there's evidence for some of the subdivisions)- the trend seems to be going away from unification rather than toward it (except for Dene-Yeniseian). As for Nostratic itself, everyone who believes in it seems to have a different combination of constituent families. Chuck Entz (talk) 03:36, 21 September 2016 (UTC)
Yeah, I've often wondered about Dene-Yeniseian. I had to read Vajda's paper in college and found it very convincing. Also, I believe there was recently a paper showing genetic evidence that the two peoples spent a significant period in the Bering Strait before splitting East and West. I remember that from a discussion with some professors I met from Diné College who also recalled a time when a Ket speaker came to the Navaho nation and discussed apparent cognate words in the two languages. But then again, all of this still remains too circumstantial. —JohnC5
I'm friends with an Athabaskanist who told me all the Athabaskanists she knows are pretty much convinced by Dene-Yeniseian. But it's definitely the exception rather than the rule for new suggestions of high-level groupings to be accepted by the wider linguistics community. —Aɴɢʀ (talk) 12:13, 21 September 2016 (UTC)
Do we have any Athabaskanists working on here? If any trusted Athabaskanist wanted to begin adding PDY forms, I'd be prepared to make a code for it and point PY and PND at it. —JohnC5 14:21, 21 September 2016 (UTC)
As I recall, some earlier BP discussions settled on basically the following rules of thumb for forms in Nostratic etc. macrolanguages:
  • the comparisons themselves can be mentioned in etymology appendices for PIE etc., if properly cited;
  • they cannot be created as their own reconstruction entries, with the special exception of Proto-Altaic;
  • they cannot be mentioned in mainspace entries.
I would support a compact appendix (or set of appendices) that listed the members of alleged Nostratic etymological groups together with reconstructions used by different authors, though (as said, no two groups of Nostraticists substantially agree on anything, so e.g. Illich-Svitych's Nostratic ≠ Dolgopolsky's Nostratic ≠ Bomhard's Nostratic). For that matter, I would even support proto-entries as soon as you can provide two unconnected sources (not e.g. from one scholar + one of his students) who can both agree on what the term's descendants are and what its reconstruction should be ;)
Re OP though, nobody considers Amerind to be "Nostratic". "Amerind" itself is a hypothetical macrofamily of a similar size as Nostratic; what you'd use to link them is "Borean" or perhaps "Proto-World" (the likes of which should probably be banned entirely from Wiktionary, being another order of magnitude more speculative than the likes of Nostratic or Amerind or Sino-Caucasian). --Tropylium (talk) 20:42, 22 September 2016 (UTC)
  • I recall reading that the emerging evidence from archaeology is painting a picture of multiple waves of migration from the Old World to the New over the span of thousands (tens of thousands?) of years, which would seem to make any such "Amerindian" family quite moot. ‑‑ Eiríkr Útlendi │Tala við mig 21:28, 22 September 2016 (UTC)

I plan to clean house in WT:RFD.Edit

There are a number of months-old RFDs that have received little or no discussion. I'm giving fair warning that I plan to close all of these as no consensus in the next few days, unless an actual consensus develops quickly. Cheers! bd2412 T 13:04, 22 September 2016 (UTC)

The default in RFD is no objection, since the proposer themselves is generally in favour of the deletion. With no response, that's 100% in favour, therefore delete. —CodeCat 13:32, 22 September 2016 (UTC)
It's not that straightforward in the first half-dozen discussions. They have at least one half-hearted objection to deletion. What then? bd2412 T 13:43, 22 September 2016 (UTC)
Then it's no consensus. —CodeCat 14:16, 22 September 2016 (UTC)
Which causes the entry to be kept, I believe. --Daniel Carrero (talk) 14:19, 22 September 2016 (UTC)
That means an erroneous entry might be kept by virtue of sufficiently great user apathy towards the topic. If we turn it around, correct entries might go for the same reason. We don't have a better alternative, huh? Jury duty or something. Korn [kʰũːɘ̃n] (talk) 15:20, 22 September 2016 (UTC)
I've done my jury duty. --WikiTiki89 15:35, 22 September 2016 (UTC)
Inspiring. (Not sarcasm.) Maybe we can have a (collapsed or optional or something) list of RFDs/RFVs without any replies (= With only one signature.) in the watchlists? Like we have with the votes. Korn [kʰũːɘ̃n] (talk) 16:33, 22 September 2016 (UTC)
Not all my votes were to delete. I hope I didn't accidentally vote twice. DCDuring TALK 18:06, 22 September 2016 (UTC)
I don't there's a problem with erroneous entries being kept, as they can just be sent to RFV, where the default is to delete. And I think it's better to err on the side of keeping an SOP entry when in doubt, than to delete it just because not enough people care. Andrew Sheedy (talk) 21:41, 23 September 2016 (UTC)
RFD is for redundant entries, not for erroneous, for which we have RFV. So echoing Andrew Sheedy. --Dan Polansky (talk) 17:09, 8 October 2016 (UTC)
Not really: no objection and the only poster the nominator => no consensus since one person does not consensus make; that is my position. --Dan Polansky (talk) 16:56, 8 October 2016 (UTC)

User:Embryomystic form-of editsEdit

Embryomystic has been fiddling around with form-of entries for a while now. Some of it is ok, but they've also replaced the perfectly-valid {{plural of}} with {{inflection of}} just for the sake of it. Now, they have their eyes set on Spanish and Portugese, and seem to replacing {{masculine plural of}} and similar generic templates with some language-specific templates that do the same thing. I objected to this but was ignored, so I'm bringing it to wider attention here. Generic templates should always be used if possible, and replacing them with custom templates for no reason is pointless. —CodeCat 19:44, 22 September 2016 (UTC)

You were not ignored, just disagreed with. I didn't create the Portuguese templates, but I find them useful, and I've been adding them to Portuguese adjective form entries that don't have them, and just recently I created parallel Spanish and Italian templates. I realise now that when I started doing something similar with Catalan that I was stepping on your toes, and I didn't object to you reverting Catalan entries, as you yourself had made similar templates for Catalan, but I don't really see why there's a problem with adjective forms being sorted into relevant subcategories as the Portuguese ones have been for some time now. embryomystic (talk) 19:50, 22 September 2016 (UTC)
Subcategorising non-lemma forms is mostly a pointless exercise that nobody benefits from, and therefore it's not worth the increased complication introduced by not using generic templates. —CodeCat 19:57, 22 September 2016 (UTC)
By this logic, should we delete Category:English adjective comparative forms? --Daniel Carrero (talk) 20:02, 22 September 2016 (UTC)
I wouldn't oppose it, unless someone can come up with a real use case. To me, this is no different from categorising Latin verb forms as "1st person forms", "singular forms", "indicative forms", "active forms" and so on. Categorising for the sake of it, not because anyone is ever going to have a use for it. Subcategorising lemmas is useful, but non-lemmas not really. —CodeCat 20:08, 22 September 2016 (UTC)
As someone whose browsing as a user trying to find words was more than once hindered by non-exhaustive categorisation, I'm leaning towards too much rather than too little. Korn [kʰũːɘ̃n] (talk) 20:36, 22 September 2016 (UTC)
  • Hear, hear. If there's no harm in having a category, I say keep it -- it's highly likely that someone somewhere has found it useful. ‑‑ Eiríkr Útlendi │Tala við mig 21:31, 22 September 2016 (UTC)
    • That's good and well, but then why create a language-specific template? If we all agree that such categories are useful, then surely they are useful regardless of language. Therefore, this functionality could be integrated into Module:form of and the language-specific templates done away with. I still oppose it, but if it's going to happen, it might as well be done right. —CodeCat 21:39, 22 September 2016 (UTC)
    • Also, as for it not being harmful, consider that one particular user created one category for every single form of Turkish nouns and verbs, a few years ago. It was a huge mess, resulted in dozens of useless categories. Since I'm assuming we don't want to repeat that, the question is how much is too little, how much is too much, and how much is just right. Personally, I think just about none at all is just right. —CodeCat 21:45, 22 September 2016 (UTC)
      Whatever the prefered, bear in mind that WT:ACCEL creates entries using the generic templates. --Q9ui5ckflash (talk) 13:24, 23 September 2016 (UTC)
      WT:ACCEL can be customized to use custom templates. But I agree, that if we want these, they should be language independent. That doesn't mean we need to use them for every language, but if it makes sense for Spanish, Portuguese, and French, then why not use a single template for all three languages? --WikiTiki89 13:33, 23 September 2016 (UTC)
      That certainly makes sense to me. embryomystic (talk) 23:30, 23 September 2016 (UTC)
      I don't think it makes sense for these languages either. —CodeCat 23:40, 23 September 2016 (UTC)

Let's get rid of the "Quotations" headerEdit

Wiktionary:Quotations says "Longer lists of quotations may find a more appropriate place in a separate section, as they would hamper readability for people only interested in the definitions." In this case, I think that the quotation really belongs on a separate citations page. The point of citations pages is to avoid cluttering up the entry with information that is not directly relevant to the words and definitions, but may still be useful for some readers (and for WT:CFI). So I propose abolishing this practice/header altogether, and moving its contents to the citations page. —CodeCat 21:37, 22 September 2016 (UTC)

I think I would support that. Equinox 21:39, 22 September 2016 (UTC)
I support removing the "Quotations" header, and adding {{seemoreCites}} in individual senses. This past vote might be relevant: Wiktionary:Votes/2016-02/Removing "Quotations". --Daniel Carrero (talk) 21:47, 22 September 2016 (UTC)
I also support. I don't think it is used terribly often as it is. - TheDaveRoss 21:56, 22 September 2016 (UTC)
Support wholeheartedly. The quotations sections are little more than clutter. Andrew Sheedy (talk) 21:57, 22 September 2016 (UTC)
Support. I've always found it weird that this header was even there, and it's annoying to see it on random entries. PseudoSkull (talk) 22:03, 22 September 2016 (UTC)
Oppose using citations page to hold citations that could easily go under the definition line. DTLHS (talk) 00:29, 23 September 2016 (UTC)
I always thought that the quotations used with definitions were just a selection of all the citations found on the citation page. That is, that one is a subset of the other. —CodeCat 01:09, 23 September 2016 (UTC)
I don't know what other people think citations pages are for. In my mind it is for quotations of as yet to be defined terms and for senses that are being researched, and the contents should be moved to the main entry if it is possible. DTLHS (talk) 01:15, 23 September 2016 (UTC)
I figured citation pages were just for collecting all the citations, the more the better? —CodeCat 01:18, 23 September 2016 (UTC)
Like I said, this might just be me. I would be interested to know what other editors think citation pages should be used for. DTLHS (talk) 01:19, 23 September 2016 (UTC)
My opinion is this:
  • The Citations: page should be used to collect an indefinite number of citations, the more the better. Getting citations from the internet is okay too, if the sense is already attestable through durably archived media such as Google Books.
  • Each sense should have only a small number of citations in the main page, which should preferably be representative and unambiguous, concerning that particular sense.
  • It would be nice if the citations in the entry were always a subset of the citations in the Citations: page, if the Citations: page is a big one. It is normal to add a new citation in an entry without copying it in the Citations: page, and I'm okay with that if there are only one or a few quotations.
  • The "Quotations" section in entries seems to be useless. If it is used simply to point to the Citations: page, the link could be added below each sense, when applicable. If it contains one or more quotations where it is unclear to what sense they belong, they can't be "representative and unambiguous" as suggested above and should be in the Citations: page until we figure out what to do with them.
  • Usage examples and quotations complement each other, so I oppose if people remove usexes just because the entry/sense has quotations.
--Daniel Carrero (talk) 01:34, 23 September 2016 (UTC)
I agree that usexes and quotations are complementary. I think Wiktionnaire does an exceptional job of illustrating definitions with a balance of both (relatively speaking, we are rather lacking in this area). Andrew Sheedy (talk) 01:38, 23 September 2016 (UTC)
I think of quotations in entries as just a special kind of usex: a usex that's attested and cited from another work. They're meant to illustrate the use of the word in that meaning, using an example from "out in the world" instead of something we made up. I don't think one should be favoured over the other, we should simply pick what works best in the particular situation. If none of the cites illustrate the use particularly well, a made-up example would do better. —CodeCat 17:54, 23 September 2016 (UTC)
I've always understood citations pages to be used the way CodeCat describes, in addition to hosting citations for senses that have yet to be added (or where the intended sense is unclear). I think they should eventually hold as many citations as is practical, to demonstrate as wide a range of use as possible (including various time periods, regions, registers, and genres). Andrew Sheedy (talk) 01:36, 23 September 2016 (UTC)
Quotations For what it's worth, that's how I think of Citations as well: that namespace has a large chronology of uses from which we pick a handful of particularly illustrative ones to show in the definition in the main namespace. —Justin (koavf)TCM 02:58, 23 September 2016 (UTC)
Question. The title of the thread is "Let's get rid of the 'Quotations' header", but could anyone explain for me exactly what the "Quotations header" is and where it appears? It seems from some comments that it is not the "[quotations ▼]" link that is seen next to some definitions, but if not that then what? Mihia (talk) 17:45, 23 September 2016 (UTC)
@Mihia: The entry abyss has a "Quotations" section. It contains the text:
It is a section, like "English", "Noun", "Etymology", "Pronunciation", etc. --Daniel Carrero (talk) 17:51, 23 September 2016 (UTC)
I see, thanks. My comment then would be that if the "Quotations" section was not there to provide a link to the "Citations" page, then probably many people would not notice that the Citations page existed. However, I don't know on what basis you would put actual quotations in that section rather than using the inline "[quotations ▼]" method. Mihia (talk) 19:27, 23 September 2016 (UTC)
There are places where the section contains quotes as well, see halcyon. - TheDaveRoss 14:42, 28 September 2016 (UTC)
So on what basis is it decided whether to put quotations below the definition ("[quotations ▼]"), or in a "Quotations" section, or on a separate "Citations" tab? Mihia (talk) 20:05, 28 September 2016 (UTC)
Support. - -sche (discuss) 18:58, 23 September 2016 (UTC)
Support, and use the citations namespace for citations where meaning is not clear or is for a definition we don't have yet. Wherever possible, citations go under the sense they are supporting. Renard Migrant (talk) 16:52, 24 September 2016 (UTC)
Oppose using a Beer parlour discussion for something that did not make it in a fairly recent vote: Wiktionary:Votes/2016-02/Removing "Quotations". It looks like an unintentional forum shopping. --Dan Polansky (talk) 16:48, 8 October 2016 (UTC)

Vote about not nesting headings inside stuffEdit

Based on Wiktionary:Beer parlour/2016/August#Proposed addition to WT:NORM: headers cannot be nested inside things, I created Wiktionary:Votes/pl-2016-09/No headings nested inside templates or tags. --Daniel Carrero (talk) 04:01, 23 September 2016 (UTC)

IMO your vote is not well-phrased. What you want to disallow is something like this (i.e. where the header is surrounded by newlines):
Something like this: {{foo|==English==}} where there aren't any newlines isn't such a problem, and might conceivably actually occur. (In general, embedded newlines in templates cause lots of parsing problems, even using mwparserfromhell.) Benwing2 (talk) 05:22, 23 September 2016 (UTC)
@CodeCat, do you wish to comment here? This was her idea. In any event, I hope people don't start using {{foo|==English==}} without discussion, it seems weird and without precedent. Is there any possible use for this, even a hypothetical one? --Daniel Carrero (talk) 18:37, 23 September 2016 (UTC)
I recall we do it on talk pages. But this proposal is for entry space only (I don't say mainspace because entry space includes Reconstruction: too), so it doesn't matter. —CodeCat 18:54, 23 September 2016 (UTC)
Just in case, I added a note in the vote to remind people that the proposal only affects entries. --Daniel Carrero (talk) 18:57, 23 September 2016 (UTC)

Proposal for an extension to a few different page creation templates in the case of the search query containing more than one wordEdit

I propose a new addition to one search query template and one creation template. Now, honestly, I'm not really sure how someone would do this, but I'm sure that you LUA experts out there on this site probably could have some idea.

The rationale of both proposals is that if we do this, I believe the amount of (especially new) users creating SOP entries ignorantly of WT:CFI may decrease. I know that a lot of you may be thinking "Oh, well we already linked to CFI so I'm assuming that the creators of every entry are going to sit there and read that entire page to find the part about SOP (and fully understand it)." Let's face it; people don't read terms of service, etc., pages all that often, especially not fully. People are eager to go ahead and start creating entries. So we should at least include a little more right in front of their faces. And almost the entire WT:RFD page is dedicated to finding out whether or whether not a multiple-worded entry is SOP, so perhaps we really should include something that mentions SOPs in these two templates.

I can't find the actual templates themselves on here after I've searched, so fill me in on their titles please.

The extra texts will not appear if the query does not have 2 words or more. PseudoSkull (talk) 16:23, 24 September 2016 (UTC)

Proposed text 1Edit

For large airplane:

"Wiktionary does not yet have an entry for large airplane.

  • You may Create this entry or add a request for it.
  • You can also look for pages within Wiktionary linking to this entry. This may help if, for example, large airplane is an inflected form of another word; although Wiktionary does not have the entry for large airplane, the base form may be listed as linking to this entry.
  • If you think this may be a misspelling, try browsing through our indices (e.g., the index of English words) for the correct spelling.
  • Perhaps there is a page large airplane in our sister encyclopedia project, Wikipedia.

Try searching Wiktionary:

  • If you have created this page in the past few minutes and it has not yet appeared, it may not be visible due to a delay in updating the database. Try refreshing the page, otherwise please wait and check again later before attempting to recreate the page.
  • If you created a page under this title previously, it may have been deleted. Check for large airplane in the deletion log. Alternately, check here.
  • Please also check large and airplane separately, as the definitions of those terms may collectively give you the meaning of large airplane."

For big strong girl:


  • Please also check big, strong, and girl separately, as the definitions of those terms may collectively give you the meaning of big strong girl."

For five-edged:


  • Please also check five and edged separately, as the definitions of those terms may collectively give you the meaning of five-edged." PseudoSkull (talk) 16:23, 24 September 2016 (UTC)


Proposed text 2Edit

For large airplane: "Wiktionary does not yet have an entry for large airplane.

  • To start the entry, type in the box below and click "Save page". Your changes will be visible immediately.
  • If you are not sure how to format a new entry from scratch, you can use the preload templates to help you get started.
  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of large airplane does not equal the sum of the definitions of large and airplane. "

For big strong girl: "[...]

  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of big strong girl does not equal the sum of the definitions of big, strong, and girl. "

For five-edged: "[...]

  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of five-edged does not equal the sum of the definitions of five and edged. " PseudoSkull (talk) 16:23, 24 September 2016 (UTC)


General commentsEdit

General comments about both proposals as a whole should go here. I wanted to bring it up here before possibly starting votes, especially since someone might have better wording for the new additions than I did and might want to reword. PseudoSkull (talk) 16:23, 24 September 2016 (UTC)

If it's possible to have the "no entry" page point users to the entries on the individual components of multi-word strings, then doing so is a good idea. But I don't know if it's possible without adding some javascript, which however we could probably do (I think some javascript is what adds a link to the search-results page when you search for a term and it's present as a translation of something). - -sche (discuss) 22:17, 26 September 2016 (UTC)
I am reluctant to implement this kind of change unless we have real evidence that it will make a difference. You have said "I believe [it] will decrease". How do you know? AFAIK, the issue is that a lot of anonymous users just don't read anything before they type. Changing the warning text won't change their behaviour. Equinox 22:43, 26 September 2016 (UTC)
I would also add that most people who create SoP entries don't do so out of what you call "ignorance", but because they disagree with policy and believe those entries should genuinely exist. Equinox 22:45, 26 September 2016 (UTC)

Not seeing See alsoEdit

Searching for "Gamergate" I typed gamergate into the Wiktionary search box. I clicked on the first entry and then scrolled down to the definition section. It did not refer to Gamergate, so I added a separate definition for that term. I was quickly reverted. Why? Because there is already a separate Gamergate page. I'm guessing my behavior is not uncommon for most casual users.

Has the project given any thought to either - (a) putting the "see also" information in the appropriate definition section (particularly helpful if there are multiple language definitions) rater than at the top of the page or (b) combining lower case and capitalized versions of words into one article? Butwhatdoiknow (talk) 13:17, 25 September 2016 (UTC)

Generally, I tend to assume that readers can see things that are at the very top of the page. That said, there is a solution to this, which I have implemented by adding a ===See also=== section pointing to the same link. By the way, I saw the definition you tried to add, and it was pretty clearly biased. That's not acceptable on Wiktionary regardless. —Μετάknowledgediscuss/deeds 16:57, 25 September 2016 (UTC)
Μετάknowledge - First, thank you kindly for making the change.
Second, I ask that you reconsider your assumption that many casual readers, focusing on the definition section, will not lose sight of everything else, including something at the beginning of the entry - particularly when they arrive at a page for the exact word they are looking up (or so they would assume, not noticing the capital/lower case difference). If you do so then I further request that you consider working to make it standard practice to do what you did for gamergate in all cases where there are separate capital/lower case pages.
Finally, I ask that you keep Hanlon's razor in mind when you consider whether a proposed definition is biased. In my case I tried in good faith to fit the opening paragraphs of the Wikipedia article into a single sentence. You evidently concluded that I failed in this attempt. But that is no reason to go immediately into chastisement mode. Butwhatdoiknow (talk) 00:35, 26 September 2016 (UTC)

Bot-replacing Template:etyl + Template:m with Template:derEdit

In the past, I already proposed and then performed a bot run to do a replacement where {{etyl}} had "-" as the the second parameter. I'd now like to do the same, but more generally with all instances of {{etyl}}, replacing with either {{cog}} or {{der}} depending on the second parameter. This doesn't add or remove any information, as "der" adds to the same categories as "etyl". However, it does make things a lot easier for future editors who want to replace "der" with "bor" or "inh" as appropriate, because then it's a matter of changing the three letters of the template name. —CodeCat 13:23, 25 September 2016 (UTC)

Support. --Daniel Carrero (talk) 13:49, 25 September 2016 (UTC)
  • What will you do about things like From {{etyl|de|hu}} thieves' argot {{m|de|Fühbar}}.? —Μετάknowledgediscuss/deeds 16:51, 25 September 2016 (UTC)
    • Nothing. —CodeCat 17:23, 25 September 2016 (UTC)
      • I dunno, I've been using CAT:etyl cleanup as a way of finding terms for which a decision needs to be made whether they're inheritances or not. I've been working on the assumption that if an entry uses {{der}} it means someone has deliberately made the decision not to use {{inh}}, but that if an entry uses {{etyl}} it probably means no one stopped to think about the difference. But if your bot empties the Etyl cleanup categories automatically, then I'll have no way of knowing which entries have already been thought about and which haven't. —Aɴɢʀ (talk) 19:18, 25 September 2016 (UTC)
        • You can use the derivation categories. Granted, they won't be emptied out, but if you go through them systematically in alphabetical order, you'll eventually cover them all. —CodeCat 19:49, 25 September 2016 (UTC)
          • But the derivation categories include everything using {{der}}, regardless of whether a human editor deliberately used {{der}} instead of {{inh}} or a bot automatically used {{der}} without considering {{inh}}. The derivation categories will be far too big for me (or anyone else, probably) to feel any motivation to work through them. As a result, inheritances will stay in the derivation categories indefinitely, thus rendering completely useless the distinction we only fairly recently decided to make between inherited and noninherited terms. —Aɴɢʀ (talk) 20:22, 25 September 2016 (UTC)
            • I agree with Angr here. We should create a new template, perhaps {{autoder}} or {{ader}}, which does the same as {{der}} except use a different cleanup category. Benwing2 (talk) 21:22, 25 September 2016 (UTC)
  • Oppose this automatic change -- I also agree with Angr here. Any instance of {{etyl}} + {{m}} is pretty clearly the old format, and can thus be easily identified as an entry that needs conversion. Meanwhile, any instance of {{der}} is impossible to distinguish from an intentional use of {{der}}, and thus cannot be easily identified for any further processing.
FWIW, I often come across JA entries where we used the {{etyl}} + {{m}} templating in the past, because that's what we had, and now we need to use {{bor}} as the term is clearly a borrowing (such as スプーン ‎(supūn, spoon) which has already been converted, or タオル ‎(taoru, towel) which hasn't yet). ‑‑ Eiríkr Útlendi │Tala við mig 21:43, 25 September 2016 (UTC)
Support, done in the right way of course. No rush, better have to a separate {{etyl}} and {{m}} than a broken entry. Renard Migrant (talk) 17:51, 26 September 2016 (UTC)
Even if the aim is to get rid of {{etyl}}, it seems we could indeed use something like Benwing2's {{ader}}: this would allow non-etymologically focused editors to add etymologies without having to research if they are "borrowings" or "derivatives" or what. --Tropylium (talk) 20:16, 28 September 2016 (UTC)

"In other projects" in the sidebar"Edit

Requested feedback Entries such as Wikipedia have "in other projects" in the sidebar and link to Wikipedia articles on a topic. For some reason, this entry only links Danish, Dutch, English, and German articles. Why? There are definitely articles on Wikipedia in other language editions of the encyclopedia. For that matter, there is material on Wikipedia on (e.g.) Commons. Why are these languages displayed? If the thinking is that these are all Germanic languages, then why not Scots (which is mutually intelligible)? Can someone explain this to me or direct me to policy discussion about it? —Justin (koavf)TCM 01:34, 26 September 2016 (UTC)

The {{wikipedia|lang=xx}} template is what puts them there. --WikiTiki89 14:48, 26 September 2016 (UTC)
@Wikitiki89: Excellent. Are there any best practices about this? E.g. should we have one for c: as well? —Justin (koavf)TCM 19:26, 26 September 2016 (UTC)
I don't see why you would want to link to Commons- just include the image or audio file on the page. DTLHS (talk) 02:50, 27 September 2016 (UTC)
To play the devil's advocate, consider this argument in favour: the entry foot shows one photo of one foot (which seems like just the right number to me), but what if you wanted to see more photos of feet? A link to Commons helps with that. —Μετάknowledgediscuss/deeds 03:11, 27 September 2016 (UTC)
@DTLHS: To encourage users to edit cross-wiki and to spread more knowledge. Why would we link to an encyclopedia? —Justin (koavf)TCM 03:12, 27 September 2016 (UTC)
Are you talking about doing it automatically whenever something is transcluded from Commons? Because I think we would need a Mediawiki extension for that- not something we could do ourselves. Otherwise you can just use {{commons}} and {{PL:commons}}. DTLHS (talk) 03:16, 27 September 2016 (UTC)
@DTLHS: Well, that is actually a good question, since having inline templates seems to make the sidebar links redundant. Why do we have both? —Justin (koavf)TCM 05:24, 27 September 2016 (UTC)

{{bor}} and {{inh}} should also categorize into "Foo terms derived from Bar"Edit

Using e.g. {{bor|fr|en|foo}} puts the term into CAT:French terms borrowed from English but not CAT:French terms derived from English. This seems transparently wrong. I will change this myself unless there is really strong objection. Benwing2 (talk) 02:48, 27 September 2016 (UTC)

But Category:French terms borrowed from English is a subcategory of Category:French terms derived from English. --Daniel Carrero (talk) 02:50, 27 September 2016 (UTC)
Hmm. This is true but hardly obvious. I'm an experienced Wiktionarian and didn't notice this. What I did notice is that Category:French terms borrowed from English and Category:French terms derived from English are in quite different parts of the tree. It still seems very wrong to me that changing from {{der}} to {{bor}} removes a term from Category:French terms derived from English, because the term is still derived from English. Benwing2 (talk) 02:56, 27 September 2016 (UTC)
Well, we do categorize English nouns in both Category:English nouns (which is more specific) and Category:English lemmas (which is more generic). I'm not sure if I agree with your proposal, because I was under the impression that it's clear enough that "borrowed from English" is a subset of "derived from English". Maybe if more people want that, it wouldn't be harmful... It would occupy more space on the list of categories of an entry, but it could also help navigation. --Daniel Carrero (talk) 03:05, 27 September 2016 (UTC)
First of all I do think it's a problem that dog isn't in those categories. But secondly, it's not a useful distinction the way you think it is because {{etyl}} categorizes into the "derived by" category which isn't necessarily a non-borrowing. Benwing2 (talk) 13:32, 27 September 2016 (UTC)
This is a question that has a risk of turning into quite the slippery slope. Suppose that a user does not know or care about the differences between the individual Slavic languages: should we do them the favor of additionally duplicating the contents of categories like Category:English terms borrowed from Polish and Category:English terms derived from Old Church Slavonic in it?
Or suppose that a user does not care for a language distinction that we make on Wiktionary. Should we provide for them a way to group together e.g. the contents of Category:Zenaga terms derived from Latin, Category:Tarifit terms derived from Latin etc. as Category:Berber terms derived from Latin?
Duplicating words in parent categories is however basically a manual work-around to what is a software problem. What seems to be really being sought here is the ability to view all terms contained ultimately inside one category, even after subcategorization. We already have the ability to preview sub-subcategories, so previewing subcategory terms should probably be possible too…? --Tropylium (talk) 20:05, 28 September 2016 (UTC)
Oppose as well. The subcategorisation already indicates that one is a subset of the other. —CodeCat 13:54, 27 September 2016 (UTC)
Who cares? Not like anyone reads these categories. It's an administrative issue only. Renard Migrant (talk) 14:31, 27 September 2016 (UTC)
That's not true. I've seen users at WT:FB and communicating through email say that they look through our categories for words. —Μετάknowledgediscuss/deeds 15:52, 27 September 2016 (UTC)
  • Strong support. I have been meaning to ask for this myself. --WikiTiki89 17:41, 27 September 2016 (UTC)
I have been under the impression that the "X derived from Y" categories are by now essentially obsolete, and that we should aim to clean them up into more specific etymological ones (with "X borrowed from Y" and "X inherited from Y" categories as the initial step): after all, we have been cleaning up instances of {{etyl}}. So I would oppose going right back. There may be room for double categorization for directly-borrowed vs. indirectly-borrowed terms, but at the very least, inherited vs. non-inherited terms should not be in the same category (where applicable).
— In terms of the category "tree" shown at the top, having to locate specific etymological categories by language family trawling such as French language » Terms by etymology » Terms derived from other languages » Indo-European languages » Germanic languages » West Germanic languages » English always seemed like a pain to me, and I am happy that {{bor}} does away with this. --Tropylium (talk) 19:51, 28 September 2016 (UTC)
Derived terms category is not obsolete, it's the category for words borrowed into older stages of the language and for calques. Korn [kʰũːɘ̃n] (talk) 08:23, 29 September 2016 (UTC)
  • I can't overstate my support. Very frankly, I always thought the absence of the supercategories was a bug (!) caused by oversight and I'm shocked that anyone would actually support this situation. I regularly browse categories as a user. And, to name a recent example from my life, if 'Romanian given names derived from Greek' are not listed under 'Romanian given names', this dictionary becomes a, pardon my French, fucking hassle to use, since the category 'Romaninan given names' does not list - as I would certainly expect from the name - all Romanian given names on this site. Why would we (the editors) make me (the user) click through 15 categories and actively prevent me from having an overview, and why would we exclude data from showing a category which it already is part of? Korn [kʰũːɘ̃n] (talk) 08:23, 29 September 2016 (UTC)
Strongly support categorizing borrowed and inherited terms also into the derived-terms category, as well as into their more specific category. Alternatively, "derived terms" could be left as the exclusive category of "der" if another macro-category were set up ("English terms from Spanish"?). I recall this being discussed before or while the current threesome of templates was being set up. Often I only want a list of terms from language X present in language Y, and don't care how they got there (direct borrowing, chain of borrowing, etc). - -sche (discuss) 20:51, 29 September 2016 (UTC)

So is this enough consensus to start implementing the complete categorisation or should this go into a vote? Korn [kʰũːɘ̃n] (talk) 22:22, 7 October 2016 (UTC)

I think it needs a vote... In this discussion, the proposal clearly has consensus but there are some people opposed to the change. I created Wiktionary:Votes/2016-10/Populating "derived from" categories with borrowed and inherited terms. --Daniel Carrero (talk) 11:43, 8 October 2016 (UTC)
Scratch that, I implemented the proposed change. Feel free to revert, discuss, etc. --Daniel Carrero (talk) 14:35, 25 October 2016 (UTC)

Vote about disallowing triple-braced template parameters in entriesEdit

Based on Wiktionary:Beer parlour/2016/August#Proposed addition to WT:NORM: no template parameter expansions, I created Wiktionary:Votes/pl-2016-09/No triple-braced template parameters in entries. --Daniel Carrero (talk) 05:02, 29 September 2016 (UTC)

Some may be false positives or inside of comments. DTLHS (talk) 05:50, 29 September 2016 (UTC)
Chinese was a false positive due to a couple of stray extra braces. Most of the others look like bad substs. I don't see the point in a vote to outlaw something that seems to be strictly unintentional. Chuck Entz (talk) 07:03, 29 September 2016 (UTC)
The purpose is to not make the bot responsible if it mangles badly-formatted entries, thereby making bots easier to write. —CodeCat 12:38, 29 September 2016 (UTC)
So this is about blame? If someone doesn't comply with the rule, then the fact of the matter is that the bot may make matters worse. I understand that good programming practice is to test input for conformity to what the program needs and to bypass what does not conform. DCDuring TALK 13:05, 29 September 2016 (UTC)
Wikitext is completely freeform. Anyone can write anything at all in it. If we don't put checks on it, bots become impossibly complicated to write. The simpler the format, the easier it is to understand for both humans and computers, which will attract more new editors. —CodeCat 13:09, 29 September 2016 (UTC)
If people aren't even aware that they're leaving this stuff in the entry, they're not going to be influenced by any rules, especially since there's usually no way to spot this without looking at the wikitext after a save. I think the best way to deal with this is a filter that warns and tags, but doesn't disallow (we don't want to make editing impossible for entries with existing hard-to-fix examples). There may be some intricacies regarding the timing of template expansion vs. the timing of the filter, so it may require some tinkering to keep it from producing false positives, but it should be doable. Of the editors mainly responsible for the list above, @Rajasekhar1961 and @Aryamanarora are relatively inexperienced, but @Equinox and @Renard Migrant are veterans and would know better than to do this on purpose. Chuck Entz (talk) 14:06, 29 September 2016 (UTC)
Could always make an abuse filter which warns/prevents saving an entry with such a formation. Making the wikitext more uniform with no downside is a no-brainer for me. - TheDaveRoss 13:40, 29 September 2016 (UTC)
Would people understand the message enough to realize that triple curly brackets are a problem? Impenetrable technical explanations followed by a rejection of one's edits doesn't sound terribly nice. —suzukaze (tc) 17:53, 29 September 2016 (UTC)
Perhaps something along the lines of "Your edit contains invalid markup; check for the use of triple braces ("{{{example}}}") which should be removed and resubmit. If you need further assistance check in at the Grease Pit." It could also just be a flag to prompt further scrutiny and keep track of when the syntax is used. - TheDaveRoss 18:00, 29 September 2016 (UTC)
Completely agree with DCD. Requiring a "useful assumption for parsers" suggests that people want to use crappy, non-watertight parsers that don't do proper validation of what they are parsing. In practice, having a rule won't guarantee the rule is actually followed (since entries are free text, as stated). And if a bot mangles 1000 entries then we still have 1000 mangled entries, even if we can conveniently say "oh well it was the fault of that Chinese IP that made one edit in 2007". Equinox 13:34, 29 September 2016 (UTC)
The one at stop is being used inside <math></math> tags with some TeX meaning and not as a template parameter. --WikiTiki89 14:46, 29 September 2016 (UTC)
The one at stop does not accurately reflect the source text. It should not have been "{{{1}}}". --Daniel Carrero (talk) 14:52, 29 September 2016 (UTC)
That's beside the point (but if you want to fix it, coalescing, dyonically, and sbottom also do that). --WikiTiki89 15:20, 29 September 2016 (UTC)
I have investigated and found that most of these come from improperly substed templates. Is there any way to create an edit filter that can detect the use of subst:? Also some of these come from the "New Entry" links that you get when searching for an entry that doesn't exist (specifically, all except for Basic, Noun, 3rd person, and Participle contain these template parameters, but they all probably need some cleanup). --WikiTiki89 15:20, 29 September 2016 (UTC)
Yes gab, merir, my bad, you used to be able to subst: {{wikipedia}} but it's been modified so you can't. Could an admin change that? It's ridiculously easy to do. Renard Migrant (talk) 17:59, 29 September 2016 (UTC)
I hope you mean {{w}}. --WikiTiki89 18:36, 29 September 2016 (UTC)
Yeah, that's the one! Renard Migrant (talk) 19:39, 29 September 2016 (UTC)
I've fixed it, by the way. --WikiTiki89 20:00, 29 September 2016 (UTC)

About that list of entries: Wikitiki89 fixed some, Renard Migrant fixed others, I fixed the rest. --Daniel Carrero (talk) 06:37, 10 October 2016 (UTC)

Coptic construct statesEdit

How should Coptic construct states be entered (examples at ⲣⲟ ‎(ro) and ⲟⲩⲱⲙ ‎(ouōm) - this affects verbs, nominals and prepositions)? Coptologists have the convention of adding a hyphen after nominal state forms and equal signs to pronominal state forms, but the equal signs aren't going to work. So should those forms be hyphenated or entered bare? Lingo Bingo Dingo (talk) 13:31, 29 September 2016 (UTC)

I suggest what we do for Hebrew, which is display the hyphen (or equals sign) in the link, but not in the target of the link. Thus:
ⲣⲟ ‎(rom ‎(nominal construct state ⲣⲉ- or ⲣⲁ-, pronominal construct state ⲣⲱ=, plural ⲣⲱⲟⲩ=)
We should probably create {{cop-noun}} to make this easier. --WikiTiki89 15:24, 29 September 2016 (UTC)
See {{=}} by the way, also something like 1== in a template should produce an equals sign. Renard Migrant (talk) 17:51, 29 September 2016 (UTC)
In fact I had to use that trick to get {{=}} to display! Renard Migrant (talk) 17:56, 29 September 2016 (UTC)
I didn't need to do that in my example above. --WikiTiki89 18:34, 29 September 2016 (UTC)
No and I wasn't claiming you did, just pointing out that that's what {{=}} is for. Renard Migrant (talk) 20:52, 2 October 2016 (UTC)
Good idea, though maybe there should be hyphens in the target since they're usually directly attached to other forms (dissimilar to construct states in Semitic languages) and many aren't stand-alone forms. Lingo Bingo Dingo (talk) 13:22, 1 October 2016 (UTC)
@Lingo Bingo Dingo: Usually or always? That's an important question. Pronominal construct states perhaps don't need their own entries, but rather a table such as at בן listing out the full pronominal forms. --WikiTiki89 17:55, 5 October 2016 (UTC)
@Wikitiki89 Coptic used continuous script, but modern publications separate words. Nominal states vary in the modern convention depending on context, the editor's preference and word class (monosyllabic prepositions are usually connected to their nominals, for nouns and verbs that's more rare), pronominal states are always prefixed to a pronominal suffix, whether it is a noun, verb or preposition. Lingo Bingo Dingo (talk) 12:17, 6 October 2016 (UTC)
@Lingo Bingo Dingo: Oh. It seems that the "modern conventions" are very similar to Hebrew, so we can follow the same approach. --WikiTiki89 14:06, 6 October 2016 (UTC)
@Wikitiki89 Fine with me. Tables don't have to be a priority though, as pronominal suffixes are relatively regular in Coptic. Lingo Bingo Dingo (talk) 11:42, 10 October 2016 (UTC)
@Lingo Bingo Dingo: If it's that regular, it should be pretty easy to make a table template. --WikiTiki89 15:44, 10 October 2016 (UTC)

Grants to improve your projectEdit

Greetings! The Project Grants program is currently accepting proposals for funding. There is just over a week left to submit before the October 11 deadline. If you have ideas for software, offline outreach, research, online community organizing, or other projects that enhance the work of Wikimedia volunteers, start your proposal today! Please encourage others who have great ideas to apply as well. Support is available if you want help turning your idea into a grant request.

I JethroBT (WMF) (talk) 19:52, 30 September 2016 (UTC)

October 2016

Initialisms etcEdit

Our current policy on initialisms fails to give guidance (or have I missed it?) on the appropriate format. Entries with the "Initialism" header are labelled as "Entries with non-standard headers". I have been adding and occasionally editing entries as "nouns" or the appriate POS.
  Do we have policy? — Saltmarshσυζήτηση-talk 10:11, 1 October 2016 (UTC)

The absence of explicit guidance leaves us with only the other PoS headers. One thing that the Acronym and Initialism headers did was provide pronunciation guidance (eg, for WHO: W-H-O or who). DCDuring TALK 11:12, 1 October 2016 (UTC)

syllable marks in English pronunciationEdit

I think they don't belong. It's especially problematic in cases like daughter written (US) /ˈdɔ.tɚ/, which would tend to imply that /t/ isn't flapped, which is false. Benwing2 (talk) 21:07, 2 October 2016 (UTC)

Or at least, they should be used only when they clearly convey something useful, as in nitrate vs. coatrack, where the tr in the middle of the two words is pronounced quite differently in one vs. the other due to the morpheme boundary in the latter. Benwing2 (talk) 21:11, 2 October 2016 (UTC)
The syllable marks are useful in my experience for non-native students of English. Some education systems and languages encourage counting and recognition of syllables. English is so arbitrary that explicit syllable marks help the newcomer.

By the way, there is more discussion of syllables at https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2016/September#Stress_marks_and_syllable_marks Bcent1234 (talk) 13:36, 3 October 2016 (UTC)

Pluralization of Acronyms and IntialismsEdit

For an entry like DINK, it is not clear to me if we want to create an entry for the plural form, which I would usually write as DINKs. This reflects the plural nature, and the fact that the word is based on an Acronym/Initialism. I don't know if there is any received practice on 1) to create the entry at all 2) to name the entry with the mixed case 3) Provide cross linking from this new entry to the plural word of similar pronunciation ( dinks ) Could someone give me some guidance ? Thanks ! Bcent1234 (talk) 13:42, 3 October 2016 (UTC)

@Bcent1234: I created DINKs, because it is a plural word (abbreviation) citable in Google Books ( https://books.google.com/ ). It's like PC -> PCs. If "DINKs" were not citable, we would not be able to create the entry. --Daniel Carrero (talk) 19:58, 3 October 2016 (UTC)

why long marks in Canadian English?Edit

Why does Appendix:English pronunciation indicate long marks for Canadian but not American English? This makes no sense. Canadian English is largely the same as American English and doesn't have any clear distinction between long and short vowels, any more than American English does. Using separate symbols means we can't write (North America) or (US, Canada) or (cotcaught merger, Canada) or similar, even though the two dialects are phonemically identical in most words. Benwing2 (talk) 21:39, 2 October 2016 (UTC)

Apparently @QuartierLatin1968 (interesting pattern of contributions) added this detail when copying the table from w:International_Phonetic_Alphabet_chart_for_English_dialects Crom daba (talk) 03:12, 3 October 2016 (UTC)
Sorry if including length marks led to difficulties! Yes, I don't contribute much to Wiktionary; I'm more on other projects. Cheers, QuartierLatin1968 (talk) 04:10, 4 October 2016 (UTC)
I am a fan of including length marks in American English for the phonemes /ɑː/, /iː/, /uː/, and /ɔː/. --WikiTiki89 18:29, 5 October 2016 (UTC)
Why? There's no noticeable length on any of these phonemes, any more than any others. Benwing2 (talk) 20:18, 5 October 2016 (UTC)
I don't think that's entirely true. It may be partially true, in that length is not phonemic, but I for example pronounce beet slightly longer than bit and bead a lot longer than bid. Anyway, the way I see it is that /iː/ is just a symbol for a phoneme and the length mark is just part of the symbol, and so choosing /iː/ over /i/ simply makes it more consistent with our UK pronunciations. And now that I mentioned the UK pronunciations, the length situation in the UK is actually not that different from the situation in the US (especially when you take internal variation into account for both countries) except for the fact that there are instances where length is the only distinguishing feature in the UK (like in dared /dɛːd/ vs dead /dɛd/ for at least some speakers). --WikiTiki89 20:35, 5 October 2016 (UTC)
It seems extra cruft to me. Also, in UK English, the lax phonemes /æ ɛ ɪ ɒ ʌ ʊ/ are quite short, shorter than the US phonemes and noticeably shorter than the tense phonemes, whereas e.g. /ɑː/ and /ɔː/ are noticeably long. In the US, however, there's no obvious length difference at all between e.g. bat, bot, bought, nor bad, bod, bawd, so writing /æ/ but /ɑː/ is misleading. We will have to distinguish UK and US English much of the time anyway so I don't see why it helps that much to distort US notation to accommodate UK notation. Benwing2 (talk) 20:54, 5 October 2016 (UTC)
It's not a distortion. Just because the length difference doesn't hold in for all vowels in all environments doesn't mean we should do away with it entirely. --WikiTiki89 21:01, 5 October 2016 (UTC)

Minimal Difference PairsEdit

Is there an existing template or practice of how to link words that lexically differ by one word ? Such as bill and bull ? Similarly, is there an existing template or practice of linking words which have a pronunciation that only differs by one sound (at least in some dialects), such as tinned and tend. These are important in teaching a language so the student knows the importance of pronunciation and knows what other words they may be mis-interpreted as saying. I know we have homophones to warn if two words sound similar. Do we have a standard way to describe these other relationships ? Bcent1234 (talk) 13:53, 3 October 2016 (UTC)

Spanish I had actually just started User:Koavf/Appendix:Spanish terms distinguished by similar letters a few days ago. If others think it is a good idea to have these, I would agree. —Justin (koavf)TCM 14:08, 3 October 2016 (UTC)
I was thinking these words would be linked in the word entries, not a separate page, much like homophones are. By the way, would the perro / pero pair qualify in Spanish? Bcent1234 (talk) 14:14, 3 October 2016 (UTC)
Exactly This is what I had in mind: differences like <l>/<ll> or <r>/<rr> or <n>/<ñ>. But this brings up the question of inclusion criteria for "minimal difference". Is it just one letter or phoneme? Bill/bull and also lose/loose? What about el/él? I assume that we want to restrict these pairs to a given language so that we aren't linking trivial things across languages like the Anglicized facade to façade, since that will be in the etymology or alternative spellings anyway if they have an actual relationship. —Justin (koavf)TCM 14:27, 3 October 2016 (UTC)
For my students, I want words that are minimally different in writing or pronunciation which yield a totally different meaning, as they cause problems for the student because it is a distinction the non-native speaker hasn't learned is significant. I don't know that there is a different between Anglicized facade to façade as I don't really speak French so I can't contrast the English word meaning to the French word meaning. Bcent1234 (talk) 15:43, 3 October 2016 (UTC)
Hungarian: I collect them in Appendix:Hungarian pronunciation pairs. A while ago I did link these entries to each other in the Pronunciation section, but other editors did not think it was a good solution. Maybe this time we can come up with a better way. --Panda10 (talk) 17:02, 3 October 2016 (UTC)
Since these are largely subjective (in terms of which groups to create, as well as sometimes what belongs in a group) I think that appendices are a much better choice than main entry space. Such relationships could quickly overwhelm entries. - TheDaveRoss 17:24, 3 October 2016 (UTC)
I personally don't see these words as subjective, as they are created by a mechanical process from an existing word.

simply identify a phoneme or a letter, and replace it by another phoneme or letter where the end result is also a word. This mechanical process is not any different than that which is chosen to create an anagram of a word. There is a part which is difficult to automate, as an anagram may be a word for some folks, and not for others, just as a minimal change to a word may not be a recognizable word for some folks and might be one for others. But this isn't really much different than homophones might not sound alike in one dialect and can in another. Bcent1234 (talk) 22:12, 3 October 2016 (UTC)

Personally, I can understand making an appendix if there were only a few minimal change words, but I see there as being a LOT of them. I can understand not having them in a single page as the page would grow very large very quickly. I guess you could use the browser "search page" capability to find a word and where it appears in a chain/groups. It just seems more inefficient as a single word may appear in multiple chains/groups. Bcent1234 (talk) 22:12, 3 October 2016 (UTC)

I don't see a need for this. --WikiTiki89 18:30, 5 October 2016 (UTC)
  • I support adding a 'Minimal pairs' header to our list of standard headers somewhere in the -nyms section. Let editors who want to invest their time do that. I'd have had use of that kind of thing many times in the past. I'm thinking of spelling only for a start. Korn [kʰũːɘ̃n] (talk) 20:27, 5 October 2016 (UTC)
I agree that spelling is the natural basis for the process/method that would create them. Could someone explain the idea and ramifications of doing this in an appendix? As you can see from my comment earlier, I thought making an appendix would involve all the minimal pairs for every word being mentioned in a single document. We currently distribute the cost of the anagrams links and the homophones links in each word. It may not change the cost, but it would make it only visible when you are in a minimal pair grouping. Bcent1234 (talk) 21:50, 18 October 2016 (UTC)
We already have users complaining that they can't find the definitions (especially in English). We also have many complaining about how long it takes to download larger pages, ie, those entries with short headwords that appear in multiple languages that are highly likely to appear in minimal difference pairs. Hiding some of the material that only a linguistics major would love addresses issue one, but not issue two. DCDuring TALK 22:30, 18 October 2016 (UTC)

Third LexiSession: policeEdit

Dear all,

Apologies for writing in non-native English; please fix any mistakes you may encounter in these lines!

The Tremendous Wiktionary User Group, a nice and open gathering of Wiktionarians, is happy to introduce the third chapter of our collective experiment baldly named LexiSession.

So, what is a LexiSession? The idea is to coordinate contributors from different languages to focus on a shared topic, to enhance all projects at the same time! First LexiSession was about cat, second on roads and ways. For this third LexiSession, we offer a month - until the end of October - to deal with the police! There is a substantial amount of slang and police codes, including abbreviations of services, and it can be very helpful to help people to better understand this domain.

English Wiktionary already have a Wikisaurus:police but there is still plenty work to do. If you're up for this LexiSession, please indicate your contributions here! You can also have a look at what other Wiktionarians are doing, on the LexiSession Meta page. We will discuss the processes and results in Meta, so feel free to have a look and suggest topics for the next LexiSessions.

Thank you for your attention, and I hope you will be interested in this new way of contributing. I'll get back to you later this month for an update! Noé (talk) 14:19, 3 October 2016 (UTC)

Update: Thesaurus of police in French have been created. Info about this LexiSession have be spread to a large range of other Wiktionary, but you are still welcome to translate this message to our non-English speaking mates on other projects, if you can :) Noé (talk) 07:24, 7 October 2016 (UTC)

Pronunciation and EtymologyEdit

I have made it a practice to put the Pronunciation section under the Etymology section, but in a word like luma I think it is more readable at the first, as the pronunciation is common to both Etymologies. Is there a consensus on this ? Bcent1234 (talk) 16:38, 3 October 2016 (UTC)

Personally, I believe it makes sense to place etymology before pronunciation (except in Japanese) and in my experience English entries often use that format. (I don't have actual numbers.) I don't think there's a consensus on this. WT:EL#List of headings has "Etymology" before "Pronunciation", but this was not voted yet. --Daniel Carrero (talk) 17:06, 3 October 2016 (UTC)
I put pronunciation before etymology, because that way, the order is the same whether there is one etymology section or several. It's more consistent that way. —CodeCat 17:10, 3 October 2016 (UTC)
By the way, note that the word is "etymology" not "entymology" (possibly you are getting it mixed up with entomology). Mihia (talk) 17:58, 3 October 2016 (UTC)
Thanks. I fixed my typos. I'm aware of the two words and difference of meaning (history & bugs) but struggle sometimes Bcent1234 (talk) 18:10, 3 October 2016 (UTC)
I put Etymology first, then Pronunciation, because the pronunciation may change due to the Etymology. For instance, English record ‎(noun) and record ‎(verb) each have distinct Etymology sections with nested Pronunciation sections. However, in the event that multiple Etymologies have identical Pronunciations, I will often place the Pronunciation outside (i.e. first) in order to conserve space on the page. Leasnam (talk) 18:14, 3 October 2016 (UTC)
I find that it's much more common across languages for the same pronunciation to apply across all etymologies, than for the pronunciations to differ. So this is the case that we should base it on. Which is why I put pronunciation before etymology. The first section after etymology should always be the POS section, except when there are etymology-specific pronunciations. The same logic applies to alternative forms as well, but since we now have a vote to allow putting them under the POS section, that point is moot. —CodeCat 18:20, 3 October 2016 (UTC)
There was a long discussion about this a few months back. The consensus was that the editors of each language should decide which order is the most reasonable for their languages - and I think, but this is a part I'm not at all sure of, that within one language, the order should be fixed. If you want to search for it, I believe it was in the Beer Parlour and involved Latin, Russian and Japanese examples, the Latin one being the entry auraria, if I'm not mistaken. Korn [kʰũːɘ̃n] (talk) 19:56, 3 October 2016 (UTC)
@CodeCat I have complained to you about this before. If existing entries do things a particular way, you need to follow that way even if you think a different way is more logical. In particular, if a given language tends to put etymology before pronunciation, you need to follow that. Benwing2 (talk) 20:12, 3 October 2016 (UTC)
Indeed. A lack of consistency makes Wiktionary harder to use, and also makes us look very disorganized. Now, on that note, CodeCat also tends to split nouns and verbs with a common origin into two etymologies on the basis of one having come from the other and thus having a marginally different origin. I don't strongly disagree with this, but which leads to a lot of inconsistency. Am I justified in merging these etymology sections, or is there a sufficient lack of consensus on this that CodeCat's position is equally valid? Andrew Sheedy (talk) 02:39, 4 October 2016 (UTC)
There already is consensus that consistency should take a second place to reasonable order, though. For me, the information that a word only secondarily derives from a homophone is relevant information and keeping it under a separate etymology seems cleaner to me. Korn [kʰũːɘ̃n] (talk) 07:37, 4 October 2016 (UTC)
I'm not sure if consistency should take a second place to reasonable order, (I mean, judging entries case-by-case, right?) but it does sound like something other people may be likely to support. Either way, I don't think there's evidence of actual consensus for that yet. Please let me know if this was discussed before. --Daniel Carrero (talk) 08:02, 4 October 2016 (UTC)
@Daniel Carrero, Bcent1234 Wiktionary:Beer_parlour/2016/January#About:_Pronunciation_1.2C_Pronunciation_2.2C_Pronunciation_3 - You actually initiated that discussion and created a failed vote from it: Wiktionary:Votes/2016-02/Multiple_pronunciation_sections#Decision. Korn [kʰũːɘ̃n] (talk) 09:16, 4 October 2016 (UTC)
Yes, but the discussion was basically about numbered pronunciation sections, like "Pronunciation 1". I don't think it is a great indicator of what we do concerning entries which have non-numbered etymologies and pronunciations. --Daniel Carrero (talk) 09:23, 4 October 2016 (UTC)
A good part of the discussion was about the relationship and order of the Pronunciation and Etymology headers, it's the only conversation relevant to the question here of which I know. Korn [kʰũːɘ̃n] (talk) 10:45, 4 October 2016 (UTC)
Fair enough. --Daniel Carrero (talk) 10:54, 4 October 2016 (UTC)
I didn't realize this was an issue. I thought our longstanding practice was to put ===Pronunciation=== after ===Etymology===, unless there are multiple etymologies and all have the same pronunciation, in which case ===Pronunciation=== comes before ===Etymology 1===. —Aɴɢʀ (talk) 21:04, 3 October 2016 (UTC)
I believe the long discussion mentioned above (the one with auraria as an example entry) was Wiktionary:Beer parlour/2016/January#About: Pronunciation 1, Pronunciation 2, Pronunciation 3, which was followed by Wiktionary:Votes/2016-02/Multiple pronunciation sections. The discussion was basically about the existence of numbered pronunciation sections: "Pronunciation 1", etc. The order between non-numbered etymology and pronunciation was at best a secondary issue in that discussion. --Daniel Carrero (talk) 07:56, 4 October 2016 (UTC)

Suggestion: Rule in EL about not linking back misspellingsEdit

I suggest adding this rule to WT:EL (section section WT:EL#Alternative forms) eventually. Apparently, this is a rule we already follow, so I figure it shouldn't hurt to formalize it:

"Misspellings link to the correct spellings, but correct spellings do not link back to misspellings. Don't link marshmallow (correct spelling) to marshmellow (misspelling)."

--Daniel Carrero (talk) 19:31, 3 October 2016 (UTC)

I heartily agree with this policy. Students don't need to know how many ways to mis-spell. They can come up with those on their own. Linking back to the correct spelling is useful, as it changes the page to the proper page. Is this more than just a #REDIRECT ? or is there a need to have a true page for the mis-spelled word ? Bcent1234 (talk) 19:48, 3 October 2016 (UTC)
But theoretically one can misspell an already misspelled word. At least there are two level deliberate misspellings: pr0n is deliberate misspelling of pron that in turn is for porn.--Giorgi Eufshi (talk) 07:53, 4 October 2016 (UTC)
I would oppose transforming all misspellings into redirects. To be fair, even if we decided to do that, pr0n is a deliberate misspelling and thus a word on its own right, and I believe words like that could be "spared" and kept as normal entries. Still, there are probably a few entries for misspellings which can't be redirects because they are spelled the same as other normal words. The Portuguese entry trás is both a normal preposition and a misspelling of a verb form. --Daniel Carrero (talk) 04:06, 5 October 2016 (UTC)
This rule has nothing to do with entry layout and does not belong in WT:EL. I don't think it really needs to be codified at all, it hasn't been a problem. --WikiTiki89 18:32, 5 October 2016 (UTC)
I agree that WT:EL is not the right place for this. But I tend to feel that the misspelling thing has grown beyond what it should be, and we now have a lot of misspelling entries that don't really deserve to exist. Equinox 19:48, 5 October 2016 (UTC)
This isn't even about them existing, but about linking to them. I don't think we have an epidemic of that and I don't think anyone would object to removing those links. --WikiTiki89 19:49, 5 October 2016 (UTC)
I'm under the impression that "entry layout" would encompass "what we should put and not put in an entry". For example: Should we add translation tables, and where? Don't add translation tables inside Finnish entries! -- Because WT:EL says we shouldn't! --Daniel Carrero (talk) 22:00, 5 October 2016 (UTC)
I see your point. However, I still don't think we need to have an explicit policy about this. --WikiTiki89 22:05, 5 October 2016 (UTC)

I'm afraid the current WT:EL#Alternative forms might imply that we should list misspellings in the alternative forms section, because it is listed equally with other entry variations. The current text was voted approved at Wiktionary:Votes/pl-2015-10/Entry name section as part of the "Entry name" section, but I moved it to the current section without a vote some time ago, because it seemed to make more sense. If we clarified that misspellings should not link back to entries, I believe it would be an improvement. --Daniel Carrero (talk) 21:58, 5 October 2016 (UTC)

Unless anyone objects to this change, that is no longer a problem. --WikiTiki89 22:05, 5 October 2016 (UTC)
I support your change, it looks good. I agree that my concern above is no longer a problem.
I'd still like to create a vote eventually, introducing that rule about not linking back misspellings, just because it's something we already do. This does look like a good "layout" rule. --Daniel Carrero (talk) 22:09, 5 October 2016 (UTC)
Why can't we focus on the issues that matter first? --WikiTiki89 22:16, 5 October 2016 (UTC)
I agree that this rule is not very important -- still, many of my EL votes are about formalizing unwritten rules that we already follow, which is. I created most of the 2016 votes (WT:VTIME) and a chunk of the votes in previous years. If we wait for all big problems to be solved we won't ever get to tackle small issues. I could even say: "I'm satisfied that we have reviewed all the current pronunciation text and our list of POS sections, and that EL even finally mentions that prefixes and suffixes usually have an hyphen in the entry title. I'm so happy that I will look for some small stuff to solve now." At the moment, I'd say that Wiktionary:Votes/2016-07/Request categories and Wiktionary:Votes/pl-2016-09/Placement of "Alternative forms" 2 (weaker proposal) are examples of really important current votes, and some other votes that I created may be less important compared to them. I prefer to have a small number of major votes at a time, because they require more thought, discussion and are harder to pass. Adding a simple "don't link back mispellings" should be a no-brainer, in my opinion. --Daniel Carrero (talk) 22:45, 5 October 2016 (UTC)
  • I agree with having common misspellings link or redirect to the correct spelling. However, it concerns me that having misspellings as headwords means that these get picked up in word lists and presented as if correct. For example, if you type bizzare into onelook.com, it lists the Wiktionary entry with no indication that it is misspelled. Someone looking for confirmation of spelling might take that as such and not look further. Mihia (talk) 00:17, 7 October 2016 (UTC)
  • You're right. I agree that this is a valid concern. Redirecting all misspellings when possible would fix that problem. When an entry is both a misspelling and an actual word, we could use some soft redirect like "You may be looking for marshmallow." or something. --Daniel Carrero (talk) 00:29, 7 October 2016 (UTC)

I guess there are some entries that have both misspelling and right-spelling senses in one and they must be linked unavoidably. Found one: apart --Octahedron80 (talk) 00:39, 7 October 2016 (UTC)

CFI and idiomaticity clarificationEdit

Based on User talk:Renard Migrant#CFI and idiomaticity clarification, I created Wiktionary:Votes/pl-2016-10/CFI and idiomaticity clarification. --Daniel Carrero (talk) 04:02, 5 October 2016 (UTC)

Creative Commons 4.0Edit

Hello! I'm writing from the Wikimedia Foundation to invite you to give your feedback on a proposed move from CC BY-SA 3.0 to a CC BY-SA 4.0 license across all Wikimedia projects. The consultation will run from October 5 to November 8, and we hope to receive a wide range of viewpoints and opinions. Please, if you are interested, take part in the discussion on Meta-Wiki.

Apologies that this message is only in English. This message can be read and translated in more languages here. Joe Sutherland (talk) 01:34, 6 October 2016 (UTC)

About the smallest discussionsEdit

Based on this request by @Korn, I created Wiktionary:Smallest discussions and added a new "smallest discussions" box in the watchlist. Feel free to discuss/revert/etc.

Note: There's a minor bug that could be annoying. The displayed entries are only properly formatted if the headings have spaces between the title and the equal signs. So, == example == works but ==example== would not work. All listed entries have spaces in the headings as I described, so this bug could go unnoticed for a while. I don't know how to fix it. --Daniel Carrero (talk) 10:29, 6 October 2016 (UTC)

Can we make the watchlist box collapsible? BTW, I plan on going through the module and reworking it, including fixing the section header bug. --WikiTiki89 13:58, 6 October 2016 (UTC)
Personally, I slightly prefer the un-collapsed watchlist box, but it's fine if other people want it collapsed. Thank you for fixing the module and reworking it. Apparently you successfully fixed the bug that I mentioned above. --Daniel Carrero (talk) 19:44, 6 October 2016 (UTC)

About WT:SDEdit

WT:SD was an old, barely used redirect to Category:Candidates for speedy deletion. I created CAT:SD for it. I'd like to avoid having redirects from WT: to Category: when possible. I edited all pages that were using this shortcut and pointed it to Wiktionary:Smallest discussions. I suppose it's OK? --Daniel Carrero (talk) 19:44, 6 October 2016 (UTC)

I'll repeat what I said on your talk page: Don't forget that these shortcuts are not only for links but also for the search bar. So the number of pages that use it is only half the picture. --WikiTiki89 21:37, 6 October 2016 (UTC)
Point taken. I don't suppose I should revert what I did? CAT:SD really is better than WT:SD. It's unlikely that many people used "WT:SD" in the search bar to mean Category:Candidates for speedy deletion, otherwise chances are they would use it more in actual discussions. At least, in the last months, barely anybody even access that specific redirect page. (see access counter) --Daniel Carrero (talk) 21:55, 6 October 2016 (UTC)
Why is it "better"? No one ever needs to discuss Category:Candidates for speedy deletion. It's just a page that admins check once in a while. I have restored the deleted WT:CSD, and re-added both shortcuts to the category page. We can re-check later to see whether CAT:SD is actually more popular than the old ones. --WikiTiki89 22:01, 6 October 2016 (UTC)
The thing with the shortcuts and the search bar is another thing that nobody tells you in an easy to find manner. Let me just mention user:Korn/draft again. Korn [kʰũːɘ̃n] (talk) 23:12, 6 October 2016 (UTC)

September News of French WiktionaryEdit

Hi all,

French Wiktionary is publishing a monthly page with fresh news about the project named Actualités. In August, we started to translate our editions to English, to give more visibility on what is going on with French Wiktionary. So after August Actualités, here is September Actualités. Translations have been made by Pamputt and I, with probably mistranslations. So, be gentle on the language, it is still a wiki and it is collaboratively improvable   We are very interested by every comments you may share about this publication and are aware of what can be of your interest for the next edition! Noé (talk) 07:37, 7 October 2016 (UTC)

I just wanted to highlight the article on Synonyms which also serves as a metric for development/maintenance of the Thésaurus. Useful pseudo-objective concept. - Amgine/ t·e 15:02, 7 October 2016 (UTC)

Dutch nouns with gender-based meaningsEdit

This topic came up while editing zegel (cf Talk:zegel). Genders are a tricky topic in Dutch, some nouns can have different meanings depending on the used gender, in this case het zegel ‎(seal) and de zegel ‎(stamp). Right now both senses are in one entry, with the gender mentioned in the definition line. Is there a better way to achieve this? Maybe split into two complete separate entries? – Jberkel (talk) 09:49, 7 October 2016 (UTC)

You can have two separate noun headings to take into account different genders, yes. Renard Migrant (talk) 17:12, 7 October 2016 (UTC)
That's what we normally do. See Swahili spika, for example. —Μετάknowledgediscuss/deeds 17:51, 7 October 2016 (UTC)
But those have different etymologies. They have different noun classes, which is equivalent having to different declensions in, say, Latin. The choice of noun class is an inherent part of the borrowing process; why else would there be two outcomes? In fact, it's equally possible that only one of them was actually borrowed, and the second was derived from the first. The etymology doesn't currently clarify this. —CodeCat 19:52, 7 October 2016 (UTC)
Well, they do come from different senses of speaker. It's unparsimonious to suggest that one was derived from the other, considering the senses involved. —Μετάknowledgediscuss/deeds 21:04, 9 October 2016 (UTC)

Standardizing Template:calqueEdit

I happened to be using this template to note a few Latin grammatical terms (for instance, optativus) that are calques of Greek terms. The template is rather annoying to use: it doesn't take the "from" language in the second parameter, the term in the third parameter, the link text in the fourth parameter, and the translation in the fifth parameter, as {{der}}, {{inh}}, and {{bor}} do, but instead uses |etyl lang=, |etyl term=, and |etyl t=. I think it should be standardized to use the same parameters as the other etymology templates. Would anyone disagree? I see it uses Module:etymology/templates, so I am not sure how to make the changes myself — Eru·tuon 18:50, 7 October 2016 (UTC)

I agree, but let's discuss what the interface should be before making any changes. --WikiTiki89 18:53, 7 October 2016 (UTC)
@Wikitiki89: Well, if this is what you mean, how about this, for the example I gave above: {{calque|la|grc|εὐκτική||related to wishing}} > Calque of Ancient Greek εὐκτική ‎(euktikḗ)? — Eru·tuon 19:06, 7 October 2016 (UTC)
The current setup is {{calque|fr|etyl lang=en|etyl term=light year}}. I'd say the new setup should be {{calque|fr|en|light year}}, but that won't work right away because the older setup, {{calque|année|lumière|etyl lang=en|etyl term=light year|lang=fr}} is still accommodated. Perhaps we need a new template with the new setup during the period of transition; {{cal}} is currently a redirect to {{calque}} but it's used on only a handful of pages. Maybe we could separate {{cal}} for now, make it follow the new setup, correct the 13 pages it's used on, and then gradually migrate uses of {{calque}} with the old setup to {{cal}} with the new. Then, once no pages are using the old setup anymore, we can move {{cal}} back to {{calque}} (deleting the old template and leaving a redirect) in order to achieve our status quo of having a short-name template redirect to a long-name template. —Aɴɢʀ (talk) 19:08, 7 October 2016 (UTC)
@Erutuon: Don't forget that the current {{calque}} template supports a lot more features, such as showing the component parts in the calquing language as well. The new template would either have to handle that or we would have to figure out how to reformate that outside of the template. --WikiTiki89 19:11, 7 October 2016 (UTC)
@Wikitiki89 Hmm. If that functionality should be supported, then perhaps the syntax would be {{calque|la|grc|εὐκτική||related to wishing|εὔχομαι|-τικός}}, but I don't quite understand how it works in {{calque}}... — Eru·tuon 19:18, 7 October 2016 (UTC)
That feature is already deprecated in the current version of the template. It's better to use the variety of morphology templates we have such as {{affix}} and {{compound}}. —CodeCat 19:20, 7 October 2016 (UTC)
We don't need to create a new template, we can make the existing one support both old and new usage in the same way that {{bor}}, {{prefix}} and {{suffix}} do. In each of those templates, the presence of lang= is tested for, and if it's absent, then the new parameter format is used, otherwise it falls back to the old one. {{calque}} already does this too, but only with lang=, not with the other parameters. However, etyl lang= is set as a required parameter by the module, so we have the guarantee that all existing uses have that parameter. This means that we can use its presence to switch between old and new behaviour in the same way. If etyl lang= is present, then use the old parameters, otherwise use the new ones. —CodeCat 19:26, 7 October 2016 (UTC)
Yes, I see now that it doesn't do anything too fancy anyway. I completely agree then. And I agree with CodeCat that we don't need a new template. --WikiTiki89 19:30, 7 October 2016 (UTC)
I've implemented my proposal now. All existing entries should still work, but see optativus, which now uses the new parameter format. That said, I'd like it if @Erutuon added a Latin etymology to the entry as well, to show which elements the word was constructed from when calqueing. —CodeCat 19:47, 7 October 2016 (UTC)
@CodeCat Done. — Eru·tuon 19:56, 7 October 2016 (UTC)

Deprecating glosses as the fourth positional parameter of {{m}} and {{l}}Edit

I have just added the parameter t= as an alias for gloss= for templates {{m}} and {{l}}. We have already been using t= for this purpose in templates such as {{der}} and {{cog}}. I think having this shorter alias should enable us to transition away from using the fourth positional parameter for this. Thus, instead of {{m|fr|école||school}}, we will have {{m|fr|école|t=school}}. The main advantage of this, is that we will no longer have to deal with the confusing empty parameter in between, that often causes a lot of errors. And especially, it enables us to more logically arrange the parameters when a transliteration is involved, for example {{m|he|בַּיִת|tr=báyit|t=house}} is much more logical than any of the current possibilities of {{m|he|בַּיִת|tr=báyit||house}}, {{m|he|בַּיִת||tr=báyit|house}}, or {{m|he|בַּיִת||house|tr=báyit}} (all of which actually do occur). What does everyone think? --WikiTiki89 19:55, 7 October 2016 (UTC)

I know this section looks scary because of all the template code, but it affects everyone and needs input. --WikiTiki89 18:08, 10 October 2016 (UTC)
I support deprecation of the parameters "gloss" and even the fourth parameter in favor of a short parameter, like "t". --Z 18:19, 10 October 2016 (UTC)
I'm not yet convinced this is necessary. Benwing2 (talk) 18:30, 10 October 2016 (UTC)
I support that.--Dixtosa (talk) 18:31, 10 October 2016 (UTC)
By the way, this was discussed before back when were first converting our link templates to lua (I think). However, I cannot seem to find that discussion (maybe someone else remembers where/when that was?). What I do remember, is that people did support it, but we didn't go through with it because I guess no one thought of using t= for it. --WikiTiki89 19:09, 10 October 2016 (UTC)

Duplication of definitions for spelling and other minor variantsEdit

It seems obvious to me that definitions should not be duplicated across entries for spelling variants and other minor variations; for example, pedestrianise and pedestrianize, or Down syndrome and Down's syndrome. However, sometimes in the past (probably quite a while ago) when I have tried to merge definitions for such entries, the merge has been reverted with an explanation that the Wiktionary convention is to keep the definitions separate. I do notice, though, that certain entries such as labour and labor, which previously had duplicate definitions, now have the definitions all in one place. May I assume that common sense has now prevailed, and that it is OK to merge all definitions to one of the variants? Mihia (talk) 20:20, 7 October 2016 (UTC)

I hope so. —CodeCat 20:24, 7 October 2016 (UTC)
@CodeCat What would be the best way to accomplish this? Transclusion? A template? A bot which monitors duplicate definitions? I have to admit, I have wondered this in the past myself... —Justin (koavf)TCM 20:37, 7 October 2016 (UTC)
colo(u)r has been particularly contentious since the early days; see Talk:colour, Talk:color. Equinox 20:39, 7 October 2016 (UTC)
I always use US spellings even though I'm British just to have some consistency. I'd happily lemmatize just color and not colour if for no other reason than consistency. Renard Migrant (talk) 23:17, 7 October 2016 (UTC)
I agree (actually, I use ours sometimes and theirs sometimes, depending on the topic or usage region) but I think our various resident Anglo-Saxonists would be horrified. Relatedly, I believe WP has a policy of using (or at least not subsequently changing) UK spelling for UK topics, and so on; such a rule could in theory be applied to some kinds of dictionary entry. Equinox 23:21, 7 October 2016 (UTC)
Should they be called Anglo-Saxophones? DCDuring TALK 23:31, 7 October 2016 (UTC)
Ideally there should be a way of creating an entry called, for example, "colour or color", so that there is no preference except for the order. "color" and "colour" should then both point to that. Usage of the spellings can be explained somewhere in the single article. Failing that, I believe that the person creating the entry should choose where to place the definitions, where the topic is not obviously nation-specific. I do not support making all headwords American spellings by policy since that will give the impression that Wiktionary is an American English dictionary. Mihia (talk) 00:10, 8 October 2016 (UTC)
Unfortunately, though, that doesn't work with Wiktionary's multingual aspect. There are entries for other languages at both color and colour. Andrew Sheedy (talk) 00:37, 8 October 2016 (UTC)
I agree that is an obstacle, but maybe there is some way around it. In any case, the present situation with colour and color, where all the content -- definitions, translations, and the rest of it -- is duplicated, is clearly ridiculous, in my opinion. Mihia (talk) 01:11, 8 October 2016 (UTC)
While this may sound crazy, what if we made a list of all words with a pondian spelling difference, and divided it evenly so that an equal number of American and British spellings would host the main entry? How we split them would probably be fairly arbitrary, but it would ensure that American and British spellings get equal treatment.
On a side note, perhaps we should change the wording of definitions of British/American spellings when they link to the other form of the word (rather than duplicating content). For example, the definition line of honour would be changed to: (British, Canadian and Irish, Australian, NZ, and South African) See honor for definitions. It would thus no longer be implied that one was just an alternate (and perhaps inferior) form of the other. Andrew Sheedy (talk) 01:22, 8 October 2016 (UTC)
Heh, then every time we added one single new word with a spelling difference, we'd have the same argument about what to do with it. Equinox 14:01, 8 October 2016 (UTC)
Haha, true, though I can't imagine there are many words with these sorts of spelling differences that we have yet to add (for English, anyway). Andrew Sheedy (talk) 17:10, 8 October 2016 (UTC)
@Equinox: In case you are wondering, the relevant page on Wikipedia is w:WP:ENGVAR. —Justin (koavf)TCM 13:56, 8 October 2016 (UTC)
I oppose any merge of definitions that is from a higher-frequency variant to a lower-frequency variant. That is where a contention arises since some people wanted that the variant that is the oldest in Wiktionary should be the main one, which can turn out to be the lower-frequency one. --Dan Polansky (talk) 16:06, 8 October 2016 (UTC)
My preference is that whoever adds the first form get to choose the primary form. That would probably produce about 50% left- and right-pondian main entries. SemperBlotto (talk) 16:12, 8 October 2016 (UTC)

Are misspellings lemmas?Edit

I was under the impression that they were not, and that misspellings should use {{head|xxx|misspelling}}. "Misspellings" is also listed under nonlemmas in Module:headword. We define lemma as "The canonical form of an inflected word", so is a misspelling canonical? DTLHS (talk) 01:11, 8 October 2016 (UTC)

A misspelling may be inflected; it can have a plural, verb conjugations, and so on. That makes them fit that definition, for the same reason alternative forms in general are considered lemmas. Furthermore, for some languages we show the inflections of words in the headword line, which necessitates the use of a headword-line template that places the entry in the lemmas category. —CodeCat 01:15, 8 October 2016 (UTC)
It should be removed from the list in Module:headword if so ("misspelling" isn't a part of speech anyway). DTLHS (talk) 01:19, 8 October 2016 (UTC)
It's only in the list because many entries already used it when the list was made, and I didn't want to flood Category:head tracking/unrecognized pos. —CodeCat 01:20, 8 October 2016 (UTC)
It may be possible to cleanup with bot. --Octahedron80 (talk) 01:30, 8 October 2016 (UTC)

I like the idea discussed above of redirecting all misspellings to the main entries when possible, to avoid any sites that use Wiktionary to consider misspellings such as "marshmellow" and whatnot as correctly spelled entries. --Daniel Carrero (talk) 01:54, 8 October 2016 (UTC)

A possible disadvantage of this (if I am correctly understanding "redirecting") is that people may not notice that they have been redirected, and, for hard-to-spot misspellings, may not become aware that what they originally typed was misspelled. Mihia (talk) 01:57, 8 October 2016 (UTC)
What about something like {{no entry}}? DTLHS (talk) 01:58, 8 October 2016 (UTC)
That sounds like a good idea to me. --Daniel Carrero (talk) 02:01, 8 October 2016 (UTC)
That doesn't address the concern that they should be lemmas, possibly with inflected forms listed. DTLHS (talk) 02:02, 8 October 2016 (UTC)
As a random example, we have the misspelling aqcuire but we don't have entries for aqcuired, aqcuiring and aqcuires... Should we? I don't see a lot of value in linking "acquire" to its misspelled conjugations, but that may be just me. Maybe we could create entries for all the misspelled conjugations and just link them to the correctly spelled conjugations, which sounds a great way to handle entries for misspelled conjugations (but I'm not saying that we should have them in the first place...) It's my opinion, at least. If possible, I would want Category:English lemmas without any misspellings, because it is our "index" in a way, and having blatantly wrong entries there seems harmful, to some extent. We can't expect everyone to check all entries individually when navigating, to make sure that they are not defined as "misspelling of" something. --Daniel Carrero (talk) 02:28, 8 October 2016 (UTC)
Yes. Misspellings are, and must be treated as, second-class citizens, or we will descend into farce. Equinox 09:30, 8 October 2016 (UTC)
Misspelling are lemmas in that they are not inflected forms. The word "canonical" in the definition of lemma is misleading; it is "canonical" only in that it is the form chosen to be in the dictionary whereas the other forms are absent from a traditional dictionary. Traditional dictionaries focus on words as lexemes, not words as inflected word forms, and for the purpose, they do pick a favor with one form type that they declare to be the "lemma". Misspellings are second-class citizens in that they are declared as misspellings, and do not contain their own definitions proper. Soft redirects such as that in aqcuire, with definition line # {{misspelling of|acquire|lang=en}}, have been the usual practice and make sense to me. --Dan Polansky (talk) 16:16, 8 October 2016 (UTC)

Non-lemmas for misspellingsEdit

e.g. we have a misspelling entry "yooman" for "human", or "digg" for "dig"; we should not allow inflections like "yoomans" or "diggs". Isn't that a voted policy? I can't find it. I thought we had to use (head|en|misspelling), and not allow inflected misspelling entries. But CodeCat just reverted me here: [5]. If what I describe isn't policy, and I'm mistaken, then it probably should be. Equinox 09:29, 8 October 2016 (UTC)

Just noticed the thread above appears to deal with the same topic... Equinox 09:30, 8 October 2016 (UTC)
I can assure you that disallowing entries for misspelled inflections is not a voted policy, at least not yet. If you visit WT:VTIME, click "Show other boxes" and do a Ctrl+F for "spell", you are going to find some voted policies about misspellings, but this is not one of them. (but maybe you already did what I said)
Should we disallow all misspelled inflections? Surely there are some common misspelled inflections that we would want to keep? I just dislike the idea recently proposed of having to create separate entries for inflections and conjugations of every "lemma" misspelling. Just because we have aqcuire, it does not mean that we should automatically create aqcuires, aqcuired and aqcuiring. --Daniel Carrero (talk) 12:31, 8 October 2016 (UTC)
Indeed, #Are misspellings lemmas? above is for a similar topic. --Dan Polansky (talk) 16:24, 8 October 2016 (UTC)

Extend Description vote?Edit

I'm repeating what I said in the vote talk page. Wiktionary:Votes/2016-08/Description is going to end in 2 days, and currently has only 8 participants (5-3-0). Maybe we should extend it by 1 month?

Currently, the vote would fail. If 1 more people supported it, it would pass. Either way, this small turnout is not a great indicator of consensus. --Daniel Carrero (talk) 11:52, 8 October 2016 (UTC)

Perhaps the low turnout is an indication that folks aren't that interested. Or perhaps the mistake is to have votes during Summer in the northern hemisphere. DCDuring TALK 12:03, 8 October 2016 (UTC)
That does not really answer my question, but I remember that last time in August 2016, you opposed extending a vote. Maybe it's understandable if people are disinterested with this vote, because it basically only affects some Translingual symbols. In my experience from previous years, I did not notice any big change in turnout depending on the season of the year, but in any case it's not summer yet. --Daniel Carrero (talk) 12:14, 8 October 2016 (UTC)
I have noticed a lower participation during Summer. Also early Fall sometimes seems to lead to lower participation by the educators among us.
Votes about trivial matters will not get much participation. DCDuring TALK 13:45, 8 October 2016 (UTC)
Thanks for the information. Don't you agree that this proposal required a vote? I don't think that we can introduce a new heading without a vote. --Daniel Carrero (talk) 13:50, 8 October 2016 (UTC)
I support extension of the vote since the participation has been not so great so far and the result is near to pass. An extension opens the proposal to greater scrutiny, and the threat of result picking or "fishing" for results is greatly overrated, IMHO. --Dan Polansky (talk) 16:27, 8 October 2016 (UTC)

Derived terms voteEdit

Based on Wiktionary:Beer parlour/2016/September#{{bor}} and {{inh}} should also categorize into "Foo terms derived from Bar", I created Wiktionary:Votes/2016-10/Populating "derived from" categories with borrowed and inherited terms. --Daniel Carrero (talk) 12:00, 8 October 2016 (UTC)

  • Wouldn't there be something to be learned from low participation in other votes? DCDuring TALK 12:04, 8 October 2016 (UTC)
What's your point? --Daniel Carrero (talk) 12:06, 8 October 2016 (UTC)
Votes on trivial matters leads to low participation in all votes. Too many votes on trivial matters is likely to lead to less willingness to take the trouble to make an informed decision on subsequent votes. DCDuring TALK 13:49, 8 October 2016 (UTC)
How many of the current votes in the watchlist box are on trivial matters? --Daniel Carrero (talk) 13:59, 8 October 2016 (UTC)
Not every BP discussion needs to be turned into a formal vote. We can do a poll in the BP or just assess the consensus from the discussion itself. --WikiTiki89 15:52, 10 October 2016 (UTC)
@Daniel Carrero: Let's make another rule of thumb: Only create votes for things if the desire for a vote has already been expressed in the discussion by at least a few editors. --WikiTiki89 16:02, 10 October 2016 (UTC)
Why is that needed, even as a rule of thumb? --Daniel Carrero (talk) 16:15, 10 October 2016 (UTC)
Because if no one's asking for a vote, then no one wants a vote. You'd think that would be common sense, but you don't seem to get that. So I'm stating it explicitly for you as a guideline. Don't forget that the only reason you went on a vote-creating spree for our policy pages is because we already had discussions where these votes were requested (we were trying to approve a huge draft bit by bit). But you seem to have improperly taken that momentum to other issues. --WikiTiki89 16:18, 10 October 2016 (UTC)
What you said did cross my mind before you explained it further, but I disagree with you and I may find it a bit difficult to state my case if you just give your rule of thumb like you are obviously right. Maybe I could have just said: "no, thanks"?
I created a lot of votes to edit WT:EL because EL was garbage 1 year ago and votes are required to edit the policy. I just want Wiktionary:Entry layout to reflect reality. I don't create votes suggesting new policies as much as I create votes attempting to formalize what we already do, which barely needs a discussion, in my opinion. --Daniel Carrero (talk) 17:57, 10 October 2016 (UTC)
What I meant, is that it was decided that we wanted to overhaul the policy pages and that we needed to create a lot of votes for that. And then you created the votes. Now, you're creating votes for things that we never decided we needed votes for. "I don't create votes suggesting new policies as much as I create votes attempting to formalize what we already do", really? What is this vote that you just created in this discussion? --WikiTiki89 18:05, 10 October 2016 (UTC)
I'll reply your last question, but let me get to an important point first: I'm fine with withdrawing the current "Derived terms" vote and just implementing that categorization change if you want. I just re-checked the discussion to make sure if there's a consensus... By my count, there are 6 supports, 3 opposes and 1 abstention. I am counting myself as a support. If that discussion were a formal vote, and thus requiring a 2/3 majority, it would barely pass (and could fail if more people voted). My plan was to proceed with the vote to make sure there's a consensus here, but I don't care about it anymore.
We were discussing general guidelines about creating votes. You quoted my statement: "I don't create votes suggesting new policies as much as I create votes attempting to formalize what we already do". In 2016, I created 12 votes for new policies or practices and 25 votes attempting to formalize current practices (and 1 admin vote). I'm not counting unstarted, unfinished, and withdrawn votes. Even if the current "Derived terms" vote is about a new practice, my statement that you quoted was truthful. --Daniel Carrero (talk) 18:42, 10 October 2016 (UTC)
Like I said, most of your votes about formalizing current practices were sort-of pre-approved, which is fine. The problem is you've gone beyond the pre-approved area, where are "normal" practices are that we create a vote when we decide that we want to create a vote. One editor should not just go and spontaneously create a vote. By the way, I'm not trying to be mean or anything, I'm just trying to help you stay in the good favor of the community, because very many of us are annoyed at all the votes. --WikiTiki89 19:04, 10 October 2016 (UTC)
Ok, that's nice of you, thank you. Are you annoyed at all the votes? Would you like me to withdraw the current "derived terms" vote and maybe edit the module to implement the discussed categorization change? --Daniel Carrero (talk) 19:10, 10 October 2016 (UTC)
I think the vote can be withdrawn. Whether or not to implement the change should continue to be discussed in the original BP discussion. --WikiTiki89 19:15, 10 October 2016 (UTC)
I withdrew the vote. --Daniel Carrero (talk) 19:20, 10 October 2016 (UTC)

"Famous bearers" section on names?Edit

Would anyone be interested in this? So for the name "Abraham", you might have "Abraham Lincoln", "Abraham Woodhull" and "Abraham Van Helsing (fictional) as famous bearers. UtherPendrogn (talk) 16:43, 8 October 2016 (UTC)

This seems like it might have more relevance on Wikipedia (which already has lists of this sort), as who had what names has no lexical significance. Andrew Sheedy (talk) 16:50, 8 October 2016 (UTC)
Some people don't have wikipedia entries though, or they have the wrong name (Alaric and not the correct gothic Alareiks). UtherPendrogn (talk) 16:58, 8 October 2016 (UTC)
True, but the thing is that we already mention famous people in name entries. Alaric, for instance, is already defined as a king of the Visgoths, and there's nothing stopping anyone from defining Alareiks ‎(Alareiks) as the same (or at least mentioning his name in the defnition line) in Gothic, provided it is attestable in that language. Andrew Sheedy (talk) 17:02, 8 October 2016 (UTC)
I don't think the reconstructed *𐌰𐌻𐌰𐍂𐌴𐌹𐌺𐍃 ‎(*alareiks), which is not attested in Gothic, is necessarily more 'correct' for English speakers than the accepted Latinization, Alaric. Anyway, I concur with Andrew Sheedy -- this seems like it would belong on Wikipedia, not mainspace Wiktionary. — Kleio (t · c) 19:49, 8 October 2016 (UTC)
The person not having a WP entry is an argument for creating a WP entry, not for adding anything here. Equinox 19:54, 8 October 2016 (UTC)
I think it’s important to mention the famous bearer in some cases. If I walk up to a random English-speaker on the street and say “Socrates was cool”, the person is expected to know that it refers to the Greek philosopher, not to any of many people called Socrates. — Ungoliant (falai) 13:01, 18 October 2016 (UTC)

Thinking of systematically adding missing pronunciations to English wordsEdit

I'm thinking of running a bot to clean up English word pronunciations and add missing pronunciations based on the CMU Pronouncing Dictionary. Some issues that I'd appreciate comment on:

  1. The dictionary says it's for "North American English". It does make the cot-caught distinction (thankfully) but doesn't make the Mary-merry-marry distinction. I think it's OK to tag it as "US" or "General American". Agreed?
  2. I'm thinking of having a bot make certain substitutions in accordance with how our standards dictate what to do (Appendix:English pronunciation). Example is /r/ -> /ɹ/. Another possibility is removing long marks from pronunciations specifically tagged as US or General American.
  3. I think syllable divisions should normally not be shown in English words because there's so often an ambiguity as to how to divide syllables. What do people think of having a bot remove them? Should I leave them alone?
  4. On the other hand, the dictionary indicates primary and secondary stress directly on vowels, when we probably need to indicate them on syllables. What should the rules be as to where to put the stress mark? This entails deciding how to divide syllables. It seems clear that VCV should be divided VˈCV but it gets trickier with VCCV. One possibility is to divide most VCCV as VCˈCV but divide VˈClC and VˈCɹV as long as the Cl and Cɹ are possible syllable onsets (which excludes e.g. /dl/, /tl/, /ʃl/, /sɹ/, /nl/, /mɹ/ etc.). For VCCCV and VCCCCV a decision will probably have to be made based on what are possible syllable onsets, but what about cases like VCsCV, e.g. /pɑɹsli/ and /ɛkstɹǝ/? My instinct is to divide them /pɑɹ.sli/ and /ɛk.stɹǝ/, i.e. put the /s/ with the following syllable as long as it's a possible onset (which excludes /sɹ/ in particular).
  5. The dictionary doesn't distinguish /ǝ/ and /ʌ/. I'm thinking we should use /ǝ/ in unstressed syllables, and /ʌ/ in syllables with primary or secondary stress. Reasonable?
  6. The dictionary doesn't distinguish /ɚ/, /ɝ/ and /ǝɹ/. One possibility is to use /ɝ/ in syllables with primary or secondary stress, /ǝɹ/ before vowels in syllables without stress, /ɚ/ not before vowels in syllables without stress. On the other hand IMO the distinction between /ɚ/ and /ɝ/ is largely spurious in GA; maybe we should just use /ɚ/ consistently. I do think we should distinguish /ǝɹ/ from /ɚ/; at least, aberration /ˌæbɚˈeɪʃən/ looks strange to me.
  7. /l̩ m̩ n̩/ or /əl əm ən/? The CMU dictionary writes /əl əm ən/ but I could convert them if necessary. Appendix:English pronunciation isn't clear about this.
  8. /ɪ/ vs. /ə/ in unstressed syllables: The CMU dictionary does make this distinction but I can't tell if it's consistent or random (e.g. they write Abigail /ˈæbəˌgeɪl/ but Abilene /ˈæbɪˌlin/). They do write consistent /ə-/ for unstressed a- and consistent /ɪ-/ for unstressed e-. For words like recorded they give two pronunciations: /ɹəˈkɔɹdəd/ and /ɹɪˈkɔɹdɪd/. In my speech there's no obvious distinction between these two sounds in the vast majority of cases (excepting certain cases like Rosa's vs. roses which are clearly different), and I kind of doubt that this distinction is salient in General American (witness e.g. the vast confusion between affect and effect). My instinct however is just to keep whatever they have.

Benwing2 (talk) 21:02, 8 October 2016 (UTC)

Systematically adding pronunciations would be nice, but cleaning up existing ones is fraught with risks. No single source can adequately account for all of the variation, so you might end up regularizing away legitimate variants that would be better glossed as such rather than eliminated. Also, it may not be enough to know that a pronunciation is wrong without knowing what it's an error for. As for using general rules: some would probably work, but there's too much that depends on things like morpheme boundaries for me to feel safe in general about running the modification part on autopilot. English is such a dynamic, multifaceted phenomenon that talking about "correcting" things makes me nervous. Chuck Entz (talk) 21:52, 8 October 2016 (UTC)
I agree that the bot should only add missing pronunciations, not attempt to change existing ones. As to your questions: (1) I'd tag it "General American"; "US" is ambiguous and should be avoided (not all US accents are GenAm). (2) Yes, have the bot follow our conventions for representing GenAm. (3) I've long been opposed to indicating syllable boundaries in English, but I think I'm in the minority here. (4) Maximize the onsets of stressed syllables when indicating stress placement. (5) I agree; /ʌ/ in primarily and secondarily stressed syllables and /ə/ elsewhere. (6) I support /ɝ/ in primarily and secondarily stressed syllables, /ɚ/ in unstressed syllables, and /əɹ/ in unstressed syllables before a vowel. All three variants are illustrated in murderer: /ˈmɝdəɹɚ/. (7) I'd follow Kenyon and Knott here: /l̩/ after all consonants; /n̩/ after alveolar consonants, otherwise /ən/; /əm/ everywhere. Thus /ˈkækl̩/, /ˈbʌtn̩/, /ˈtʃɪkən/, /ˈɹɪðəm/. (8) I'd let the bot just follow CMUPD here; individual entries can be cleaned up later as necssary. —Aɴɢʀ (talk) 22:16, 8 October 2016 (UTC)
Are /ɚ/ and /ʌ/ actually used? Korn [kʰũːɘ̃n] (talk) 22:19, 8 October 2016 (UTC)
Yes, of course they are. —Aɴɢʀ (talk) 22:28, 8 October 2016 (UTC)
It's not that of course, really. People keep using /u/ for the US, but it's shifted so thoroughly to /ʉ/ that [u] is basically used as a marker for non-native accents in media. ps.: Oh, I made a typo above. I meant [ɝ]. pps.: I'm retracting my example with [u], I remembered some people using it. Korn [kʰũːɘ̃n] (talk) 23:08, 8 October 2016 (UTC)
What's extremely rare outside of Wiktionary is using /ɹ/ rather than /r/ for the English r-phoneme. Probably most reference works that render American English in IPA use /ɜr/ for the nurse vowel and /ər/ for the letter vowel, but /ɝ ɚ/ do have some usage as well (e.g. Kenyon & Knott, PEAS). —Aɴɢʀ (talk) 10:22, 9 October 2016 (UTC)
@Angr In general I think your suggestions are fine. When you say "maximal onset" do you simply mean that anything that can be an initial cluster should get grouped in the following rather than preceding syllable? Benwing2 (talk) 16:53, 9 October 2016 (UTC)
Yes; anything that can be a word-initial onset cluster can be a stressed syllable onset cluster. —Aɴɢʀ (talk) 16:56, 9 October 2016 (UTC)
Regarding 8: The lack of distinction between unstressed /ɪ/ and /ə/ is known as the weak vowel merger. According to the Wikipedia article, it's very common in General American (and I think it's probably common in Canadian English as well). Unfortunately, many supposedly US transcriptions on Wikipedia show the distinction... or maybe it's not that unfortunate, since some American accents do have the distinction (for instance, Southern American English, according to the article). Old-fashioned RP had the distinction, but I am not sure if modern RP really does; at the very least, the unstressed /ɪ/ is more centralized than in old-fashioned RP. You can here the old-fashioned vowels in some TV shows. I think one of the recent Miss Marples had a rather good old-fashioned RP accent. — Eru·tuon 22:23, 10 October 2016 (UTC)
BTW I think the Wikipedia article is wrong in that many GA speakers with the merger still distinguish Rosa's from roses. Benwing2 (talk) 00:12, 11 October 2016 (UTC)
That might be true. I don't think I distinguish the two vowels in most cases, but feel like Rosa's and roses are different, though only slightly so... then again, perhaps it's only self-deception, like my feeling that my cot and caught are different. — Eru·tuon 02:38, 11 October 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Angr I'm implementing your Kenyon and Knott rules for ən -> n̩ after alveolar and not before a vowel. Which of [tdszʃʒθðlnɹ] do they count as alveolar? All of them are coronals. Does this specifically refer to [tdszln] or do [sz] count as dental? (The only clear motivation I can see for making this alveolar/non-alveolar distinction is in /tn̩/, where the /t/ is glottalized instead of flapped; but this applies only to /t/, and applies also when a vowel follows the /n/.) Benwing2 (talk) 22:35, 15 October 2016 (UTC)

@Angr Also, By analogy with the above, I am assuming the /ɝ/ should -> /ɜɹ/ before a vowel, e.g. furry /ˈfɜɹi/, even though /ɜ/ doesn't otherwise occur. Benwing2 (talk) 22:42, 15 October 2016 (UTC)
Furthermore, what about /t͡ʃ/ vs. /tʃ/, /d͡ʒ/ vs. /dʒ/? Appendix:English pronunciation says the tie bars should be used, but none of the three example words chat, teach, nature actually use them (of the three example words joy, agile, age, the second and third use tie bars but the first doesn't). BTW I think this is something that a bot could reasonably clean up; similarly /r/ vs. /ɹ/. Benwing2 (talk) 22:53, 15 October 2016 (UTC)
Also, /ʍ/ vs. /hw/? CMU uses /hw/ but this could easily be changed. Benwing2 (talk) 22:56, 15 October 2016 (UTC)
How about sewer /ˈsuɚ/ vs. /ˈsuəɹ/, and similarly for seer, rapier, steward and other such words with vowel + /ɚ/? Current pronunciations are inconsistent. Benwing2 (talk) 23:30, 15 October 2016 (UTC)
@Angr Here is a sample of words with their current IPA pronunciations per the code I've written: User:Benwing2/cmudict-sample If you could look over these words (there are 250 of them) and let me know if you see any issues, I'd be grateful. Keep in mind there are some inevitable issues stemming from the algorithm used to determine syllable boundaries, and other issues that reflect potential errors in the source corpus. Benwing2 (talk) 23:37, 15 October 2016 (UTC)
(1) K&K don't use /l̩/ or /n̩/ after /ɹ/. I'd forgotten about that until I saw /ˈtʃæɹl̩/ on your test page. But they write /ˈbæɹəl/ and /ˈbæɹən/ with shwas, and that feels right to my intuitions as well. They use /n̩/ after /t d s z/ but /ən/ after /ʃ ʒ tʃ dʒ θ ð l n ɹ/, and I largely agree (though I'm not adverse to Ethan /ˈiθn̩/ and heathen /ˈhiðn̩/ either). (2) I'm inclined to write /fɝi/ for furry since /ɝ/ is always stressed and doesn't lose its r-coloring before a vowel. That's different from ephemeral /iˈfɛməɹəl/ where I feel like the first shwa really isn't r-colored. (3) I have no opinion on the use of tie bars on affricates. I don't usually use them myself for English, but I usually do for other languages; I can't justify why. I certainly don't object to them. (4) I generally prefer /hw/, not least because it makes it easier to combine merging and contrasting accents by writing /(h)w/. (5) I'd use /ɚ/ after a vowel. —Aɴɢʀ (talk) 07:23, 16 October 2016 (UTC)

Possible future vote about deleting all programming language symbolsEdit

When there are fewer votes in the list, I'm thinking of creating a vote with the proposal "deleting all programming language symbols", to see if there's any consensus for the idea... I was hoping to be able to create new entries for APL symbols and so on, but a few RFDs are being created to delete some symbols. Even if they pass, this leaves us with other symbols kept (for now) and others deleted. I'm curious to see if there are people who would support nuking all the symbols. If not, are we sure where to draw the line? The argument "these symbols are not used in any natural language" could also be used against math symbols and chemical formulae, if someone wishes to do so. --Daniel Carrero (talk) 22:29, 8 October 2016 (UTC)

Mathematical and chemical symbols are used to communicate concepts to human beings. Computer languages are used to control the actions of computers. Big difference. Chuck Entz (talk) 22:51, 8 October 2016 (UTC)
Discussions show whether there is consensus. Don't create a vote just to test things. We have too many votes going on already, as has been pointed out before. Equinox 22:53, 8 October 2016 (UTC)
Is the consensus deleting all programming language symbols? --Daniel Carrero (talk) 22:58, 8 October 2016 (UTC)
I had conversations in JavaScript with a Russian friend when explaining something in English was just too cumbersome. I'm pretty on the fence of this topic. Can I hear some arguments for why we should limit ourselves to communication with other humans (and pets (not rocks))? Korn [kʰũːɘ̃n] (talk) 23:03, 8 October 2016 (UTC)
@Daniel Carrero: You're missing Equinox's point. For the most part, you shouldn't be starting votes that have a low chance of passing just to prove that they lack consensus. Instead, votes should put the official seal on consensus that has been hammered out in discussion but has a large impact or is not 100% clear, and therefore needs to go to a vote. This would help reduce both the number of active votes and the ill will toward them a great deal. —Μετάknowledgediscuss/deeds 05:17, 9 October 2016 (UTC)
Sure, Metaknowledge. I'm not even saying I have to create a new vote about it... We can discuss it in a friendly way, to determine if a vote is necessary and see what decisions we can make just in this conversation. Specifically, about the votes that I've been creating... Is there ill will toward them? Do people agree I've been doing something wrong? How many of the current votes could be avoided and should not have been created? --Daniel Carrero (talk) 12:14, 9 October 2016 (UTC)
I contest the idea that programming languages are means of human-to-computer communication exclusively, maybe machine code, but higher level programming languages are designed and written for mutual understandability among coders, not just for ease of use. Crom daba (talk) 03:56, 9 October 2016 (UTC)
Maybe not exclusively, but that's their primary function. In the same vein, color-coding of electrical wires is used to make it easier for mutual understandability of their work among electricians, but I wouldn't expect a dictionary to tell me which color is positive and which color is ground. Chuck Entz (talk) 04:25, 9 October 2016 (UTC)
I suspect that source code is read by humans more often than it is read by computer - certainly for compiled languages. I would welcome all programming language entries in this wiki (my early COBOL entries were mostly deleted). SemperBlotto (talk) 06:54, 9 October 2016 (UTC)
Anything that isn't used in human language shouldn't be in the main namespace. [[http://]] isn't a word in any language, for example. Renard Migrant (talk) 12:39, 9 October 2016 (UTC)
Allow me to disagre with you: I'm under the impression that "http://" is a SOP of http and ://. I created the latter, some time ago. --Daniel Carrero (talk) 12:43, 9 October 2016 (UTC)
Thanks for the example. I'd delete :// unless it could be shown it was used in human language somehow. Renard Migrant (talk) 12:49, 9 October 2016 (UTC)
HTTP is an English initialism used in sentences. http within a URI is not English as such because of its context. A similar case: sealed is an English word, but the keyword sealed on a Java class is not English as such because of its context. Equinox 12:49, 9 October 2016 (UTC)
We define "@" as an e-mail delimiter. In my opinion, it seems right to keep both "@" and "://" for the same reasons. Does anyone disagree? --Daniel Carrero (talk) 12:52, 9 October 2016 (UTC)
One could equally argue that it seems right to delete them both for the same reason! Describing something as a delimiter is hardly defining it, anyway: "this is a delimiter" does not give us semantic information as a dictionary should. Equinox 12:55, 9 October 2016 (UTC)
If we formally decide to delete the e-mail sense from @, I assume it's very likely that anons will think we are a missing a sense by mistake and will want to re-add it. My point is that the e-mail delimiter is a big part of what the "@" is. You said now: "'this is a delimiter' does not give us semantic information", but I'm not sure what to make of it. This is not any random delimiter, it separates the e-mail username from the domain, which is a meaningful explanation. Concerning human language unrelated to computers, we have entries for punctuation marks: , . ! ( ) etc. -- and the space, the plainest delimiter of all: ] [. --Daniel Carrero (talk) 13:14, 9 October 2016 (UTC)
I take your point. @ in john.doe@examplemail.com, you could argue the @ isn't part of any language in this example but it also seems silly to delete it. A bit like 2+2=4, + and = aren't necessarily part of any language here. You could argue the meaning they convey is not linguistic and yet it seem ludicrous to delete them. Renard Migrant (talk) 21:03, 10 October 2016 (UTC)
Well, "@" as a general symbol of the Internet has entered the popular consciousness in a way that :// certainly hasn't. Equinox 17:23, 11 October 2016 (UTC)
As said in the RFD discussion, :// is actually a SOP of : (delimiter used after ftp, http, smtp...) and //, which already has a networking sense. --Daniel Carrero (talk) 17:30, 11 October 2016 (UTC)
Then pretend I said "in a way that : and // certainly haven't". Equinox 17:32, 11 October 2016 (UTC)
Sure. --Daniel Carrero (talk) 18:04, 11 October 2016 (UTC)

Looking for German speakers to add test cases to Module:de-IPA/testcasesEdit

I'm thinking of creating a module to generate German pronunciation from spelling, sometimes with the help of pronunciation respellings (e.g. Phonem probably has to be respelled something like Phoném to indicate the unexpected stress). I don't actually know whether this is reasonably possible for German but I assume it probably is in most cases. (Note that there's already a module of sorts in Module:de-IPA that purports to do this, but it's really horrible.)

I'm looking for knowledgeable German speakers to add test cases to Module:de-IPA/testcases. This is module code but don't be alarmed; adding test cases is very easy, just follow the examples.

The only special symbols so far I've created are:

  1. acute accent (e.g. á é í ó ú ä́ ö́ ǘ) for unexpected primary stress; expected stress should be on the first syllable except for certain prefixes like ge-, be-, ver-, etc.
  2. grave accent (e.g. à è ì ò ù ä̀ ö̀ ǜ) for secondary stress, unless it's somehow predictable
  3. slash to separate compounds joined together (e.g. Buch/stabe)

Note that there's already a module that does this quite well for French (Module:fr-pron), despite the vagaries of French spelling; see the test cases in Module:fr-pron/testcases. There's similarly a Russian pronunciation module Module:ru-pron, with test cases Module:ru-pron/testcases. Both of these I rewrote entirely based on early versions (respectively by User:Kc kennylau and User:Wyang). Benwing2 (talk) 23:09, 8 October 2016 (UTC)

This really should be preceeded by a thorough discussion of our German pronunciation practice, which I consider in dire need of overhaul, which is usually met with roaring silence. Korn [kʰũːɘ̃n] (talk) 00:09, 9 October 2016 (UTC)
Fine with me. I didn't realize that there was a problem. What do you think we should use instead? Benwing2 (talk) 02:37, 9 October 2016 (UTC)
Not a native speaker but I disagree with some test cases:
The final "r" being completely silent. It's light and I think /ɐ̯/ should be used for the sound, Uhr is a correct test case.
Qualität has no 2-ary stress.
Reichstag probably needs a respelling to make it obvious that -tag is long in this compound word. It worked for Auswahl, though. --Anatoli T. (обсудить/вклад) 05:16, 9 October 2016 (UTC)
  • Here are the things we need to discuss and find consensus on. I cannot answer most questions because they're southern issues.
  1. Which pronunciations do we put down? We need at least three national standards. (Germany has at least two standards, though.) Maybe more? And in which form? What do we define as standard? For a start, my definition is: "Standard are those features which are not avoided by speakers in the most formal setting." This includes for example ⟨Chemie⟩ /ʃeːmi/, ⟨wichtig⟩ /vɪxtɪk/. And I do not think the artificial language of newsreaders is a good representation of the varying different standards. That language mostly excludes /ʃeːmi, vɪxtɪk/.
  2. Where I'm from (north) and live (Berlin area), there is no /ə/ phoneme, unstressed vowels get deleted or are phonetically fully identical with /ɪ/ as [ɪ~ɘ], or even [ɪː] with northern accent (which is not considered standard). The merger itself absolutely standard, I wouldn't be surprised if a good deal of northerners would normally parse [ə] as /-ər/. Is this a negligible phænomenon of a specific (big) region or do we need to address it? Is there actually a different situation elsewhere?
  3. Unstressed vowels can surface as [ɛ] and [a] in Austria commonly, /ə/ is [e] in some parts of the south. Will Austrian dialects be represented with ⟨ə, ɐ⟩, ⟨ɛ, a⟩, both or a mixture, what about [e]? Korn [kʰũːɘ̃n] (talk) 08:02, 9 October 2016 (UTC)
  4. Switzerland is rhotic, Austria and Bavaria are facultatively rhotic, some parts of Germany are traditionally rhotic. Where and how do we represent what?
  5. Even German German is traditionally rhotic in some places where it is no longer general. /stark/ is [ʃtaʁk],[ʃtark] or [staːk]. Which to include? What about [ʃtaχk]?
  6. Austro-Bavarian does nor palatalise fricatives after liquids, so that /furxt/ is [fʊrxt] instead of e.g. [fʊɪçt], which is the local standard where I live. How standard is it? Do we represent it?
  7. Same question for /xs/ which is [ks] in Germany and [xs] in Austro-Bavarian.
  8. Also /x/ in general. Switzerland and at least parts of Austria only have one phone [x~χ], the north has three [ç, x, χ], Berlin only knows [ç, χ]. What to include?
  9. We need to discuss the shown vowel qualities of both lax high vowels and /a/, which might be tense high vowels and [ä~ɑ] respectively in the south. (Wikipedia says that [ɑ] is the realisation for Austrian standard, it should be the same for Switzerland.) /a/ is [a~ä] in Germany, which to pick?
  10. Further, the northern third of Germans has phonemic backness: /a, ar, aː/ are [a, aː, ɑː]. Ignore or include? But ⟨Maß⟩ and ⟨Mars⟩ are identifiable minimal pairs here, but as I'm told not in other regions.
  11. Is /aɐ̯/ actually used? Do we favour /aɐ̯/ or /aː/?
  12. The last consensus was not to show aspiration. I think this is a mistake.
  13. I would consider /pf-/ [pf-] and the presence of an /ɛː/ in most words to be nonstandard for here. I.e. if someone native to my area would use it, I would consider it so foreign to be wrong. /(p)f/ is easily represented. Do we double e.g. /ʃpɛːt/, /ʃpeːt/?
  14. What to use for /v/ which is not truly [v] in many regions?
  15. What to do about fortis/lenis? As far as I'm aware, in the south, lenis consonants are unaspirated voiceless [t, p, k] with tensed but unvibrating vocal cords, while fortis consonants are longer and more forcefully pronounced. I'm not a friend of /d̥, p̊, g̊/ for dogmatic reasons, but this is the least we can do.
  16. Consonant length instead of ambisyllabic consonants in the south in general. Having ⟨Katze⟩ as /kat.tse/ [kˑɑt̚tsV] is perfectly normal for Austria and Switzerland. If its absence is markedly foreign, we should include it. Cf. audio file at Mitte.
  17. Fortis-lenis levelling is said to be absent in Austria/Switzerland. (I only know it for Switzerland.)
  18. /r/ is [ʕ~ʁ~ʀ~ɾ~r~ɹ], which to use where? I'm strictly opposed to using [ʁ] in broad transcription.
  19. Beyond automatic additions, which regional non-standard forms are worth adding/allowing? Northern standard realises /v/ as [w] in several instances, e.g. /ʃvyːl/ is inavriably [ʃʷɥ͓yːl], /tsvo/ as [tswo]. Is there any point in prohibiting any pronunciation at all?
  20. Do we include a superdialectal broad form or one for each? I.e. /bitːər/ + [pitːər] (Austria), [bɪtʰɐ] (Germany) or /bitːər/ [pitːər] (Austria) + /bɪtər/ [bɪtʰɐ] (Germany)? Korn [kʰũːɘ̃n] (talk) 09:34, 9 October 2016 (UTC)
That's what I can think of off the top of my head. If we start adding stuff automatically, we might as well do the whole shebang and do it right. Korn [kʰũːɘ̃n] (talk) 08:02, 9 October 2016 (UTC)
ps.: A lot of motivation behind this, aspiration for example, is that I assume we want to represent a native pronunciation that people can emulate, and mix and match simply makes you sound like a foreigner. It's one thing to speak German with Bavarian features if you're a Bavarian, but if you're a Slovene who comes to Berlin and tries to adapt the local language while erroneously using non-regional features, people will not interpret it as Bavarian but as Slovene. It's like mixing Geordie and Texas. Sure, it's English, but if a dictionary wouldn't tell me the difference, I'd consider it a shoddy bad work. Korn [kʰũːɘ̃n] (talk) 08:12, 9 October 2016 (UTC)
Lots of what you're trying to represent is dialectal features. Fundamentally, though, we for the most part avoid doing this because it's a huge can of worms, as you've demonstrated. For British English, for example, we pick the most standard form (RP) and represent it, and don't even try to represent all the numerous dialects. Similarly for German, that would mean choosing the standard as usually found in most dictionaries, which is similar AFAIK to what we already have. As for things like aspiration, I agree we shouldn't show it because we're trying to represent a broad phonemic representation rather than phonetic detail that is more likely to be dialect-specific. We don't represent aspiration in English, for example. In general we should follow the lead of other dictionaries. If you want to show some dialectal renderings in addition to the standard, that is fine but we don't need to force ourselves to do that by default. Similarly, for example, we show "General American" pronunciation for American English even though that differs significantly from e.g. Southern or New York English. For French what we actually show is something like a 100-year-old Parisian standard that doesn't very accurately represent anyone's speech any more but is what is conventionally found in dictionaries. Benwing2 (talk) 15:53, 9 October 2016 (UTC)
Small thing first: We absolutely and commonly show aspiration for English. cat, water, take, and I don't think aspiration was ever contested for English. We just don't have many pages with narrow transcriptions.
And yes, this is about dialects because there is no non-dialectal German. The standard is not a thing, there are the standards and these need to be represented and we need to discuss how. To give the broadest æquivalent, we're normally only showing the German version of GenAm (Germany). We need to discuss whether we show it the right way and we need to discuss how we want to represent the æquivalent to RP (Austria) and GenAus (Switzerland). And if we don't show them, we're simply not that good a dictionary. The standard 'found in most dictionaries' is some form of representation of the standard of non-southern Germany, simply because that is the largest area by far. It has little relevance to people not living in that area and people in Austria often don't have a opinion of it either - not that this is relevant to us.
While lack of aspiration is non-standard in most of Germany, presence of aspiration might be non-standard in Austria and Switzerland. Those things are features speakers using either standard would want to get right in order not to sound less capable. I repeat that a mixture of standards shows a lack in proficiency. It is for that reason that I think the features demarking the standards need to be represented, if we want to produce something useful. And this is where my questions aim: 1. We need to stop being so stupidly German-centric. 2. We need to find out which features make an actual difference between the different standards. 3. We need to find out how to represent them. Korn [kʰũːɘ̃n] (talk) 17:03, 9 October 2016 (UTC)
Very well then, I am fine with "only showing the German version of GenAm". That's my exact point. We simply don't have the resources to consistently represent a large spectrum of dialectal variants. That's why you're getting the "silence". Are you volunteering to personally do all the work to get this done? If not, who's gonna do it? Remember that "the perfect is the enemy of the good". Benwing2 (talk) 20:08, 9 October 2016 (UTC)
I feel that we should stay broadly consistent with the Wikipedia page on Standard German phonology. As far as deviations from this (ideal) standard go, perhaps we could include pronunciations specific for certain cities that would implicitly represent the accent of a wider region. Crom daba (talk) 20:51, 9 October 2016 (UTC)
  • So you would be fine with not including RP for English? I'm not sure that's the right spirit. And we don't need to represent a 'large spectrum of dialectal variants'. There's four German speaking countries, amongst which there are 3-5 different standards. I think we won't break a leg by adding 2 broad ones with 2-3 narrow versions each. And adding German phonology is one of the things I wanted to do on Wiktionary, but every time I tried to make the entries a little less misleadingly generalising, @Kolmiel pops up, reverts my edits and barks at me to get consensus first. Which I can't keep because people can't be arsed to have a discussion. Catch-22, thanks very much. As for the Wikipedia article, according to Kolmiel, about half of its information is currently verboten. Also, who's going to do the work is perfectly irrelevant for making rules on what's to be included and what not. Certainly nobody's gonna do anything if it keeps getting reverted, innit. Korn [kʰũːɘ̃n] (talk) 22:58, 9 October 2016 (UTC)
I think our pronunciation practice is fine. Keeping it simple and intelligible to at least a mentionable minority of users. What is very wrong is to say that the "northern German standard" has no relevance to southern Germany, Austria, Switzerland. This is nonsense. Bavarian and even Austrian radio news are now commonly read in a distinctly northern accent, with -ig pronounced -ich and everything. Swiss television has even hired what must be German readers to speak over their reports. Apart from that, I guess Korn will have his anyway and everything will be messed up. Do what you want. I don't care that much. Kolmiel (talk) 23:08, 9 October 2016 (UTC)
It's not right to argue about dialectal differences before the basic stuff is implemented. Regional standards can be implemented later using the same module using phonetic respellings, something like "|phon=wichtik|reg=at" (just an example), if you want to use the Austrian pronunciation of wichtig. Get your priorities right, people and help Ben fix the module! --Anatoli T. (обсудить/вклад) 06:55, 10 October 2016 (UTC)
Regional standards cannot be implemented later, since, which is my point, this is implementing a regional standard. It's simply the biggest and the one exerting most influence because of its prevalence. Starting with one first is nothing I am opposed to. What I'm opposed to is not labeling it and having it come to pass without prior thorough discussion of its ambiguities. And, as in any discussion, excluding information - not by not entering it but by prohibiting its entering - without a proper case made for it. Korn [kʰũːɘ̃n] (talk) 12:17, 10 October 2016 (UTC)
Korn, I don't object to labeling the German standard as "German standard" or whatever. But you seem determined to gum up the process to the point that nothing gets done. How about, as Anatoli suggests, we try to implement this German standard rather than just arguing? It's easy to change the module at any point to e.g. use r instead of ʁ, and changing the test cases isn't hard either. We've done it plenty of times in Russian, for example. Benwing2 (talk) 13:50, 10 October 2016 (UTC)
@Korn: Has there been a similar system developed for German to the enPR/AHD? If there is or if you create one, we could that {{de-IPA}} go from the lemma to dePR. Then for any dialect whose transcription you'd like to add, we can have a function to create that dialect from the dePR representation. I think that before anything can be done in terms of broad dialectal coverage, someone needs to make some big tables of correspondences since your lists of changes are a bit overwhelming and difficult to understand. —JohnC5 14:39, 10 October 2016 (UTC)
The creation of exactly that kind of thing for German is what I wanted to initiate! I wanted people to discuss which variants to include and which values to put for the variables. I was assuming that Wiktionary had some person who'd know more about southern German than I do (which isn't that much) and that this would be a thing done with in no time. Korn [kʰũːɘ̃n] (talk) 17:46, 10 October 2016 (UTC)
Evidently you will have to be the one to do it, if you want it done; no one else appears to have the experience or interest. However, I really don't think this is necessary to get done before helping me create the test cases I've requested above. Benwing2 (talk) 18:28, 10 October 2016 (UTC)
You miss the point. I would have done this long time ago, but people kept removing my edits. The point is not so much that I want people to do anything, the point is more that I want some consensus that these things can be done. ps.: And of course it is relevant to decide what the expected results of test cases should be before test cases are added. I have no idea what you want your module to put out and I'm surprised, for example, that the module expects ⟨Quatsch⟩ to be [kvatʃ] instead of [kfatʃ]. Korn [kʰũːɘ̃n] (talk) 19:19, 10 October 2016 (UTC)
I don't know why people are removing your edits; they would seem fine to me. As for Quatsch, I simply copied the pronunciation found on that page; it should be fixed there if it's wrong. Benwing2 (talk) 19:31, 10 October 2016 (UTC)
And here the circle closes. Because this is exactly the kind discussion I wanted to have. This isn't wrong, this is one of two options ([kfatʃ]/[kʋatʃ]). And we must decide whether the expected result of the test case should be one, the other, or both. We can not not decide. Do you see my point now? We can't put any test cases without deciding one way or another what we want tested. Korn [kʰũːɘ̃n] ([[User [talk:Korn|talk]]) 21:14, 10 October 2016 (UTC)
@Korn You think you're getting your point across but you're not, instead of helping, you seem to be sabotaging, involuntarily. You can use test cases for STANDARD German (Northern, Germany-centric, Bühnendeutsch, whatever). Any deviations or variants could use phonetic respellings and regional labels. --Anatoli T. (обсудить/вклад) 21:35, 10 October 2016 (UTC)
I am obviously not getting my point across because not a single person has yet understood what I'm saying. Bear with me one more time, I'll try to make it absolutely clear. To repeat: 1. There is more than one standard. There is not the standard. 2. I am fine with starting out with one of the standards, and have it be the northern one. 3. EVEN THEN the northern standard sometimes has multiple valid realisations. MULTIPLE. Not 'one and some deviations', multiple. equally. valid. standard. realisations. of. equal. standardness. To give another English analogy: If I were to say: 'Received Pronunciation realises ⟨ol⟩ in one of two ways, [əʊl] and [ɔʊl], which one do we use?' - And all I'm hearing instead of a reply is: 'Nobody cares about this, just put the RECEIVED PRONUNCIATION.' If you don't want to pick one form, what do you want to do? Put a picture of a dancing elephant instead of letters? Korn [kʰũːɘ̃n] (talk) 21:50, 10 October 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Just choose ONE standard, choose a label for it. You do realise that showing multiple realisations may be difficult, right? Consider this simple example of two standard pronunciations of the same word

  • ähneln (IPA follows)
  • ähneln (phonetic respelling: ehneln) (IPA follows)

Benwing will come up with some tricks, I'm sure.--Anatoli T. (обсудить/вклад) 22:01, 10 October 2016 (UTC)

Do you notice how 'just pick one' is the first actual reaction to the point I get AT ALL? How this is actually some form of guideline or method that allows to actually enter test cases with any form of consensus? How this is a jolly lot more than refusing the very question? Korn [kʰũːɘ̃n] (talk) 22:07, 10 October 2016 (UTC)
It seems like you're downplaying the pre-eminence of the Northern standard (known elsewhere as Standard German), it's the "artificial language of newsreaders" that dictionaries, learning materials and radio broadcasts use. Dialects may be used in the most formal of settings, but as evidenced by your 20 points, they are not standardised. Crom daba (talk) 22:13, 10 October 2016 (UTC)
Newsreaders are using [ʃtaʁk], [ʃtark] and [ʃtaːk] with equal standing, depending on nothing else but personal inclination. So are news readers speaking a non-standardised dialect now? Should we do away with the whole concept of German pronunciation because it isn't regulated to the last iota and replace it with dancing elephants? Or should we just be productive for a fraction of second, pick one, or give a guideline for the editor to pick, and get on with the module? Which this discussion does not seem capable of at all. This issue is not nearly as big as to merit this discussion in the slightest. This is ludicrously blown out of proportions. I would have just gone and entered something by now, but, and this is the critical point, there's this history of something being removed if I cannot point at some form of consensus. Which apparently just cannot be achieved because people for some reason keep thinking I'm discussing in order to enter medieval Cimbrian as standard German. Korn [kʰũːɘ̃n] (talk) 22:29, 10 October 2016 (UTC)
Do you think this situation is unique for German? The standard German phonology is described many times and is used in many places. Dictionary publishers won't get stuck on variations. If I were like you, we would still be arguing, if the Russian часы́ ‎(časý) should also include the pronunciation [t͡ɕɐˈsɨ] as many native speakers, including educated ones or even newsreaders, would choose to pronounce, especially from the South or from Ukraine. The standard and the most common Russian pronunciation of the word is [t͡ɕɪˈsɨ]. A suggestion: just use [ʃtaʁk] for stark but be consistent about it! BTW, you even have a chance to choose your favourite accent, being an active German editor and thus - one of the few dictionary makers. We'll have to trust your judgement but someone may dispute your decision (which is also OK!) but we need to have something happening. --Anatoli T. (обсудить/вклад) 23:49, 10 October 2016 (UTC)
Korn, we all do want to get on with the module. I'm still not sure which changes of yours are getting reverted. I think the number of actual issues involved in rendering standard Northern German will be small, and in doubt, err on the side of being more conservative. Hence use [ʃtaʁk] not [ʃtaːk], and maybe [kv-] instead of [kf-]. But overall these are minor issues, not the major issues that are involved in trying to handle multiple dialects. In many cases I think people will defer to your judgment -- similarly, I've largely deferred to Anatoli's judgment because he's a native Russian speaker and has very good linguistic intuitions. Occasionally we have had discussions over how much detail to render or how to handle things like palatalized affricates, but these have not been grave. Benwing2 (talk) 00:05, 11 October 2016 (UTC)
Yes. Yes. These are minor issues. Which is exactly why it's so frustrating that it takes a page-long conversation before somebody says something productive like 'just pick one' or 'in doubt use the more conservative one'. This is exactly the kind of thing I asked about and you (plural) could have said: 'I get that there's pluricentricity, but foreigners abroad are taught German German, so I think it's best if we start with the biggest variety of that. Put something you feel adæquate for now and if there's variation in there, just use the more conservative one', right after I posted my 20 points and we would have been done with this two days ago. Anatoli, with all due respect, which I do have for you, all of you, the reason that dictionary publishers don't get stuck on variations is probably that they just decide for one instead of making a hub-bub about the raising of the question. And that in all other publications, pages don't get torn out by random passer-bys who disagree with that choice. I think we're all somewhat annoyed by this kerfuffle, I hope this isn't dampening anyone's good spirits. Korn [kʰũːɘ̃n] (talk) 07:49, 11 October 2016 (UTC)

Which constructed languages belong in mainspace?Edit

The status quo: CFI allows for Esperanto, Ido, Interlingua, Interlingue (Occidental), Lojban, Novial, and Volapük to be in mainspace; all other constructed languages must have their entries in appendices.

The problem: I suspect some of them have so little material that nearly all our entries would fail RFV. (In fact, Klingon is not allowed in mainspace, but its durably archived corpus probably rivals that of Lojban, for example.)

Proposed solution: Constructed languages with small corpora shouldn't be in mainspace. Esperanto without a doubt, and probably Ido and Volapük as well should stay. The others should be moved to Appendix space like Toki Pona and Quenya. I would like to get some feedback before I start a vote on this. —Μετάknowledgediscuss/deeds 21:12, 9 October 2016 (UTC)

I agree that Esperanto, Ido, and Volapük should probably stay. For the other languages, I share the concern that the corpora may be too small. I'm not very familiar with Interlingua, Interlingue, or Novial, but for Lojban, very common words like .i and la appear to be citeable from Usenet, but I can't seem to cite an ordinary word like muvdu. —Mr. Granger (talkcontribs) 15:47, 10 October 2016 (UTC)
It wouldn't bother me to have only a dozen or so entries for one language, if that's all that can be cited. The problem is creation of entries based on dictionaries or whatever other criteria when the words don't meet CFI. A bit like users who enter a load of -phobia entries, I see these as about the same. Renard Migrant (talk) 15:58, 10 October 2016 (UTC)
So should we have a policy that words from these languages can be deleted on sight if no citations are provided? DTLHS (talk) 16:04, 10 October 2016 (UTC)
My own view is that we shouldn't have languages in appendices at all. Either they should be in mainspace or nowhere at all. —CodeCat 17:08, 10 October 2016 (UTC)
Why? —Μετάknowledgediscuss/deeds 19:55, 10 October 2016 (UTC)
We have a dozen or more natural languages from which only 1-3 words meet CFI, so there is no inherent problem in having languages which have only a few entries after the razor of CFI has been used to shave them, as Renard says. But I agree with Metaknowledge that it makes little sense to allow some sparsely-attested artificial languages in the mainspace but not others. - -sche (discuss) 04:25, 20 October 2016 (UTC)
Comment, because Klingon was mentioned: in previous discussions, it has been noted that Wiktionary cannot include many words in Klingon, Dothraki, and some other languages (Quenya?) without running into legal/copyright problems. - -sche (discuss) 04:25, 20 October 2016 (UTC)

Why we don't need durable citationsEdit

It's generally asserted that durable citations are required to pass RfV. I think this requirement is needlessly bureaucratic and should be abolished. While it might be preferable to have durable citations, I don't think it's imperative. Here's why:

  1. In today's world, a greater percentage of content is on not-necessarily-durable website rather than print media or durable websites.
  2. Compare to books. We don't link to every book we have, and we don't unfrequently cite rare books where Joe Avg is unlikely to ever obtain a copy.
  3. Some things don't stay up forever, but many things do. Instead of assuming that something will eventually disappear, we could assume that it won't (FWIW, Wikipedia makes the latter assumption; it doesn't require durable references)
  4. If the quote that uses the word in one or more sentences is already on Wiktionary, do we necessarily need a link to anything anyway? Purplebackpack89 05:14, 10 October 2016 (UTC)
We should expand what is considered as "durable", but not reject it. We still need quotes that we can check up on later, and won't simply disappear (as many of Wikipedia's links do). I can think of times when someone misread a source and only later, by looking at the same source, could we determine what it actually said for the purpose of citing it. —Μετάknowledgediscuss/deeds 05:28, 10 October 2016 (UTC)
We can't just accept any words from random websites or else there's some danger that we will get flooded with things that would otherwise go on AP:LOP. You said elsewhere: "[Acela Republican is] used a lot on the Internet, and, since it's a word that postdates the decline of print media, that oughta be good enough." but by your own count, there are 744 Google hits for that term, which I consider a small number. Only 744 hits seems indicative of a minor trend in using that term. Do you have any specific criteria in mind? I wonder if all English words that are used (not just mentioned) on more than, say, 10,000 websites are likely to appear in books anyway. --Daniel Carrero (talk) 05:34, 10 October 2016 (UTC)
744 may seem like a small number compared to 10,000, but it's a big number compared to 3, which is the floor for number of durable citations we need. If something appeared in 20 or 30 non-durable websites, I think we ought to consider having it. As for your concern about protologisms, those tend to be weeded out more by the at-least-a-year rule than by the durability rule. Purplebackpack89 05:38, 10 October 2016 (UTC)
Sorry, I understand your point but I don't find it very convincing. You may wish to create a vote with the proposal "include words that have 700+ uses on the internet" and see if many people support it. But in my opinion 1 durable citation is worth at least 10,000 non-durable citations... Actually, you may ignore my arbitrary math, I just meant that non-durable citations are really worthless (again, IMO) except maybe in very large numbers. --Daniel Carrero (talk) 07:20, 10 October 2016 (UTC)
10,000 to 1? Really? Why do you think non-durable citations are so worthless? Is it reliability? There are plenty of websites that most would consider "reliable" that aren't durable. By contrast, some of the durable things we have aren't necessarily all that reliable. Purplebackpack89 17:53, 10 October 2016 (UTC)
Because if something is written in 3 books (especially 3 books easily accessible through Google Books), chances are the attested word is "set in stone" -- we won't have to worry about it and we won't have any trouble verifying it and reading the page for further context that may not be available in the quotation. If something is attested from a site that might disappear tomorrow, chances are we are going to have a headache when checking the quotation and attesting the term. But feel free to convince me otherwise if you want. --Daniel Carrero (talk) 18:05, 10 October 2016 (UTC)
But if something is common enough as to have 700+ uses on the internet, surely citing would be as easy as choosing another new source from the 700+ were a previous source to disappear? —suzukaze (tc) 20:28, 10 October 2016 (UTC)
Sure, but that introduces an endless maintenance need that maybe can be avoided for most words. If a word is used so often on the internet that Wiktionary arguably needs to include it, what are the chances that the word already exists, too, in books and other durably archived sources? Do you know any specific words that Wiktionary might include by citing websites that can't be included by citing durably archived sources? --Daniel Carrero (talk) 20:34, 10 October 2016 (UTC)
Minority languages with obscure publications, conlangs, internet slang no one in their right mind would publish, etc. This is the phrase "to speak Teochew" in the Teochew dialect of Min Nan Chinese. It only has four pages of Google results. —suzukaze (tc) 20:37, 10 October 2016 (UTC)
These are good examples, thanks. They are about the phrasebook, though. I believe there's no doubt that these sentences can be composed, so attestation is less important for them than for most single words. We could have a new policy like "allowing entries for translations of all accepted English phrasebook entries, in all languages, even if said translations have 0 Google hits." --Daniel Carrero (talk) 20:48, 10 October 2016 (UTC)
BTW, the "hot word" policy is relevant in a lot of cases. Equinox 15:48, 10 October 2016 (UTC)
I think there are loads of revisions we could potentially make here. For one, in actual fact WT:CFI doesn't say anything about copy citations up into entries. It's just assumed that that's what we do. But as someone once put it, nothing is specified, so nothing happens. CFI doesn't say you have to copy up the citations, so you don't have to. Of course best practice is to copy them up because then everyone can see them, but with hundreds of items tagged with rfv at a time, I'm certainly happy for anything that's definitely citable to go without the citations being copied up (citations yes, citations copied up, no). Renard Migrant (talk) 15:56, 10 October 2016 (UTC)
Nobody's really ever wanted to define durably, and I think we should. As someone said, what about taking screenshots of websites and adding them to Commons? Commons is likely to last as long as this wiki is, why shouldn't that count? Renard Migrant (talk) 16:05, 10 October 2016 (UTC)
Commons itself might last, but the image theoretically might be deleted. But I do think that's actually a good idea. The more realistic issues are that images can be tampered with, and screenshots especially (in fact Google Chrome allows you to actually change the content of any web page right in your browser, maybe other browsers have this feature as well). And if we start including anything from the internet, we'd have to come up with more rules to limit typos and misspellings and other nonsense. And the three different authors rule would be difficult to follow if the authors are anonymous. --WikiTiki89 16:16, 10 October 2016 (UTC)
Maybe we can upload screenshots to enwikt, but not to Commons. I believe our current quotations and possible future screenshots are fair use, which Commons does not accept. If the website has a CC license or is in public domain, Commons might accept it. --Daniel Carrero (talk) 17:38, 10 October 2016 (UTC)
I see those options as essentially equivalent. --WikiTiki89 18:03, 10 October 2016 (UTC)
Why even go that far? If we have the quote of it being used in a sentence and we put that on Wiktionary, shouldn't that be good enough? Purplebackpack89 17:53, 10 October 2016 (UTC)
Copying errors. --WikiTiki89 18:03, 10 October 2016 (UTC)
Because it's not durably archived. Someone might want to check the original for any of several reasons, and it may not be there. You take a screenshot and there's something to check. Renard Migrant (talk) 19:43, 10 October 2016 (UTC)
IMO, copying errors is too narrow a reason to dictate our entire attestation policy. Purplebackpack89 20:17, 10 October 2016 (UTC)
What about having multiple editors verify the text of a quote before accepting it? Even if it changes we have the affirmation of multiple people of what the text used to say. —suzukaze (tc) 20:29, 10 October 2016 (UTC)
That does not sound too good. Just imagine if all our current quotations required to be checked by other people. It's hard enough to "finish Wiktionary" and ideally attest all senses as it is. If there are any copying errors, they are going to be found eventually, if the website does not vanish first. --Daniel Carrero (talk) 20:51, 10 October 2016 (UTC)
I suspect the reason why we have this policy is to avoid drowning in Internet ephemera, it's not a perfect filter, but removing it would necessitate creating more policies regarding exactly what kinds of words we want and how to define them, which would increase rules lawyering and decrease the quality of the dictionary. Crom daba (talk) 21:14, 10 October 2016 (UTC)
It stops people coining words on their on social media account then citing those social media accounts as sources. Renard Migrant (talk) 11:06, 11 October 2016 (UTC)
People are already able to do that through Usenet, aren't they? --Daniel Carrero (talk) 11:33, 11 October 2016 (UTC)
I think we're kidding ourselves into think that durable citations keeps only bad words out and keeping only good words in. Purplebackpack89 18:27, 11 October 2016 (UTC)
Does anyone actually think that, though? Renard Migrant (talk) 20:56, 11 October 2016 (UTC)

Words needing citations from the internetEdit

After some thought, I decided to support getting citations from the internet. English already has a social-media-ey place called Usenet from which we are allowed to get citations of internet slang and a number of random neologisms, but there are not a lot of Portuguese-speaking folks on the Usenet, so our coverage of modern Portuguese terms may be not as good. Maybe other languages too, I don't know.

Here are some Portuguese words that seem to be common enough on the internet but I was unable to find 3 citations for them on Google Books.

  • cospobre -- a very poorly done cosplay
  • flopar -- to fail
  • nerfar -- to nerf (video game sense)
  • Olindar -- to spend time in Olinda, Pernambuco
  • qnd -- abbreviation of "quando" (when)
  • shippar -- to ship (fictional character relationship sense)
  • SQN -- abbreviation of "só que não" ("only not"), added at the end of a sentence
  • trisal -- polyamorous relationship consisting of three people
  • upar -- to upload


  • exactly 1 trillion and a half emoticons can be attested from websites, if nobody minds

--Daniel Carrero (talk) 17:54, 11 October 2016 (UTC)

  • Usenet is considered durably archived because multiple sites, financially stable, with a long history, actually have the archives. It is not a precedent for proprietary social networking sites, which may disappear or become inaccessible when the owner does or when the owner is in a tight financial bind. It would be better to look for what would get the multiple institutions with Usenet archives to add other classes of text data. DCDuring TALK 18:05, 11 October 2016 (UTC)
  • Apparently, the Library of Congress is/was preserving an archive of Twitter: [6]. The OED occasionally cites tweets. Equinox 18:07, 11 October 2016 (UTC)
    The US Library of Congress ("LoC") was to be the recipient of a donation by Twitter of a few years of public Twitter postings. BYU also thought they would have that corpus among their offerings. BYU no longer mentions Twitter and I haven't found any discussion on the LoC site of the Twitter data. The discussion seems to have bogged down. They did not seem too eager to support unlimited access. DCDuring TALK 00:18, 12 October 2016 (UTC)
    Then so should we.
  • Another thought from a probably perspective: if a word is used 500 times on the internet, what are the odds that 498 or more of those citations will disappear from the Internet in 10 years' time? Purplebackpack89 18:27, 11 October 2016 (UTC)
Maybe I'm short sighted, but if a word is 700 times on the internet, do we have any method to sieve out the automatic copy paste to make sure it's not used 2x350 times? Korn [kʰũːɘ̃n] (talk) 19:17, 11 October 2016 (UTC)
This discussion was rooted in the belief that, to satisfy attestation, somebody would still have to find three different quotes spanning a year and all that... Purplebackpack89 20:00, 11 October 2016 (UTC)
Yes, and upload a screenshot (a small cropped version, I hope) of the website to enwikt to ensure that we have the source content if the website vanishes. Should we have an additional rule of only attesting entries from websites if they have a lot of Google results in the first place? Just by creating an entry, we fill the Wiktionary mirrors with the same entry, thereby increasing the number of Google results to some extent. --Daniel Carrero (talk) 20:12, 11 October 2016 (UTC)
What does 'exactly 1 trillion and a half' mean? 1,000,000,000,000.5 or 1,500,000,000,000? And really exactly? Renard Migrant (talk) 20:56, 11 October 2016 (UTC)
Please don't take it literally. What I meant was: "An arbitrarily large number of emoticons can probably be attested if we accept citations from random websites." But, I thought it was clear that 'exactly 1 trillion and a half' means 1,500,000,000,000: that is "1 trillion and a half trillion". --Daniel Carrero (talk) 02:12, 12 October 2016 (UTC)
The screenshot idea is problematic. It's easy to manipulate them, you'd need a complicated scheme to find them when you want to ('cause you can't put them in the entry), I'd take a lot of time so people probably won't do it, you can't use screen readers (except maybe if we use PDFs), etc. Instead I suggest using w:Wayback Machine, see also w:Help:Using the Wayback Machine, the archiving can be done with a click and some Javascript and it's even bottable. We can limit sites that disallow archiving. Another way to limit problematic sites is to limit to something like sites that are in the 100,000 (maybe too big) top most popular in a certain country as measured by w:Alexa Internet rank or something else. I agree that checking search engine hit counts is a good idea. —Enosh (talk) 08:50, 12 October 2016 (UTC)
Also w:Wikipedia:Link rot, this problem feels pretty well covered. —Enosh (talk) 08:59, 12 October 2016 (UTC)
Re "we can limit sites that disallow archiving": this changes over time. The archives of some of my own past domains have been hidden on Archive.org when a cybersquatter has taken over and applied a more restrictive robots.txt. (The same is true of Google's Usenet archive, where you can apply to have your old posts hidden; but I think we have assumed Usenet to be archived by more people than just Google.) Equinox 13:21, 12 October 2016 (UTC)
w:Wikipedia:Link rot basically suggests: "Don't delete a citation just because the link is now broken! Search for copies of the citation instead! You can even use internet archives!" Is this something we can implement on Wiktionary? Basically, we would not need screenshots, we would only trust that all current citations are correct, but if a link is broken, we can fix it ourselves or we can open an RFV to verify an existing citation that is a broken link. We may want to use a separate request page for that, like WT:Requests for citation check or something. We could have the rule that if a citation can't be double-checked at a later date, it is invalidated. (like when a new robots.txt disables old, archived pages)
As a related subject, should we be able to accept citations from movies, video games, musics, for attestation purposes? I would be happy to accept citations from Brazilian movies because I'm under the impression that sometimes the characters use dialectal/regional speech that may be difficult to find in books. When something is written on a video game or movie (as opposed to said out loud), we could upload a screenshot, I suppose. The fact that a screenshot is easy to manipulate is not a huge issue, is it? Text citations are the easiest thing to manipulate, and we can't disallow text citations based on that. We will just have to be able to keep double-checking citations when we want. If a certain video game disappears forever from all computer systems (which sounds unlikely to me), we may even choose to invalidate citations and delete screenshots taken from it. --Daniel Carrero (talk) 13:54, 12 October 2016 (UTC)
We do have a handful of citations from games (e.g. deathmatch), films (hubba hubba), and songs (bootylicious). In some cases these are the simplest way to find a word (e.g. hip-hop slang) or the earliest practical citation. I don't think it's much harder to get hold of these media than a book. They seem pretty "durable". Equinox 13:57, 12 October 2016 (UTC)
Maybe WT:CFI should say it explicitly? "We accept citations from books (visit Google Books!), Usenet, video games, music, movies..." --Daniel Carrero (talk) 14:02, 12 October 2016 (UTC) Nevermind, it does. Except video games and songs, though. --Daniel Carrero (talk) 14:05, 12 October 2016 (UTC)
@Equinox: In fact, much easier many times--you can just search for the song and stream it in seconds. I think that adding some kind of mass media real world usage is superior to hypothetical but illustrative examples (e.g.) —Justin (koavf)TCM 14:02, 12 October 2016 (UTC)
There is, of course, the issue that songs mostly don't come with lyrics (unless perhaps on the album sleeve), so it's hard to prove that a particular word and spelling is what the song contains. But that probably needs a separate discussion: this indentation is getting crazy. Equinox 16:23, 12 October 2016 (UTC)
I agree with Daniel's above suggestion to allow citations from anywhere, trust current citations as correct, and fix them if challenged and/or as needed. Purplebackpack89 15:37, 12 October 2016 (UTC)

Proposed CFI changeEdit

If we want to accept citations from the internet for attestation purposes, I believe the right place to edit would be WT:CFI#Attestation, as described below.

Currently, that section contains these items (I'm not copying the vote references):

  1. clearly widespread use, or
  2. use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year (different requirements apply for certain languages).

We could add a new item and move a portion of text below the list, this way:

  1. clearly widespread use, or
  2. use in permanently recorded media, or
  3. use on the internet, in sources that remain publicly accessible over time; if a link is broken, try searching for the same content in archives and other sources, otherwise the current citation is invalid.

For attestation purposes, all citations must convey meaning, in at least three independent instances spanning at least a year (different requirements apply for certain languages).

Maybe we can use WT:RFV to check for review of existing citations, but that page is already too large and unwiedly. It's more than twice the size of WT:RFD, or WT:RFM, or WT:RFDO, or WT:RFC. For this reason, I think it's better to create a new page to check for existing citations. Maybe it should be called... I don't know. Maybe WT:Requests for citation review (WT:RFCR) or something else. --Daniel Carrero (talk) 00:33, 13 October 2016 (UTC)

I support expanding CFI to include Internet citations in some form or other. This would make it ten times easier to cite slang terms, especially ones like savage that have common meanings that would dominate a Google Books search (note that that example is missing a slang sense because I haven't been able to easily find citations). Andrew Sheedy (talk) 01:42, 13 October 2016 (UTC)
I support in principle, but we need to spend a lot of time and effort to construct a viable policy. The above is totally insufficient and I oppose it. --WikiTiki89 15:21, 13 October 2016 (UTC)
Do you see any specific problems with the currently proposed text? --Daniel Carrero (talk) 15:38, 13 October 2016 (UTC)
It's not the text, it's that we need to figure out a system, and the system described in your proposed text is insufficient to weed out garbage and does not specify any limits on how frequently or infrequently internet citations need to be re-checked. It contains no protections against sites that are frequently edited. The problem is that this is actually a hugely significant change and so it requires a great deal of careful thought and discussion before we arrive at a workable system. --WikiTiki89 15:52, 13 October 2016 (UTC)
Sure. I hope this is a good start. I think it would be a bad idea to have any time limits to re-check entries, people can do it whenever they want and use the proposed WT:Requests for citation review when necessary. Additionally, maybe bots can crawl all the links and search for 404 and other errors, and tag entries automatically. Also, if we consistently fill the accessdate= parameter of web quotations, then years later we may search for the earliest accessed entries to see if they are still OK. If a certain hosting service is disabled, like the old Geocities, we should be able to find all citations that use it, to see if we can fix them or should invalidate them.
Should we have protections against sites that are frequently edited? If we use wikis for citations, we may link to the edit histories. If there are no edit histories, we may search in the internet archives or invalidate that citation if all else fails. We should probably ban citations from Wiktionary discussions themselves, to prevent a circular logic: if we allowed citations from our discussions, then basically only 2 citations from external sources would be needed at all times, because we would easily fabricate the third, which I believe would be unfortunate.
I think we should only accept citations when it's clear who is the author, not only to credit them, but because I'm afraid those Tumblr articles with "Source: Tumblr" and nothing more are deeply unprofessional. This would probably weed out many random clickbait articles like "20 Signs You Spend Too Much Time Building Dictionaries". In fact, we probably should ban all clickbaits and advertisements altogether, because I don't want the word "delicious" to have a citation reading "You should eat a delicious Big Mac™." This would be bad in citations from magazines and newspapers, too, unless maybe if they are from ages ago and the word is used in a meaningful way. --Daniel Carrero (talk) 16:31, 13 October 2016 (UTC)
What's to prevent people requesting re-checks of the citations every other day? That's one reason we need limits. As for editable pages, with or without histories, there will frequently be errors (not just typos, but misuses of words, and other such things) that are later corrected, so the fact that we are able to point to some particular revision does not make that the right revision to point to. In print media, there is a lot more proofreading going on, which is why we don't have to worry about it so much, but on the internet it becomes a problem. And even just having the ability to edit pages after posting makes it less likely that authors will proofread themselves before posting. --WikiTiki89 17:15, 13 October 2016 (UTC)
But RFV, RFD, RFDO, RFM and RFC don't have a rule like "don't create new requests every day!", and we don't seem to have that problem. I double-checked the introduction of the aforementioned request pages now. Wiktionary:Voting policy contains "No topic should have a new vote more than once a day (24 hr period)."
WT:RFV includes: "Those who would seek attestation after the term or sense is nominated will appreciate your doing at least a cursory check for such attestation before nominating it"... Based on it, we may require people to check the internet archives by themselves before making a new request for citation-checking.
I take your point: "the fact that we are able to point to some particular revision does not make that the right revision to point to". Wikis or not, basically all pages on the the internet are editable by their authors. Maybe we should have greater quality standards and ignore one-off uses of words... If a typo like "esaclator" appears even in a book, we won't create a new entry esaclator because of it. We are able to tell the different between mistakes and actual words, right? Or maybe not? --Daniel Carrero (talk) 17:46, 13 October 2016 (UTC)
RFV doesn't need a rule at the moment because once something is attested it cannot become unattested. RFD/RFDO, however, does have a rule that you can't nominate something that has already passed RFD (without a substantially new reason or there having been a change of practice or policy since the last RFD). At RFC, you wouldn't nominate an entry that has just been cleaned up because it is already clean. With this new internet stuff, however, something can be attested one day and unattested the next, but it would be too time-consuming to re-check the same word every day. As for editable pages, if someone spells selfie as selfy, we might use it to attest selfy as an alternative spelling, but then the author comes back the next the day and corrects the spelling to selfie because it was really just a mistake. How are we to know? I would say we need some sort of standards such as only using content that is professionally proofread or something like that. --WikiTiki89 18:19, 13 October 2016 (UTC)
I'm not sure I agree with attesting entries only from content that is professionally proofread. I assume we would want to use sentences with random orthography and abbreviations like h8 = hate to build up our database of internet slang and text messaging slang. That's one kind of thing we are already able to find in Usenet. Usenet is not professionally proofread.
For non-internet slang cases, yes, we may want to implement a number of quality standards. We might have a specific rule saying that a nonstandard spelling that was later fixed becomes invalid as a citation.
I'm still not sure that we need a time limit. I could even suggest having an explicit time limit of 30 days or something if people really want it, even though I don't see the need.
But let's see how it would work out with an example: Suppose we visit http://www.snopes.com/luck/chain.asp (Snopes is professionally proofread, I believe) and get one citation from that page. In the first large paragraph, the first sentence is: The practice of circulating letters to other parties beyond their original recipients has existed for centuries, so pinpointing the exact origin of chain letters is problematic.. We could attest pinpoint using that sentence.
Suppose that tomorrow I visit the page again, and witness a very unfortunate turn of events: the owners of Snopes decided to close the website forever. I believe I would have to search for archives to see if we can keep the citation. If I don't find any suitable archives, I may create a citation review request.
If Snopes is not closed and the citation still exists in the original page, or if I already found an usable archive, there's no need to request a citation review. In which circumstances would someone be able to create a new request over and over for the same entry every day? --Daniel Carrero (talk) 19:03, 13 October 2016 (UTC)
The way it's worded right now, absolutely not. In principle should we include more things common on the Internet but uncitable by our current CFI, yes we should. I would favor replacing 'clearly in widespread use' which I suspect is getting as the same thing, as other things that are in clear widespread use would pass the three citations rule anyway. Let's rewrite that line. Renard Migrant (talk) 19:12, 13 October 2016 (UTC)
Usenet is not modifiable, so it's less of an issue. I never said "time limit", I just said "limit", which can be any sort of limiting factor. Professionally proofread is an example of something we can use to limit garbage, that doesn't mean it's the only thing we can do. Another thing discussed was increasing the citation requirements to 700, but that is probably beyond our research capacity. Right now we're in the brainstorming phase, so please don't make any concrete proposals any time soon. --WikiTiki89 19:29, 13 October 2016 (UTC)
It's fine, I said "time limit" because I understood it as some specific limit to avoid new requests every day. Sure, I would probably support the general idea of having some limits. In WT:RFV#Acela Republican (to be archived at Talk:Acela Republican), Purplebackpack89 said "I got 744 [Google Web hits]". I think the idea proposed was that a word with 700+ words deserves to be included somehow, but even if we accepted it, it's not necessary to add the 700 citations here.
By "please don't make any concrete proposals", you mean I shouldn't create a vote, right? Because, in my opinion, a vote right now would be pointless, but I believe we could still discuss what could be the CFI text to be edited, even if partially. I don't have time to do it right now, though. --Daniel Carrero (talk) 19:43, 13 October 2016 (UTC)
No, I mean we don't need to be looking at specific CFI text yet. Not until we have a thorough understanding of how the new system is going to work. How do you very that the 700+ hits are of the correct sense of the word? And that they actually are uses of the word? You need to check them all. --WikiTiki89 19:50, 13 October 2016 (UTC)
I don't see a huge difference between saying: 1) "Let's have a rule saying that professionally proofread text is required." (no CFI text proposed) and 2) "Let's add in CFI: Professionally proofread text is required." (CFI text proposed).
I take your point that Usenet is not editable.
About the 700+ hits thing, you are going to have to ask that to PB89. --Daniel Carrero (talk) 19:57, 13 October 2016 (UTC)
The difference is that it's an idea. It's not a whole proposal, it's only a part of the bigger picture. And it's also not necessarily a good idea, it's just an idea. It needs to be discussed first. Also, proposing the actual CFI change draws too much attention to the language, which distracts from the content itself. --WikiTiki89 20:02, 13 October 2016 (UTC)
Ok, I understand. --Daniel Carrero (talk) 20:05, 13 October 2016 (UTC)

Requiring six citations from the internetEdit

Would it help to require six citations instead of the usual three when citing from the Internet? In other words, we'd count them as only having half the value of a citation from a published work (and thus you could use 1 quote from a book and 4 from the Internet to meet the requirements). That might decrease the possibility of typos and spelling errors being used in citations. Andrew Sheedy (talk) 01:23, 14 October 2016 (UTC)

Maybe, that seems worth discussing. Some of the limits discussed above still should apply, IMO. If someone writes "selfy" and later fixes it, (in a wiki or otherwise), as mentioned above, I don't think it would serve as a good citation for selfy, even if we require 6 citations.
I think CFI or a separate page should list in a comprehensive way what are the known durably archived sources, like this:
  • Books
  • Usenet
  • Video games (which games? all of them?)
  • Songs
  • Google Scholar (I guess?)
  • etc.
--Daniel Carrero (talk) 14:03, 14 October 2016 (UTC)

A quote from the man who created the Dead Media Project:

  • 2001, Bruce Sterling, Digital Decay[7]:
    Originally delivered as the keynote address for Preserving the Immaterial: A Conference on Variable Media at the Solomon R. Guggenheim Museum on March 30, 2001
    Bits have no archival medium. We haven't invented one yet. If you print something on acid-free paper with stable ink, and you put it in a dry dark closet, you can read it in two hundred years. We have no way to archive bits that we know will be readable in even fifty years. Tape demagnetizes. CDs delaminate. Networks go down.

-- DCDuring TALK 15:52, 16 October 2016 (UTC)

  • Are most websites reasonably expected to fade away in a few decades? (some defunct hosting providers like Geocities and hpg.com.br come to mind, because naturally when they were disabled, websites hosted by them were disabled too) Archive.org looks reliable enough, ...there's always the threat that changes in a website's robots.txt will delete the archives of that specific website within Archive.org, but we could simply reject any citations if they can't be found on the archives anymore.
    I don't know what will happen in 200 years, but if by any chance the copyright laws remain the same, apparently all that remains of today's internet will be in the public domain, and future internet archivers should have more freedom to keep it if they want. (correct me if I'm wrong) --Daniel Carrero (talk) 20:51, 18 October 2016 (UTC)

Suggested rulesEdit

As discussed above, these are some of the suggested rules. I hope I didn't forget anything important:

  • Requiring 6 citations from the internet, instead of 3. (I'd probably oppose that as unnecessary. But it's okay if people want it.)
  • A given citation needs to be publicly available in order to count. It can either link to the original page or to live archives. If no archives can be found anymore, the citation does not count anymore.
  • We should probably have a separate page like WT:Requests for citation check if a given citation can't be found either on the original page or the internet archives anymore, to request people to keep searching before considering a citation invalid.
  • If we are citing a text that was later edited to remove the cited word, then our citation is invalid because the author was probably fixing a mistake. This includes revisions of text in wikis.
  • Only allowing citations if the author is known, either by the real name or by nickname. This is intended to weed out random memes and articles with unknown authorship. This should also weed out those fake quotations like "Wiktionary is awesome --Albert Einstein", in which the quoted author never actually said it. For wikis, we can probably use "contributors" and link to the history or something.
  • New idea: Disallow any clickbait websites that require you to click "next >>" 14 times in order to view a full article. Reason: For a bit of quality, please. Cracked.com is fine, in my opinion. (Sorry, that's subjective and probably needs discussion. We may want to disallow Cracked.com if this rule is implemented.)
  • Disallow any citations from ads. We don't want the entry delicious with a citation like "Eat a delicious Big Mac."


  • With the popularity of the internet, there are terms that can't be found just in books and other durably-archived sources, so allowing citations from the internet would allow our coverage to be more complete. This probably includes some internet slang, text messaging slang and emoticons. I listed some Portuguese words above.

--Daniel Carrero (talk) 23:16, 20 October 2016 (UTC)

How many votes?Edit

Planned, running, and recent votes [edit this list]
(see also: timeline)
Ends Title Status/Votes
Oct 27 Request categories no consensus
Oct 28 Definitions — non-lemma  3  2  1
Oct 29 No headings nested inside templates or tags  5  4  1
Oct 29 Renaming transliteration 13 (10 people)
Nov 4 No triple-braced template parameters in entries  3  4  5
Nov 9 Description  6  3  0
Nov 17 Matched-pair entries — policy page  6  1  0
Nov 17 CFI and idiomaticity clarification  7  0  0
Nov 26 Redirect fullwidth and halfwidth characters  2  0  0
Jan 9 Removing label proscribed from entries  1  4  0
(=10) [Wiktionary:Table of votes] (=88)

Please let me know how many votes is a good maximum number of votes created by the same person at a given time.

Of the 12 "planned, running and active votes", I created 8. Actually, only 9 of those are actually active (because they already started and didn't end yet), of which I created 6... which is a little more than 1 vote per week.

"A little more than 1 vote per week" has been my actual rule of thumb for some time. --Daniel Carrero (talk) 19:34, 10 October 2016 (UTC)

I think it's also important to make sure votes are well-conceived. I would say that a lot of failed votes is a sign that something is wrong. A single failed vote isn't necessarily a problem but too many failed votes is just a waste of time on everyone's part. Benwing2 (talk) 19:38, 10 October 2016 (UTC)
Of my votes created in 2016, 18 fully passed, 6 at least partially passed and 14 failed. --Daniel Carrero (talk) 19:55, 10 October 2016 (UTC)
  • Of clean, well-crafted votes that don't require midstream revision or additional provisos not strictly speaking part of the proposal, I'm sure that we could do one a week. If the votes were open for five weeks, we could be reasonably sure that the realistic potential population of voters would have a chance to consider the proposals, even if they only came by once a month. Some care should be taken to avoid having too many votes at times when contributors may not be available, eg, beginning of semester, summer vacation. DCDuring TALK 00:07, 12 October 2016 (UTC)
    Sure! I've been trying to learn from my mistakes, for example nowadays when I want to edit WT:EL, I prefer to propose a change to small portions of text at a time, because a vote trying to review whole large sections is usually very difficult to pass.
    Unfortunately, even after multiple people supported the creation of a vote, sometimes people bring up new problems after the vote has started. Usually, these votes would just fail, (but most of my 2016 votes have passed) which may result in a better votes in the future for the same things. I think it's OK when people edit ongoing votes to fix minor grammar mistakes, though.
    It seems that in practice, most BP discussions that are not about syllable categories are unlikely to get new answers after a week or so. By the second week, sometimes I feel the urge to think: "Well, consensus or not, this discussion is as good as it will ever be. It's over." I've been waiting some more time just in case, but when nobody comes, I secretly think: "I knew it!" --Daniel Carrero (talk) 15:37, 13 October 2016 (UTC)
    In general, others are not be as interested in a proposal that you make as you are, usually because the proposal solves no problem that they have. Also many problems that they do have may not get addressed until they create some technical problem, at which point they are addressed (not necessarily solved, sometimes replaced by worse problems or inconveniences) without a vote.
I don't think it's OK that "minor" grammatical mistakes are fixed on the fly. They should not occur. Each such "fix" may change the proposal substantively in unintended ways. Presumably many who read the proposal once would have to reread it to determine whether the change was in fact "minor". The result will be that few indeed will read the proposal until the proposal text becomes stable. That is why we have BP discussions. (Other fora are not acceptable substitutes IMO.) Having votes that are uninteresting, eg, trivial, and imperfectly drafted will lead to simplified voting heuristics, such as "Always vote no" or more selective versions. DCDuring TALK 21:22, 13 October 2016 (UTC)
  • Most of my votes are to edit policies rather than solve technical problems. Maybe you could count proposals such as "install a new extension", "implement a new namespace shortcut" or "create a new user rights group" as technical problems, which clearly require votes. Most tech problems don't require votes.
    I think it's OK if a policy-edit vote is uninteresting. When I create BP discussions and suggest creating votes, I believe I have never heard this specific complaint: "Don't create it, it's uninteresting!" But I understand that it may be a reason for a certain vote to have a noticeably low turnout.
    If someone attempts to edit a vote that already started with the purpose of fixing a grammar mistake, I suggest you revert them if you want, but if multiple people prefer the revised version of that specific vote anyway, please cater to them. Grammar mistakes happen. If it's so serious, we can withdraw the vote and try again. --Daniel Carrero (talk) 19:19, 14 October 2016 (UTC)

Adverb, prepositional phrase, adjective, ...?Edit

Are things like at ease and à gauche best categorized as adjectives, adverbs or prepositional phrases? There are tons of things in CAT:English prepositional phrases and a few in CAT:French prepositional phrases; many more putative prepositional phrases are found in CAT:French adverbs, for example. Where is the boundary to be drawn? Benwing2 (talk) 19:36, 10 October 2016 (UTC)

I don't know a thing about the traditions of French grammar or lexicography.
Putting prepositional phrases into the word classes adverb and adjective is not traditional, but neither is treating prepositional phrase the way we do. Most dictionaries have them as run-in entries where they do not require their own word class. Grammarians would say that prepositional phrases can be used as adverbs or adjectives, but would not put them in the corresponding word classes.
In principle, every English prepositional phrase is just that. Recategorization would just be a matter of replacing the categorizing inflection-line templates. Merging the Adverb and Adjective PoS headers would require rewording definitions one at a time. It's not the kind of thing that most of our contributors are capable of or interested in. One would probably get disagreement as to the desirability of even the category change. DCDuring TALK 09:28, 11 October 2016 (UTC)
I like prepositional phrase as it covers both. Renard Migrant (talk) 11:04, 11 October 2016 (UTC)
I would keep in mind that things that look like they're used as adjectives may not necessarily be adjectives. This is stronger in French with its postposed adjectives than in English with preposed adjectives. Compare, for example, "this house here" to "cette maison ici". "ici" looks exactly like an adjective in this position, but is it? Care should be taken when judging prepositional phrases to be adjectives this way. —CodeCat 17:49, 12 October 2016 (UTC)

Formatting of cognates at Reconstruction:Proto-Celtic/kumbāEdit

A year ago, there was Wiktionary:Beer parlour/2015/September#Formatting proposal: always put cognates in a separate paragraph, which has majority support. I've been implementing this in entries ever since, but now User:Victar has started reverting me. I pointed him to the prior discussion, but he dismissed it, claiming that as he created the page he's entitled to choose how he wants to format it. This isn't true of course; there's no ownership of pages, anyone can edit anything, and decisions are made by consensus. Since this matter has no consensus and there's only two parties involved, there's two ways out: edit war into eternity or form a wider consensus. I'm choosing the latter option. So I'm asking now, how should the cognates be formatted: in their own paragraph or not? —CodeCat 17:37, 12 October 2016 (UTC)

Your proposal was simply that, a proposal. It was not ratified as formatting guideline, and as such, it shouldn't be enforced with the same blind vigor. If we're simply talking about a matter of personal preference, which it is, than I have the right to have my own, and yes, I think especially as the creator of the entry. I strongly believe that creating separate lines for cognates unnecessarily pushes down the whole of the content. --Victar (talk) 17:51, 12 October 2016 (UTC)
Indeed, in that discussion it seems that a large portion of people expressed the opinion that having cognates in a separate paragraph should be an option, especially when the Etymology section is big, but not a requirement. If I had contributed to the discussion, that's probably what I would have said, too. —Aɴɢʀ (talk) 18:48, 12 October 2016 (UTC)

Edit protect tchýněEdit

Can someone please protect this page? There's a bunch of Czechs who seem to think it's ok to ignore Wiktionary's descriptivist approach and repeatedly inserting all kinds of POV appeals to authority. There's also some warring on the talk page because one of them posted in Czech, which is inappropriate for a discussion on the English Wiktionary. —CodeCat 19:12, 12 October 2016 (UTC)

I don't have the time to check that entry right now, but I see some kind of edit war ongoing. Please someone review that. I added "autoconfirmed" protection. Is it OK or does it need admin-level protection? Also, someone pretty please give CodeCat's admin rights back, thanks. --Daniel Carrero (talk) 19:21, 12 October 2016 (UTC)
It needs admin protection, they just did another edit. —CodeCat 19:25, 12 October 2016 (UTC)
Weird. That person who edited the entry now is not even autoconfirmed. --Daniel Carrero (talk) 19:28, 12 October 2016 (UTC)
@Daniel Carrero Well can you do it please? They're still at it. —CodeCat 22:47, 12 October 2016 (UTC)
  Done --Daniel Carrero (talk) 22:50, 12 October 2016 (UTC)
Thank you! —CodeCat 22:53, 12 October 2016 (UTC)
@Daniel Carrero or anyone else: can you also protect dceřinná společnost and vyjímka? It's the same issue, they've just started messing with other pages instead. —CodeCat 23:24, 12 October 2016 (UTC)
For future reference: note in particular the edit history of these three entries. They consist almost entirely, from the moment Dan Polansky created them, of some editor changing it to "misspelling" and/or adding "incorrect" POV notes, and then various more experienced editors putting it back. —CodeCat 23:27, 12 October 2016 (UTC)
  Done. Again, I'm just trusting you on these ones, because I don't speak Czech. If others want to review my page protections, I welcome them. --Daniel Carrero (talk) 23:30, 12 October 2016 (UTC)
I don't speak Czech either, but their edits and arguments seem like just an appeal to authority to me, ignoring Wiktionary's descriptive nature. I trust Dan's judgement more than any of theirs, since he's a native speaker and a knowledgeable Wiktionary editor. Another Czech editor has now provided further statistics to show that tchýně is widely used. —CodeCat 23:33, 12 October 2016 (UTC)
I understand. Yes, I see your point and I think you're right. I see that now you asked Dan Polansky to weigh in on Talk:tchýně, which is good. --Daniel Carrero (talk) 23:35, 12 October 2016 (UTC)

Well, it would have been polite, if you (plural):

  1. pinged us in this talk, so we could react here
  2. pointed us to relevant descriptive pages explaining what is considered "descriptive"
  3. would not consider and (indirectly) call us less experienced, considering the fact that some of us are in wikiverse longer than you
  4. started the discussion instead of blind reverting without even stating the reasons which obviously led to the reverting and it didn't have to if clear explanation was provided
  5. followed the way how to make a consensus

Very impolite behavior of you to de facto here-newcomers, shame on you, this is not how users should be treated.

You have a chance to remedy at least the second and fourth point now though...

@CodeCat: I don't think it is OK to ignore any rule as long as I know it. So it is heavily unfair what you have written in the second sentence of your opening post in this section, because none of you bothered to point us to such rule (as I've mentioned above). Not even speaking that none of you followed the consensus making process, so as we say in Czech "sweep in front of your doorstep first".

Anyway, this whole situation is obviously one big misunderstanding mostly because of lack of the communication (fortiori proper) from the local folks towards us. I can not obviously speak on behalf of other involved Czechs, but I'm pretty sure they would like me prefer the discussion instead of dragging the rope there and back.

So it would be constructive if you (plural) at first explained and described reasons why you (plural) keep putting "alternative spelling" instead of "misspelling" in those entries. (And no, simple wordcount statistics is not a reason.) Instead you should for the beginning at least clearly describe these two terms, so we could move forward.

PS: Please also mind the "lost-in-translation" factor, which may be a big stakeholder here.

Danny B. 01:43, 13 October 2016 (UTC)

Actually simple word count is a reason. Descriptive linguistics is all about describing language as it is used, not as people think it should be used.--Prosfilaes (talk) 12:33, 13 October 2016 (UTC)
Maybe we don't understand whot is meaned by alternative spelling. For me is alternative situation when I should use e.g. both encyclopaedia or encyclopedia in written text. But this is not this case, when I use tchýně in czech text, it will be considered as mistake. When I use it in the school, I got worse mark.
We provided sources from university, from linguistic-oriented blogs, from newspaper etc, but because this error is very common, is this more than sources? OK, if i aply the same for inglish, i shuld writ as i hear and it wil be only altenative speling. JAn Dudík (talk) 18:19, 13 October 2016 (UTC)
That's what the usage note is for. Also, I added the tag "proscribed", perhaps no one will object to that? --WikiTiki89 18:21, 13 October 2016 (UTC)
On the talk page, I suggested using {{nonstandard spelling of}}, but no one's responded to that suggestion yet. —Aɴɢʀ (talk) 18:42, 13 October 2016 (UTC)
I object to both proscribed and nonstandard; for more, see Talk:tchýně. google:"tchýně" shows how incredibly widespread this form is. The usage note in tchýně covers the matter, upholding descriptivists standards while at the same time accurately reporting the absence from Pravidla ("Rules") for those who find that fact relevant. --Dan Polansky (talk) 13:05, 15 October 2016 (UTC)
If you only count the Czech speakers involved, it is largely one me against multiple others. This is probably because Czechs are brought up in a prescriptivist language culture. Many Czechs seem to think that if a spelling is absent from a regulatory list of approved forms, then it is "incorrect". That is one reason why the idea that consensus should only be made by the natives of a particular language leads to poor results. Instead, my position is that consensus is to be sought among all eligible English Wiktionary editors, whether they know any Czech or not. Since even English, Dutch and Portuguese editors with no knowledge of Czech can distinguish prescriptivism from descriptivism. --Dan Polansky (talk) 13:15, 15 October 2016 (UTC)
@Dan Polansky: What do you think "proscribed" means? It means that many Czechs think it's wrong. --WikiTiki89 15:46, 17 October 2016 (UTC)
Wiktionary:Glossary#P does not tell me what "proscribed" means. As far as I am concerned, "proscribed" label could be removed Wiktionary; it is suggestive of prescriptivism. Is there any English dictionary that uses the label? --Dan Polansky (talk) 17:30, 17 October 2016 (UTC)
What about going to Wikipedia and labeling W:Homosexuality article with "proscribed" box, meaning "many Americans think it is wrong"? Makes sense? --Dan Polansky (talk) 17:32, 17 October 2016 (UTC)
We are describing the fact that prescriptivism exists in Czech. --WikiTiki89 18:05, 17 October 2016 (UTC)
Agreed? Homosexuality should get a nice red box "proscribed" in Wikipedia to indicate that anti-homosexualism exists in the U.S.? --Dan Polansky (talk) 18:13, 17 October 2016 (UTC)
That's not Wikipedia's style. Anti-homosexualism is discussed in the entry (or should be, I haven't checked). --WikiTiki89 18:18, 17 October 2016 (UTC)
I don't see why it should be our style. In tchýně, I have used the usage note to indicate the term is not on the approved word list; the information is there. That is analogous to Wikipedia having no prominent box "proscribed".
Let me quote Ruakh from Template talk:proscribed: 'The problem with labeling a sense as "proscribed" is that, as Metaknowledge implies, it makes it sound like we are proscribing it. I prefer to write "sometimes proscribed" or "often proscribed", which I think makes it a bit clearer that we're talking about other people's proscriptions. And probably "condemned" or "criticized" would be better than "proscribed". —RuakhTALK 19:42, 13 September 2012 (UTC)'
I agree with Ruakh: it sounds like we are proscribing it. And we the dictionary are not proscribing anything. Given the current circumstances, I support deprecation of label "proscribed". --Dan Polansky (talk) 18:21, 17 October 2016 (UTC)
I agree with Ruakh as well, and if you had asked me to change it to "often proscribed", I would have done so. But I don't think it should be removed. Tags like this are already our style, it's no different from a "(dated)" tag. Why doesn't Wikipedia need to put a big red box at the top of w:Floppy disk saying "Dated"? --WikiTiki89 18:40, 17 October 2016 (UTC)
It is different from dated in that there is nothing prescriptivist about dated or archaic. Put differently, label dated is not an imperative in disguise. The whole disagreement is not about provision of information since, again, I stated in the usage note that the spelling is absent from the mighty uberlist, but rather about the prominence and tone of the labeling. In any case, "often prosribed" would do a bit to alleviate my anti-prescriptivist concern.
Furthermore, since German zumindestens is actually being proscribed by language teachers, should it be marked as "often proscribed"? Should all vulgar terms also be labeled "often proscribed", since they indeed are often proscribed? And since ain't is often proscribed, shall it be so labeled? You have to clarify how far do you intent to spread the badge of shame that is "proscribed". --Dan Polansky (talk) 18:52, 17 October 2016 (UTC)
I think labels such as "colloquial", "vulgar", and "slang" already imply that it is proscribed. There is no reason not to have both the tag and the usage note. For some dated terms as well, we tag them as dated and also include a usage note specifying when it was used. --WikiTiki89 18:58, 17 October 2016 (UTC)
Assuming the above, which I don't but anyway: then why not use the the definition line "informal form of" or "colloquial form of" and be done with it? --Dan Polansky (talk) 19:05, 17 October 2016 (UTC)
Do those actually apply to this word? --WikiTiki89 19:35, 17 October 2016 (UTC)
They seem to: tchýně is how people very often pronounce the word, whereas tchyně is on the uberlist of the continental regulators. "tchýně" is how people very often write the word when not subjected to the rigor of zealous copyeditors; this is probably so because Czech has a rather phonetic spelling and "tchýně" matches the pronunciation. The other Czechs have to be "on guard" when writing lest they commit an "error".
Now, I cannot prove that people very often pronounce the word as "tchýně". I can only demonstrate that conspicuously many "tchýně" make it into printed works in Google books and to the world wide web, compared to "tchyně". --Dan Polansky (talk) 19:48, 17 October 2016 (UTC)
But anyway, "often proscribed" is so much better than "proscribed". Until we get "proscribed" removed entirely from the dictionary and become fully descriptivist again, it is okay, I guess. --Dan Polansky (talk) 20:00, 17 October 2016 (UTC)
I don't think the word "colloquial" can apply to alternative spellings. And to me, an "informal" spelling is something like thru. --WikiTiki89 20:07, 17 October 2016 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── Elsewhere, I proposed to use informal and discontinue colloquial, so I am certainly okay with informal. "thru" seems much more informal to me than "tchýně", but I am no native English speaker. Interestingly, M-W's thru[8] does not seem to say "informal" or "non-standard". --Dan Polansky (talk) 20:15, 17 October 2016 (UTC)
I clicked on the link to this article given in the entry you linked to, which has some interesting information. Based on that, I assume that they don't mark it as informal because thru was technically attested before through. However, at the end of the article it says: "All that said, thru is still considered an informal variant of through, despite its history and the AP's limited approval." --WikiTiki89 20:37, 17 October 2016 (UTC)
Very interesting link; thank you. I wonder why they do not mark "thru" informal straight away. Be it as it may, they do not mark "thru" non-standard or "often proscribed", right? Does any dictionary do that for thru? And then, what do schoolteachers think of thru?
I would love to follow the model of the modern Anglo-American lexicography and mark "tchýně" as "informal form of" or the like. --Dan Polansky (talk) 20:45, 17 October 2016 (UTC)
But I'm not sure that thru has an equivalent usage pattern to tchýně. Most people when they write thru recognize that it "should" be spelled through but choose to ignore that fact for whatever reason (space limitations, laziness, stylistic considerations, etc.). I would think that most people who use tchýně do it either by without knowing that tchyně is the more widely accepted spelling, or simply by mistake without thinking about it, or perhaps because they are insistent on spelling words phonetically. Am I correct about that? --WikiTiki89 20:54, 17 October 2016 (UTC)
I don't know see any meaningful distinction above. Again, "thru" does not seem any less informal than "tchýně"; indeeed, the article you linked above mentioned that, when drive-thru was proposed to be placed on signs, at "an editor's conference in 2014, there was an audible gasp in the room when this was mentioned [...]: the decline of English in action!". If thru signifies the decline of English to many and still can be labeled informal rather than often proscribed, I don't see any evidence to suggest that tchýně should be "often proscribed". It is better than "proscribed" but I disagree with it, find it prescriptivist, and hope it will be gone. If I could edit tchýně, I would remove "often proscribed", thereby returning the entry to the status quo ante. I am thinking about trying my luck and getting rid of "proscribed" altogether from Wiktionary, but do not see too good chances: descriptivism is very fragile and I actually found it surprising that it was upheld so well until now. --Dan Polansky (talk) 21:13, 17 October 2016 (UTC)
Are there any entries at all that need "proscribed" as a label before the definition? I agree with Dan Polansky's reasoning that this label is an imperative in disguise, and it indicates that we, the English Wiktionary, are proscribing a word or a sense.
For example, in my experience as native speaker of Portuguese from São Paulo, Brazil, basically everyone writes mozzarella as mussarela, (there are pizzerias everywhere selling pizza de mussarela, and markets sell slices of queijo mussarela by the kilogram). But one source online states that both Aurélio and Houaiss (two important dictionaries) use muçarela. This is because of a prescriptivist rule stating that words borrowed from other languages and then adapted to our orthography always use "ç", not "ss" in the middle of a word. I wouldn't want to add "proscribed" at the beginning of mussarela, because it would seem like we are prescribing it. --Daniel Carrero (talk) 12:47, 18 October 2016 (UTC)

More ise/izeEdit

canonize/canonise is a typical example of -ise/-ize definitions being assigned to one entry and not duplicated over both. Given that one entry has to be chosen, by whatever method, to host the definitions, the treatment should otherwise be as symmetrical as possible. Presently this is not the case. While canonise is labelled "UK" or "British", no label is anywhere attached to canonize, giving the impression that canonise is a deviation from the norm. What is the best way to redress this -- given, as I say, that it is accepted that only one of the entries, in this case canonize, can host the definitions? Mihia (talk) 20:10, 12 October 2016 (UTC)

In actual practice, spellings like canonize are hardly used in British English, except by non-natives and American companies such as GM and Ford I suspect. DonnanZ (talk) 16:08, 13 October 2016 (UTC)
I agree, and I am aware of that, but the question is how best to indicate this in the entry for, say, canonize. In practical layout terms, where does the "US" label go? Mihia (talk) 17:43, 13 October 2016 (UTC)
Some British publishers do use the -ize spellings, notably the Oxford University Press, which is why British spelling using the -ize variants is known as Oxford spelling. It shouldn't be labeled simply as "US" (the way center or color should be) but should be labeled {{lb|en|US|Oxford}} or the like. —Aɴɢʀ (talk) 17:49, 13 October 2016 (UTC)
OK, but where should the label go? Mihia (talk) 19:19, 13 October 2016 (UTC)
I think having usage notes on every -ise/-ize verb would get pretty tedious. Can this be managed through context labels alone? Renard Migrant (talk) 19:28, 13 October 2016 (UTC)
I think so. I'd say the actual definition should be at only entry, and the other entry should be marked as an alternative spelling, with the appropriate context labels. The problem is which entry to make the primary one. In the past, some have suggested using Google Books Ngrams to see which is more common across English as a whole (i.e. without specifying en-US or en-GB); in this case, that would be canonize. But I don't know whether there's a consensus to use that method. —Aɴɢʀ (talk) 19:48, 13 October 2016 (UTC)
Which entry to make the primary one may be another problem, but it is not one that I am concerned about here. All I am concerned about in this thread is practically how to label the "primary" entry (the one hosting the definitions), to show that it is US, UK, or whatever. Mihia (talk) 20:57, 13 October 2016 (UTC)
-ize spellings can be labeled {{lb|en|US|Oxford}} and -ise spellings can be labeled {{lb|en|non-Oxford}}. The form called the alternative spelling can use a from= parameter instead of {{lb}}, like this. —Aɴɢʀ (talk) 21:23, 13 October 2016 (UTC)
I see ... you have just put the label against every definition at canonize. I do not personally feel that this is a very satisfactory solution (in fact, I had more or less discounted it, as I suppose I should have made clear). I suppose it may be tolerable with just three definitions, but if you look at an entry like color/colour (let's assume it is agreed to merge the definitions, and not get into that debate specifically here), it would be tremendously tedious to have to repeat the national labels against every definition, along with the various other labels on top. I was looking for a neater solution. Mihia (talk) 22:12, 13 October 2016 (UTC)
Just noticed (don't know why I didn't notice before) that "colour" and "color" actually put the national labels just once, next to the headword. Perhaps that is the way to go ... Mihia (talk) 22:16, 13 October 2016 (UTC)
That's {{term-label}}. —CodeCat 22:18, 13 October 2016 (UTC)
OK, thanks, I changed it to use "term-label". I think that is better, unless anyone has any other suggestions about how to handle this ... Mihia (talk) 22:30, 13 October 2016 (UTC)
I wasn't aware of {{term-label}}. That is a good idea. —Aɴɢʀ (talk) 19:51, 14 October 2016 (UTC)

Proposal: Redirect many single-character entriesEdit

Sometimes, when I propose a new thing, it's something that I've been thinking for years. This is one of those times.

Proposal #1: Redirect many separate single character codepoints that basically mean the same thing. I believe it would be useful to create an exact, comprehensive list of the group of characters affected, if possible. I mean hard redirects, the ones that use #REDIRECT.

Proposal #2: For every redirected character, add {{character info/new}} in the main entry with the codepoints of all redirected characters. (or maybe another template if there are too many redirects for a single entry)

Here's a partial list. This assumes that all characters (including emojis bellow) are attestable. These are all the examples I could remember for now. Please add more if you remember any, discuss if you want, etc. (there are no Han characters in this list, apparently we prefer soft redirects for the traditional/simplified variants at least)

  1. redirect (when possible) all specialized Roman numeral characters
  2. redirect all fullwidth and halfwidth letters and symbols (I started a discussed recently about it here)
  3. redirect all subscript and superscript characters
  4. redirect all small caps characters (except when they have a separate meaning in IPA or something, like )
  5. redirect all combining characters when possible (already voted and approved here in 2011)
    • (combining acute accent) → ´
  6. redirect single-character digraphs (already voted and approved here in 2011)
  7. redirect single-characters that stand for multiple punctuation marks
    • (double exclamation) → !!
  8. redirect some specific characters for units of measurement
    • ºC (or maybe soft redirect into both º and C)
    • µ ("micro-" sign) → μ (small mu) (it seems the "micro-" sign is used a lot in our entries, though)
    • K (Kelvin) → K (the sofware already redirects this one automatically, it appears)
  9. some random spaces like "EM SPACE" and "THREE-PER-EM SPACE" appear to be impossible to use in page titles, but they at least should probably have character boxes in Unsupported titles/Space, in my opinion
  10. redirect Arabic presentation forms, I guess (I don't even speak Arabic, ignore if blatantly wrong)
    • , ـب, ـبـ, بـ (EDIT: I don't know how to write this, but ب should be the main entry, I believe)
  11. concerning w:Hangul Compatibility Jamo and w:Hangul Jamo (Unicode block), redirect one to the other, I suppose
    • , (we are already redirecting from normal to compatibility entries? shouldn't it be the other way around?)
  12. it may be just me, but I'm not too happy with the small katakana word boxes -- I would redirect them and add {{character info/new}} to the full word entries
  13. redirect all pieces and components of single characters
    • , , , , , (parentheses pieces) → ( ) (although these could redirect to either ( or ) I guess -- the ( ) feels more like the "main entry" to me)
  14. redirect all "ornament" versions of other characters
    • , ( ) (see comment in the item above)
  15. redirect all vertical writing versions of other characters
    • , (vertical parentheses) → ( ) (see comment two items above)
  16. redirect all fancy typography
    • (heart-shaped exclamation mark) → !
  17. redirect all emojis that basically are the same thing
    • (hourglass with flowing sand) → (hourglass)
  18. specifically, redirect emojis that mean the same thing but differ in color
  19. specifically, redirect emojis that are basically the same expression of emotion
    • 😭 (LOUDLY crying) → 😢 (crying)
    • 😙 (kissing face with smiling eyes) → 😗 (kissing face)
    • see the huge list from Unicode (link) for yourself, let me know if there are any problems with this one
  20. redirect characters and emojis that mean the same thing but are arbitrarily rotated or inverted with no additional meaning
    • (reversed empty set) → Ø (unless the reversed one has any actual, separate meaning)
    • (reversed not sign) → ¬ (unless the reversed one has any actual, separate meaning)
    • (white shogi piece)

Also, I think we should SOFT redirect these, because they are basically SOP and consist of multiple entries:

  1. number + period
  2. number + comma
  3. number + parentheses
  4. letter + circle (...which sounds controversial, because the circle is just a typography thing; well, some like do have a separate, attestable meaning)
    • B + (alternative idea: redirect B as a single character)
  5. number + circle
    • 1 + (alternative idea: redirect 1 as a single character)
  6. random fractions... my point is: in my opinion, we don't need entries for random fractions

--Daniel Carrero (talk) 20:48, 12 October 2016 (UTC)

Support these sorts of redirects for anything where there is no distinction made when writing by hand as opposed to using Unicode (meaning that I'm hesitant to support #10, though I don't oppose it either). I'm not sure I support #19, at least for common emojis, due to the fact that some have slightly different connotations even if they are very similar. It would be very helpful, BTW, to include the list of all variations of an emoji in all entries for them, just as that page you link to does, since the way they are perceived can change depending on what version is used. Andrew Sheedy (talk) 01:58, 13 October 2016 (UTC)
Apparently, the separate "start", "end", etc. Arabic letter varieties only exist for compatibility purposes, for this reason I'm under the impression that these redirects would be fine. (correct me if I'm wrong)
If people want to keep a separate entry for each emoji, that could probably work, too. But there are just too many for the same things, in my opinion. --Daniel Carrero (talk) 18:32, 13 October 2016 (UTC)
I oppose the systematic creation of redirects. --WikiTiki89 15:23, 13 October 2016 (UTC)

I'm thinking of creating a vote for a new page called WT:Single-character redirects and starting with only the ones about Latin script letters and punctuation, then adding the others later if other people agree. We can link CFI to the new page. The specific redirecting rules that were already voted and approved in Wiktionary:Votes/2011-06/Redirecting combining characters and Wiktionary:Votes/2011-07/Redirecting single-character digraphs can be kept in the new page. --Daniel Carrero (talk) 18:32, 13 October 2016 (UTC)

One thing I wish to add: If we find 3 citations with "!!", are we citing the codepoint "DOUBLE EXCLAMATION MARK" ("") or two exclamation marks together ("!!")? I agree with Andrew Sheedy's remark about "these sorts of redirects for anything where there is no distinction made when writing by hand as opposed to using Unicode". When the distinction only exists in the text encoding as opposed to being an actual linguistic distinction, I believe it's a good idea to create redirects.
In my opinion, having a separate entry for (fullwidth A) is like having another entry for italic A, other for boldface A, other for sans-serif A, etc.
In my opinion, having a separate entry for 🄐 does not make a lot of sense either, because it's basically a SOP of A and ( ). 🄐 is an entry whose meaning can be perfectly understood from the sum of its parts, and it appears to currently exist only because Unicode has a codepoint for it. --Daniel Carrero (talk) 19:03, 14 October 2016 (UTC)
should be treated like ligatures such as . In fact, I don't even think we should have entries for them at all. --WikiTiki89 21:03, 14 October 2016 (UTC)
If my proposal of creating certain redirects as described above passes, then the single-character can be redirected to !!, but if the latter should not exist, then the redirect could point to !. Per WT:CFI#Repetions, we are able to have a lot entries with repeated letters like suuure, so for consistency I believe it makes sense to create a few entries for repeated punctuation marks like !! and !!!. --Daniel Carrero (talk) 21:51, 14 October 2016 (UTC)
Minor comment: if you'd like to save smallcaps that mean something specific in phonetic transcription (#4), that logic immediately axes the redirecting of superscript letters (#3), which are regularly used in phonetic transcription for "overshort" sounds, such as the release of an affricate or a diphthong. (I'm also not sure how many smallcaps that aren't used in phonetic transcription there are, but it's not many.) --Tropylium (talk) 01:47, 16 October 2016 (UTC) 
Thank you for the comment. Yes, I support keeping separate entries for all superscript letters that have a separate meaning. Apparently, some superscript numbers like ¹ and ² already have separate meanings and thus merit separate entries under that logic.
But, I would suggest converting the entry ³ into a redirect. The 1st definition is "superscript three", which is not a semantic, meaningful definition, it just describes the typography of the glyph. The 2nd definition is "cubed", whose meaning is taken from 3 + Appendix:Superscript. Any superscript number (or letter) is cubed, so it's not a special meaning of "3". In my opinion, having an entry ³ meaning cubed is like having an entry ×3 meaning "times 3". --Daniel Carrero (talk) 03:22, 16 October 2016 (UTC)

To suggest implementing the point #2 of this proposal, I created Wiktionary:Votes/2016-10/Redirect fullwidth and halfwidth characters. --Daniel Carrero (talk) 13:39, 21 October 2016 (UTC)

Terms attributable to a particular sourceEdit

Does anyone object to adding this into the category hierarchy? Perhaps someone can think of a better wording? I mean it to house "___ terms coined by _____" categories ( so far only Latvian has such categories, but I'm thinking of making a {{coined by}} template to automate such a categorization ) together with "___ terms coined in the Simpsons/the Economist/Usenet" or whatever else we feel the need to categorize. Thoughts? Crom daba (talk) 13:41, 14 October 2016 (UTC)

Excellent idea! That said, I would limit the categories for individual coiners to a predefined list of people and add the rest to a catch-all category, otherwise we will end up with a lot of terms coined by _____ categories with only one or two terms. — Ungoliant (falai) 12:07, 18 October 2016 (UTC)

Correct use of templatesEdit

This edit has the desired visible effect. Please advise whether it is the correct use of templates, or whether the result ought to be accomplished in some other way. Mihia (talk) 21:05, 14 October 2016 (UTC)

I think {{term-label}} is meant to be used after the headword. For alternative forms, we actually have a dedicated template {{alter}}. --WikiTiki89 21:09, 14 October 2016 (UTC)
Thanks, {{alter}} does not specially understand the parameter "non-Oxford" or do the links or the brackets. I can almost replicate the "term-label" results like this, but I'm not sure if this may be more of an abuse of templates than just using "term-label". Also, the brackets come out in italics which they strictly shouldn't. Mihia (talk) 10:45, 15 October 2016 (UTC)
We could create Module:en:Dialects and create abbreviations for {{alter}} to recognize, the way Module:hy:Dialects and Module:grc:Dialects do for those languages. Otherwise, I'd just use {{q|non-Oxford British English}}; that way the parentheses aren't italicized but the contents are. —Aɴɢʀ (talk) 11:02, 15 October 2016 (UTC)
OK, thanks, I plan to use canonise/canonize as a model for definition-merged -ise/-ize entries, in terms of layout and labelling, so if there's anything you or anyone else finds unsatisfactory about the way those entries are currently set out, please say. Mihia (talk) 12:46, 15 October 2016 (UTC)
@Angr: I was recently surprised by the fact that there isn't a module for English dialects yet! I think I'll start it. — Eru·tuon 02:46, 16 October 2016 (UTC)
@Erutuon: That's fine, but wouldn't it be better to have {{alter}} call on the same dialect list that {{lb}} and {{alternative form of|from=}} already call on, i.e. Module:labels/data/regional? It seems redundant to have two lists. —Aɴɢʀ (talk) 07:27, 16 October 2016 (UTC)
@Angr: The same could be said about Module:grc:Dialects: I have added some Greek dialect labels from there to Module:labels/data/regional. I agree that it's redundant, but I'm not sure how to solve the problem. — Eru·tuon 18:09, 16 October 2016 (UTC)
@Erutuon: Couldn't {{alter}} be rewritten to use Module:labels/data/regional instead of, or in addition to, Module:XXX:Dialects? —Aɴɢʀ (talk) 09:16, 18 October 2016 (UTC)

@Mihia: I added Oxford to Module:en:Dialects, but wasn't sure what Wikipedia article to link to for non-Oxford. If you'd like to add that label, go ahead. — Eru·tuon 03:02, 16 October 2016 (UTC)

@Erutuon:. Thanks. I believe that "non-Oxford" should link to https://en.wikipedia.org/wiki/Oxford_spelling. I don't understand in practice how what you have done affects the entries canonise or canonize. I would be grateful if you could explain exactly what needs to change in those articles to take advantage of it, or make the edit yourself if you prefer. Mihia (talk) 20:39, 16 October 2016 (UTC)
@Mihia: The labels in Module:en:Dialects can be used in the template {{alter}}, by adding a blank parameter after the word, and then adding the label in the next parameter (or multiple labels, one per parameter). I did that at canonize. — Eru·tuon 20:49, 16 October 2016 (UTC)
@Erutuon: I see, thanks. Would it be possible to change the text to read Non-Oxford British English, to be consistent with other templates, and also to put this label in brackets? In other words, for the appearance of the whole thing to be like the version here? Mihia (talk) 21:01, 16 October 2016 (UTC)
@Mihia: Yes. Go ahead and edit the display part of the data['Non-Oxford'] label in the en:Dialects module. That will change the link text for the label. — Eru·tuon 21:04, 16 October 2016 (UTC)
Ooops, I'm not sure about the brackets part. That falls into the realm of the {{alter}} template and Alternative forms module. I think brackets are not used because for Ancient Greek and other languages that are transliterated, this would result in two consecutive parentheses – for instance, ὄνῠμᾰ ‎(ónuma) (Aeolic, Doric) – which is ugly. There may be a solution, but it would be more complicated... — Eru·tuon 21:09, 16 October 2016 (UTC)
As Erutuon has pointed out, there is another long-standing request for brackets at Template talk:alter. Anyone fancy doing this? It is beyond my skill level. Mihia (talk) 19:33, 18 October 2016 (UTC)

Italics in Project-Link TemplatesEdit

Some time ago, User:DCDuring suggested that an |i= parameter be added to these templates so links to taxonomic names could be italicized according to standard practice for such names. That was never done, but he started adding it to the wikitext in entries in anticipation of someone getting around to doing it eventually. This wasn't a problem, though, since templates ignore undefined parameters. Then User:CodeCat decided to convert these templates to use a Lua module. Even that would be fine, since Lua adds useful capabilities that can be exploited later. She also incorporated Module:parameters, which lets you specify which parameters can be used in a template. That's where things went off the rails. Aside from maybe half a dozen unrelated errors, all of the 462 entries currently in Category:Pages with module errors are due to the previously-ignored |i= parameter.

CodeCat had said that the luafied versions would work exactly like the un-luafied versions to start with. I don't know about you, but 450+ module errors seems like a big difference to me. It's not that Module:parameters is inherently evil, but in this case I would suggest it's been misused. Sure, I found a small number of typos that needed to be fixed, but that could have been done without breaking over 450 entries.

As I see it, we have four main options, in order of ease of implementation:

  1. Do nothing. Not recommended, because this has replaced links to Wikipedia, Wikispecies and Commons with alarming red error messages in close to 460 entries, and it's hard to spot real errors among the three pages of these errors.
  2. Change the templates to have Module:parameters ignore the |i= parameter
  3. Implement the |i= parameter. That's what I would recommend, because the codes governing taxonomic names explicitly say that at least genera and species should be italicized, and the system overhead is negligible.
  4. Remove the |i= parameter from all 450+ entries. Why? They were added in good faith and could serve a useful purpose.
  5. Pork. Sorry, I just wanted to see if anyone was paying attention.

What would everyone prefer? Chuck Entz (talk) 01:14, 15 October 2016 (UTC)

Option 3 sounds like the best to me too. If there's some kind of obstacle to option 3, then option 2 seems like an acceptable temporary fix. —Mr. Granger (talkcontribs) 01:25, 15 October 2016 (UTC)
  • I for one am willing to go for the pork option, if pork is a misspelling of fork. That would mean substituting a non-Lua template for {{pedia}}, {{specieslite}}, and {{comcatlite}} that supported italics in those entries that needed it, genus and subgeneric entries. At some future date we could add the ability to de-italicize portions of the taxa that should not be italicized like "subsp.", "subg.", "var.", etc. I am fairly sure that CodeCat will never work on that feature, having bigger fish to fry. I don't like my fish fried anyway; it's bad for my heart. DCDuring TALK 02:02, 15 October 2016 (UTC)

I did option 2 (it's a trivial line of code), I tried doing option 3 but I think it better that someone who knows what he/she's doing handle it properly. Crom daba (talk) 02:50, 15 October 2016 (UTC)
@Crom daba: I looked at the code for Module:wikipedia, and then at Module:links, and finally at Module:script utilities. Those are the modules that it uses to create links to Wikipedia. Apparently italics and bold are not supported (see § tag_text), so there is no way to make the links to Wikipedia articles on species names be italicized without removing boldface. That's quite irritating... — Eru·tuon 02:05, 16 October 2016 (UTC)
Those modules are of "our" own creation. The lack of functionality is a self-inflicted wound. DCDuring TALK 15:29, 16 October 2016 (UTC)
Thanks to @CodeCat:, the change has been made. A look at [[Argentina#External links]] will show another class of interproject links that make a fork of the templates worth considering. A WP entry like w:Argentina (plant) or a commons link like [[commons:Category:Argentina (Rosaceae)}} have mixed character formatting. They should be Argentina (plant) and Argentina (Rosaceae) respectively. In general the taxonomic authorities prescribe that only a genus and subgeneric names and epithets should appear in italics in text.
They also prescribe that such items should appear in ordinary typeface when embedded in italic running text, so templates that force italics or regular text force the appearance of such taxa to deviate from the prescription. We might not care about prescriptions, but these prescriptions usually followed in scholarly works and often in popular science and nature books. DCDuring TALK 18:11, 18 October 2016 (UTC)
I created a function to apply correct italics to species, subspecies, and variety names on Wikipedia (in w:Module:eFloras, which is used in a reference template). A similar thing could be created here – though the parameter that triggers it would have to have a name besides |i= (|genus=, |subspecies=, etc. or something else?). It could detect the abbreviations subsp., ssp., var., and f., and words in parentheses, all of which should not be italicized, and then apply italics to everything but them. Either that, or we manually enter link text with correct italics in every case. Thus, it would take Cupressus arizonica var. glabra and display it as Cupressus arizonica var. glabra, and the previously mentioned Argentina (plant) would display correctly as Argentina (plant). The parameter |i= could still be used in cases where the whole title should be italicized. — Eru·tuon 19:55, 18 October 2016 (UTC)
Cool. That's exactly the kind of thing I was hoping for. Some questions remain in my mind:
  1. Is it worthwhile to attempt to provide for a piped alternative formatting for whatever cases that the logic you have implemented on WP doesn't cover every situation we find? Have you found any exceptions at WP?
  2. We might discover other taxon elements that should not be italicized, eg, "morph.", perhaps "×". Would this be updateable by altering data in a Module?
  3. Should this be implemented here within the project-link system or with separate templates and/or modules for the big-box and inline versions the templates for pedia, species, and commons?
None of these are likely insurmountable and all but the last may be ignorable. DCDuring TALK 21:04, 18 October 2016 (UTC)
@DCDuring: I haven't encountered any exceptions in the logic that I used in w:Module:eFloras, but it's unlikely there will be any, because the floras and lists that the eFloras template creates references to are all plants, and there is a fair amount of regularity in botanical names. For instance, all plant families end in -aceae, so the module searchs for that and makes sure it's not italicized. The logic has to be different here on Wiktionary, because more than just plant names are involved, and the automatic italicization would have to be explicitly turned on in links to species, subspecies, variety, etc. pages and not turned on for links to family pages, which aren't ever italicized.
It would be easy to add or remove elements to the module, if we discover any more that should not be italicized. There should just be a list of testcases somewhere (with one example each of genus, subspecies, form, etc.) that we can look at to check that the code is doing what it's supposed to.
I'm not familiar with the structure of these interwiki link templates and modules, but I think I'll start work on a module that does the simple task of this automatic italicization, which can then be used in whatever interwiki link modules require it. — Eru·tuon 23:14, 18 October 2016 (UTC)
I greatly appreciate your undertaking this. DCDuring TALK 23:22, 18 October 2016 (UTC)
Module:italics is now complete and seems to work. It has an array of things that shouldn't be italicized, and it's pretty easy to add another one if you think of any. The documentation page has a set of testcases that show how the module handles some of the un-italicized elements that we talked about. If there are no problems, someone can add this function to the interwiki link modules; not sure what the parameter that turns it on should be called, though. — Eru·tuon 01:19, 19 October 2016 (UTC)
My choice: "taxi=". DCDuring TALK 11:09, 19 October 2016 (UTC)
That might be fine, though I wonder: would there be any examples of page names with parentheses that should be italicized but are not taxonomical? I was thinking maybe the titles of plays, but I couldn't find any that are linked to. — Eru·tuon 15:53, 19 October 2016 (UTC)
Examples: plays like Antigone (Sophocles play), which are named after characters and therefore have a disambiguator. Or Richard II (play), which is named after a real person. These currently don't have Wiktionary entries, or they are not mentioned in definitions (see Antigone; there's no entry on Richard II) yet, but perhaps they will be in the future, and then they would have to be italicized in the same way as genus or species names. — Eru·tuon 16:57, 19 October 2016 (UTC)
For a broader reference for the parameter name how about "seli", for "selective italics"? DCDuring TALK 00:22, 20 October 2016 (UTC)

Use of Google hit countsEdit

Frequently I see people quoting Google hit counts -- the numbers that come up at the top of the first page of results -- as if they were an exact measure of how many times a word or phrase is used on "the Internet". Unless and until someone can give a clear explanation of how these numbers are generated, they need to be treated with the greatest caution or scepticism. A common behaviour is that a largeish number (say 10,000) comes up, but a much smaller number (say 200) appear to actually exist in the sense that a web page containing that exact word or phrase can be retrieved. There is a variable cutoff point at which Google's "page through the results" runs out, even for search terms that obviously will genuinely have very large numbers of results, but the way this works is entirely opaque. It is not hard to find egregious examples. For example, a search for "a it of" (in quotes) supposedly yields 2,510,000 results, of which 98 are actually retrievable. A search for "goodgrief" supposedly yields 5,690,000 results, of which virtually every one of the retrievable set is actually "good grief" and not "goodgrief". Mihia (talk) 03:22, 16 October 2016 (UTC)

Indeed, what I usually do is click several times "to the right", on results page 10, then 15, etc., to see whether a much smaller number appears. For google:"goodgrief", clicking to the right ultimately lands me on Page 16, where it says "Page 16 of about 146 results". I have to admit that we have to be careful with these numbers.
Where possible, it is better to use Google Ngram Viewer to get frequency numbers. This works not only for English but also for Spanish, German, Russian, Italian, French and Hebrew.
Czech is not in Google Ngram Viewer, so either we have to make do with the Google numbers, or use an academic corpus of Czech. --Dan Polansky (talk) 05:29, 16 October 2016 (UTC)
  • Number of hits also doesn't tell you anything about meaning. Something with 500 hits may have 490 hits as a username and 9 as a brand name. Renard Migrant (talk) 12:27, 18 October 2016 (UTC)
I agree with the above sentiment about result count not conveying meaning, so they are only valuable to the extent that common sense is applied to their consideration. Some notes about the result count number: the number is an estimate unless the rc=1 parameter is used, even then it will only be actual up to one million results. Also, the fact that results beyond page 16 or so do not display is that Google only returns 1000 results for any search, the fact that they aren't displayed does not mean they do not exist. In cases where there are even fewer than 1000 despite a large number it is likely at least in part due to the condensing process that results are put through after the ranking which combines and removes some "duplicate" results. - TheDaveRoss 13:10, 18 October 2016 (UTC)

Nonstandard spellingsEdit

FYI, I have started Wiktionary:Requests_for_deletion/Others#Template:nonstandard_spelling_of. I think the template should only exist if we have a reasonably clear idea of what we mean by "nonstandard"; I don't have such a clear idea. --Dan Polansky (talk) 06:35, 16 October 2016 (UTC)

How to categorise zero-derivations of English verbs from nouns, etc.?Edit

A zero derivation is one that does not lead to any changes in the word, the part of speech is just switched over without any changes. English uses zero derivation a lot, so it's especially important there. However, we don't currently categorise or otherwise mark such derivations. In fact, the majority of such cases are lacking an etymology altogether. I'm wondering how we can best handle zero derivations more explicitly. When it comes to suffixes, we categorise by suffix, so the nature of the derivation is generally clear from the nature of the suffix (and if there are multiple homographic suffixes, we can disambiguate with id=). It's clear that -ify creates verbs, for example, while -age makes nouns. For zero derivations this is less clear, so we should probably add a part-of-speech qualifier to the category name. So embrace would get a second etymology which would categorise into something like Category:English noun zero derivations. We'd presumably create a template for the occasion too. —CodeCat 17:26, 17 October 2016 (UTC)

Deleting user talk pagesEdit

I propose prohibiting [admins from] deleting user talk pages, especially, their own one unless [the user proves] it is very necessary or a link to the archive is clearly shown on their talk page.

Talk pages are the best and fastest way to study a user, to see what they are up to or have been up to, what their expertises are, how communicative they are, what usergroups they belong to etc. Also, talk pages usually do not contain garbage but rather discussions that may be useful to others to read in order to not ask the same question. --Dixtosa (talk) 17:35, 19 October 2016 (UTC)

  • Support: Non-admins should be allowed to see the history of all user's talk pages, except for graffiti and outing edits. Purplebackpack89 19:34, 19 October 2016 (UTC)
  • Support --Daniel Carrero (talk) 19:36, 19 October 2016 (UTC)
  • I would even go so far as to say that the pages should not be emptied either. All the discussions should be either on the page or archived on a subpage. This would ensure that all the content is searchable. --WikiTiki89 19:40, 19 October 2016 (UTC)
  • Support. I've expressed support for this before, at Wiktionary:Information desk/Archive 2014/January-June#Geequinox. Archiving talk pages seems unobjectionable, and I'm even okay with emptying them, but I think deleting them should be avoided. —Mr. Granger (talkcontribs) 19:46, 19 October 2016 (UTC)
There is currently a page deletion reason "userspace page deleted on user's request". Would the use of this be restricted? Would users have to provide evidence that the page "should" be deleted at some kind of forum? Equinox 19:47, 19 October 2016 (UTC)
This deletion reason is for userspace sandboxes, notes, etc. Deleting them is fine, in my opinion. It does not apply to talk pages. --Daniel Carrero (talk) 19:49, 19 October 2016 (UTC)
There is a difference between a userspace page and a user-talk-space page. --WikiTiki89 19:49, 19 October 2016 (UTC)
I think I have deleted user talk pages when marked with this reason by users. I think others have too. So needs to be clarified. Equinox 19:50, 19 October 2016 (UTC)
Ok, I should have said that there should be such a distinction. --WikiTiki89 19:51, 19 October 2016 (UTC)
  • Oppose Admins need to have the authority to delete whatever they think needs deleting. SemperBlotto (talk) 19:53, 19 October 2016 (UTC)
    Except the main page. --Daniel Carrero (talk) 19:54, 19 October 2016 (UTC)
Oppose. A contributor’s user[talk?]space is his castle, and as long as it’s not harming the project or other users, I don’t see a problem with allowing him to bulldoze and rebuild one of the castle’s towers. Concerning some of your points:
  • Talk pages can be used to study users: which is why users who don’t want to be probed should be given the option of getting rid of their talk page.
  • Talk pages contain useful content: if a discussion is important for Wiktionary, it should take place or be archived in a public page, not in someone’s talk page.
Ungoliant (falai) 20:09, 19 October 2016 (UTC)
I've been archiving my talk page for years, under the impression that it's what everybody does, and that people would want to read it. If it turns out we can delete our talk pages, I'm thinking of maybe deleting mine. --Daniel Carrero (talk) 20:26, 19 October 2016 (UTC)
@Ungoliant MMDCCLXIV I think it's generally held around here, including by you, that users must consent to allowing their actions to be "probed" as a condition of participating. It's less "bulldozing and rebuilding the castle's towers"; that would be just blanking the page. Deleting the page and its history is more akin bulldozing the castle's towers while demanding that all record the towers ever existed be burned. Purplebackpack89 03:23, 20 October 2016 (UTC)
Actions in public pages must be kept for probing. — Ungoliant (falai) 11:04, 20 October 2016 (UTC)
Talk pages are public pages. When they aren't deleted, anyone can read them. Purplebackpack89 14:01, 20 October 2016 (UTC)
By public, I mean they are not inherently connected to an individual user. — Ungoliant (falai) 15:11, 20 October 2016 (UTC)
A lot of discussions start from someone's talk page and only very few of them reach to the "public pages".
Overall, I think it is far more important for Wiktionary that each user can have some idea about any other user and can access to useful information than the right to have own talk page deleted for some lame reasons. --Giorgi Eufshi (talk) 16:27, 20 October 2016 (UTC)
  • Support. I find that talk pages are helpful for figuring out who to talk to about what. Andrew Sheedy (talk) 01:50, 20 October 2016 (UTC)
  • Support: Let us preserve discussions at least in page histories. And I find it especially troublesome when admins delete talk pages of banned users. Let transparency reign supreme. --Dan Polansky (talk) 13:23, 22 October 2016 (UTC)


If this proposal passes, what to do when a user deletes their own talk page? Would another person restore it and perhaps archive it for them? --Daniel Carrero (talk) 14:51, 22 October 2016 (UTC)

restore and admin-protect it. --Dixtosa (talk) 15:07, 22 October 2016 (UTC)
And how do non-admins post to an admin-protected talk page? Chuck Entz (talk) 00:46, 23 October 2016 (UTC)
I have seen this option in protection summaries "move=only admins". I thought there would be "delete=only stewards". Dixtosa (talk) 05:49, 23 October 2016 (UTC)
The system gives us only two types of protection: against editing and against moving. We have a choice of what level of user we can protect against, but those are the only two actions. I don't know about stewards, but bureaucrats have no special powers beyond the ability to add or remove privileges- the ability to block, protect and delete comes from being an admin, not a bureaucrat. I'm sure stewards have all of the above on any wiki they go to and globally, but I don't know if they can set protections against admins- I think they would have to remove a given admin's ability to delete anything rather than being able to protect a given item from deletion by all admins as a class. As I understand it, it took action by the developers to institute admin-proof protections when there was a dispute at de-WP a while back. Chuck Entz (talk) 08:10, 23 October 2016 (UTC)
By delete, do you mean delete, or do you mean erase the contents of? I don't think there's a big problem in the latter case, as long as the page history is still accessible. Andrew Sheedy (talk) 00:36, 23 October 2016 (UTC)
I don't see the problem. How is this different from any other case of an admin abusing his deletion power to delete something out of process? --WikiTiki89 13:39, 24 October 2016 (UTC)

Durability of CFI for Google groupsEdit

Hi everyone,

I have an opinion but I am not sure it is valid enough so I wanted some input from fellow Wiktionarians to check my reasoning. This is my problem: as far as I know to attest a word it suffices to use a citation from a Usenet group - the rationale is that Google archives it "durably". And I was thinking if that is so, then this also applies to words that appear in Google Groups but not necessarily in the Usenet. I suppose Google archives Google Groups also "durably" so the word is also durably archived and then it should satisfy that part of our strict Criteria for Inclusion. Does anyone have an opposing opinion? My intention is to include a word here and propose it as a FWOTD candidate but I'm not sure it passes the "durability" part of CFI. Hope to hear from you soon, cheers all, --biblbroksдискашн 19:41, 19 October 2016 (UTC)

I'm not sure that it's specifically Google's archives that make Usenet durably archived, but rather because Usenet is archived independently by various organizations, and Google just happens to provide a convenient searchable interface. This is much the same as Google Books, the books aren't durably archived by Google, but by libraries; Google just gives us convenient searchable access. --WikiTiki89 19:45, 19 October 2016 (UTC)
Then wouldn't the wording "[...] this naturally favors media such as Usenet groups, which are durably archived by Google.[..]" (at Wiktionary:ATTEST) be somewhat incomplete? IMO it should read similar to "... which is durably archived by Google and various other organizations." Otherwise it gives an impression of Google's archives as a sole contributor to the "durability" part of the criterion. I am not sure if this is of much importance but I remember some discussions few years ago about these "durably"/"permanent recorded media" stuff. Or was it something like that? OTOH, I much more vividly remember a dispute over Request for Verification with a contributor whose argument was that Google searches weren't a proper way to base a word's attestation. At that time I was busier with winning the "RFV contest" of one particular phrase than deciphering the true meaning of CFI even to myself let alone to that editor. If I understood the durability more properly it might have helped. The editor eventually received a perma-ban. --biblbroksдискашн 20:32, 19 October 2016 (UTC)
The best search term for Wiktionary discussions about this is "durably archived".
We value Google for its convenient online access. The print corpora they have (Books, News, Scholar) are durably archived because the recorded documents are of print media which are physically archived somewhere, not necessarily convenient of access, eg, Otago Daily Times is probably in multiple NZ libraries, but few, possibly none, elsewhere. An analogous situation exists with respect to Usenet: multiple archived copies, but only Google offers convenient access. DCDuring TALK 00:52, 20 October 2016 (UTC)
@biblbroks: You are right: the CFI text is misleading. We are still in the process of cleaning up CFI. The whole durably-archived business is something of a gray area. --Dan Polansky (talk) 13:17, 22 October 2016 (UTC)

Speaking of deleting user pages...Edit

Should the deletion reason "Userspace page deleted on user's request" be changed to "Userspace page deleted on owner's request" (or similar)? It just struck me, seeing I'm-so-meta delete DTLHS's tracking page that Wonderfool had completed, that it might sound as though any user could request deletion of any other user's page. Equinox 21:25, 20 October 2016 (UTC)

Probably. --WikiTiki89 21:30, 20 October 2016 (UTC)
I changed it now. I don't think we could actually use "deleted on user's request" as a reason to let one person delete another user's page, but having even excess clarity shouldn't hurt. --Daniel Carrero (talk) 21:52, 20 October 2016 (UTC)
Thanks. (If they are the page owner, why don't they have the right to delete it on demand?) Equinox 00:35, 22 October 2016 (UTC)
Do you mean that maybe a non-admin should have a "Delete" button for their own userpages? There's a Wikipedia policy saying that they wouldn't want to implement that, because a person in bad faith could move pages to their userspace and then delete them. I think there were more reasons, too, that I don't remember right now. --Daniel Carrero (talk) 01:22, 22 October 2016 (UTC)
I was really talking about your anti-deletion comment in "Deleting user talk pages" above. But only as a speculative aside. Equinox 01:47, 22 October 2016 (UTC)
IMO, people should be completely free to delete their user pages that are not talk pages. At one point, I tried to delete my own talk page on that wiki called explain xkcd, but someone reverted it and said that it is not allowed. I'd prefer if you and other people never deleted their user talk pages, but I'd feel weird reverting or archiving others' talk pages if they want them deleted, and it's certainly not a blockable offense, so even if we implemented the rule that nobody can delete their own talk pages, I'm not sure if we would just ignore you or other people if they don't want to comply. --Daniel Carrero (talk) 02:14, 22 October 2016 (UTC)

Editing the introduction of WT:EL - PronunciationEdit

Mainly to address some complaints in Wiktionary:Votes/pl-2016-07/Pronunciation 2, I'd like to edit the first sentence of WT:EL#Pronunciation.

This is a minor edit, and I'm thinking this won't need a vote. @Dan Polansky, it seems you are usually the first person to defend having votes to edit policies, not counting myself. Do you think that this needs a vote?

Current text:

Ideally, every entry should have a pronunciation section, with the phonetic transcription and an audio file. Note that pronunciations may vary widely between dialects.

  • The region or accent ({{a|GA}}, {{a|RP}}, {{a|Australia}}, et al.) is first if there is regional variation, followed by the name of the transcription system, then a colon, then the transcription. It is preferable to use an established transcription system, such as enPR or IPA (see Wiktionary:Pronunciation key for an outline of these two systems). Phonemic transcriptions are normally placed between diagonal strokes (/ /), and phonetic transcriptions between square brackets ([ ]).

Proposed text:

The pronunciation section includes the IPA transcription, audio pronunciations, rhymes, hyphenations and homophones.

  • The region or accent ({{a|GA}}, {{a|RP}}, {{a|Australia}}, et al.) is first if there is regional variation, followed by the name of the transcription system, then a colon, then the transcription. It is preferable to use an established transcription system, such as enPR or IPA (see Wiktionary:Pronunciation key for an outline of these two systems). Phonemic transcriptions are normally placed between diagonal strokes (/ /), and phonetic transcriptions between square brackets ([ ]). Note that pronunciations may vary widely between dialects.


  • Removing "Ideally, every entry should have a pronunciation section, with the phonetic transcription and an audio file.".
  • Adding "The pronunciation section includes the IPA transcription, audio pronunciations, rhymes, hyphenations and homophones."
  • Moving "Note that pronunciations may vary widely between dialects." to the end of the first bullet point.


  • Arguably, the "every entry should have ..." part " is useless clutter. We don't want that statement in every section "ideally, every entry should have an etymology section" and such. (as pointed out in the vote)
  • The "every entry should have ..." part is false. There are some languages that shouldn't have a pronunciation section (e.g. sign languages, which have a Production section instead, and many ancient languages whose pronunciation is unknown). (as pointed out in the vote, too)
  • Arguably, the part "Note that pronunciations may vary widely between dialects." fits better the text that is explaining the IPA transcriptions, instead of the introduction.

--Daniel Carrero (talk) 21:56, 20 October 2016 (UTC)

I oppose explicit mention of IPA. Also, I don't see the point of saying "Note that pronunciations may vary widely between dialects." Why not just drop it? --WikiTiki89 22:00, 20 October 2016 (UTC)
Sure, we can change "IPA transcription" to just "transcription". I agree with your second point, too: we can just drop the "Note that pronunciations may vary widely between dialects." I always like when we're able to remove any statements from WT:EL that are comments rather than regulations. --Daniel Carrero (talk) 22:08, 20 October 2016 (UTC)
@Daniel Carrero: The meta-principle states that "Any substantial or contested changes require a VOTE". The proposed change is not very substantial and is so far uncontested. While I prefer to always have a vote, a vote does not seem required by the meta-principle in this case. But if you want to use this BP discussion as a basis for changing WT:ELE, you should wait a couple of days before you change WT:ELE to allow other people to provide input. --Dan Polansky (talk) 14:57, 21 October 2016 (UTC)
OK, sounds good to me. If nobody objects, I'll do the change without a vote, then, after waiting some time. --Daniel Carrero (talk) 01:39, 22 October 2016 (UTC)

Taking Wikitiki89's first message into consideration (which I support), the exact proposed text is going to be this:

The pronunciation section includes the transcriptions, audio pronunciations, rhymes, hyphenations and homophones.

  • The region or accent ({{a|GA}}, {{a|RP}}, {{a|Australia}}, et al.) is first if there is regional variation, followed by the name of the transcription system, then a colon, then the transcription. It is preferable to use an established transcription system, such as enPR or IPA (see Wiktionary:Pronunciation key for an outline of these two systems). Phonemic transcriptions are normally placed between diagonal strokes (/ /), and phonetic transcriptions between square brackets ([ ]).

--Daniel Carrero (talk) 01:59, 24 October 2016 (UTC)

ASCII vs. Unicode apostrophes in French entriesEdit

User:Angr edited d'où to use Unicode apostrophes instead of plain ASCII apostrophes in various places. What's the desired behavior here? Our entries are named using ASCII apostrophes so I think we should stick with ASCII apostrophes. Benwing2 (talk) 23:30, 20 October 2016 (UTC)

I don't see any harm in it. We use all kinds of things like macrons and accents in headwords that we don't include in the entry name- this is just an extension of that. It may even be beneficial in cases where the search engine doesn't recognize the plain character and the fancy character as the same thing: it means that spellings with both the plain and fancy character are in the entry for the search engine to find. Chuck Entz (talk) 02:26, 21 October 2016 (UTC)
It would be ideal if Unicode apostrophes replaced ASCII ones across all French entries for consistency. I don't much like the idea of some entries have one type and some having the other. Andrew Sheedy (talk) 02:37, 21 October 2016 (UTC)
I always change them back to straight apostrophes, as these are the ones that appear on keyboard. They sometimes get corrected in Microsoft Word to curly ones but that's it. Renard Migrant (talk) 12:49, 21 October 2016 (UTC)
I've been under the impression for years that our usual practice is to use typewriter apostrophes in entry titles and curly apostrophes in headword line displays, not just for French but for all languages that use apostrophes. I understand the rationale behind using typewriter apostrophes in entry titles, but curly apostrophes look prettier, so I prefer using them for display whenever possible. What should definitely always be the case, though, is for there to be a hard redirect from the curly version to the typewriter version, e.g. d’oùd'où, because French Wiktionary always uses curly apostrophes, and the only way to make sure our entries link correctly to theirs is for en:d'où to link to fr:d'où, which hard-redirects to fr:d’où, which links to en:d’où, which hard-redirects back to en:d'où. —Aɴɢʀ (talk) 13:19, 21 October 2016 (UTC)
’ in entries seems to be incredibly rare, making me think there is no such unofficial policy. Renard Migrant (talk) 17:28, 22 October 2016 (UTC)
Personally, I would be quite pleased if a bot went through and replaced all ASCII apostrophes with Unicode ones (aside from actual page names). I agree with Angr's reasoning. Andrew Sheedy (talk) 00:33, 23 October 2016 (UTC)

Vote: Removing label proscribed from entriesEdit

FYI, I created Wiktionary:Votes/2016-10/Removing label proscribed from entries. Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 19:43, 21 October 2016 (UTC)

template:it-conj-ire / morireEdit

Morrebbero does not seem to be showing up in the conjugation as an alternative to morirebbero. (3rd person plural conditional.)— Pingkudimmi 03:29, 22 October 2016 (UTC)

Removing: "No topic should have a new vote more than once a day (24 hr period)."Edit

I'd like to make Wiktionary:Voting policy a full-fledged policy at some point, instead of just a think tank. One thing I'd like to remove is this:

"No topic should have a new vote more than once a day (24 hr period)."


What does this rule even mean? If a person creates a vote about Greek romanization today, can't I create another vote about Gothic romanization on the same day, because it's the same "topic"? Does anyone want that? If not, let's just remove that rule.

--Daniel Carrero (talk) 13:07, 22 October 2016 (UTC)

You're right about the need for change. The period for no additional votes on the same topic (broadly construed) should be 30 days. DCDuring TALK 14:06, 22 October 2016 (UTC)
Or to simplify these determinations, we should limit the number of proposals a single individual could make to, say, one a month or one a week. As we already have a policy against sock puppets, that way, at the very least, someone with many proposal would at t least have to find stooges to introduce additional votes. DCDuring TALK 14:10, 22 October 2016 (UTC)
I'm currently the only one above a certain number of votes created per month. Personally, I don't think I would like to find stooges to create votes for me, thank you -- sometimes I do the opposite, creating votes for proposals that were introduced by other people. I wish more people created votes for things that need to be voted. Anyway, what you talked about could be seen as variations of the rule that I proposed to remove, but that would be a long shot. It's easier just to remove it and then discuss and introduce whatever other proposals concerning how many votes may be created, which is something that we discussed recently. --Daniel Carrero (talk) 14:21, 22 October 2016 (UTC)
Yes, I certainly see the advantage to you of getting inertia on your side. DCDuring TALK 14:31, 22 October 2016 (UTC)
What is the advantage to me of getting inertia on my side? --Daniel Carrero (talk) 14:45, 22 October 2016 (UTC)
I agree that it should be removed. But I also think that when a single person creates too many votes in a week, that is not so good and should be avoided. --Dan Polansky (talk) 14:19, 22 October 2016 (UTC)
Strong discouragement has proven insufficient to avoid the creation of too many votes by an individual. DCDuring TALK 14:31, 22 October 2016 (UTC)
By contrast, I saw Daniel Carrero respond to disagreement by reducing the number of created votes a lot. My rule of thumb is that there should never be more than 10 votes running and listed, and that is where we have been lately. In any case, "at most one new vote a day" is too lenient if applied to a single person since that would be 30 votes a month. The wording should go since it does not do anything useful anyway. --Dan Polansky (talk) 14:55, 22 October 2016 (UTC)

Procedural note: I would like to remove "No topic should have a new vote more than once a day (24 hr period)." from Wiktionary:Voting policy without a vote. Rationale: This is a think tank policy and as such, I believe there's some leniency in editing it with abandon. By contrast, the whole section "Voting eligibility" was voted and approved at Wiktionary:Votes/2010-04/Voting policy and I would probably oppose changing any regulations in that section without a vote. --Daniel Carrero (talk) 15:03, 22 October 2016 (UTC)

Votes created on the whims of a single editorEdit

Instead of imposing any kind of limits, why don't we simply have a rule that there needs to be a discussion in which at least a few editors express their desire to have a vote on the topic? That way no vote can be created merely on the whims of a single editor. --WikiTiki89 13:41, 24 October 2016 (UTC)

At the moment, we have this related but more lenient rule in Wiktionary:Voting policy: "Votes should not usually be called for on Wiktionary:Votes. They should be the result of prior discussion located elsewhere (the Beer Parlour)."
As you may remember, I oppose this: "why don't we simply have a rule that there needs to be a discussion in which at least a few editors express their desire to have a vote on the topic".
Personally, I have this political position: The right to create new votes is a fundamental right, and anyone must be free to create new votes if they want, with or without prior discussion. (except maybe this: people who aren't eligible to vote probably should not be able to create votes -- but we may want to cross that bridge when we come to it) In practice, probably most voted proposals require discussions as you can't just propose a new thing without everyone being on the same page, but formally requiring discussions for all votes would, in some cases, be a bureaucratic hindrance, or maybe a more serious problem.
  • If a new vote is most certainly going to pass, no new discussion must be created to check if a vote is needed. Chances are, the proposal was already discussed before at some point, and the old discussions may be linked in the vote.
  • @Dan Polansky created this vote about the "def" template a few months ago. From what I understood, he did it because if the vote did not exist, people would probably be adding {{def}} to new entries without the vote. Sometimes, passing votes are needed to show consensus for a new thing and failed votes are needed to show lack of consensus for another thing. We don't need to create discussions to prove that there is lack of consensus for a new thing; rather, the proposers of the new thing need to show consensus on their side.
  • Votes to remove clutter from policies (like this) or formalize what we already do (like this) don't need, or barely need a discussion to check if the vote can be created.
If a new discussion is created proposing a new vote but nobody or few people bother to answer, that is good enough to me and we can create the vote, rather than insisting for people to reply. If a new non-discussed vote is created but which should have been discussed, the vote is probably going to fail anyway, and with comments about what exactly is wrong with the vote -- which probably takes about the same effort as replying and pointing problems at a pre-vote discussion. Sometimes, even when a vote that was discussed a lot before its creation, people point out new problems listed while the vote is ongoing, so pre-vote discussions are not a guarantee of creating perfect votes.
In all cases, even when no new discussion is required to check if a vote can be created, a new BP discussion may be created alongside the new vote, to inform people that the vote exists.
As long as we are fundamentally able to create votes if we want to, I'm fine with having restrictions like this: a maximum number of votes created per person during a certain period of time. --Daniel Carrero (talk) 17:06, 24 October 2016 (UTC)
Why is creating votes without prior discussion a fundamental right? If you're the only one who wants to vote on something, how do you expect it to even pass? We have the right to not have our time wasted by bad votes. What's so hard about starting a discussion and asking about whether we need a vote? --WikiTiki89 17:28, 24 October 2016 (UTC)
I said: "probably most voted proposals require discussions". Do you believe that 100% of votes created require pre-vote discussions? Where's the disagreement? What are the bad votes you are talking about? --Daniel Carrero (talk) 17:40, 24 October 2016 (UTC)
Suppose we say that the right to start a BP discussion on an issue is a fundamental right. If there is no support for something in a BP discussion, there isn't much point in going to the trouble of starting a vote. bd2412 T 17:43, 24 October 2016 (UTC)
Yes, I agree with you, but on most votes, not all votes. In my message from 17:06, 24 October 2016 (UTC), I listed some cases that in my opinion, should be exceptions. Do you think that 100% of votes created require pre-vote discussions? --Daniel Carrero (talk) 20:20, 24 October 2016 (UTC)
Exceptions would be procedural votes like bot and adminship approvals. The ones you listed above should not be exceptions in my opinion. --WikiTiki89 20:32, 24 October 2016 (UTC)
I agree that Wikitiki's are good exceptions and that Daniel's are not. DCDuring TALK 20:57, 24 October 2016 (UTC)
That's OK. Should we create a vote with a proposal like "Votes may only be created as a result of a discussion in which at least a few people supported creating a vote, except for nominations of bots, administrators, and bureaucrats.", to be added at Wiktionary:Voting policy? --Daniel Carrero (talk) 23:15, 24 October 2016 (UTC)
"At least a few" needs to be more specific. DCDuring TALK 23:27, 24 October 2016 (UTC)
Maybe 3 people? @DCDuring, would you like to create a vote with the proposal: "Votes may only be created as a result of a discussion in which at least 3 people supported creating a vote, except for nominations of bots, administrators, and bureaucrats."? Or maybe another number of people?
I may create the vote if people want, even though I'd vote oppose. I'm okay with either requiring 3 people to start this vote (for consistency?) or just creating it at once. Please, do whatever you want. The idea is not mine, I'm just offering to help with implementing others' ideas, which is something I like to do often. But I would also oppose implementing this requirement somehow without a vote, because it's a serious limitation on the ability to create further votes. Not to mention that it sounds like a bad idea to me, because it's needlessly bureaucratic and I don't see what problem it fixes, but I guess I can live in a system that I disagree with if it will make others happy. --Daniel Carrero (talk) 04:52, 25 October 2016 (UTC)
Votes are usually attended by more people over a longer period of time than BP discussions which always show the same dozen names and do not usually survive the end of a month. Votes are a way to create consensus for things which people just don't care about, to force a hand. Binding such a method which might be the last escape from an utter lack of input to the presence of input seems like introducing a bug to our system. Korn [kʰũːɘ̃n] (talk) 13:07, 25 October 2016 (UTC)
I agree with Korn. --Daniel Carrero (talk) 17:32, 25 October 2016 (UTC)
@Daniel Carrero re: "would you like to create a vote with the proposal...?" No. I'd like to see if there are others who agree and think it's worth a vote. I think that, by itself, it is not worth a separate vote. If we added some other "common-sense" reforms to our voting process, like quora, there might be something worth having folks stop adding and improving content and instead evaluate the proposal and its elements, considering how the elements work together. I'd also like to save my proposal-of-the-week for something better that might come along. DCDuring TALK 17:50, 25 October 2016 (UTC)
@DCDuring: you mentioned quora. Do you mean, requiring a minimum number of participants in an ongoing vote, in order to successfully close the vote? --Daniel Carrero (talk) 18:11, 25 October 2016 (UTC)

Old Provençal or Old Occitan?Edit

OK, I already brought this up but it bears repeating in light of the dubious category CAT:Catalan terms derived from Old Provençal. The terms in this category are largely inherited terms and express the completely wrong notation that Catalan derives from Old Provençal. The intent was clearly to derive Catalan from Old Occitan, but even then I think this is wrong. This brings up two issues:

  1. Can we please rename Old Provençal to Old Occitan?
  2. What's the "old" language that Catalan derives from? We don't seem to have "Proto-Gallo-Romance".

Benwing2 (talk) 17:04, 22 October 2016 (UTC)

We treat Old Provençal as a synonymous name for Old Occitan, and Provençal as a synonymous name of Occitan. And the two related categories are... Old Provençal and Occitan. Yeah. There was a decision on this many years ago, I dunno, 2010, and Old Provençal won out. There is a Category:Old Catalan language and I've seen references to Gallo-Romance rather than Proto-Gallo-Romance. Like, some say that the Oaths of Strasbourg are written in Gallo-Romance rather than Old French (but that's another debate). Renard Migrant (talk) 17:25, 22 October 2016 (UTC)
As a user I would find it highly preferable if the ancestor of X, and X alone, was Old X and not Old Y. Korn [kʰũːɘ̃n] (talk) 17:45, 22 October 2016 (UTC)
How is that supposed to work when Old X is the ancestor of multiple modern languages? Old Irish is the ancestor of Irish, Scottish Gaelic, and Manx. Old English is the ancestor of English and Scots. Old Norse is the ancestor of some 10 languages, not one of which is called "Norse". —Aɴɢʀ (talk) 08:51, 23 October 2016 (UTC)
But Provençal is no-way, no-how the same as Occitan. Provençal is a dialect of Occitan, as are Languedocien, Auvernhat, Gascon, etc. As for Old Provençal vs. Old Occitan, Wikipedia and AFAIK all modern scholarly sources use "Old Occitan" for the basic reason that the language is ancestral to all of the modern Occitan varieties (except maybe Gascon), and is not specifically an old version of Provençal. IMO we need to change the terminology. Anyone else agree? Benwing2 (talk) 18:35, 22 October 2016 (UTC)
I agree. —CodeCat 01:10, 23 October 2016 (UTC)
I support renaming pro Old Occitan. —Aɴɢʀ (talk) 08:51, 23 October 2016 (UTC)
Oh yes, consensus can change and I support it for broadly the same reasons (though I'm not massively clued up on Occitan vs. Provençal). Renard Migrant (talk) 11:27, 23 October 2016 (UTC)
I too agree. Leasnam (talk) 17:15, 25 October 2016 (UTC)
Old Provençal was the historic name, and was still slightly more common as of 2008 (and in general Provençal was several times more common than Occitan). Wikipedia also says Provençal was the older name, but it lemmatizes Occitan and Old Occitan, saying "in the English-speaking world, the term Provençal has historically also been used to refer to all of Occitan, but is now mainly understood to refer to the variety spoken in Provence." Perusing Google Books, it does seem like "Old Occitan" is more common in the most recent books (2010-2016). To add clarity as to the scope of pro and add consistency between the names of oc and pro, I'd support renaming it to "Old Occitan", though my feelings on the matter are not strong. Keep the old name as an alt name, obviously. - -sche (discuss) 21:02, 25 October 2016 (UTC)

AWB accessEdit

I have used AWB on English Wikipedia, and there are a couple of tasks here on Wiktionary that I really don't want to do by hand: adding syllable breaks in the sequence IPA(key): /iə/ when it's not a diphthong and correcting some {{R:Smyth}} references. — Eru·tuon 01:34, 23 October 2016 (UTC)

Done. Benwing2 (talk) 01:46, 23 October 2016 (UTC)
I used search, but I still don't know what AWB is. Since @Erutuon: thinks it would help with syllable breaks, and hence counting syllables, I'd like to know what it is. Where can I look ? Bcent1234 (talk) 13:42, 25 October 2016 (UTC)
@Bcent1234: AWB stands for AutoWikiBrowser, and it's described at Wiktionary:AutoWikiBrowser. Actually, it can't exactly help with counting syllables. Module:syllables can count syllables, when someone adds it to Module:IPA. I was using AWB to add syllable breaks to /iə/, which is listed as an English diphthong in Module:syllables, because it is a diphthong in New Zealand English, but is a disyllabic sequence in most other dialects (and in most of the existing IPA transcriptions on Wiktionary). Anyway, this was a complicated explanation. If a syllable break isn't added, words containing the disyllabic sequence /iə/ would be counted as having at least one less syllable than they actually have. In short, AWB can't exactly help with counting syllables, but it can help to modify IPA transcriptions so that the syllable-counting module will work correctly. — Eru·tuon 16:59, 25 October 2016 (UTC)

Comment about the "Request categories" voteEdit

Wiktionary:Votes/2016-07/Request categories is going to end in 4 days. Current results: 8 supports, 4 opposes, 3 abstentions. Total: 15 people voting.

I believe that we should simply close the vote at 23:59, 27 October 2016 (UTC) as scheduled, instead of postponing the vote, because it was already created as a 2-month vote and the current turnout is pretty good. With 15 people voting, it's unlikely that a lot more people would vote even if we postponed it. That said, ongoing votes that are very close to 66.6% support are noticeably unpredictable. Currently, the vote would pass, but 1 new "oppose" could cause it to fail. --Daniel Carrero (talk) 02:16, 24 October 2016 (UTC)

Joconde or La Joconde? Seine or La Seine? Cap or Le Cap?Edit

La Joconde is the French name of the Mona Lisa. Formerly this name sat under Joconde, but the headword displayed la Joconde (lowercase, although the French Wikipedia article capitalizes La in La Joconde even in the middle of a sentence). So far I've adopted three different solutions for similar instances:

  1. I hard-redirected Joconde to La Joconde.
  2. I left Seine as-is.
  3. I changed Cap to use {{only in|Le Cap|lang=fr}} (a hard redirect wasn't possible because there was also an English defn of this term).

What's the correct way of handling these cases? I don't like the current solution of having the headword disagree with the page name. Benwing2 (talk) 04:49, 24 October 2016 (UTC)

The way you don't like is the way the Irish editors have agreed to list country names in Irish, which almost always have the definite article. For example, the entry name for the Irish word for "France" is Frainc, but the headword line says An Fhrainc. —Aɴɢʀ (talk) 13:03, 24 October 2016 (UTC)
Le Touquet (for which we have no entry) is actually the name of the place Le Touquet, so « aller à Paris » but « aller au Touquet » (for non-French speakers, see au). Not sure about Le Cap as I hadn't heard of it. Head word disagree with the page name is not always wrong and it's used in English entries as well. La Joconde I have no idea and I'd have to research it, instinctively the article isn't part of its name but perhaps on researching it it will turn out to be. Renard Migrant (talk) 21:13, 24 October 2016 (UTC)
We have The Hague and there are probably more similar names. (I've added "Le Touquet" by the way.) SemperBlotto (talk) 01:56, 25 October 2016 (UTC)
p.s. A Google ngrams search of "Mona Lisa,the Mona Lisa,The Mona Lisa" shows it to be used with the definite article about half the time. SemperBlotto (talk) 01:59, 25 October 2016 (UTC)
(e/c) In English we are equally inconsistent:
  • We have The Hague, where Hague has a "see also The Hague".
  • But we have the Gambia under Gambia, with headword "Gambia" and The Gambia a hard redirect (whereas Wikipedia has the country under w:The Gambia).
  • Yet we have the Netherlands under Netherlands with headword "the Netherlands" (not "Netherlands").
  • Finally, for river names, we have e.g. Rio Grande with headword "Rio Grande" and no mention anywhere of the fact that it is normally "the Rio Grande"; similarly for Thames.
Arguably the different treatment of rivers stems from the fact that most rivers in English are preceded by "the" whereas cities, states and countries usually aren't. The English example suggests we ought to have the Seine in French under Seine with headword "Seine" and similarly for other French rivers, and maybe the same for countries, since rivers and countries in French normally take le/la/les. Benwing2 (talk) 02:25, 25 October 2016 (UTC)
The article is not an inseparable part of these terms though. You could say "The second Rio Grande estuary". Here, the article modifies "estuary", and there's also an adjective in between. —CodeCat 17:09, 25 October 2016 (UTC)
This test isn't probative. You can say "the second Hague tribunal" even though we generally agree that "The Hague" is the name of the city. Benwing2 (talk) 20:17, 25 October 2016 (UTC)
Then The Hague is different, and is actually an inseparable unit, unlike Rio Grande. —CodeCat 20:28, 25 October 2016 (UTC)
I don't understand what you're saying. Are they different simply because someone says they're different? They both behave syntactically the same. Benwing2 (talk) 20:45, 25 October 2016 (UTC)
But they're not the same, and this is one instance where they aren't. You might say "The Rio Grande conference" but not "The Hague conference", you actually say "The The Hague conference". The article is an inseparable part of The Hague, it's not actually syntactically an article. —CodeCat 20:47, 25 October 2016 (UTC)
You don't say "The The Hague conference". That would be rather strange. Do a Google search on "a Hague" and "the second Hague" and "the only Hague" and you'll see what I mean. Benwing2 (talk) 22:17, 25 October 2016 (UTC)
"the the hague", with quotes, gets almost 10000 hits on Google. —CodeCat 22:22, 25 October 2016 (UTC)
And "the the rio grande" gets 258,000. What does that prove? Benwing2 (talk) 23:12, 25 October 2016 (UTC)

Proposal: don't show brackets around transliterationsEdit

Currently, transliterations are shown with brackets around them. I propose to remove the brackets, which gives a cleaner look with less visual clutter. —CodeCat 22:39, 24 October 2016 (UTC)

They are? Where? абелево кольцо- where are the brackets? DTLHS (talk) 02:06, 25 October 2016 (UTC)
They are written between parentheses, though. The entry bracket asserts that parentheses count as brackets. --Daniel Carrero (talk) 02:11, 25 October 2016 (UTC)
current display (clunky):

Cognate with ... Ancient Greek γένεσις ‎(génesis) (English genesis)


Cognate with ... Ancient Greek γένεσις, génesis (English genesis)

no brackets in headword:

γένεσις génesis f ‎(genitive γενέσεως); third declension

I think in some cases it would be very nice to have the option to display it without brackets or parentheses. That would be helpful when you want to put a non-Latin-script term in a parenthesis, or put another parenthesis directly after a transliterated term. For an example of the latter, here's something I'm editing right now, the etymology section of gēns. Looks clunky having two parentheses right after each other. Much better, I think, if the transliteration is separated from the original by a comma.
But I think in most cases it's fine to have brackets: for instance, if the Greek term in the example is just in a list with items separated by commas, and doesn't have another word parenthesized after it. — Eru·tuon 05:14, 25 October 2016 (UTC)
Also, I would support removing parentheses from around the transliteration in headwords. I think it looks fine if the original script is separated from the transliteration by a bullet. — Eru·tuon 05:23, 25 October 2016 (UTC)
To me the proposal is both less clear and harder to read. I'm strictly against it. Korn [kʰũːɘ̃n] (talk) 12:58, 25 October 2016 (UTC)
I prefer keeping the parens, they make it clear that the "main" word is written in its native script and that the transliteration is secondary information. --Daniel Carrero (talk) 14:16, 25 October 2016 (UTC)
I also prefer keeping the parens, except perhaps in the headword line. "γένεσις • ‎génesis f" doesn't look that bad to me, but in running text the parens really need to be there. —Aɴɢʀ (talk) 14:54, 25 October 2016 (UTC)
Agreed with Angr. Benwing2 (talk) 16:13, 25 October 2016 (UTC)

Suggestion: Edit the abbreviation policyEdit

May I create a vote to edit what we have to say about definitions of abbreviations in WT:EL#Abbreviations, a subsection of WT:EL#Definitions? I intend to do it eventually if people are OK with it, I'm not in a hurry.

Current text:

The “definitions” of entries that are abbreviations should be the expanded forms of the abbreviations. Where there is more than one expansion of the abbreviation, ideally these should be listed alphabetically to prevent the expanded forms being duplicated. The case used in the expanded form should be the usual one — do not capitalise words in the expanded form of an abbreviation that is made up of capital letters unless that is how the expanded form is usually written.

Where the expanded forms are entries that appear (or should appear) in Wiktionary, wikify them. Expanded forms that are encyclopedic entries should also be wikified and linked to the appropriate Wikipedia entry. When the expanded form does not merit an entry of its own, either in Wiktionary or Wikipedia material, wikify its component words and give a gloss (italicised, in parentheses) after the expansion explaining what the term means (see SNAFU for an example).

See PC for an example entry.

Proposed text:

For abbreviations, acronyms and, initialisms (Examples: PC, SNAFU), the definitions usually use templates linking to their expanded forms. For example, one of the senses in the entry PC may be "Initialism of personal computer." Do not capitalise words in the expanded form unless that is how the expanded form is usually written. (in the previous example, don't write "Personal Computer") Where the expanded form is an entry that exists (or should exist) in Wiktionary, link to it. Otherwise, if an appropriate Wikipedia article exists, you may link to it. When the expanded form does not merit either a Wiktionary entry or a Wikipedia article, link it to its component words. You may expand the definition with a gloss if appropriate.

Rationale and changes:

  • Concerning the 1st sentence of the original text:
    • Replacing "abbreviations" by "abbreviations, acronyms, and initialisms", a more complete list.
    • Removing the quotation marks around "definitions". They are actual definitions.
    • Mentioning that these entries usually use templates.
    • Removing "should be the expanded forms of the abbreviations". In a few entries, like LOLWUT, the sense may be a non-gloss definition and the abbreviation may be in the etymology. In most entries, the definition is "abbreviation of X Y Z".
    • It may be unnecessary, but I'm adding an explanation of what an "expanded form" is.
  • Concerning the 2nd sentence of the original text:
    • Removing it completely. I don't think we should usually bother listing the senses alphabetically, or should we?
  • Concerning the 3rd sentence of the original text:
    • Full rewrite, with an example added of what is meant by incorrect capitalizing words.
  • Other changes:
    • Rewriting the difference between linking to Wiktionary entries and Wikipedia articles.
    • Making it clearer that Wiktionary has entries and Wikipedia has articles.
    • Removing the explanation that glosses are "italicised, in parentheses", because the template is going to deal with formatting, and people may see other styles if they edit their personal CSS pages.
    • Mentioning the abbreviation examples (PC and SNAFU), together in the same line. The original text had them in separate lines.
    • Minor edits, including word replacements like "appear [...] in Wiktionary" -> "exists [...] in Wiktionary" and "wikify" -> "link".
    • Rewriting clutter to make the text shorter without removing any rules, unless otherwise stated.

Would you change anything? Please let me know. --Daniel Carrero (talk) 04:41, 25 October 2016 (UTC)

Why don't {{m}} and {{l}} support etymology-only languages?Edit

Most of the errors in CAT:E are because Lombardic was made an etymology-only language, and the etymology or descendants sections use {{m}} or {{l}}. Why don't these support etymology-only languages? {{inh}}, {{der}}, {{bor}} and {{cog}} all do. Benwing2 (talk) 21:17, 26 October 2016 (UTC)

There are no entries for an etymology only language, so how could you link to them? DTLHS (talk) 21:18, 26 October 2016 (UTC)
Because in theory it would be no different from using the parent language code. With {{cog}} et al. there is a difference, in that the name displayed is that of the etymology-only language rather than of the parent language. I think I suggested to User:CodeCat before that for consistency {{m}} and {{l}} should accept etymology-only languages, but CodeCat disagreed. --WikiTiki89 21:27, 26 October 2016 (UTC)
Example. Renard Migrant (talk) 22:58, 26 October 2016 (UTC)
That's what you're "supposed" to do. --WikiTiki89 23:03, 26 October 2016 (UTC)
Perhaps in such situations it makes sense for it to work, but I'm not sure. I like the idea that codes are unique, and that you can't use multiple codes interchangeably for the same thing. —CodeCat 00:02, 27 October 2016 (UTC)