Open main menu

Category for all lemmas againEdit

Previous discussion: Wiktionary:Beer parlour/2014/January#A category for all words or lemmas in a language

The previous discussion seemed to have general support, so I would like to make this change, but there are a few details I'd like to ask about first. We can either have just a category for all lemmas, but nothing else changes, or we could split off all "form" categories into their separate tree and have another category for non-lemmas (which may not be all that useful in the end?). A third option would be to have a category for lemmas alongside a category for all terms in a language regardless of lemma status. However, this last option could also be achieved by mentally merging the lemma and non-lemma categories, so this does not have much added value over the second option. —CodeCat 11:34, 1 July 2014 (UTC)

I prefer the idea of having a per-language category with all words (not just lemmata/headwords), rather like the way the Official Scrabble Words is presented. When I've used Index:English in the past — of course, it's years out of date now — I've wished it had all words and word forms. Equinox 13:25, 6 July 2014 (UTC)
I think we should have both: one category for all words, and another category for all lemmata. Fr.Wikt and De.Wikt already have categories for all words in each language. Both categories would have many uses. A category of all words would be useful for scrabble players, and for finding entries in the event that we needed to (a) make some change to all words in a certain language, or (b) examine all words in a certain language to see which of them met a certain criterion (e.g. used an acute accent, if we decided that they were all actually supposed to use a macron). (On De.Wikt I used to use the "all words" categories to look for words I didn't recognize, check Google Books and other dictionaries for them, and 'RFV' them if necessary.) A category of all lemmata would be useful for finding words to alliterate, and would also probably be more useful for any other practical purpose, for highly inflected languages where inflected forms would otherwise swamp the lemmata. Both categories would allow Wiktionary to be used like a paper dictionary, where all words can be seen in alphabetical order regardless of POS. - -sche (discuss) 16:22, 6 July 2014 (UTC)
Would a category for all lemmas, and another for all non-lemmas also be ok? That way, you could still look through all words, by searching through both categories. —CodeCat 16:34, 6 July 2014 (UTC)
No, I think there are advantages to having a category that already contains all words, vs having to merge two categories oneself. And I don't actually see a benefit to having a category for all non-lemmata at all, besides that it might provide a more up-to-date count of "form[-of] definitions" than WT:STATS does.
It's also worth noting that a category for all words will be simpler on a philosophical level, and presumably also on a technical level, to implement than a category for all lemmata, because for the "lemmata only" category we will have to wrestle with questions like: are Template:alternative spelling ofs lemmata? Are Template:standard spelling ofs lemmata? What scalable way is there to know which category to use for entries that only contain {{head|foo}} with no POS set? What scalable way is there to know which category to use for entries like messages (q.v.)? Etc, etc. Whereas, anything with {{head|en}} can go into the "all words" category. - -sche (discuss) 16:58, 6 July 2014 (UTC)
It's more that if we have a category for lemmas and all words, then every lemma in every language will have two more categories added to it. If we split them, it will only be one. As for the question of what is a lemma, I think it's relatively simple: if it would probably be listed as a lemma in a paper dictionary, we would do the same. My intention was to create separate category trees for lemmas and non-lemmas, Category:English lemmas and Category:English non-lemma forms. The former would contain Category:English nouns, Category:English verbs etc, while the latter would have Category:English plurals, Category:English verb forms and so on. I would consider an alternative spelling a lemma, because it is the lemma form of a word, and would presumably be found in a paper dictionary with a "see (other lemma)" notice. —CodeCat 17:31, 6 July 2014 (UTC)

FWIW, De.Wikt and Fr.Wikt both use their equivalents of Category:English language as their "all words in English" categories. We could either follow that model, or come up with a separate category, like Category:English words. - -sche (discuss) 16:58, 6 July 2014 (UTC)

I just came across this discussion from a link from NFE (July 23's entry). That page says "As {{head}} needs its second parameter to be specified for this to work, this should always be specified if possible. All remaining entries that are still missing this parameter are placed in Category:head tracking/no pos and will need fixing.". Is there consensus to that effect? I don't see any on this page, at least, nor at [[WT:RFM#Split subcategories of Category:English parts of speech between Category:English lemmas and Category:English non-lemma forms]] (current link), and the January BP discussion linked to above has, as its last point, CodeCat saying we should have more input before going ahead with it: so here's mine. It doesn't make sense to me: form-of entries (many of them, anyway) have always gotten along just fine with {{head|langcode}} as their headword line and a categorizing template as the definition line, obviating the need for the headword to categorize also. If we want terms to be directly in the non-lemma category, besides being in the e.g. past-tense-form category, then the e.g. past-tense-form-of template can so categorize: it makes more sense to require some dozens of template edits now than some thousands of entry edits now followed by making everyone work extra on each entry. Moreover, it's unwise to require users to write duplicate information in each entry. Pinging CodeCat and kc_kennylau, who've discussed this (see the January BP link above).​—msh210 (talk) 19:17, 29 July 2014 (UTC)

I should clarify that I support the categorization, and I support the use of {{head}} to so categorize. I object only to requiring a second parameter in {{head}}.​—msh210 (talk) 19:18, 29 July 2014 (UTC)
The way it works right now is fine. I don't think it is ever right to omit the second parameter, but I don't think there should be a error if it is. A cleanup category is enough. --WikiTiki89 19:22, 29 July 2014 (UTC)
Well for starters, certainly not all form-of templates categorise and there have been many difficulties with the ones that do. {{plural of}} was cumbersome to use for example as the category it added was often inappropriate, so we removed the category from it some time ago. Of course some of them still categorise, but who is going to remember which ones do and which one's don't? I'm not. So I think the headword line should always add a category just to remove any ambiguity as to whether a category is needed. I support the exact opposite that you do: the form-of templates should categorise less, not more. I see no reason why requiring the second parameter on {{head}} would be a problem. —CodeCat 19:25, 29 July 2014 (UTC)

A plea for more scrupulous patrollingEdit

Rather sloppy content has been slipping through RC patrol recently. I have found some through second-hand monitoring pages like Special:UncategorizedPages and Special:Shortpages. Apparently User:SemperBlotto has been inactive lately, which means that someone else has to do what he has been doing. I urge all sysops and patrollers to visit Special:RecentChanges more often.

On request, I can grant rollback and patroller rights to trusted regulars. Keφr 06:56, 2 July 2014 (UTC)

Rollback and patroller rights AFAIK have in the past been done at WT:WL, requiring two admins' input, not one.​—msh210 (talk) 05:55, 6 July 2014 (UTC)
Given the lack of interest, this question is kind of academic anyway, but: Wiktionary:Beer_parlour/2013/October#Purplebackpack89 Rollback request. And for some (if not most) users listed at Special:ListUsers/rollbacker the rollback or patroller right has been granted without any process at all (just because Stephen sees someone undo a lot of edits). Of course, for me an autopatrolled flag (which is granted at WT:WL with input from two admins) is a prerequisite here. And given that I am announcing this in public, and it can be undone in case someone disagrees with my judgement, I think it should not pose a problem. Keφr 06:49, 6 July 2014 (UTC)
Sounds good to me.​—msh210 (talk) 07:40, 6 July 2014 (UTC)
@Kephir Please make me a patroller. I don't promise anything, but becoming the patroller will create the temptation for me to actually patrol. Let the patroller flag be removed from me as soon as anyone disagrees. --Dan Polansky (talk) 09:00, 6 July 2014 (UTC)
Granted. Keφr 09:09, 6 July 2014 (UTC)
Wait, Dan doesn't have the mop? If he doesn't, he should. Purplebackpack89 15:02, 6 July 2014 (UTC)
He shouldn’t be an administrator if he still can’t deal with editors peacefully. I don’t think that he’s merited patroller rights either. --Æ&Œ (talk) 18:59, 6 July 2014 (UTC)
To be clear, I am hardly a big fan of Dan's, but given his, shall I say, very critical attitude to other people's editing, I doubt he is going to abuse the "mark as patrolled" button too much. About the rollback button, I am less sure. Keφr 20:08, 6 July 2014 (UTC)
AEOE, If dealing with editors peacefully is a criteria for adminship, there are some admins who should have their mops taken away. Purplebackpack89 22:49, 6 July 2014 (UTC)
Too bad SemperBlotto drove away all the new users that could have picked up the slack :P Kaldari (talk) 08:38, 13 July 2014 (UTC)

I'm willing to help too by becoming a patroller please. I have a certain experience of this task, with more than 50,000 rereadings on the French Wiktionary and around 3,000 editions here. JackPotte (talk) 17:56, 12 August 2014 (UTC)

Converting WT:Information desk to monthly pagesEdit

Moved from Wiktionary:Grease pit/2014/July#Converting WT:Information desk to monthly pages

Can we do this now? The last time this was proposed there was some contention that new users might be confused by the monthly pages system and post things to the wrong page. However, I cannot recall a single such incident, so this seems to be a non-issue. Shall we switch WT:ID to the monthly page system as well? The benefits are quite obvious.

Keφr 21:14, 1 July 2014 (UTC)

There are plenty of examples of people (and in some cases the "+ (add section)" button itself—see [1]) getting confused and mistakenly posting to the main page rather than the monthly subpages, e.g. [2] and [3]. (There are also examples of people posting to the wrong monthly subpage.) However, I no longer feel that this is much of a problem. - -sche (discuss) 22:44, 1 July 2014 (UTC)
Is this worth even asking the question. The page doesn't get very big and shows no signs of growth AFAICT. DCDuring TALK 23:31, 1 July 2014 (UTC)
Yes it does. --WikiTiki89 23:35, 1 July 2014 (UTC)
I think all these were submitted while MediaWiki:Common.js was broken. So assuming it will not break too often, we are rather safe. Keφr 05:13, 2 July 2014 (UTC)
This is a WT:BP question now, since we know we are technically capable of it. --WikiTiki89 22:49, 1 July 2014 (UTC)
Yes, I see. The method used for BP would work for ID, but not for request pages without further complications. DCDuring TALK 00:28, 2 July 2014 (UTC)

It looks like -sche is for it, DCDuring has been convinced(?), and Wikitiki89 seems kind of supportive. One more supporter and if no one objects I go with it. Keφr 13:32, 5 July 2014 (UTC)

  • I oppose this. The benefit that I can see is no need to archive the page anymore, but the page is low-profile enough that archiving is not really a problem. The subpaging seems less intuitive than having a single page. --Dan Polansky (talk) 15:52, 5 July 2014 (UTC)
    • I think one of the reasons it is "low-profile" is that nobody wants to visit it, because it is so annoyingly large. (See WT89's diff above.) Keφr 16:01, 5 July 2014 (UTC)
      • I don't think so; the information desk is a rather unimportant page, especially compared to Beer parlour, so it gets low traffic; nothing to do with the size. As an aside, you said "I knew I could count on you." in the edit summary. If you want to say such things, be enough of a man and put them in the discussion, or, better yet, drop that juvenile behavior. --Dan Polansky (talk) 16:37, 5 July 2014 (UTC)
  • How about automating the current archiving method by (docs)? There will need to be slight (and probably good, I'd say) changes, though; the month headings need to go, and archiving will be done section by section, not all sections in a period at once. For an example, see ArchiverBot working on [4]. I can volunteer to run it, if there is interest. Whym (talk) 10:53, 6 July 2014 (UTC)
    • Not workable in my opinion. We have rather few bots, and for all I know, there is no one who can afford to run a bot full-time. And even if, they would probably prefer it to handle mainspace tasks. Also, I never liked Wikipedia-style archives. With monthly pages, you know that if you started a thread in one place, it stays there, unless expressly moved. Keφr 13:34, 6 July 2014 (UTC)
      • Just to clarify, I am an operator of the archive bot for two other wikis. It costs almost nothing to me to add one wiki. Whym (talk) 16:05, 6 July 2014 (UTC)
        • And who will replace you if you stop running it? (Which is another problem here. High rotation and little staff.) Keφr 16:43, 6 July 2014 (UTC)
          • (Just responding to "if you stop running it" for the record, not objecting to the other concerns Kephir and -sche noted) My bot uses Tools Lab. [5] Co-maintainers are welcomed. It could also be useful for archiving user talk pages. Whym (talk) 03:09, 7 July 2014 (UTC)

{{look}} Asking User:Æ&Œ, User:Equinox, User:-sche, User:Angr, User:Stephen G. Brown for further input. (Anyone else is also welcome.) Keφr 16:43, 6 July 2014 (UTC)

  • I have no strong opinion on the issue one way or the other. —Aɴɢʀ (talk) 19:13, 6 July 2014 (UTC)
  • I also have no strong feeling about monthly subpages. In the past, I opposed converting the Information Desk to subpages, out of concern for teh noobs, but as evidenced by my comment above, I no longer feel that people posting to the main page rather than the subpages is much of a problem, given how easy it is to move threads. The suggestion that a bot could archive threads on an individual basis is interesting, but the number of pages on which that might conceivably be useful is small (BP, GP, ID, ?TR?), and I think the benefit Kephir notes (of knowing that if you started a discussion on the July subpage, that's where it's staying) outweighs the small potential benefits of per-thread archiving. - -sche (discuss) 19:29, 6 July 2014 (UTC)
  • I don’t have a strong feeling about it. It gets very little traffic, so I don’t think it matters either way. —Stephen (Talk) 03:29, 7 July 2014 (UTC)
    • In the last archived batch, ID had 16 threads per month on average. Which seems rather typical, and is not that small in my opinion. The Etymology Scriptorium often has fewer topics.
    Anyway, what we have here seems to be three "welllll, sure, if you want to" (WT89, -sche, DCD), one oppose (DP), and two strong lacks of opinions (Angr, Stephen). I am going to convert it now. Revert me if you give a shit. Keφr 09:38, 9 July 2014 (UTC)

New Word of the Day feedEdit

Featured Feeds for Word of the Day are now available: rss, atom. If you have a suggestion to better format the feed, I'd like to help implementing. Otherwise, enjoy. :) Whym (talk) 15:30, 2 July 2014 (UTC)

I just set up a FWOTD feed, when should I expect it to appear? Also, it would be nice if the feed item contained the actual word for its title. I already know how to set that up, but it requires running a bot over WOTD/FWOTD pages, which I am too lazy to do right now (basically, the same way we solved the problem with context templates). Otherwise, wooooooo! Keφr 16:00, 2 July 2014 (UTC)
Feed names need to be added on the server side; see gerrit:136316. Should FWOTD be added for all Wiktionaries or only for English Wiktionary? Whym (talk) 11:19, 3 July 2014 (UTC)
I have no knowledge of other projects having a FWOTD. Keφr 11:21, 3 July 2014 (UTC)
Ok, I have made the request in bugzilla:67563. Whym (talk) 11:02, 6 July 2014 (UTC)
And it has been resolved: [6][7] Whym (talk) 09:32, 11 July 2014 (UTC)

Recent "Tbot" entriesEdit

I've been finding a few entries here and there that are tagged with the {{tbot entry}} template that date to 2013 and 2014. They had redlinked categories, and I created a few of those categories using the {{tbotcatboiler}} template before I realized that these were for new entries.

Not that I have anything against the type of entries Tbot used to create, but if we're going to be doing this sort of thing again, we should change the documentation so we're not listing someone who's no longer here as the contact, and talking about how things are different now that it's 2007. Chuck Entz (talk) 07:44, 6 July 2014 (UTC)

If these entries are not by Tbot… where do they come from? Keφr 13:36, 6 July 2014 (UTC)
See this. One user making a few. I'd ask User:Liuscomaes. --Type56op9 (talk) 00:37, 9 July 2014 (UTC)



Does the {{deprecated}} headband is still available on that template? The template seems to be very used and no replacement is proposed. — Automatik (talk) 15:48, 7 July 2014 (UTC)

The replacement is to use a real part-of-speech header like "noun" or "verb". —CodeCat 16:06, 7 July 2014 (UTC)
What would be the replacement for IANAL? Keφr 16:08, 7 July 2014 (UTC)
Did you even look at the entry? :) —CodeCat 16:09, 7 July 2014 (UTC)
Oh, me stupid. Previous time I checked, the header was "Acronym". But truth is, even "Phrase" does not seem very fitting. Keφr 16:13, 7 July 2014 (UTC)
Well, in any case, the replacement is whatever header you would use for the fully spelled out form. So if "I am not a lawyer" is a phrase, then so is this. If not, then this needs to be changed, but I don't know what into. —CodeCat 16:15, 7 July 2014 (UTC)
And for the categorisation? {{en-noun|-}} doesn't seem to be correct for Mbps, neither {{en-noun}} because there is no inflection for this word. — Automatik (talk) 17:12, 7 July 2014 (UTC)
{{en-plural noun}}? (Which I still think to be a stretch.) Keφr 17:15, 7 July 2014 (UTC)
I don't think, because we can say 1 Mbps. — Automatik (talk) 17:19, 7 July 2014 (UTC)
I guess it's safe to say that it stands for both "megabit per second" and "megabits per second". --WikiTiki89 17:33, 7 July 2014 (UTC)
{{en-noun|Mbps}}? Keφr 17:35, 7 July 2014 (UTC)
Thank you, I used it. — Automatik (talk) 14:43, 8 July 2014 (UTC)
Realistically I don't think this template will ever be orphaned because it always needs human intervention. That is, a bot can't tell if it's a 'noun', a 'verb', (etc.) so a human editor is always needed. Meanwhile the template is still being used in new entries. But in principal 'noun', 'verb' (etc.) offers more information to the user, while things like 'acronym' should be in the etymology, as 'acronym' explains how the word was formed in the first place. Renard Migrant (talk) 10:13, 17 July 2014 (UTC)
We could make an abuse filter for it. —CodeCat 11:08, 17 July 2014 (UTC)

"Definitions" header in Chinese entriesEdit

Apparently, people have been adding this header to Chinese entries instead of part-of-speech headers. But I recall that there was no support for this in the previous discussion. Why is this being done anyway? These entries should be fixed. —CodeCat 11:58, 8 July 2014 (UTC)

Can we just have a real vote on it? Otherwise people are just going to keep going back and forth. DTLHS (talk) 21:53, 8 July 2014 (UTC)
Because the validation of a language-specific header does not require consensus by vote (Wiktionary:Entry layout explained/POS headers#Other headers in use). It only needs the agreement between editors who regularly deal with such entries. The "definitions" header is no different from the "Han character" header in use in the hundreds of thousands of Chinese character entries (e.g. ). Wyang (talk) 03:23, 9 July 2014 (UTC)
Inventing a new part of speech header for languages where it's appropriate (I've done this too, I added the "Relative" POS for Xhosa and Zulu) is not a problem. It's a very different story when you're introducing a new header to remove part-of-speech information altogether. That is my objection here. —CodeCat 11:33, 11 July 2014 (UTC)
Regardless of bureaucracy, what exactly is the reason for replace POS headers with ===Definitions===? --WikiTiki89 13:54, 11 July 2014 (UTC)
I'm with CodeCat and DTLHS here. Why not just split the meanings by part of speech like we do for literally every other language. If there's a case to be made for not doing this, set it out in a vote where we can all see it. Renard Migrant (talk) 10:43, 15 July 2014 (UTC)
  • Re: what is the reason, there's the simple fact that 1) Chinese doesn't inflect at all, so there's no useful information provided by the POS header other than the POS itself, which can easily enough be included inline; and 2) many Chinese terms have basically the same meanings applied in different POS ways. Take , for example. We've got 13 senses listed under 5 different POS headers. The headers really only serve to break up the page in ways that are unintuitive for Chinese. ‑‑ Eiríkr Útlendi │ Tala við mig 18:52, 15 July 2014 (UTC)
    Thank you for explaining your reasoning. Here's what I think: The POS headers are useful because they make it easier to find the definition you are looking for. Most of the time when you are looking up a word, you already have a good sense of its POS because of how it was used in a sentence, and so you can use the headers to narrow down your search for the definition. It would be very redundant to list "(noun)" before every noun sense, etc. BUT I think it may be a good idea to remove the requirement for "inflection lines" after each POS header, since they serve no purpose other than to duplicate the same information over and over. --WikiTiki89 18:59, 15 July 2014 (UTC)
Some of the reasons were also mentioned here: Template_talk:zh-pron#Why_does_this_categorise_in_part-of-speech_categories.3F. The choice of PoS is often arbitrary, based on the translation into English, dictionaries either mix up PoS or ignore it. By any system, listed PoS's do not sufficiently represent the actual usage. --Anatoli T. (обсудить/вклад) 05:28, 17 July 2014 (UTC)
I oppose "Definitions" header in Chinese entries, now as before. I already posted this, albeit to what is now ranked by someone as "off-topic", below. --Dan Polansky (talk) 18:05, 23 July 2014 (UTC)
I oppose "Definitions" header in Chinese entries, now as before. Nothing to do with "bureaucracy"; it has to do with civilized methods of government with which people of certain backgrounds are obviously not acquainted. --Dan Polansky (talk) 11:04, 13 July 2014 (UTC)
You obviously consider yourself "civilised" if you allow yourself to make racist remarks. --Anatoli (обсудить/вклад) 11:21, 13 July 2014 (UTC)
Not that I commend DP's needlessly inflammatory remark, but there was nothing racist about it. (Unless you were referring to something else.) Keφr 11:58, 13 July 2014 (UTC)
It wouldn't be a racist remark at all if he just makes fun of something I do, but it became a blatant racist insult when he specifically elevated it to the level of "people of certain backgrounds". Yes, Dan, my ancestors, me as well as all of my comrades are obviously barbarians not living up to your lofty expectations. Could you then kindly explain why the sentence "Other headers in use may be added to this table regardless of the warning above note to modify this policy page without a vote; the appearance of a header in this table is not strict policy" has been sitting on that policy page since 2007? Could you also explain in what way is this different from the "Han character" header currently in use in the hundreds of thousands of character entries (e.g. )? Wyang (talk) 12:06, 14 July 2014 (UTC)
The point is that Dan was referring to the political system of the country you're from, not your race. The remark presumably wouldn't apply to someone of the same ancestry who grew up in a different country. It was an offensive cheap shot, but not a racist offensive cheap shot. Chuck Entz (talk) 12:36, 14 July 2014 (UTC)
"Background" does not mean "race" or "ethnicity". --WikiTiki89 14:43, 14 July 2014 (UTC)
"Background" can mean "origin". I consider this comment was racist. I also hate when people accuse me for the deeds of the Russian government, as if I am responsible and have anything to do with them. Wyang is not responsible for the Chinese government, if it's considered to be "uncivilised". Should people who grew up in Communist countries also be labelled "uncivilised", including Czechoslovakia? I don't expect DP to to explain, let alone apologise but we should call spade a spade. --Anatoli (обсудить/вклад) 22:54, 14 July 2014 (UTC)
How is the comment "people of Chinese background are obviously not acquainted with civilised methods of government" not racist? Do you really think he was just insulting the Chinese government? Wyang (talk) 23:22, 14 July 2014 (UTC)
Well, I cannot peek into Polansky's head to know for sure, so for the time being I assume yes, my reasoning being just like Chuck's. This is how I read it: as an insult of culture, not of inherited traits or whatever. Either way, I think we agree that the remark was grossly inappropriate. Just please do not label it "racist" so that you can be even more offended by it. Keφr 08:22, 15 July 2014 (UTC)
Dan thinks everyone outside America and England lives on trees. Read his comments about the "Anglo-American civilization" here. --Vahag (talk) 08:36, 15 July 2014 (UTC)

Abbreviated Authorities in WebsterEdit

I have recently discovered the Abbreviated Authorities in Webster Table, and, noticing that a few of the early entries have been linked to Wikipedia, I have been adding a few such links myself. It's interesting, though there are occasionally mismatches of dates (should the Wikipeida date be moved in?). But it's a bit inconvenient for navigation. I feel that the table should be divided by initial letter. If this seems to be generally agreed upon, is it something I would need to do myself or is it something that should be done by a coding whizz ? —ReidAA (talk) 08:08, 9 July 2014 (UTC)

Excellent! That table could be quite useful in resolving some of the {{rfquotek}} entries.
I started manually splitting the table by initial letter. It is not hard. It just requires copying the wikitable formatting surrounding the "W" or Y" headers and inserting it in the appropriate place in the undivided table.
What might be a great help would be adding links to Wikisource, Google Books, or Project Gutenberg versions of some of the specific works. As an example I did so for Hawking and Hunting. To make sure that the work is useful we should extract from the XML dump a list of how often each authority is used within {{rfquotek}}. DCDuring TALK 10:12, 9 July 2014 (UTC)
The table is now initialised. I've done a bit more wiki-referencing some of the authors. --Catsidhe (verba, facta) 12:23, 9 July 2014 (UTC)
Thanks. A dump run would help us see which authorities were actually in use, so, for now, we may as well just pursue what is interesting. DCDuring TALK 14:01, 9 July 2014 (UTC)
As we would want to use this to source citations, the best forms of a work to link to would be those that allowed search at once of the entire range of the authority in question. Wikisource often breaks the work into chapters, which is unsatisfactory for search, though arguably good for linking. It is not so handy to have to download the work to search it. DCDuring TALK 14:22, 9 July 2014 (UTC)

Proper nounsEdit

I just came across Bible and Qur'an, which are labelled proper nouns. But at the same time, these have plurals and can take an indefinite article. I just read through w:Proper noun, which suggests that real proper nouns (or proper names) can't take indefinite articles nor have plurals. If they do, then they're not proper nouns, but refer to a class of things rather than a unique entity. The article uses "Toyota" as an example that can be either: the company itself as a proper noun, or a car made by the company as a common noun. In this sense "Bible" is a common noun because it's a book that many copies can exist of. It doesn't act grammatically the same as other book or story titles, whether old or modern. Compare for example "Odyssey", which takes a definite article like "Bible", but doesn't normally have an indefinite article: a Bible versus a copy of the Odyssey, not *an Odyssey. So I wonder what kind of criteria we should apply to proper nouns on Wiktionary, and whether we shouldn't consider relabelling some. —CodeCat 18:33, 11 July 2014 (UTC)

Are given names not proper nouns? They can be pluralised: "All the Jameses in the room raised their heads.", and they do not seem to have a distinct meaning in the plural. Keφr 18:44, 11 July 2014 (UTC)
(e/c) In particular, we currently label personal names as proper nouns, while simultaneously admitting (in many though not yet all entries) that they pluralize. Ditto country names (Germany : Germanies, Germanys, America : Americas, France : Frances). - -sche (discuss) 18:46, 11 July 2014 (UTC)
These words can be both common and proper nouns. Compare the following sentences:
  1. The Bible says to honor one's parents.
  2. Jack read the Bible.
  3. Jack put the Bible he had just bought under his pillow.
In the first sentence, "the Bible" is indisputably a proper noun, while in the third, it is indisputably a common noun; in the second, however, it can be interpreted either way. --WikiTiki89 19:01, 11 July 2014 (UTC)
In cases such as Bible and Qur'an, I think we should include both POS sections, which is what Bible already does. — Ungoliant (falai) 18:54, 11 July 2014 (UTC)
Yes, in the case of books, I think including both sections (Proper noun, and Noun) is best. In the case of personal names, on the other hand, I think including two sections would be unjustifiable; as Kephir notes, the singulars and plurals have the same sense (differing only in number): "one Richard" means one person named Richard, "two Richards" means two people named Richard. Whether that means it would be better to relabel all personal names plain nouns, or live with pluralized proper nouns, I don't know. - -sche (discuss) 19:07, 11 July 2014 (UTC)
"One Richard" is a common noun. "Richard" by itself is a proper noun. However, I think it would be overkill to create common noun sections for every name. --WikiTiki89 19:14, 11 July 2014 (UTC)
We could also just call them all nouns, couldn't we? We could keep the category if needed, but just use the normal "Noun" header. —CodeCat 19:34, 11 July 2014 (UTC)
(@Wikitiki) I don't necessarily disagree that "Richard" can be a proper noun, but I note that whatever parts of speech "Richard" can have, "Richards" can also have. The very reason that given names' definition-lines are italicized is that they are in most uses non-gloss; "and then Richard arrived" means "and then a person named Richard arrived", not *"and then a male given name arrived". An exception would be a hypothetical use like *"not long after the first scribe began to spell the adjective which had been hart as hard, the change spread to instances of the word in compounds, and with that, Richard had arrived", where "Richard" really would be a proper noun meaning "a male given name" — but NB Richards could (equally hypothetically) be used the very same way, e.g. *"and when 'd'-final words began to pluralize with '-s' rather than '-es', Richards arose". - -sche (discuss) 19:36, 11 July 2014 (UTC)
Mentioning a word is an entirely different story. I was not referring to that at all. I also disagree that the plural exists as a proper noun (except in cases where a group of people who are all named "Richard" are collectively named "Richards"; e.g. Richards are coming for dinner, where "Richards" refers to a specific group of people). --WikiTiki89 19:48, 11 July 2014 (UTC)
FWIW, here's how de.Wikt handles it: common names are common nouns, e.g. de:Angela's POS is "noun - first name" and de:Fritz has one POS "noun - first name", another "noun - last name", and a third, labelled "noun", which covers in one section the slang uses that our entry on Fritz split into a "noun" and a "proper noun" section. When a name is defined as referring to only one specific person, e.g. de:Archimedes, it is labelled "noun - proper noun" (but contrast de:Platon). - -sche (discuss) 19:36, 11 July 2014 (UTC)
[e/c] A basic distinction is between proper names (of specific entities, eg, "The White House", "Mack the Knife", "Germany", "The Federal Republic of Germany", "Deutschland", my late dog "Hayek" [short his full name "Friedrich Augustus von Hayek"]) and proper nouns. CGEL (Huddleston and Pullum) hold that ""Proper nouns, by contrast, are word-level units belonging tho the category noun. Clinton and Zealand are proper nouns, but New Zealand is not." and "Proper nouns are nouns which are specialised to the function of heading proper names. There may be homonymy between a proper noun and a common noun, often resulting from historical reanalysis in one or other direction." Their examples are sandwich and Sandwich and rosemary and Rosemary.
Our L3 header "Proper noun" is applied both to terms that serve as names of specific entities and to "nouns which are specialised to the function of heading proper names". Even a term such as White House, which is often considered the name of a specific entity, ie, a proper name, can be shown to be attestably made into a plural. Are the uses of White House to be taken as nicknames for the specific entities Roosevelt White House or the Franklin Delano Roosevelt White House?
Whether in a given case we have under the L3 header "proper noun" a proper name or a proper noun (in the CGEL sense), there is no reason not to show plurals, if attestable. Showing a word like Bible as both a common noun and a proper noun seems fine as the common noun meanings are not entirely predictable from any of the meanings of the proper noun and are attestable, but both common and proper noun meanings are likely to be pluralizable, some attestably so. DCDuring TALK 20:09, 11 July 2014 (UTC)

Reading all of the discussion here, I wonder if things would benefit from an approach like the German Wiktionary, with all of them treated as nouns. Our header structure is different, so it would not fit in exactly the same way. So how about relegating proper name-ness to the actual definition line? For given names, we already have a template to do the job, and for others, the definition already implies properness in most cases. So there's nothing that the header "Proper noun" really adds beyond what the definition already tells the user. It would also allow us to list plurals without problems, while labelling the real proper names as uncountable, and we could also merge Noun and Proper noun sections together in entries when the distinction is not so clear anyway (like in Bible). Furthermore, we need to distinguish nouns that are used without the definite article (such as names) from those that are used with it. There is nothing in the current Bible entry that indicates this to the user. —CodeCat 20:59, 11 July 2014 (UTC)

Uncountability, as we use it, is not the same as not having a plural, though many use {{en-noun}} as if that were true, either through lack of understanding of uncountability, not reading the {{en-noun}} documentation, or being defeated by it. The problem would seem to be that we use "uncountable" both in reference to mass nouns, specific entities, and nouns whose plural form is the same as the singular form. If we could find attestation for expressions like "too much/little White House" (which we probably can), that wold show White House to be uncountable in the sense of mass noun.
Nothing in a template should per se prevent us from making a decision to show plurals for things that appear under the proper noun header. We would just have to revise {{en-proper noun}} and search instances were the plural shown by the template ("tail") did not conform to usage ("dog").
OTOH, none of the OneLook dictionaries call (the) White House a proper noun. (Most call it a noun; some seem to dispense with PoS labels.) We could either take that as an indication that we have bitten off more than we can chew or that we are making an un-lemming-like advance over other dictionaries.
Use with the is usually grammatical information (eg, no the in attributive use; the used to emphasize that a named entity was the famous one of bearing the name), but may also be sense-level information (examples to follow).
It seems to me that we are still some distance away from having a sufficient shared appreciation of the issues involved in altering the thousands of English proper noun L3 headers, let alone those in other languages. DCDuring TALK 22:10, 11 July 2014 (UTC)
  • I dissent from Wikitiki and CodeCat on this. It is possible for a proper noun to have both singular and plural forms. You can have one James or a lot of Jameses, one Henderson or a lot of Hendersons. I also don't understand where CodeCat is coming from with her Wikipedia argument: I read the article last night, and I came out of it thinking the opposite. Purplebackpack89 17:45, 23 July 2014 (UTC)
    @Purplebackpack89 See the section w:Proper noun#Capitalized common nouns derived from proper nouns. --WikiTiki89 17:51, 23 July 2014 (UTC)
    Jameses is the plural form of a common noun. It's very easy to see this just by back-forming the countable singular. A James is not the same thing as James, and there is certainly a big difference between saying you don't look like James and you don't look like a James. Furthermore, the statement James is a James is true, which illustrates that a single specific person called James is a member of the class Jameses (people who have the name James). Compare this to a car is a vehicle which has the same semantic structure. —CodeCat 18:35, 23 July 2014 (UTC)
    The problem, though, is that "Jameses" can be definite or indefinite. "a James" (indefinite) might be common, but "the James" is proper. Purplebackpack89 18:41, 23 July 2014 (UTC)
    "The James" is still a common noun, unless it is turned into a name/nickname. For example: Here "The James" is a common noun: "The James I met yesterday was taller than the James I met the day before." But here "The James" is a proper noun: "There are five people named 'James' at school, but only one of them—the biggest and baddest one—we call 'The James'; Everyone is afraid of The James." --WikiTiki89 18:49, 23 July 2014 (UTC)
  • Famous examples: Thackeray authored The Four Georges, The Newcomes, and The Virginians. A less famous example is The Four Jameses, an anthology of Canada's four worst poets, all named James.
  • Another example is in the absolutely fantastic dialog in Douglas Adams Mostly Harmless, where one character misinterprets "The King" as "the King", because he did not recognize Elvis.
  • I recently created Mighty Mouse. One of the citations has the plural. I was wondering at the time how to enter the plural, which is Mighty Mouses. (What do people use for the plural of that Apple mouse from way back, nicknamed the "Mighty Mouse"?) Choor monster (talk) 21:47, 31 July 2014 (UTC)
    • Further comment: Regarding consensus regarding what is a "proper noun", I noted that the examples given at Template:en-proper noun include "Wiktionarian", which is obviously not a proper noun. The link treats it as a common noun. Choor monster (talk) 14:59, 1 August 2014 (UTC)
      You're right. Those examples have been there since the template was created in 2006 and no one has bothered to correct them. I'm going to follow suit and also not bother. --WikiTiki89 15:05, 1 August 2014 (UTC)
      I've changed the example from "Wiktionarian" to "Alex", which is better, though still suboptimal. Side note, it annoys me that {{en-noun|foobar|ies}} behaves differently than {{en-proper noun|foobar|ies}}: vide foobary (plural foobar or ies) vs foobary (plural foobar or ies). - -sche (discuss) 18:56, 1 August 2014 (UTC)
      I could make them behave the same. The hard part is ensuring that all the entries that rely on the old behaviour are updated. —CodeCat 19:21, 1 August 2014 (UTC)

I've now updated {{en-proper noun}} to have the same parameters as {{en-noun}}. It also categorises differently: uncountable proper nouns aren't categorised specially, but countable ones are placed in Category:English countable proper nouns. This category probably needs cleaning up. —CodeCat 13:19, 3 August 2014 (UTC)

Thanks. I notice that when a plural is specified, as on Alex, Template:en-proper noun now says "usually uncountable, plural ___" — something Template:en-noun does not do when the same parameters are supplied to it. I don't think this new wording is correct; I don't think there's anything unusual about Alex’s countability. Alex is not often counted, but this is different from being usually not countable. Something like anger or dark matter is usually uncountable: it's an emotion / type of substance (respectively), and even though one may find various instances/forms of it, they are just that, specific examples/variants of the one underlying emotion / type of substance. Specific people named Alex are not variants of one underlying person or such; there's nothing about "Alex" which is uncountable per se, it's just that people more often have occasion to refer to one Alex (one person, at a time) than to several Alexes. (Wasn't there recently a discussion about the meaning of "uncountable"/"countable"? Postscript: oh, yes, just a bit earlier in this thread.) - -sche (discuss) 15:37, 3 August 2014 (UTC)
It is in principle possible for a proper name or proper noun to be used uncountably. I am sure that it would not be attestable for very many names, but after failing with Mary, I succeeded with Marilyn From a biography of Edith Head: "If you do, there will be too much Marilyn showing." I suppose we can try to dismiss such cases as metonymy. DCDuring TALK 15:51, 3 August 2014 (UTC)
too much Winston Churchill would be attestable. DCDuring TALK 15:53, 3 August 2014 (UTC)
I think this is again a problem of mixing "countable" and "having a plural". Everything that has a plural is countable, but not everything that has no plural is uncountable, apparently. So when templates receive a parameter that tells them to omit the plural, they should not show "uncountable" as they do now, but "no plural". Something like "usually uncountable" could become something like "plural (rare) ..." —CodeCat 15:56, 3 August 2014 (UTC)
(@DCDuring, after edit conflict) You seem to acknowledge that uncountable use of names is less common (attested for fewer names) than countable use is. That being the case, I'd prefer a wording like "countable and uncountable; plural __" to wording like "usually uncountable, plural __" (emphasis mine). I don't want to discount the uncountable uses [if attested], I just don't want to privilege them over, or discount, the countable uses. - -sche (discuss) 15:59, 3 August 2014 (UTC)
In the case of proper nouns, I can imagine uncountability of an apparent proper noun in some realistic cases (perhaps too much Ebolavirus? (the Translingual proper name used in English)). I find it hard to believe that there could be any proper noun in English that was ever used uncountably in more than a small minority of cases.
I would very much like us to simply show the plural of proper nouns without committing ourselves to the existence of the plural of the lemma or its possible uncountability. Why would we want to use one minute of contributor time in attesting or debating plurals, let alone uncountability, of individual proper nouns? To me the situation is reminiscent of the uncommon inflected forms that occupy some Latin inflection tables. DCDuring TALK 16:48, 3 August 2014 (UTC)
Yes, to be clear, my top choice is for Template:en-proper noun to behave like Template:en-noun in this regard (that was in theory, but not in fact, what the recent edits to it were to do), and just display the provided plural. - -sche (discuss) 17:10, 3 August 2014 (UTC)
Well, if it were to do exactly the same, it would also display a plural by default. —CodeCat 17:12, 3 August 2014 (UTC)
Automatic pluralization can be suppressed (for now, although depending on what percentage of our proper noun entries are names, it might be worth enabling in the future). I'd just like the template not to insert novel claims that terms are uncountable whenever plurals are added (it's oxymoronic). - -sche (discuss) 17:18, 3 August 2014 (UTC)
Indeed. I would very much like to make sure that we don't transfer to proper nouns practices that can be tedious (though sometimes meaningful), even for common nouns, such as RfDing or RfVing proper noun sections or definitions solely for their use in the plural or their uncountability. For proper nouns and even proper names plurals are virtually always possible. Uncountability, too, is significantly rarer, but also, I think, always possible.
Do we need to do anything special to record this as a desirable practice? Does such a recording have any force or would we need a vote? I would like to have a link to this archived discussion in Wiktionary:English proper nouns (very incomplete, awaiting EncycloPetey) or its talk page. DCDuring TALK 17:56, 3 August 2014 (UTC)
  • Would it make sense to have one or more parameters in {{en-proper noun}} that reflected what meaningful class a given proper noun was in, eg, personal name (or given name, or surname, or both), organization name (presumably a proper name of a specific individual), proper individual name, eg, Marilyn Monroe, toponyms, demonyms, language names, and possibly others? Each class could generate the appropriate inflection line, plural or no plural and more specific category membership. Personal names unless of a specific individual would show plurals by default, organization names would not, etc, with the possibility of overriding any of the defaults. DCDuring TALK 20:21, 3 August 2014 (UTC)
    My approach has been to go through the given-, family- and place-name categories and add plurals after checking that they are attested. I would be wary of adding more "class" parameters, in part because it was only when I looked at {{en-proper noun}}'s code recently that I realized that it already had parameters for given and family names — I have never seen anyone use them, and I presume that's what would happen if we added more class parameters. (There would also be problems if, as frequently happens, a string is a placename and a given name and a family name.) Isn't it sufficient to add plurals to the "proper nouns" (or subcategories/types of proper nouns) which pluralize, and leave the other proper nouns as they are, pluralless? - -sche (discuss) 22:08, 3 August 2014 (UTC)
    It certainly is simpler in terms of decision making. We would gain some information by having smaller, more homogeneous categories. It would make it easier to implement any changes in policy that applied to the classes or subsets of the classes. It was just a thought that occurred to me, knowing that you were spending time visiting the entries anyway. DCDuring TALK 22:36, 3 August 2014 (UTC)

Example sentences in ELE, linking of words and delinking of transliterationsEdit

What's the history of the rule behind Wiktionary:ELE#Example_sentences? Who said we can't link individual words? Now that transliteration is (unintentionally) wikified in {{usex}}, see Wiktionary:Grease_pit/2014/May#Transliteration_linked_to_individual_parts_in_usexes_when_hyperlinked, my request to delink it, is brushed by - we shouldn't link words, anyway. Can we change this rule - "not contain wikilinks" for words used in usage examples? Do we really need a vote for that? Can somebody help delink usex transliterations, as in this revision or this revision ? --Anatoli (обсудить/вклад) 00:28, 14 July 2014 (UTC)

Someone needs to edit Module:usex to delink transliterations- removing links by hand is a waste of time. DTLHS (talk) 00:39, 14 July 2014 (UTC)
Yes, I agree (thanks for agreeing to fix!) but the rule itself doesn't reflect the reality and I think it's not helpful. A lot of Russian usexes are linked (not my edits but I don't see it as a problem, in fact, it may quite useful for learners to link to lemmas or some difficult words) and most Chinese usexes are linked and it's very useful for languages with no straightforward word boundaries. Anyway, editors should be free to choose, if they want to wikify individual words in usexes. --Anatoli (обсудить/вклад) 00:58, 14 July 2014 (UTC)

Gender templates for French inflected formsEdit

{{fr-adj-form}} has been edited so that it no longer accepts gender. I understand that adjectives do not inherently have their own gender but agree in gender with what they are describing. I also understand that in with the 'definition', it says 'feminine singular of' or 'masculine plural of'... but I still think we should encourage having the gender in the head word wherever possible.

My proposal is to enable gender in {{fr-adj-form}} and to add back the gender to French adjective forms wherever possible. This is very doable by bot, for example \{\{fr\-adj\-form\}\}\n\n# \{\{feminine of\|([\ -9\;-\\\^-z\}-ퟻ]+)(\||\}) is a regex that finds all the uses of {{fr-adj-form}} with no gender, followed by {{feminine of}} on the following line (with a single blank line in between). Renard Migrant (talk) 16:49, 14 July 2014 (UTC)

Why should the gender information be in two places, both on the headword line and in the definition? - -sche (discuss) 18:28, 14 July 2014 (UTC)
I oppose this for the reason -sche gave, and the reasons you yourself gave too. —CodeCat 18:29, 14 July 2014 (UTC)
I find it quicker to understand with the gender in the head word. I say quicker, probably by a few tenths of a second. Renard Migrant (talk) 21:57, 14 July 2014 (UTC)

How about going the other way then? Actively removing the gender from the headword template? That's even easier to do! Renard Migrant (talk) 11:03, 15 July 2014 (UTC)

Software update: <ref> without <references/> no longer shows an error or categorizesEdit

As was announced on Wikipedia but oddly not over here yet, "With the deployment of 1.24wmf12 on July 10, missing reference markup will no longer show an error; the reference list will show below the content [...] without adding a category, so there's no way to find and fix the affected pages." See this WP thread (permalink) and this WP thread (permalink) for discussion, and diff for an example of the phenomenon. Note that our abuse filter still (correctly) discourages adding <ref> without <references/>. - -sche (discuss) 17:18, 14 July 2014 (UTC)

If a page has multiple language sections, and un-<references/>ed ref tags are added to one of the upper language sections, the references appear in the last language section. This has the potential to be especially confusing for people who use Tabbed Languages. - -sche (discuss) 18:26, 14 July 2014 (UTC)


In a discussion above, DCDuring noted something that (I think) implied we're not using the term "uncountable" the way we should. But I'm not quite sure what this means, as to me uncountable just means having no plural. Is this not what it means, and what does it mean in that case? I came across a few categories named "singulare tantum", is that the term we should be using instead of "uncountable"? —CodeCat 18:34, 14 July 2014 (UTC)

Uncountable does not mean having no plural, it means that quantities of the noun are not measured in discrete amounts. Theoretically, a noun could be countable, but not have a plural if, for example, there is only one in existence and no one ever speaks of any others. For an uncountable noun, it is impossible to say that "there is only one in existence". Proper nouns can be countable but not have a plural: there is only one William Shakespeare (barring metaphorical usage, or others who happen to have the same name), but William Shakespeare is most certainly countable. --WikiTiki89 18:57, 14 July 2014 (UTC)
What I understand, then, is that uncountable words have no plural for semantic reasons (it makes no sense to speak of a plurality) while the remainder have none only because it is simply rarely used or not at all. —CodeCat 19:04, 14 July 2014 (UTC)
Well I think that proper nouns such as William Shakespeare also don't have a plural for semantic reasons, but it's a different semantic reason. --WikiTiki89 19:23, 14 July 2014 (UTC)
"Paint" is uncountable when you talk about "some paint" but countable when you talk about "three different red paints". If something has no plural but is singular, I tend to use the "plural not attested": {{en-noun|!}}. Equinox 19:30, 14 July 2014 (UTC)
Yes, but the "paint"s in your two examples are different senses. --WikiTiki89 19:34, 14 July 2014 (UTC)
I think CodeCat is thinking about the inflection line for common nouns and {{en-noun}}, ie, not definition-level of countablity/uncountability distinctions.
The prevailing pattern of usage of "uncountability" by English native contributors at Wiktionary coincides with the mass noun concept. However, many uses of various early incarnations of {{en-noun}} used features of the template intended to mark uncountability (mass noun) to suppress the display of plurals, for whatever reason the contributor felt justified that suppression, eg, user didn't-know-how/couldn't-be-bothered to get plural ending in "es" or a truly irregular form to display, user didn't think noun had or should have a plural form, plural form was not attested. If you combine that with the changes to {{en-noun}} wrought by contributors with an imperfect understanding of the concept, you can understand why we have not made much progress in rectifying this. I hope we can come up with some scheme so that our inflection-line displays can be made correct without thousands of hours of tedium and are not too misleading in the interim. I doubt that bots can be relied on however, except perhaps for narrowly circumscribed cases.
At the sense level we use "labels" or "contexts" to distinguish. There is nothing that prevents understanding usually if someone uses a countable noun uncountably or an uncountable noun (mass noun) countably, but we have invested a great deal of effort in attempting to distinguish uncountable from uncountable senses, which effort is worth preserving. The task of marking each English noun sense as countable or uncountable (or both) is quite incomplete.
At the inflection-line level, we do not usually get data to support our claims that a given common noun is always or never countable or that countability of uncountability is the prevailing usage, relying mostly on native-speaker intuition, as most other dictionaries do not expend resources on this matter. DCDuring TALK 20:59, 14 July 2014 (UTC)
Would it be ok then if we adopt the practice of showing "no plural", "singular only" or the like in the headword line, and leave countable/uncountable information to the individual senses? That way the headword line is agnostic about countability, which makes sense if this can be different for different senses anyway. It would also mean changing the categorisation of many nouns, emptying out "uncountable nouns" categories in most cases and substituting it with something else. Possibilities might be Category:English singular-only nouns or Category:English nouns with no plural. We may want to revise the use of "plurale tantum" as well. —CodeCat 21:09, 14 July 2014 (UTC)
CodeCat said "as to me uncountable just means having no plural". Oh come on I find it hard to believe you're not better educated than that. There's such a thing as countable singular use, e.g. "I have a grain" is a countable singular use of grain. "I have some grain" is uncountable use of grain. Some countable nouns will be attested in the singular but not the plural. Renard Migrant (talk) 22:04, 14 July 2014 (UTC)
WT:AGF says we should take CodeCat's word for it. DCDuring TALK 22:15, 14 July 2014 (UTC)
No, it only says that we should assume CodeCat's intentions were in good faith. --WikiTiki89 15:00, 15 July 2014 (UTC)
@CodeCat If eliminating inflection-line information would make things simpler for you, who am I stand in your way? Why don't we eliminate the display of regular plurals (ending in "s", "es", and "ies") too? Oh, wait, users might value the information.
The logic of our entry display is that inflection-line information is assumed to carry over to definition lines unless there is something contrary indicated on the definition line. Thus exceptional plurals are sometimes displayed at definition lines, sometimes only at definition lines. It is a major change to depart from that formulation for one attribute of one PoS in one language, especially where the language is the wiki's host language.
So, before we start changing modules and templates of wide use, I would like to understand an implementation plan that preserved the correct information that was now in the inflection lines and transferred it to the definition lines for each type of headword-line, whether implemented using {{en-noun}} or {{head}} directly or by other means. A dump-processing run that took a census of the options used in {{en-noun}} would be useful for that. We must have at least a dated one to support the major changes you previously made to {{en-noun}}.
It would be nice if the changes were carried out with more care and knowledge than the changes made to {{en-noun}}. DCDuring TALK 22:15, 14 July 2014 (UTC)

One way to recognize the difference is that singular countable nouns typically require a determiner, while singular uncountable nouns can get by without one. At Simple English Wiktionary, this is dealt with at the sense level (e.g., alarm). I can't see how it would make sense to deal with it at any other level.--Brett (talk) 23:30, 14 August 2014 (UTC)

@Brett Yes, but we do not yet have that information at the sense level in many, many of our polysemic English noun sections. In even more cases there is only one noun sense, so the information on the inflection line is adequate. Further, many definitions, whether usually countable or usually uncountable, will show some use of the other kind, but without much other semantic difference, so we would be essentially duplicating the definition in order to more clearly distinguish uncountable and countable use. All of this makes reform a little difficult, especially if it is solely undertaken as a template simplification and "tidying" problem. DCDuring TALK 00:43, 15 August 2014 (UTC)
Understood. In cases where both countable and uncountable uses exist a given sense, then we can simply say as much: rather than duplicating the entry, just have a tag for "countable and uncountable", which again, is what is done at Simple English.--Brett (talk) 00:56, 15 August 2014 (UTC)
And we may not be able to come to agreement on exactly how it should be done. DCDuring TALK 01:48, 15 August 2014 (UTC)

Context Label: ReflexiveEdit

I have recently edited the module code for the context labels ( so as to have it automatically send entries marked with the label "reflexive" into a category named "-LANGUAGE NAME- reflexive verbs". I did this in an attempt to have the Macedonian reflexive verbs compiled into a list, since I didn't see any other way to do this other than add "[[Category:Macedonian reflexive verbs]]" under each entry, which didn't seem like an ideal solution - I wanted something automatic, just like the automatic system that works for intransitive and transitive verbs. I also thought that if I merely wrote "[[Category:Macedonian reflexive verbs]]", it may end up erased in the future, whereas some automatic mechanism would be operable on a longer term. However, things have gone awry.

Apparently, the context label "reflexive" has been used for various entries in various languages to mark reflexive pronouns as well. It has also been used to denote reflexive senses of verbs which are not truly reflexive and thus don't belong in a reflexive verb list. Now, I suppose these things need mending, so I have come here to announce what has happened in hope that someone will be able to restore things the way they were before the change (and possibly advise me as to how to solve the problem I had with the Macedonian reflexive verbs, i.e. how to have them automatically go to a list of reflexive verbs). Martin123xyz (talk) 14:59, 15 July 2014 (UTC)

As far as I understand, you could have a label 'reflexive verb' that displays reflexive but categorizes in reflexive verbs. I don't know about other languages (much) but in French, almost all transitive verbs can be used reflexively, and almost no verbs are always reflexive, so you could talk about reflexive usage but not reflexive verbs (because they're not inherently reflexive, just they can be used that way). Renard Migrant (talk) 15:02, 15 July 2014 (UTC)
I noted before you made this change that calling verbs where one or more senses are used reflexively "reflexive verbs" is silly. Just look at Category:English reflexive verbs now. Almost none of them are actually reflexive, they just happen to have a sense that is used reflexively. The same applies to Category:English transitive verbs and Category:English intransitive verbs as well, which also had categorisation added recently for some reason. And Category:English countable nouns and Category:English uncountable nouns are a similar problem, which prompted me to start the discussion above. —CodeCat 15:03, 15 July 2014 (UTC)
That's the argument for categories with names like Category:English nouns with countable senses. Even then, it would seem even better to just not categorize at all. Renard Migrant (talk) 15:19, 15 July 2014 (UTC)
Probably, yes. Most of the time these labels are only used when it's not clear from the definition, or to contrast with other definitions. So paradoxically, the nouns labelled "countable" are primarily those which are also labelled "uncountable". —CodeCat 15:21, 15 July 2014 (UTC)
I know that many think it pointless to have a reflexive verb category, but in Macedonian there are some verbs that are always reflexive, i.e. whose reflexive form is inherent. For example, "се кае" means "to regret", but "кае" doesn't anything. Also, there are many cases where a reflexive form of a verb is unrelated to the basic one when it comes to meaning. Thus, "дере" means "to skin" whereas "се дере" means "to scream". I think that these verbs deserve a separate category. Finally, many of the reflexive verbs in Macedonian have one-word equivalents in English - in those cases, English doesn't convey reflexivity explicitly. For example, Macedonian "се движи" and "движи" both correspond to English "move", but they have different meanings - the former means to be in motion whereas the latter means to cause something to be in motion. I think that in these cases too, the reflexive verb deserves its own category.
It's not as though I created separate entries for all reflexive forms in Macedonian and then declared them unique verbs. For example, I haven't created an entry "се допира" beside "допирa", because I don't feel that there is anything special about the reflexive form - it is marked explicitly in English too. Namely, the difference is that between "to touch oneself" and "to touch". This is because "се допира" is a true reflexive verb, whereas the point is that I am not really focusing on the true reflexive verbs. I am more interested in a separate category for the autocausative, anticausative and inherent ones. The true reflexive, reciprocal, and universal passive verbs are predictable and as you pointed out, derivable from any transitive verb. I really don't know why all of these are under the umbrella term "reflexive verbs"...
Anyway, I have a potential solution. I would create a new context label in the module code, called "mkreflexive", which would send entries to "Macedonian reflexive verbs", and I would mark Macedonian reflexive verbs with it. Meanwhile, I would set the display to simply "reflexive", which is what users actually need to see. Then, only I (and possibly someone who chooses to continue my work in the future) would use this label, and there would be no categories for reflexive verbs for other languages and no problems with reflexive pronouns or pseudo-reflexive verbs. However, could the problem I have caused already be fixed, i.e. could all the unnecessary (and defective) categories be undone? Martin123xyz (talk) 15:24, 15 July 2014 (UTC)
{{fr-verb}} covers this by allowing type=reflexive. There are some, s'agir differs in usage from agir for example. Renard Migrant (talk) 15:28, 15 July 2014 (UTC)
I don't agree with creating such a label. A better solution would be to let the inflection table add the category. —CodeCat 15:31, 15 July 2014 (UTC)
How would I let the inflection table add the category? I use the same inflection table for reflexive verbs, except that I use the parameter "ref" to have it add the reflexive marker "се" where appropriate. 16:08, 15 July 2014 (UTC)
You (or someone) can edit the template and have the "ref" parameter trigger a category. --WikiTiki89 16:48, 15 July 2014 (UTC)
Could you tell me how to have the parameter trigger a category? I have no idea where to code that, as I've never even defined the "ref" parameter anywhere. I just automatically used it in an if-statement and it worked. Martin123xyz (talk) 16:55, 15 July 2014 (UTC)
Exactly the same way. You use it in an if-statement, and have the true-clause add a category: {{#if:{{{ref|}}}|[[Category:WHATEVER]]|}}. You can add that anywhere really, but the end is the best place I think. --WikiTiki89 17:06, 15 July 2014 (UTC)
How very simple - thank you. I didn't think it could just work like that. I'll see to it soon enough. Martin123xyz (talk) 17:08, 15 July 2014 (UTC)
I prefer refl in general to avoid confusion with reference. Renard Migrant (talk) 20:58, 17 July 2014 (UTC)

Script or language: let us reduce ambiguity and prevent confusion!Edit

On pages such as and in many translations lists, spellings of one word in one language are given in multiple scripts. These scripts are indicated by names that sometimes coincide with language names, such as with "Latin" and "Hebrew". That easily creates ambiguity and, with it, confusion, at least with me. I have changed such references several times by adding the word "script" where a script is meant, but such contributions have also been reverted. I do insist that tables where language( groups) may be branched into several languages and where languages may be branched into several scripts, it is difficult for the eye to make out if the final branch concerns a language or a script.

I propose adding the word "script" to all occurrences of script names near language names. I do not know how to do it, but there seem to be scripts (of a different kind, this time) that can help us do this in a rather automated way.

Please help a language enthousiast, and his colleagues!

(I am trying to find my way, and just found out that this divided into month parts. This is the second place where I added my plea, because I put the first one in a month part somewhere in 2013.Redav (talk) 20:27, 17 July 2014 (UTC)

Support. Before I started using targeted translations, I used to come across some really strange Latin translations, only to find out it was Latin-script Ladino. It was my fault for skimming through the translations too quickly, but it won’t hurt to add script to the lines. — Ungoliant (falai) 20:35, 17 July 2014 (UTC)
(e/c) I don't see where the confusion can come from, since Latin is not a sub-language of Serbo-Croatian. In some places, we do use the word "Roman" instead, but this does not solve the problem in the general case, since some multi-scriptal languages use scripts like Hebrew and Arabic, which are also languages. --WikiTiki89 20:38, 17 July 2014 (UTC)
"Roman" is a misnomer anyway. The script is called Latin; "roman" is one variety of the Latin script, the other being called "italic". —Aɴɢʀ (talk) 20:44, 17 July 2014 (UTC)
By that logic, romanizations would be de-italicizations. "Roman" has both meanings. --WikiTiki89 20:50, 17 July 2014 (UTC)
Would they even be de-italicizations, if they were presented (as they sometimes are) in italics? Would they not then be italicizations? Why, this puts a whole new spin on the debate over whether or not to italicize Cyrillic! lol
As WikiTiki says, "Roman" has both meanings. - -sche (discuss) 20:55, 17 July 2014 (UTC)
For some languages, such as Cree, script names are (in my experience) not provided at all. One sees simply Cree: ᒪᐢᑲᐧ (maskay) / maskwa. Providing script names, especially with "script" spelled out, would be quite unwieldy:
- -sche (discuss) 20:55, 17 July 2014 (UTC)

For Beer parlour people who work more in discussions than on translations or in the main namespace, it IS confusing to have "Latin" and "Hebrew" to mean both the script and the language (also script tags Roman and Cyrillic). If you used User:Conrad.Irwin/editor.js quite a lot, you'd notice that the name conflict is quite frequent. When a translation into Hebrew, Aramaic, Serbo-Croatian, Latin appears not where it's expected, either from this tool, a bot or a human error. I'm not suggesting any specific solution but just letting you know that I have also experienced these problems firsthand and I am also very interested in the resolution. --Anatoli T. (обсудить/вклад) 00:49, 18 July 2014 (UTC)

You're right that it causes bugs in some of our tools, but I don't think it's confusing to people (at least when everything is formatted correctly). --WikiTiki89 13:13, 18 July 2014 (UTC)

Thanks for your input so far! I read:

  • 1) that at least two more people got or get confused by the ambiguity of names that may either indicate a language or a script;
  • 2) that someone does not see where the confusion could come from; that remark referred to the one particular example I mentioned, and (obviously) to readers who know beforehand that Latin is not a sub-language of Serbo-Croat; luckily I knew, otherwise I might indeed have been misled by the information, and that is exactly my point!;
  • 3) that someone thinks adding the word "script" to script names is unwieldy; I would have thought so too if I had not yet (admittedly only recently) discovered that the actual work may be done by bots.

For substantiation's sake, here I give you more examples where confusion looms.

On I read in the translations list:



If I were totally ignorant about these languages and scripts, I would not be able to make out if Devanagari were a sub-language, or if Campidanese were an orthography name (some languages have several orthographies) or even script name, or wether Arebica were a sub-language.

On I encounter:

        • Tibetan
          • Written Tibetan: (nga, I)
            • Modern Tibetan [Lhasa]: /ŋa˩˨/


      • Kiranti
        • Eastern Kiranti = Rai
          • Rodong: /ka-ŋa/
          • Limbu: ᤀᤅ (aṅa) /əŋa/
          • Waling: /aŋ-ka/

Again, I personally know that written and spoken languages exist alongside each other, I know that // are used to enclose IPA-script indicating pronunciation. But if "Waling" had not been mentioned and if the pronunciation in Limbu had been the same as in Rodong, I could have believed that Rodong and Limbu are just different scripts for the same language.

On I see:

The unsuspecting reader might be led to think that Egyptian Arabic is meant as a script name (since the spelling differs from the vocalized spelling after "Arabic", and that Maltese is yet another spelling or orthography. (By the way, there are people arguing that Maltese is a dialect rather than a language.) The Aramaic, Hebrew, and Syriac case were already mentioned by user writer above.

I do acknowledge a difference is (often) made between language and script indications by putting a bullet in front of a language name and leaving it out in front of a script name. But the difference in meaning between bulleting and non-bulleting does not seem to be easily understandable, and in my case, I did not notice it until I was already confused.

To return to my proposal:

  • I think sharing and finding information on e.g. languages and scripts is very valuable.
  • I think the information given about language and script names deserves to come across clearly and easily. I and several of you have given examples of things that may (and do) cause confusion to me, to several of you, and to other unsuspecting readers.
  • I get the impression that bots could do the hard part of the job more or less automatically, and that the "only" work a human might have to do is adding " script" to all script names. I volunteer (if I get the access needed, which does not seem the case now).
  • I read of no objections (other than the alleged unwieldiness in adding the word "script"), there are no remarks that e.g. adding the word "script" confuses or makes reading more difficult.
  • I propose adding the word "script" behind a script name, but am open to other suggestions.
  • To the remark "For Beer parlour people who work more in discussions than on translations or in the main namespace" I would like to say: I would like to be adding translations, and I did that a few times already. But, at least to me, Wiktionary is not always intuitive to handle: I seem to have managed to scramble a translations box when adding a translation, I added the word "script" several times but saw them reverted (which is fine with me if the problem I observe is solved in a different way) and at the same time learnt that bots seem to make changes as well, I added a few other bits and pieces but did not feel helped by an environment handing me the lay-out nearby (I have visited multiple pages to find examples and tried to mimic them). I am not certain that my input will not be overridden by bots, because I cannot see any indications that warn me I am inputting in the same place where a bot might revert my work. But this is a different topic already, isn't it?Redav (talk) 13:19, 2 August 2014 (UTC)
    Re "someone thinks adding the word "script" to script names is unwieldy; I would have thought so too if I had not yet (admittedly only recently) discovered that the actual work may be done by bots": I don't think it was meant that the work would be difficult, but that the word script adds unnecessary text to the screen. --WikiTiki89 20:56, 2 August 2014 (UTC)
    • You may be right. I reacted to -sche's utterance saying: "Providing script names ... would be quite unwieldy." rather than: "Reading script names ... would be quite unwieldy." Personally I am not convinced that reading the extra word "script" would look unwieldy to me: it is only the relatively few languages that are / were written in two or several scripts that would show the extra word "script" in front of the various spellings of their words.Redav (talk) 15:21, 8 August 2014 (UTC)

I have just come to realize that leaving out any script name (so even the part that would come before the word "script") would solve the problem as well. We would then simply have e.g.:

in the list of translations for "water". Or would that (re)create other problems that were meant to be solved by indicating script names?Redav (talk) 15:21, 8 August 2014 (UTC)

That is actually a brilliant idea! I would support it. --WikiTiki89 15:25, 8 August 2014 (UTC)
Support it as well. —CodeCat 15:27, 8 August 2014 (UTC)
Wikitiki is right, I was saying that spelling out script names like "Canadian Aboriginal Syllabic script" would make the relevant lines in the translations tables undesirably long. At present, for languages that use CAS, script names are IME not provided at all. Dropping script names in other places is a fine solution in my opinion. As far as I know, there are no cases where both script forms and dialect forms are nested on separate lines, because that would be untenably confusing even in the current arrangement. (See for example how the Serbo-Croatian translations of euro and chemistry are provided.) - -sche (discuss) 19:17, 9 August 2014 (UTC)


This bot, which belongs to User:JackPotte, has been active on Wiktionary in the past, but in December I noticed that it has no bot flag, so I followed our current procedure as I understand it, and blocked the account. Since a protest of the block has now been posted on the talk page, I thought it would be a good idea to expedite things by bringing it up here. I also want to know if I should have dealt with the matter differently, and if I should handle bot accounts differently in the future.

I should mention that, although most of the edits have been interwikis, a run was performed in March of 2013 that created a large number of entries for Geological era names, at least some of which (if memory serves) ended up in rfv. I don't have any objection to those entries as a whole, and they may very well be a one-time exception to the bot's normal interwiki tasks, but I thought they were worth mentioning, just to be complete. Chuck Entz (talk) 21:58, 19 July 2014 (UTC)

Hello, just to precise that my March summaries were pointing to their BP permission. JackPotte (talk) 22:48, 19 July 2014 (UTC)
I think JackBot failed either two or three bot votes. Renard Migrant (talk) 21:20, 23 July 2014 (UTC)
Precisely and objectively I had already proposed here two bot jobs which had been judged unnecessary by a minority:
  1. Wiktionary:Votes/bt-2009-12/User:JackBot
  2. Wiktionary:Votes/bt-2010-11/User:JackBot2
But they could also have been useful as on 21 other wikis as you can see, and are not linked to the test for which I was indefinitely blocked after without a message (which is not praised in the current recommendations as I've already demonstrated in the dedicated template).
Moreover I used to published my scripts on the bot subpages and Github, if you want to make your own idea of the whole context apart from that. JackPotte (talk) 19:30, 24 July 2014 (UTC)

Factually after three weeks nobody has treated Category:Requests for unblock and the blocker is simply ignoring my follow-up on his page. Is the Wiktionary policy something that is not applied? JackPotte (talk) 17:24, 9 August 2014 (UTC)

  • The above-mentioned votes (1, 2) were interpreted at the time as showing "no consensus" for having JackBot perform the tasks which were the subjects of those votes. But firstly I think some of the votes/subvotes might have been interpreted as passing had they been held under present [unwritten] "rules"/sentiments (as they show 2-to-1 support), and secondly we have (as someone observed in another thread) come since that time to realize how useful it is to have multiple interwiki bots. Is there anyone who objects to JackBot adding interwikis? Is there anyone who would like to vocally support JackBot adding interwikis? I am willing to unblock the bot if there is support for that. - -sche (discuss) 01:26, 12 August 2014 (UTC)
Lmaltier (talkcontribs) have seen my work for six years in fr.wikt, I would be glad to him if he could testify here... JackPotte (talk) 09:59, 12 August 2014 (UTC)
You ask me to witness. I must say that JackBot sometimes does changes on pages created by others, without any prior discussion, and despite my explicit opposal. If the flag is given, it should be strictly limited to tasks explicitly approved by the community on a case by case basis. Lmaltier (talk) 19:07, 3 September 2014 (UTC)
I support JackBot adding interwikis, as long as they are added correctly, in the right order, etc. I also think we should have User:Ruakh's input as the operator of one of the interwiki bots. --WikiTiki89 13:56, 12 August 2014 (UTC)
I've slowly come to that conclusion myself. It's hard to remember the exact reasons for the block 9 months later, but at the time I was aware of the deletion of some of the geological entries, but had forgotten about any discussions here re: permission to do a bot run. When I saw that he had been refused the bot flag in the past, I apparently thought the block would be the proper way to prompt him to come here to discuss it per WT:BOT. I overlooked the fact that the bot was only doing interwikis, a gray area that we intentionally leave open in our bot enforcement practice. I brought the matter here because I thought there was a possibility that I had made a mistake, but I wasn't sure. I would support unblocking as long as the bot sticks to interwikis. Chuck Entz (talk) 14:15, 12 August 2014 (UTC)
I do not think that interwikis are a gray area that we leave open. You were absolutely correct to block the unapproved bot and direct its owner to seek the bot flag. (And similarly for the geological-entries case. Even if the run had consensus, it should not have been performed without the flag.) —RuakhTALK 20:11, 14 August 2014 (UTC)
  • Incidentally, I do support having more interwiki-bots. There was a time when we decided that we didn't need more than one, because the Interwicket/Rukhabot dump-based approach covered them all — and in mainspace that's true — but since then we've eased up on that, realizing that there's little harm in having more-traditional/Wikipedia-style interwiki-bots as well. The two are not in competition. And the more-traditional/Wikipedia-style interwiki-bots can cover categories and appendices, which the Interwicket/Rukhabot dump-based approach cannot (or at least, currently does not). —RuakhTALK 20:18, 14 August 2014 (UTC)
    @Ruakh I have created Wiktionary:Votes/bt-2014-08/User:JackBot for bot status. Please instruct me if there are some deficiencies with the vote, such as details on how the interwiki is to proceed. --Dan Polansky (talk) 20:24, 14 August 2014 (UTC)

Russian pronunciation - standard, alternative, regional, dated or simply individualEdit

User:Wikitiki89 has been persistently adding alternative Russian pronunciations, which I consider not only non-standard but individual and rare, possibly limited to immigrants. He has been very persistent in his edits, so any reversals just results in edit wars. I have no problem with having alternative non-standard forms but Russian is much more phonetic than he claims it to be, so if you pronounce it irregularly, you can spell it so, there are notable (well-documented exceptions), which also follow certain rules or patterns but there is some limit to irregularities. I tried to compromise by creating alternative non-standard forms but he insists on adding these irregular pronunciations on the regular entries. In particular he claims that these words are alternatively pronounced:

  1. капюшо́н (kapjušón) as капишо́н (kapišón)
  2. двою́родный (dvojúrodnyj) as двою́рный (dvojúrnyj)
  3. во́доросль (vódoroslʹ) as во́дросль (vódroslʹ)
  4. не́который (nékotoryj) as не́кторый (néktoryj)
  5. сейча́с (sejčás) as щас (ščas) (I'm OK with this one but still the casual pronunciation should belong to the alternative forms, since it exists)

Another claim was that бюрокра́тия (bjurokrátija) can be also pronounced as бирокра́тия (birokrátija), which I find quite ridiculous and he's using кверх нога́ми (kverx nogámi) as the first translation for upside down, "кверх нога́ми" sounds very rustic and illiterate to me (even if this form can be found on the web), вверх нога́ми (vverx nogámi) is the common and standard form. These alternative forms do exist but they are not as common and these pronunciations are neither standard nor common. In any case, the alternative pronuncations, IMO, belong to alternative forms. I am creating a request on, since I don't know how to handle this situation. The English Wiktionary doesn't have enough native Russian speakers, so I'm not sure this argument can be resolved. On the Russian Wiktionary such edits would be ultimately reverted. I don't claim to be the ultimate source for the Russian language but some Russian edits of Wikitiki89 surprise me. Sorry, I don't mean to insult him or something. My goal is accuracy. --Anatoli T. (обсудить/вклад) 00:43, 21 July 2014 (UTC)

Let it be known that (1) Anatoli's Russian and my Russian are from different regions, (2) I was raised in a highly educated environment and could not possibly have picked up any "illiterate" Russian, and (3) I have been willing to discuss each of the above cases individually with Anatoli and don't see a need for BP discussion. --WikiTiki89 01:44, 21 July 2014 (UTC)
1) My family's accents are a mixture of south Russia, Ukraine and Siberian accents. Due to education exposures and self-discipline as far as the language I speak standard and common Russian, not southern or Ukrainian Russian, travelled a lot in Russia, read many books and watched a lot of movies, videos, etc. My Russian is not regional at all and I can tell when Russian is regional or non-standard. And since I lived till I was 30 in Russia, speak Russian with my family, friends and communicate with Russians in Russia, I have been exposed to various accents. I'm sure I can judge what is right and what is wrong in Russian to a high degree but as I said, by no means, I don't consider myself an ultimate source. Having said this, I humbly consider my Russian significantly better than his. 2) It is quite commendable for a long-time emigrant, who left Russia in the young age to preserve the language but there are still small problems, which show in the edits and I don't think we should allow misleading info. 3) the discussions so far have not been very fruitful and edit-warring has happened on a number of entries. As an interim solution, I suggest to source irregular pronunciation with something other than plain Google searches. As I said, I don't oppose any non-standard form entries, which I have also created. --Anatoli T. (обсудить/вклад) 02:05, 21 July 2014 (UTC)
And how should I do that? With links to YouTube videos? --WikiTiki89 02:17, 21 July 2014 (UTC)
Not sure yet. Maybe Youtube, if the pronunciation is clear and the speakers are native speakers. --Anatoli T. (обсудить/вклад) 02:23, 21 July 2014 (UTC)

I support moving non-standard pronunciations to the entries with non-standard spellings. --Vahag (talk) 08:57, 21 July 2014 (UTC)

This would resolve some disagreements, since Russian pronunciation is much more regular and most irregularities are documented. It doesn't matter that much if a spelling is used more often than pronunciation or the other way around. Just need to create those non-standard forms. The irregular spelling is used quite often used to render the irregular pronunciation and the existence of irregular spellings can usually be easily found. --Anatoli T. (обсудить/вклад) 09:11, 21 July 2014 (UTC)
On the other hand, most people who use the colloquial pronunciations, use only the formal spellings. --WikiTiki89 11:55, 21 July 2014 (UTC)
Yes but even if a person reads "what's up" as "wassup", it doesn't mean that "what's up" should have the same pronunciation. It's better to separate regular and irregular pronunciations and spellings, especially when a form is definitely a different (older, colloquial, regional) of another one, like капюшо́н (kapjušón) and капишо́н (kapišón). --Anatoli T. (обсудить/вклад) 22:52, 21 July 2014 (UTC)
"What's up" and "wassup" is a bad example, because it is colloquial either way and so people will write it exactly as they say it. This is more like environment being pronounced like enviorment (which is citeable in google books:"enviorment"). Most people who say enviorment, still write environment, which is why it makes sense to have the pronunciation right there. --WikiTiki89 23:32, 21 July 2014 (UTC)
Well, it depends on the case and if you can make this type of judgement. If you consider "wassup" a bad example, then "капишон" is worse. It's a dated form, not an alternative pronunciation, since "пю" is never read as "пи" as you insisted. --Anatoli T. (обсудить/вклад) 23:51, 21 July 2014 (UTC)
I said that "wassup" is a bad example, because "what's up" is also colloquial. "Капюшон" is not colloquial, so that reason does not apply. Compare it to my example of environment. --WikiTiki89 23:58, 21 July 2014 (UTC)
I think you misunderstand. "wassup" (colloquial) should have its own pronunciation, so should "капишон" (dated) and "двоюрный" (irregular) but the regular forms shouldn't include them. "водросль", "некторый" may be considered similar to the "enviorment" case. --Anatoli T. (обсудить/вклад) 00:14, 22 July 2014 (UTC)
I did not misunderstand you. You misunderstood me. All I am saying is that "wassup" is a bad analogy because "what's up" itself is also colloquial, while "enviorment" is a much closer analogy. Would you say that the pronunciation /ɪnˈvaɪɚmɪnt/ doesn't belong at environment? --WikiTiki89 00:25, 22 July 2014 (UTC)
It does, I have already said so, so does "сечас" belong to "сейчас", "пожалуста" to "пожалуйста", also "водросль", "некторый", even if pronunciations are less common. --Anatoli T. (обсудить/вклад) 00:33, 22 July 2014 (UTC)
Am I missing something or are you agreeing with me now? --WikiTiki89 00:39, 22 July 2014 (UTC)
What I'm saying is, one needs to judge whether a pronunciation is indeed alternative or it should belong to a different spelling. /ɪnˈvaɪɚmɪnt/ and /ɪnˈvaɪɚnmɪnt/ can belong to "environment" entry. Same with some Russian words I mentioned above, e.g. сейчас as /sʲɪˈt͡ɕæs/ (=сечас). However, "капюшон" and "капишон" should definitely have separate pronunciations, like "wassup" and "what's up". --Anatoli T. (обсудить/вклад) 00:48, 22 July 2014 (UTC)
Ok, so you agree with me for "водросль" and "некторый", but not for "капишон". I'm willing to concede "капишон" for now until I get some more data on it. I have already found a few YouTube examples of the pronunciations "водросли" and "проволка" used with the spellings "водоросли" and "проволока". --WikiTiki89 01:10, 22 July 2014 (UTC)
Yes, "водросль" and "некторый" are OK, even if I don't think they are common, I found that people were surprised like me with these accents but these can be considered alternative pronunciations with a drop of vowel, which does happen. So, I'm conceding on these. I put "двоюрный" into the same bucket as "капишон", although they differ in etymology. Note, even if you find pronunciation "капишон", it still belongs to this different spelling. Just making sure you agree on the distinction. --Anatoli T. (обсудить/вклад) 01:19, 22 July 2014 (UTC)
It depends. For example, if I find a video titled "мой классный капюшон" where it is clearly pronounced "капишон", then that is (one piece of) evidence that the pronunciation does belong at "капюшон". I also think there is some confirmation bias going on. When I listen to someone say "капюшон", I hear "капишон"; and I'm sure that if you listen to someone say "капишон", you will hear "капюшон". These vowels are very close and in a short unstressed syllable, they are hard to distinguish. --WikiTiki89 01:27, 22 July 2014 (UTC)
I know what you mean. There are ways, as I suggested one can use a tool such as Audacity where you can listen in a very slow speed (you can adjust the speed). The audio should be available as an MP3 or OGG file, for example. Yes, your example with "мой классный капюшон" would work. As an example, I used Audacity to determine Chinese tones and prove my point that Chinese tones are pronounced even in quick speech. "Hard" is not impossible with technologies. --Anatoli T. (обсудить/вклад) 01:42, 22 July 2014 (UTC)
As a longtime student and user of the Russian language, I consider the Russian entries on English Wiktionary to be intended for native English speakers who are interested in a Russian word or who are studying Russian. As such, I see no value in putting these anomalous pronunciations here, and I think American students of Russian will take away the wrong thing from them. Such pronunciations belong in the Russian Wiktionary for the enjoyment of a native Russian audience. This reminds me of w:Charles Robert Jenkins, an American defector to North Korea. Jenkins got a job teaching English at a North Korean university, since the North Koreans wanted to learn English well enough to pass as South Korean. However, Jenkins was from North Carolina and spoke with a strong southern accent. Once the Koreans learned his English pronunciation was very odd, he was fired from his job. When people study a foreign language, they usually want to learn the best standard pronunciation. —Stephen (Talk) 03:31, 22 July 2014 (UTC)
I have nothing against properly indicating which pronunciations are standard and which are colloquial, but there is no reason to suppress information. --WikiTiki89 04:00, 22 July 2014 (UTC)
We’re not suppressing information, it’s a matter of putting the information where it belongs. This information belongs on the Russian Wiktionary. There are three major accent areas in spoken Russian ... if we wanted to see nonstandard pronunciations here, it would be far more preferable to show the pronunciations of the other two major accents, northern (with оканье among other features) and southern (with аканье/яканье among other features). But even this is really not useful to indicate on every page, and would be likely to cause confusion and damage. The northern and southern Russian accents should be described and explained with sufficient examples on Appendix pages. But the idiosyncratic pronunciations you are adding are not so useful or interesting and I would not include them on English Wiktionary at all. —Stephen (Talk) 05:55, 22 July 2014 (UTC)
I will not comment on the specifics of this discussion, as I’m very little familiar with Russian, but I support the inclusion of regional, nonstandard and colloquial pronunciations in the English dictionary. They should be tagged as such, of course. — Ungoliant (falai) 21:48, 23 July 2014 (UTC)
It's about specifics of various pronunciations. It's not so much about whether we include regional, nonstandard and colloquial pronunciations but whether they are frequent enough for inclusion (not individual, used by limited overseas communities), belong to the same spelling as the standard pronunciation. Yes, labelling is important and we do include variants. Major variations - northern "okanye" and southern "h" for "g" could be considered as well, if they are needed. --Anatoli T. (обсудить/вклад) 22:59, 23 July 2014 (UTC)

User:Wikitiki89 and proper nounsEdit

User:Wikitiki89 has been going around changing things from proper noun to common noun. In particular, he has been doing it with political factions such as Libertarian and Democrat, plus the California separatist group known as the Osos. I believe that he is in error, and I have reverted him pending discussion here. Purplebackpack89 17:34, 23 July 2014 (UTC)

Democrat, Libertarian and Oso are proper nouns
  1. Purplebackpack89 17:34, 23 July 2014 (UTC)
Democrat, Libertarian and Oso are common nouns
  1. Sure, Democrat, Libertarian and Oso are common nouns. Just like many of the items at Category:English words suffixed with -ian. Just like Frenchman, Popperian, or Clintonite. --Dan Polansky (talk) 17:58, 23 July 2014 (UTC)
Please take a look at our POS for Englishman, American, Frenchman, and many more, none of which I have ever edited. --WikiTiki89 17:37, 23 July 2014 (UTC)
But you have mass-edited a number of pages in the last hour or so after I mentioned Democrat was tagged as a proper noun, and have edit-warred with me to keep them common nouns. You should stop changing pages until this discussion is over or you've linked me to another beer parlor discussion that supports your POV. Purplebackpack89 17:43, 23 July 2014 (UTC)
I do not need your permission to make changes that we have had a consensus on for a long time. --WikiTiki89 17:45, 23 July 2014 (UTC)
If you claim such consensus has existed for a long time, the least you can do is provide a link to that discussion (and the discussion from earlier this month is a) still going, and b) not at consensus at the moment). And if the last discussion with consensus was indeed a long time ago, then it may not hold now and it is perfectly acceptable to revisit it. Particularly if the discussion was about some subset of nouns that are different from this subset. Purplebackpack89 17:50, 23 July 2014 (UTC)
Take your pick. --WikiTiki89 17:55, 23 July 2014 (UTC)
Wikitiki89 is correct; these are common nouns, like Briton and Nazi. - -sche (discuss) 18:01, 23 July 2014 (UTC)
I agree, and so do the professionally edited dictionaries in which I just checked Frenchman (Chambers, Merriam-Webster, OED). The test of "properness" of a noun is not, of course, just whether it has a capital letter! Equinox 18:56, 23 July 2014 (UTC)
So a user has been going round making correct edits. Why are we discussing this? Are there so few correct edits nowadays we need to have threads to discuss them in? Renard Migrant (talk) 21:19, 23 July 2014 (UTC)
Democrat isn't the faction (as you put it) Democratic Party is the faction, perhaps that's what's causing the confusion here. Renard Migrant (talk) 21:26, 23 July 2014 (UTC)

Can we get 'particularly useful translation target' into CFI?Edit

From Wiktionary:Requests for deletion#emergency physician (later: Talk:emergency physician) two users want to keep outside of CFI as a translation target. I worry about translation targets as a bit of a slippery slope issue. Do we want an entry in English for everything that can be expressed as a single word in one other language? No. Because then we'd end up with he had had in his possession a bunchberry plant (I'm not kidding, see xłp̓x̣ʷłtłpłłs). Is there any way to regulate this? There's a further issue, translation is necessarily subjective so what one person might translate with a two-word noun, I might translate with a slightly different two-word noun. It's tricky.

As a completely separate issue, I've noticed that entries de facto don't need to meet CFI. They just need to not get nominated for deletion or get nominated and pass with a consensus even if they don't meet CFI. I suppose that's why serious efforts to amend CFI into something usable have failed. It's easier to just keep on ignoring it. Renard Migrant (talk) 21:16, 23 July 2014 (UTC)

You're setting up a strawman. No one has ever been proposing to have he had had in his possession a bunchberry plant only because there is a single entry like xłp̓x̣ʷłtłpłłs. If we were after a formal strict set of criteria for translation targets, we would take care to handle these sorts of languages. --Dan Polansky (talk) 21:36, 23 July 2014 (UTC)
I'm not setting up a strawman. I'm saying we would need criteria and you seem to be agreeing. Renard Migrant (talk) 21:45, 23 July 2014 (UTC)
I agree; this practice should be codified. My first suggestion is that it should be used for lexemes, not individual forms, with distinct meaning (i.e., let’s not add I will do because farei, haré etc. exist nor translations of the “sentence-words” of polysynthetic languages). — Ungoliant (falai) 21:40, 23 July 2014 (UTC)
Thirded. We should codify "hot words" too. The problem always is that there are so many issues and so few people willing to tackle them. And often people get distracted before we reach anything conclusive. Keφr 22:10, 23 July 2014 (UTC)
  • Support: Purplebackpack89 22:12, 23 July 2014 (UTC)
  • Support as well. I would also support clarifying CFI in general to make it less opaque and more friendly to people not familiar with Wiktionary. Ideally, it should be written in such a way that someone who has spent only a day using Wiktionary (as a reader, not a contributor) should be able to understand enough of it to not do anything really bad. —CodeCat 22:16, 23 July 2014 (UTC)
  • Support as well. Also, I think we need to include Wiktionary:Lemming_principle#Lemming_test. What about back-translations from English (for lexemes only, as per Ungoliant's comment above)? Terms such as па́лец ноги́ (pálec nogí) and 足の指 (ashi no yubi), etc. have passed RFD, both non-idiomatic translations of toe, literally "finger of the foot". Such terms do penetrate various dictionaries, since "toe" exists in English, what's the word for it in language X? --Anatoli T. (обсудить/вклад) 22:59, 23 July 2014 (UTC)
    • To note: the closing comment at Talk:палец ноги and Talk:足の指 are "Kept: no consensus to delete either entry" and "Kept: no consensus to delete" respectively. Keeping due to "no consensus" is a rather weak outcome in my opinion, and I always had the impression that these "no consensus" entries are more open to renomination than those with clear consensus to keep. This is hardly "passing RFD". Keφr 23:17, 23 July 2014 (UTC)
I know. "No consensus" is not a strong case for closing RFD. Still, they are kept for now. With proper formatting (a soft redirect?) and labelling, they may be a bit more palatable. They are not idiomatic by definition and if they are only there to point users to how an English term is translated, there may be some room for them here. --Anatoli T. (обсудить/вклад) 06:59, 24 July 2014 (UTC)

Looking to get AWB privilegesEdit

Hello. I'm relatively new to Wiktionary but I've been active on Wikipedia for a long time. I've been working on Old French verbs and there are a bunch of changes I'd like to make that are too painful to do without an automated regex tool like AWB -- basically, to change the templates used for conjugating a number of verbs. Could someone add me to the list of registered AWB users? Thanks.

Benwing (talk) 05:32, 24 July 2014 (UTC)

I have added you to Wiktionary:AutoWikiBrowser/CheckPage#Approved_users. —Stephen (Talk) 06:59, 24 July 2014 (UTC)
Awesome, thank you. Benwing (talk) 08:50, 24 July 2014 (UTC)

Are phrases lemmas?Edit

Entries marked as {head|xx|phrase} are currently listed in the main lemmas category. Are they really lemmas? I think they should be in a Phrases subcategory under the main lemmas category. --Panda10 (talk) 12:33, 24 July 2014 (UTC)

They are lemmas because they are not a form of another lemma. —CodeCat 12:40, 24 July 2014 (UTC)
I agree, though it is sometimes hard to identify the lemma properly, as such multi-word entries, especially with verbs, are, at essentially defective, or at least have a dramatically different distribution of use across inflected forms. DCDuring TALK 13:51, 24 July 2014 (UTC)
The problem is really that we're not using the term "phrase" properly on Wiktionary. In many cases, it seems that "sentence" is the more appropriate term. See w:Sentence (linguistics). —CodeCat 14:05, 24 July 2014 (UTC)
I'm not sure about the "They are lemmas because they are not a form of another lemma" argument. Phrasebook entries such as I don't understand or this morning clutter up the lemma category and will contribute to an inaccurate count of lemmas. --Panda10 (talk) 13:05, 25 July 2014 (UTC)

Including sum-of-parts termsEdit

One of the reasons against including sum-of-parts terms is that they are counter-productive in defining the term by picking and choosing some of the senses of the component parts, thus under-emphasizing the other senses. Listing all possible combinations would cause too much duplication of information, which is bad for a number of reasons; for example, adding or modifying a sense of the component parts would require also adding or modifying one or more senses of the whole term as well. When we include sum-of-parts terms, we often try to make them sound more idiomatic by making the definition more specific than it needs to be.

On the other hand, there are many reasons to include some sum-of-parts terms:

  • They are defined in other dictionaries and/or people are likely to look them up: random number
  • They have useful translations into other languages: last year
  • They happen to be spelled as one word or have alternative spellings as one word: coal mine, unhelpful (un- + helpful)
  • They have unusual etymologies, pronunciations, or other useful information: (can't think of any at the moment, but I know they exist)
  • They are non-obvious in the encoding direction, even if they are obvious in the decoding direction: and so on and so forth

We have provisions, some of which are controversial, for keeping some of the types of words listed above, but not for all. We also have endless RFD debates about keeping words "outside of CFI".

I think a compromise is needed and I propose allowing the inclusion of some sum-of-parts terms that we decide would be useful to include, but without real definitions, similar to what we already do for translation targets. This can apply to terms included through WT:COALMINE, as well as simple cases of prefixes and suffixes, where a full definition has very little benefit over linking to the component parts. Here are some examples I created: User:Wikitiki89/coal mine, User:Wikitiki89/unhelpful, User:Wikitiki89/and so on and so forth.

--WikiTiki89 15:35, 24 July 2014 (UTC)

The problem I see with your example for "coal mine" is that it requires prior knowledge of the term to understand how to interpret the parts of the term. There is nothing in your entry that specifies that it's the sense "excavation" that is meant, rather than "explosive device". This is exactly why we need a full definition for it and other similar entries. If a term were truly SOP, then it could be validly be interpreted and used as any possible combination of its parts' meanings. But the reality is very different, such terms usually have much more restricted uses. —CodeCat 15:48, 24 July 2014 (UTC)
Another more general issue is that we seem to treat "idiomatic" and "SOP" as antonyms where they often are not. and so on and so forth is definitely idiomatic, even if it may also be interpretable as a sum of parts. Idiomatic phrases often translate into idioms in other languages, but we are sorely lacking translations for such terms thanks to our overly strict focus on deleting SOP terms. —CodeCat 15:50, 24 July 2014 (UTC)
But that's the thing about SOP, a "coal mine" could be an explosive device made of coal (also, as I said, we will only do this "where a full definition has very little benefit over linking to the component parts"). As to your second point, that is why I did not use the word "idiomatic" here. --WikiTiki89 15:54, 24 July 2014 (UTC)
I think we need to consider whether a term is a term of art in a specified field. For example, genuine issue of material fact is SoP to one who knows which senses of each term are intended, but is also a set phrase used in the law, and one that can not be substituted for other phrases. I think that if a general dictionary has a phrase, we should have it, and if a specialized dictionary (legal, medical, engineering, slang, etc.) has a term, then we should have it with the appropriate context label. Context labels go a long way towards eliminating the problem of "picking and choosing some of the senses of the component parts" because they indicate that when this phrase is used in this field it only refers to the specified senses of the words included. bd2412 T 15:59, 24 July 2014 (UTC)
That's the idea here: we will have the phrase, but we will link it to the component parts. Note that we can consider "material fact" to be one part rather than two, and possibly likewise for "genuine issue" if it is in fact a set phrase outside of this term. --WikiTiki89 16:03, 24 July 2014 (UTC)
For that example, I'm not aware of "genuine issue" being used outside the complete phrase. Our definition of genuine actually doesn't really capture the meaning used here (an actual controversy between the parties, rather than the facade of a controversy designed to test the law). It is sense 11 of issue. However, I generally think that a veneer definition requiring readers to look at two or three different entries to figure out the complete meaning of a term would be a needless inconvenience. bd2412 T 16:28, 24 July 2014 (UTC)
It's less of an inconvenience than the inconvenience of finding incomplete information presented as if it were complete. --WikiTiki89 16:32, 24 July 2014 (UTC)
That is where I think a context tag helps. If you are talking to a geologist or a civil engineer or a utility company about a coal mine, then there is only one relevant meaning, and the information presented is complete within that context. We could, for all of SoP definitions that are set phrases within a particular context, have an &lit sense, so that we can inform readers that when used other than in the sense of industry or geology, "coal mine" can mean any combination of coal and mine. bd2412 T 17:04, 24 July 2014 (UTC)
The word there is only one relevant definition of "mine" when talking to a geologist or civil engineer; this has nothing to do with the preceding word "coal". --WikiTiki89 17:22, 24 July 2014 (UTC)
Coal mine is a bad example for this point, since it only exists due to coalmine. If "coalmine" didn't exist, I would agree to deleting "coal mine" as readily as "copper mine" or "uranium mine". However, this principle is directly applicable to random number, which in the context of mathematics will never mean a "slapdash and seemingly directionless performance of a dance routine within a larger show". bd2412 T 13:24, 25 July 2014 (UTC)
That's one of my points. Since we are only including coal mine because of coalmine, it does not need a real definition, so we can just link to its parts. --WikiTiki89 13:29, 25 July 2014 (UTC)
Are we still going to have a complete definition at coalmine? I wouldn't object to coalmine being an "alternative spelling of" template and coal mine being bare links, but I don't think coalmine can be used to describe a military mine that runs on coal, so something would be getting lost in the sequence there. bd2412 T 15:16, 25 July 2014 (UTC)
I think there's a bit of a slippery slope here. Your test page decomposes "unhelpful" into [[un-]] + [[helpful]], but there's nothing stopping it from being decomposed into [[un-]] + [[help]] + [[-ful]]. Is [[electricity]] then SOP too, as [[electric]] + [[-ity]]? Is [[nothing]] just [[no]] + [[thing]]? Are full definitions only for monomorphemic words? —Aɴɢʀ (talk) 16:32, 24 July 2014 (UTC)
Most multimorphemic words are not simply SOP of their morphemes. Out of your examples, only [[nothing]] can actually be defined as just [[no]] + [[thing]], but then it is for us to decide whether it is beneficial to do so in each specific case. --WikiTiki89 16:37, 24 July 2014 (UTC)
I disagree that "nothing" is the only one that is SOP of its morphemes, but either way, I think it would create far too much work for us to decide on a case-by-case basis which polymorphemic words are SOP of their morphemes and which aren't. It's hard enough for us to decide that for multi-word expressions as it is. —Aɴɢʀ (talk) 17:20, 24 July 2014 (UTC)
There isn't much deciding to do. If the definition at the term is clearly equal to the component definitions, then you can replace it with a reference to each component. If someone later decides that that definition is inadequate, he could replace that with an adequate definition. No huge RFD discussions are even required. --WikiTiki89 17:26, 24 July 2014 (UTC)
@Wikitiki89 It can be used with that meaning, yes, but Wiktionary concerns itself only with attestable meanings. So the question we should be asking is: is it used with that meaning? Does coal mine ever mean "explosive device made of coal"? I would be very surprised if it did, precisely because its main sense "excavation for mining coal" is so much more common and using it in any other sense would cause confusion. So in reality, "coal mine" is much more restricted in meaning than its parts allow, which makes it idiomatic and hence includable per CFI. —CodeCat 17:17, 24 July 2014 (UTC)
It's probably possible to attest that meaning. --WikiTiki89 17:22, 24 July 2014 (UTC)
"Probably" isn't good enough for an RFV, though. If our current entry was like your proposal, I could validly RFV all senses that arise from the possible combinations of meanings of its parts. And many of them would likely fail, which would then mean we would have to put in a more specific, limiting definition. —CodeCat 10:17, 25 July 2014 (UTC)
If you want to find citations, I will. Anyway, something like "see coal, mine" (however we choose to format it) does not imply that all combinations exist, so it is not necessary to narrow it down. --WikiTiki89 10:57, 25 July 2014 (UTC)

My feeling about a set phrase is a bit like the US Supreme Court judge's feeling about hard-core pornography: I can't define it, but I know it when I see it. Some are little more than common collocations, but when they are common enough, especially within a particular field or arena, then to me they start to ‘feel’ like single concepts and not two concepts stuck together. This is unscientific but I'm just trying to explain my process. The CFI tests are good ways to check if something is a set phrase, but sometimes a term can fail all of them and still demand coverage (at least to my mind). DCDuring's ‘lemming test’ is, I think, valuable because it gives us a rationale without having to explain exactly why something should be kept. The weird thing is that when I first joined Wiktionary, I was a firm deletionist. I thought that entries like fried egg and Egyptian pyramid were a waste of time. But over the years I have slowly done almost a complete 180. My feelings in general now are that if there is a significant minority of people who see value in an entry, then we lose nothing by keeping it. Ƿidsiþ 16:58, 24 July 2014 (UTC)

The whole point of my proposal here is to allow us to keep these set phrases, without duplicating their wide range of definitions from the component parts. --WikiTiki89 17:02, 24 July 2014 (UTC)
I don't object to it on principle in some cases, though not necessarily routinely. There is also the issue that if a multi-word term has more than one meaning, we would presumably want to split the two senses so as to show quotation evidence for each one, and then you would have to write some kind of meaningful definition. Ƿidsiþ 17:14, 24 July 2014 (UTC)

Actually, the important word in your proposal is term. If they are terms of the language, if they belong to its vocabulary, they should be includable, SOP or not. Lmaltier (talk) 17:42, 9 August 2014 (UTC)


Misspellings are recognised as lemmas by {{head}}, but that doesn't quite seem right. They have their own parts of speech of course, so they should probably use the normal POS categories and templates like {{en-noun}}. But I imagine some might object to this because they are supposedly not "proper". Recently I created rediculous, which is the spelling I normally use, and which is quite easily CFI-attestable. But I opted to call it an alternative spelling, because it didn't seem right to label a spelling I use normally a "mistake". So I have been wondering whether labelling things as "misspellings" does not go against the descriptivist philosophy of Wiktionary. What we really mean is that these spellings are commonly proscribed, but they are probably not considered misspellings by the people that use them. So what do other editors think of this situation? Should we categorise them simply as misspellings, or should we give the proper POS? And should we continue to label them as "misspellings" or change the wording to something more descriptive?

As a side note, the template {{misspelling of}} originally said common misspelling, but I removed this because it looked silly for entries like animalike. —CodeCat 10:25, 25 July 2014 (UTC)

Rediculous is certainly a misspelling. I think the criteria for that should have something to do with whether most people who use it would admit that it is a misspelling if shown the correct spelling. --WikiTiki89 11:02, 25 July 2014 (UTC)
Well that doesn't include me, because I think the spelling "rediculous" makes more sense. It better reflects how it's pronounced, and that's probably what all the other people think too. —CodeCat 11:04, 25 July 2014 (UTC)
I realize that it does not include you, but I do think that it includes most people. I also think that the main influence of this spelling is not the pronunciation, but the abundance of word initial re- compared to the relative rarity of ri-. --WikiTiki89 11:12, 25 July 2014 (UTC)
That's bizarre: "littel" (little) would make more sense for pronunciation, but everybody knows that's not how English spelling works. Which other words do you respell for this reason? Equinox 12:07, 25 July 2014 (UTC)
The difference is that it was not a conscious effort to change the spelling based on some reasoning. I just wasn't acutely aware of how other people spelled it, and I spelled it the way I figured it would make the most sense. It's only after I found out how people write it that I figured, my way is fine too. —CodeCat 12:11, 25 July 2014 (UTC)
As to the question of whether something we agree is a misspelling should count as a lemma. I would think the answer to that is simply NO.
Perhaps we need to also review items in our English alternative spellings categories to root out miscategorized entries. We serve users well by misleadingly characterizing common misspellings as alternates. After all we are supposed to only have common misspellings. AFACIT rediculous is not even a "common" misspelling. It occurs 3 times in BNC/COCA combined vs nearly 8,700 occurrences of ridiculous. Results are similar in Google Books and Google N-gram. DCDuring TALK 11:17, 25 July 2014 (UTC)
Why only common misspellings? Why not just any that are attestable per CFI? And why should misspellings not be lemmas? They have plurals and other inflections like any other lemma might have. —CodeCat 11:34, 25 July 2014 (UTC)
It has been our practice to do so because the number of attestable misspellings of common words probably exceeds by far the number uf axepted [spelins. DCDuring TALK 12:02, 25 July 2014 (UTC)
I do hope that reasoning distiguishes between accidental mistakes, deliberate respellings, and deliberate and consistent spelling variants that are intended as normal use. We should definitely have the latter no matter how common, per descriptivism. For the former two, I think a criterium for commonness is ok. —CodeCat 12:07, 25 July 2014 (UTC)
No it does not and should not. We are documenting the set of conventions called language. DCDuring TALK 12:12, 25 July 2014 (UTC)
That would make sense if everyone followed the same conventions, but clearly they don't. If labelling something a misspelling is a matter of one group disagreeing with another group about the spelling, then why can we not label things like color as misspellings? My point is just that: Wiktionary cannot and should not decide what is a misspelling, and clearly mispelling-ness is not strictly defined as there are varying opinions about it. So what I ask for is clear criteria, which are verifiable, that can be used to decide when the label "misspelling" should be used. If Wiktionary is descriptive (which it is), then a label like "misspelling" should describe some objective verifiable reality, not subjective opinion. —CodeCat 12:17, 25 July 2014 (UTC)
I completely agree with CodeCat here. There is a difference between a misspelling most likely caused by the writer's clumsy typing alone (e.g. typign), a misspelling caused by the writer most likely not knowing how to spell the word (e.g. independance), and a misspelling most likely caused by the writer's intentional choice to use a variant in order to achieve a literary effect like showing snarkiness or dialect (e.g. rediculous, "gawn to the sto'"). The only typo we should include is teh, because its commonness has turned it into a word intentionally used in jest. The second kind we should include if they are common enough that a reader would want them defined, so we can inform the reader in our definition that this is not the correct spelling. The third kind we should include if they are attested, because their specialized use makes them subtly different words in terms of the definition itself. bd2412 T 12:52, 25 July 2014 (UTC)
Yes but CodeCat isn't saying that exactly, he's saying he (or she, not sure) continues to use ‘rediculous’ because he thinks it's ‘more logical’ and therefore it shouldn't be called a misspelling. Ƿidsiþ 13:00, 25 July 2014 (UTC)
I mean to agree with CodeCat's comment immediately preceding my response. But his earlier point is also valid. Isn't that why we have thru and tho? bd2412 T 13:19, 25 July 2014 (UTC)
My objection concerning "rediculous" specifically is that it didn't seem like a misspelling to me, just an uncommon alternative spelling. The "misspelling" part lies only in the proscription against it. This is why I consider "misspelling of" to be equivalent to "(proscribed) alternative/rare spelling of". Whether something is a misspelling is subjective, but widespread proscription against a certain spelling is objective and can be verified at least in theory. Proscription can wane as forms become more accepted, and people will no longer consider them wrong. So I think we should replace "misspelling of" with something else that makes that more clear. Something like "proscribed spelling of" - this fits with how "(dated)" + "alternative spelling of" gives "dated spelling of" and similarly for other usage labels. —CodeCat 13:41, 25 July 2014 (UTC)
I would consider something an "alternative spelling" if a significant number of people believe that it is the correct spelling, even if others proscribe it. --WikiTiki89 13:43, 25 July 2014 (UTC)
Then what about {{rare spelling of}}? —CodeCat 13:48, 25 July 2014 (UTC)
I would consider that an equivalent of {{cx|rare}} {{alternative spelling of}}. If a spelling is considered by almost everyone to be a misspelling, then it we should label it as such. --WikiTiki89 13:53, 25 July 2014 (UTC)
Do you think there is a difference between {{context|proscribed}} {{alternative spelling of}} and {{misspelling of}}? —CodeCat 14:18, 25 July 2014 (UTC)
Yes, something can be proscribed by some people and accepted as correct by others. --WikiTiki89 14:24, 25 July 2014 (UTC)
Does that mean that to you, a misspelling is accepted by nobody? —CodeCat 15:01, 25 July 2014 (UTC)
By no significant group at least. Note that I'm saying what the intrinsic criteria are, even if it may be impossible for us to determine whether this is the case or not. --WikiTiki89 15:17, 25 July 2014 (UTC)
I just noticed an entry that uses "misspelling" as the second parameter of {{head}}, i.e. uses "misspelling" as if it were a part of speech: [[aqui]]. I do not recall noticing this before. My initial reaction is that such entries should declare their actual part-of-speech, which in [[aqui]]'s case is "adverb". But I can also see how that would "pollute" the part-of-speech (and "lemma") categories with non-words (to whatever extent we use "misspelling" to describe things that are actually misspellings/mistakes, as opposed to intentional alternative spellings), and so I can see an argument for continuing to not put them into the POS categories.
Regarding the wording of the template: I think the idea behind including the word "common" was that it would emphasize and enforce our exclusion of rare misspellings. In practice, however, rare misspellings were including using the template anyway, so removing the word was probably good.
Similar to BD, I distinguish three categories of nonstandard spellings: (1) typos or typo-like misspellings, which are distinguished by (among other things) not being used consistently throughout a work, and which are not includable, (2) misspellings, or mistaken spellings, and (3) intentionally deviations from standard spelling, which we handle through templates like {{alternative spelling of}} and {{eye dialect of}}. (Re "teh": in my opinion, teh is includable because it has come to be used intentionally, and so it does not constitute an exception to the exclusion of typos.) - -sche (discuss) 02:51, 26 July 2014 (UTC)

Use of babel templates from other wikisEdit

I was going to create Category:User eml, but saw that it's based on a language code we don't recognize (it was split into egl and rgn). That led me to wonder why we had Category:User eml-3 and Category:User eml-N. It turns out that there are a couple of user pages that have {{#babel:it| which means they're using the Italian Wiki's babel system, which apparently recognizes some language codes we don't, and that this prompts User:Babel AutoCreate to re-create categories that we had deleted.

Is this ok, and, if not, what should we do about it? Chuck Entz (talk) 19:19, 25 July 2014 (UTC)

The script was actually blocked twice for creating categories like this, once by someone who though it was a bot and once by someone who seemed to think it was a live user. As I noted when I unblocked it, the solution that's most obvious to me is to salt the categories we don't want by protecting them such that only admins can re-create them. Alternatively, we could allow people to specify fluency even in things we don't consider languages, and specially categorise the categories, e.g. we could allow Category:User eml and put it in Category:User egl and Category:User rgn. - -sche (discuss) 20:15, 25 July 2014 (UTC)
We also ended up with Category:User simple, Category:Romany language, Category:Traditional Chinese language, Category:British English language and Category:Simplified Chinese language thanks to this script. The categories don't exist, but they do have entries in them. —CodeCat 20:46, 25 July 2014 (UTC)
All of those except the first one were due to mistaken hard-coded categories, which it was simple to fix (e.g. 'Romani' was misspelt, I corrected it). We could continue to delete and "salt" those categories even if we decided to allow categories for retired language codes. - -sche (discuss) 02:58, 26 July 2014 (UTC)

Proposed compromise votes on romanizationsEdit

Since the various recent votes on romanizations have failed to achieve a consensus, I have drafted two compromise votes incorporating some ideas that had some traction in the various discussions. These are Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit and Wiktionary:Votes/pl-2014-07/Redirecting attested romanizations. Cheers! bd2412 T 02:50, 27 July 2014 (UTC)

Middle VietnameseEdit

Thanks to Mxn, we have two 'Middle Vietnamese' entries, and . (trên and trời also mention Middle Vietnamese in their etymologies.) However, because Middle Vietnamese doesn't have its own language code, they just use the code and templates of Vietnamese. Is Middle Vietnamese different enough from modern Vietnamese thay it should have its own code and templates, or should the two entries be switched to use 'Vietnamese' headers? - -sche (discuss) 18:04, 29 July 2014 (UTC)

See also Wiktionary:Grease pit/2014/April#Middle Vietnamese. – Minh Nguyễn (talk, contribs) 10:06, 13 August 2014 (UTC)
Aha, thanks for the link. I've commented there. - -sche (discuss) 21:12, 14 August 2014 (UTC)

Inclusion of DothrakiEdit

Thanks to one user, Wiktionary currently includes in the tables and subpages of Appendix:Dothraki more than 100 words from Dothraki, an artificial language created a few years ago for the television series Game of Thrones. According to Wikipedia's sources, Dothraki only contains a few thousand words, so Wiktionary is including a substantial part of it. notes that "All extant words in the Dothraki language are copyright of HBO, as is the text and audio of the language documents provided to HBO by the LCS." notes that "Living Language, in conjunction with HBO Global Marketing, is publishing a guide on Dothraki this October! You can preorder the book [...]". Is Wiktionary violating HBO's copyright and competing with or harming Living Language by publishing for free information which they intend to sell? Even if the answer to the previous question is 'no', should Wiktionary be including Dothraki? - -sche (discuss) 18:08, 29 July 2014 (UTC) (added last sentence making the more basic question explicit - -sche (discuss) 18:26, 29 July 2014 (UTC))

I don't know who wrote that wiki but "All extant words in the Dothraki language are copyright of HBO" seems like nonsense. You can't copyright a word, can you? Equinox 18:12, 29 July 2014 (UTC)
The Loglan-Lojban dispute seems to suggest that you can copyright (a collection of) words, provided you're the one who created them in the first place. - -sche (discuss) 18:16, 29 July 2014 (UTC)
(e/c) You can copywrite a language. I oppose the inclusion of Dothraki, since as far as I know it has no community of speakers. --WikiTiki89 18:17, 29 July 2014 (UTC)
Some Dothraki words existed before the series was created. They are present in George Martin’s books. — Ungoliant (falai) 18:15, 29 July 2014 (UTC)
...which are copyright GRRM. - -sche (discuss) 18:16, 29 July 2014 (UTC)
Putting on the intellectual property attorney hat. Yes, the language as a whole is subject to copyright because it is a substantial element of a creative work. We can as a matter of fair use present "Dothraki" words that seep into general usage (comparable to the Klingon Qapla'). We most definitely can not make an appendix listing a large number of such words that do not meet this criteria. I recommend speedy deletion. bd2412 T 19:10, 29 July 2014 (UTC)
Speaking of Klingon, Wiktionary has an Appendix:Klingon which seems to present from the same issues as Appendix:Dothraki. Should it be deleted too? - -sche (discuss) 19:49, 29 July 2014 (UTC)
Let me put it this way - Anderson v. Stallone and Castle Rock Entertainment, Inc. v. Carol Publishing Group stand for the principle that if you take a large number of original elements of a work covered by copyright, even if those elements are rearranged in some way, then you are liable for copyright infringement. Copyright presently runs for the life of the author plus seventy years (running from the end of the calendar year in which they died). Any invented language that is part of a copyrighted work for which the author hasn't been dead since 1943 are likely to be legally barred to us. There are exceptions, for example where an author releases their work into the public domain, or failed to abide by technicalities that were in force up until the 1970s, but there's no reason to think any of those are applicable here. Therefore, Appendix:Klingon is nearly as problematic (although it does have the benefits from a copyright defendant's viewpoint of being much older, so that it has had more time to seep into the culture, and has both a larger number of "authors" diluting the claims of ownership, and a longer history of copyright owners failing to prosecute infringing conduct). I would get rid of both of these, or severely cut the Klingon appendix down to words for which we have entries, and the chart of personal pronouns, which is de minimis. Also, before anyone asks, yes Appendix:Na'vi should go too. bd2412 T 20:15, 29 July 2014 (UTC)
I have deleted Appendix:Dothraki, Appendix:Na'vi, Appendix:Goa'uld, Appendix:Unas and Appendix:Noxilo. I have truncated Appendix:Klingon, and modified the short lists of Appendix:Eloi, Appendix:Lapine, Appendix:Mandalorian, Appendix:Láadan, Appendix:Toki Pona and Appendix:Bolak, per the suggestion that short lists are de minimis and OK. Of the other languages listed in Template:artistic languages and Category:Appendix-only constructed languages, I tentatively make the following assumptions: Appendix:Communicationssprache was intended for widespread distribution and its creator died in 1843, so it seems to be includable; similarly, Appendix:Mundolinco seems includable; Appendix:Quenya and Appendix:Sindarin and Appendix:Black Speech are probably under copyright and should probably be deleted like Dothraki or truncated like Klingon; Category:Lingua Franca Nova language and Category:Neo language I am not sure what to do with. - -sche (discuss) 21:33, 29 July 2014 (UTC)
There are some differences between constructed languages made as adjuncts to works of fiction and those made as proposed languages for actual speaking. First, the latter type is not part of the sort of creative endeavor that would be diminished in value by republication of the definitions; and second, the latter type is intended for general use by people other than the author, and inclusion in a dictionary is a predictable kind of the intended use. Of course, when it comes to Lingua Franco Nova, we could just ask C. George Boeree for permission, but I do note that the Lingua Franco Nova website has copyright notices on every page. bd2412 T 02:23, 30 July 2014 (UTC)
Which of those two does Klingon fit under? It was originally an adjunct to a work of fiction, but has grown enough that it has a comparatively large number of speakers. --WikiTiki89 12:17, 30 July 2014 (UTC)
The Klingon language was initially created as part of a larger whole creative work, and for the purely commercial purpose of making that work more marketable for potential purchasers. It remains under copyright for that purpose for as long as the original work remains under copyright for that purpose. I had thought that the language was created, at least in some part, for the original Star Trek series in the 60s, but apparently it was not constructed in detail until the making of Star Trek III in 1984. In any case, the key fact is that it was not created for the purpose of serving as a language to be used in real life, but as part of the story, as a harsh and guttural language that filled out the Klingon characters and gave them an extra sense of menace. As long as the original works remain under copyright, the language remains under copyright, and is only susceptible to fair use. Note that the original work for purposes of most of the Klingon language is The Klingon Dictionary by Mark Okrand (under a copyright owned by CBS Television Studios), which is written from an an-universe perspective, as if it related facts about an actual Klingon civilization, and is therefore clearly part of the creative work. bd2412 T 13:27, 30 July 2014 (UTC)
But then the question is what do they do with their copyright? Clearly they allow things such as w:Klingon Language Institute. --WikiTiki89 15:47, 30 July 2014 (UTC)
I have truncated Appendix:Quenya to 20 words (or 30 counting inflected forms), and Appendix:Sindarin and Appendix:Black Speech to 26 words each. Neo already only included its 10 number words. At this point, the largest appendices (not counting those for languages like Communicationssprache which seem to be unproblematic) are for Appendix:Lapine, with 36 words, and LFN, with 140+. I am trimming LFN now. - -sche (discuss) 18:06, 30 July 2014 (UTC)

Multiple accountsEdit


It has come to my attention that BaicanXXX - who has been confirmed (by the administrators of Romanian Wikipedia) to have a sock puppet called WernescU - has created yet another account called BanescuBaican. He has currently three active accounts here.

I've tried to discuss this issue with a Wiktionary administrator, but I haven't received any feedback.

Aren't multiple accounts prohibited? Unsure if this is the right place to discuss this; I just wanted to give the heads-up.

Best regards, --Robbie SWE (talk) 11:19, 30 July 2014 (UTC)

There is no policy against them (and no policy for them either, WT:BOT notwithstanding), but this question, as many others, is ultimately left to administrators' discretion. If the accounts have clearly distinct purposes, are not constantly switched, the number of accounts is reasonably small (say, less than five) and the user is in relatively good standing with the community, I see no problem with that. Is the user doing anything wrong here? Keφr 11:52, 30 July 2014 (UTC)
Well that's the problem; the account owner switches quite freely between accounts, usually every time I correct his translations and/or question his contributions. Baican has been blocked two times in the Romanian Wiktionary because of insubordination and using/adding words that don't exist. He was also blocked in the Romanian Wikipedia because he harassed other users with whom he didn't agree with. I've had issues with this user before and that's the reason why I monitor his contributions around here. --Robbie SWE (talk) 12:09, 30 July 2014 (UTC)
I see. Any juicy diffs? I want to see the whole context. What also worries me is that the user is apparently either unable, or simply refuses to write in English. Not that I like the language, but I know no Romanian at all, so I would rather prefer English. (Also, a link to the previous discussion). Keφr 15:48, 30 July 2014 (UTC)
Off the top of my head, his translation for Commonwealth of Nations. I've corrected the Romanian entry, because it was simply not accurate. As I pointed out earlier, he usually provides verbatim translations (think Google Translate) and although I understand why they can at times be useful, they can just as well be confusing. I'm not saying that all his contributions are bad, but I've had a hard time reasoning with him when he hasn't followed the rules. Trying to discuss anything - be it in English, Romanian or any other language - usually turns into a feisty discussion. --Robbie SWE (talk) 17:18, 30 July 2014 (UTC)
Did I mention something about "context"? One bad edit does not a context make. And in Commonwealth of Nations you made a slight mistake of your own. Where can I read about those blocks? Or could, if I knew Romanian? Keφr 18:08, 30 July 2014 (UTC)
I see that I made a mistake of my own and I apologise for that. I was trying to keep up with his edits and became sloppy - not trying to make excuses for myself, just give an explanation :-) Unfortunately the blocks and the discussions leading to the blocks are indeed in Romanian. --Robbie SWE (talk) 19:30, 30 July 2014 (UTC)
  • Wiktionary at present has no sockpuppetry policy. But if it ever gets one, multiple accounts should not be prohibited. There are only three reasons multiple accounts (or one account and an IP) are problematic:
  1. Block evasion (using one account while another is blocked)
  2. Double voting (voting with both accounts)
  3. Handedness (one account)

An account that doesn't do any of those things is perfectly acceptable. Purplebackpack89 17:53, 30 July 2014 (UTC)

One of our block reasons is "abusing multiple accounts". However, it seems that you are correct that no policy page actually describes our policy on it. --WikiTiki89 18:00, 30 July 2014 (UTC)
This editor is somewhat like Gtroy; they create a prodigious quantity of edits which contain enough errors that they need to be checked, and it wears out the admins and other users who take on the task of checking them (for which reason Dick Laurent at one point just blocked the editor for 6 months). And now, like Gtroy, the user is creating multiple accounts. Given its name, the new account does not exactly hide its relationship to the others, but it isn't linked from them, either. If the user creates too many accounts, it has the effect of making scrutiny of their edits difficult (without us even speculating on whether or not that is the user's intent), which is disruptive. If the number of accounts gets any higher, let me know; I am willing to block the 'extras'. (For reference, the accounts mentioned so far are BAICAN_XXX, last edit 26 July, WernescU, last edit 12 July, BanescuBAICAN, last edit 30 July.) - -sche (discuss) 18:29, 30 July 2014 (UTC)
Thank you for taking interest. And I agree. Also, note that WernescU has the autopatrolled flag. I think it should be revoked, given the situation. Keφr 18:53, 30 July 2014 (UTC)
Okay, I've de-autopatrolled him. (I wonder if de-autopatrol is attestable...) —Aɴɢʀ (talk) 19:29, 30 July 2014 (UTC)

Contributing the contents of bi-lingual dictionaries?Edit


Hope I'm asking in the proper place.

I work as a consultant for the Dzongkha Development Commission (DDC), the national language authority of Bhutan. After talking to the head of the DDC and others concerned, and explaining what Wiktionary and the CCbySA license are, it seems we can contribute the data we have for English-Dzongkha, Dzongkha-English, Dzongkha-Dzongkha and Tibetan-Dzongkha Dictionaries.

What is the best way to proceed with this? We have the data for these dictionaries in a MySQL database as well as in XDXF (XML) and StarDict formats.


CFynn (talk) 08:57, 31 July 2014 (UTC)

If you are able to convert the data automatically into the format that Wiktionary uses for its entries, then you could probably create the entries with a program. It would probably still be necessary to tag those entries with some kind of template notice so that it's clear the entry was not created by hand by a user, and may therefore need to be checked. —CodeCat 11:07, 31 July 2014 (UTC)
Please contact me on my talk page or via email if you are not sure about something. Wyang (talk) 12:26, 31 July 2014 (UTC)
A basic Dzongkha noun entry would look like this: e.g. if I want to create an entry for ལྡུམ་ར (ldum ra) (from a dictionary), I would make it like this:


# [[garden]]
--Anatoli T. (обсудить/вклад) 13:27, 31 July 2014 (UTC)

Names categoriesEdit

Right now we have two different category trees for names, Category:English names and Category:en:Names. I would like to merge the latter into the former, as I suggested at WT:RFM before. But someone suggested bringing it up here because there might be more comments. The point that was brought up in a past discussion was that there is something intuitively different about names that are "English" and names that simply occur in English usage. The problem is partly the category name: "English names" suggests that the names themselves are "English" but we really intend it to mean that it's simply a name used in English at some point, like "English nouns" contains nouns used in English at some point. People interpret "English name" differently from "English noun". But someone rightly pointed out that café is not "English" by many people's standards, yet we have no problem calling it an English noun.

Of course there's also the problem that bearers of names can move around the world and take their name with them. And people in other countries have names which then need to be adapted to English. This is very different from how loanwords are adopted into languages. Loanwords are adopted and used by speakers of another language, simply by choice or for convenience. But names are generally granted by speakers of a certain language and adapted to other languages when people need to refer to someone with that name. On the other hand, names can be "loaned" in the sense that they are adopted by speakers of a language, like common names such as John, Lisa and Hans. So there's a distinction between adapting a foreign-given name into English for the purpose of referring to a person with that name, and English speakers adopting that name as an actual loanword by giving that name to their children.

Then there's the issue of Category:en:Place names. Place names probably don't suffer as much from the above problems as personal names do, because they are simply the names used in English for a certain geographical object. They may be borrowed, or they may be adapted to the language, pretty much like normal words. There's much less dispute that Kilimanjaro is the English name for that specific mountain. So even if we don't find a solution for Category:en:Names, would it at least be ok to migrate Category:en:Place names and its subcategories to become subcategories of Category:English names, and renamed to something like Category:English place names, Category:English names of countries etc.? —CodeCat 11:51, 31 July 2014 (UTC)

Aren't ALL proper nouns used in personal names essentially Translingual with the characteristic "English" (used simply as a concrete example of a language) being one of its Etymology? That there are instances of names sharing a common root (Jean, Sean, Juan etc) or common derivation from something in the real world (Eagle and Adler as surnames) may create an illusion that names are more language-specific than they are in our times. What lasting limits on full translinguality there are are probably script (narrowly construed for language-specific diacritics) differences.
Similarly for toponyms. The vast majority, though not the most common ones, seem essentially Translingual, though some are rarely used outside of the local language. DCDuring TALK 13:26, 31 July 2014 (UTC)
I think this is one of the reasons that many dictionaries don't include proper nouns at all. --WikiTiki89 13:54, 31 July 2014 (UTC)
I think DCDuring makes a very good point. But there are reasons why we should continue to treat names in a language-specific way. A major one is inflection; names inflect differently in different languages and putting all of that in a single translingual section would not be feasible. Of course there's also pronunciation, which can vary quite widely even with the same spelling. Just compare English Jean /dʒiːn/ and French Jean /ʒɑ̃/, not to mention the difference in gender between the two. Inflection and pronunciation is a verifiability concern because it's quite likely that some names have never appeared in a certain language at all and therefore have never been pronounced or inflected in it before, so then any information we add in the name's entry would be pure speculation.
So maybe what we need to do is examine the categories more closely. Category:English given names may not really describe the reality, and maybe what we really want is something along the lines of "Given names adapted into/used in English". That category would of course be entirely separate from "Given names originating in English", which would be etymological.
I don't see that it's the same with toponyms. The big difference is that toponyms are not repeatedly assigned to things, they're names for single definite objects, much like nouns are (albeit that regular nouns name classes, not individuals). Of course because most toponyms are of foreign origin, many of them are borrowed more or less verbatim, but there's certainly no guarantee of that, which is why there is such a thing as an exonym. —CodeCat 14:04, 31 July 2014 (UTC)
Ahem... not repeatedly assigned? --WikiTiki89 14:14, 31 July 2014 (UTC)
re: "Given names originating in English". The word "originating" should be replaced by "derived from". Presumably membership would be dependent on the content of the Etymology section for the word. Derivation reflects the world as it existed with a relatively limited amount of peaceful migration of literate people into places occupied by other literate people, Rome being a great exception.
re: toponyms. Right. Full toponyms (eg, Springfield, Illinois) are proper names, names of specific entities. But why are they not Translingual? Pronunciation?
re: pronunciation differences justifying multiple L2 sections for exact homonyms. I suppose that means we should, in principle have different L2s for taxonomic names for the same reason. You should look at the discussion of that question. DCDuring TALK 14:28, 31 July 2014 (UTC)
Let's take a look at some of the ways by which a name can make its way into English-language texts or utterances. How should each case be categorized?
  1. A word or set of words of native Germanic origin came to be used in Old English as name, and is still used (perhaps in changed form) in the modern English period. Example: Harold. This is the most clearly, purely English kind of name.
  2. A word of native Germanic origin came to be used in the modern English period as a name. Examples: Winter, from winter (think of Harlow Winter Kate Madden); Sky, from sky (think of Sky Ferreira).
  3. A name came to be used in a foreign language and was borrowed into Middle or Old (not modern) English and then passed into modern English. Example: Vernon.
  4. A name came to be used in a foreign language and was borrowed into modern English. Example: Pierre. (If Pierre happens to have been borrowed before 1500, there are many other examples.)
  5. A word of foreign origin was borrowed as a word into Middle or Old (not modern) English, and came to be used as a name in modern English. Example: Joy (and probably also River).
  6. A word of foreign origin was borrowed as a word into modern English, and then came to be used as a name. Example: Karma (currently a red/orange link, but attested).
  7. Non-English roots were combined in a language other than English (say, Mongolian) to form a name which is only ever found in English texts when Mongolian (non-English-speaking) individuals who bear the name are being mentioned. Examples: Tsakhiagiin (name of the current president of Mongolia, and potentially other Mongolians), Toivo (name of a former Prime Minister of Finland, and of various other Finns). This is the most clearly non-English kind of name. (Does it make a difference that Tsakhiagiin is a transliteration of Цахиагийн, while Toivo is not a transliteration?)
  8. The same situation as above, but the name was anglicized. Possible example: Khosrau or Chosroes (referring to inter alia the Persian ruler whose name, if simply transliterated, would have been something more like Husrō(y). (If that is a poor example, I am sure others exist.)
  9. As above, but the name has come to be used for new people (i.e. the name is given by English-speaking parents to their babies). Example: Virgil (from Latin Vergilius).
  10. Assimilation of all or part of a non-English-speaking culture into an English-speaking culture results in some of that culture's names being used by English-speakers (regardless of whether their forebears were also English-speakers, or were part of the non-English-speaking culture). Examples include many Irish names and some Lakota/Dakota names. (In the Tea Room, I speculated that there may be more [European- and African-American] English monoglots named "Winona"/"Wenona" than there are Sioux-speaking [Lakota and Dakota tribes-]people with that name.)
- -sche (discuss) 22:04, 31 July 2014 (UTC)
I would say that all except 7 and 8 are English names. If we apply the use-mention distinction here, then a word is used as a name if it's used to name a new person, whereas referring to an already-named person is a mention. This is distinct from using the names as simple words in a text. Evaluating names in this way may be a useful criterium? Of course there is still the ambiguity of expatriates naming their child from their own culture/language, or bilingual parents. For example if a Moroccan couple move to the Netherlands and name their child who was born there Abdul, is it then a Dutch name? What if one of the parents is native Dutch? —CodeCat 22:15, 31 July 2014 (UTC)
See Wiktionary:About given names and surnames#The language statement of a name. The etymology of a name says very little about its language statement, since most Europeans and Americans bear given names that are borrowed from other languages: Hebrew, Latin and Ancient Greek mostly, but also from modern languages. Karen and Michelle are definitely English names by now. There are good statistics of given name use available in many modern languages. Foreigners' names are translingual as long as they do not change their spelling. There is no need to define a purely Finnish given name in English. Transliterations like Abdul do need a definition; and if you check statistics, there might be enough Dutch-speaking Muslims to make it a genuine Dutch name. We do have a difference in categorization already: transliterations are in the topic Category:Transliteration of personal names. Notice that places are defined as topics: "A city in England", "Any of a great number of cities in the US", not as "A place name". Unlike given names and surnames, foreign places do need an English section too.--Makaokalani (talk) 11:36, 1 August 2014 (UTC)