This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


The discussion about Tagalo/Filipino quickly died out above. So I'm posting this link to avoid this vote being forgotten and made obsolete. Mglovesfun (talk) 11:38, 1 October 2011 (UTC)[reply]

Romanizations of words in languages including Gothic

In light of the comments on Wiktionary:Votes/pl-2011-08/Romanization of languages in ancient scripts, I have created two new votes:

Please give feedback before the votes start. Please vote after the votes start. - -sche (discuss) 01:36, 3 October 2011 (UTC)[reply]

Where I do I give feedback? Well I'll say it here for now:
I think this is a horrible idea. I don't see why we should use another alphabet to write Gothic when there is a perfectly good one which was created specifically for Gothic. One of the reasons the Unicode Consortium adds characters from "unused" alphabets is so people like us can write words in ancient languages in their own script, instead using a transliteration. Maybe you could add a heading for trasliterated forms in ELE, and make the transliterated form redirect to the actual entry.
In the rationale it's written "Modern readers will most likely want to look up words in their Romanized form; these readers will not necessarily know or be able to input the words' original-script forms." To be honest, I think this applies more to modern languages such as Arabic or Russian. What kind of person would look for a word in Gothic? Some one interested in the Gothic language; such person is quite likely to know the Gothic alphabet. But what kind of person would look for a word in Arabic or Russian? Could be anyone; could be some guy who heard the word on TV, saw it on a newspapers, or whatever. None of these are expected to know the Cyrillic or Arabic alphabets.
I'm not saying that you should add entries transliterated from Cyrillic or Arabic; I'm just trying to show that adding transliterated Gothic entries is a worse idea. Ungoliant MMDCCLXIV 15:50, 6 October 2011 (UTC)[reply]
To be clear, we're not proposing to move the content to the romanisations: entries in the Gothic alphabet (like 𐌵𐌹𐌽𐍉) will still exist and define that Gothic word, but romanisations (like qino) will exist as soft-redirects, similar to pinyin and romaji entries. This seems to be almost or exactly what you're proposing regarding redirects. The two major reasons which our Gothic contributors have given for allowing romanisations are: that users might know the Gothic alphabet but be unable to type it, and that Gothic texts and secondary sources (dictionaries, etc) are often published in romanised form (and we should have entries for the forms as they are in fact published, which means: both in the Gothic alphabet and in romanised form). - -sche (discuss) 21:51, 6 October 2011 (UTC)[reply]
Ok. I misunderstood that. Ungoliant MMDCCLXIV 23:09, 6 October 2011 (UTC)[reply]
Actually, that's not why the Unicode Consortium added Gothic. For one, I think most Unicode members with an an opinion would encourage the continued use of transliteration; see Don't Proliferate; Transliterate!. If you look at the historical record, approaching Unicode 3.1, Unicode had a problem. The concept of being 16-bit didn't last long; it was obvious that Unicode would need to expand, and there was a theoretical expansion area added in Unicode 2.0. But most programs only supported 16-bit Unicode, so characters that were added outside the base 16-bits wouldn't be accessible to many users; so nobody wanted their characters to be added outside that base 16-bits. But nobody had incentive to fix their programs until there were characters in that section. So they found some scripts that were completely useless, that were wanted by non-scholars because scripts are cool, and encoded them outside the 16-bit limits. Stuff like Old Italic, the Deseret alphabet and Gothic.--Prosfilaes 20:30, 7 October 2011 (UTC)[reply]
"What is the scope of Unicode?
A: Unicode covers all the characters for all the writing systems of the world, modern and ancient."
http://www.unicode.org/faq/basic_q.html
These scripts aren't completely useless. Epigraphers, medievalists, classicists, and bible scholars find them important. Consider the Medieval Unicode Font Initiative (www.mufi.info), why would they recomend such characters if they didn't think they're useful?
Ungoliant MMDCCLXIV 23:15, 7 October 2011 (UTC)[reply]

Merging Moldavian and Romanian

A couple of editors suggested above that Wiktionary discuss merging Moldavian and Romanian. Let's discuss! I favour merging the two. The issue seems quite like that of Serbo-Croatian: that is, the distinction is politically motivated. It is also similar in that Moldavian can be written in Cyrillic, whereas Romanian is not written in Cyrillic anymore: but we could handle that just like we handle Cyrillic/Latin Serbo-Croatian. A possible vote (not started!) is here. - -sche (discuss) 06:31, 3 October 2011 (UTC)[reply]

Moldavian or Moldovan is essentially dead. I don't think anyone is trying to really revive it and we don't have active editors using it. Moldavian Wikipedia is locked. Romanian is written entirely in Roman script. Maybe Moldavian is worth keeping for historical reasons? There is still material written in Moldavian out there. It doesn't create any maintenance problems, like Serbo-Croatian, as far as I can tell. I don't have a strong opinion on this, though. --Anatoli 06:53, 3 October 2011 (UTC)[reply]
Cyrillic could be treated as an alternative but obsolete script, like Arabic for Turkish. —CodeCat 10:26, 3 October 2011 (UTC)[reply]
Obsolete? It is still used in the de-facto regime of Transnistria, I'd hardly call that "obsolete". -- Liliana 18:17, 3 October 2011 (UTC)[reply]
Right, I would keep (and add) Latin spelling entries and Cyrillic spelling entries, and just explain the use of Cyrillic (that it is no longer standard to write Romanian in Cyrillic in Romania, but that the language continues to be written and published in Cyrillic in the region of Transnistria) on WT:About Romanian. - -sche (discuss) 21:55, 3 October 2011 (UTC)[reply]
Sounds OK to me. --Anatoli 10:47, 3 October 2011 (UTC)[reply]
If Romanian templates are modified like Hindi/Urdu or Serbo-Croatian (Cyrillic/Roman) then we could always add the optional Cyrillic spelling flagged as "Moldavian spelling". Russian/Ukrainians from Transnistria have to use Russian and Romanian to communicate with Moldova. It's just my opinion, prove me wrong, if you disagree. --Anatoli 06:40, 4 October 2011 (UTC)[reply]
I've asked the Robbie SWE for input. :) - -sche (discuss) 23:44, 6 October 2011 (UTC)[reply]
I'm kind of torn; the Cyrillic alphabet is a thing of the past for Romanian, most certainly not something most Romanians would want to promote today. The fact that Bogdan Stăncescu - the founder of the Romanian Wikipedia project - cancelled the Moldavian ISO 639 code (mo an mol) back in November of 2008 indicates that there is no substantial difference between Romanian and Moldavian. This initiative was welcomed by Marius Sala, vice-president of the Romanian Academy.
Personally, I don't think that the solution should be in the form of "Cyrillic/Latin Serbo-Croatian". I can't however provide a solution, but will follow this discussion and see how it evolves. --Robbie SWE 10:35, 7 October 2011 (UTC)[reply]
Wiktionary including Cyrillic Romanian does not mean that we 'promote' it, as we only document. If indeed still used in Transnistria (as Liliana has pointed out), then we can't just tag them "obsolete", because we would then be misrepresenting things. However, we could tag them with both "archaic" and "Transnistrian", and problem solved. --JorisvS 12:18, 7 October 2011 (UTC)[reply]
I'm not saying that including Cyrillic Romanian promotes a regression. It just makes things problematic; I mean where do we draw the line? Will we start including runes for old Swedish? From what I've heard (might be wrong; the socio-political distance between Sweden and Moldova is quite far), most inhabitants of Transnistria speak Russian and therefore use the Cyrillic alphabet. --Robbie SWE 12:33, 8 October 2011 (UTC)[reply]
I also checked and it's being taught at schools in Cyrillic. I still don't see a problem in merging (despite being a native Russian). "Moldavian" as a name of the language is still used colloquially but "Romanian" is used increasingly in both Moldova and Pridnestrovye (Transnistria). There is no serious efforts to separate them again (unlike say Serbo-Croatian). Perhaps "Moldavian spelling" is better than "Cyrillic", e.g. "România f (genitive/dative României), Moldavian spelling: Ромыния" --Anatoli 12:49, 8 October 2011 (UTC)[reply]
Of course we'd allow runic Old Swedish entries. Perhaps that's a bad analogy as it's a dead language. We have a couple of runic Old English entries. Mglovesfun (talk) 12:53, 8 October 2011 (UTC)[reply]
Ok, I'm sorry for the bad analogy. I think that the use of "alternative/variant" might work, maybe worth giving it a try. I'm not sure though that we'll be doing the same thing in the Romanian Wiktionary project; the task seems too big for two active users. --Robbie SWE 18:19, 8 October 2011 (UTC)[reply]

When reading w:Moldovan language, it's clear that this is a controversial issue, which would be a good reason to allow Moldovan as a separate language (even if the category is almost empty). This would be a good reason because we must be neutral about controversial issues, and because the definition of language may involve political issues as well as linguistic issues. Also remember that dead languages are allowed here, even when nobody is able to contribute to them and their categories are empty. However, as the use of the ISO code for Moldovan is now discouraged, I don't know. Lmaltier 19:03, 8 October 2011 (UTC)[reply]

How is Serbo-Croatian any more neutral than this? -- Liliana 19:24, 8 October 2011 (UTC)[reply]
@Lmaltier: Separating Moldav(i)an and Romanian is as neutral or non-neutral an approach as unifying the two. If those who consider there to be two languages would find it controversial if we unified them into one language, those who consider there to be one language will find it controversial if we separate it into two languages.
@Anyone who knows: do words have the same Cyrillic spellings in Transnistria today that they had in Romania in the past, when Cyrillic was used there? - -sche (discuss) 01:48, 9 October 2011 (UTC)[reply]
That's where the problem arrises: the Wikipedia article states "Its structure is based on the Russian Cyrillic alphabet (excluding three Russian letters and adding another), and does not have a direct resemblance to the historical Romanian Cyrillic alphabet used from the Middle Ages until the second half of the 19th century in the Principalities of Vallachia and Moldavia[1] and until 1932 in the Soviet Union." We're basically talking about two different interpretations of the Cyrillic alphabet. --Robbie SWE 13:09, 9 October 2011 (UTC)[reply]
So we should have three entries for the same word: one using a Roman script, and two different ones using Cyrillic scripts and one of which tagged as "obsolete" and the other tagged as "Transnistrian", right? Of course all in accordance with WT:CFI. I wouldn't have any problem with that. --JorisvS 14:18, 12 October 2011 (UTC)[reply]
Right, ro.Wikt may wish to wait and not add Cyrillic entries at this time because they do not have enough users to manage such an addition, but we already have entries (in Category:Moldavian language) and presumably enough users to manage them. Robbie and JorisvS, please take a look at Wiktionary:Votes/2011-10/Unified Romanian and see if anything needs to be changed. - -sche (discuss) 00:37, 13 October 2011 (UTC)[reply]
Maybe mention that there can be two different Cyrillic spellings that will be allowed, one archaic, one Transnistrian? Or is that superfluous? --JorisvS 09:00, 13 October 2011 (UTC)[reply]

It's unbelievable, 6 votes in 3 week timeframe deciding on fate of language — This unsigned comment was added by 98.172.161.250 (talk).

First of all, please sign your comments with four tildes (~). No, we haven't decided the fate of the languages. The governments do. As I mentioned in Wiktionary:Votes/2011-10/Unified Romanian, here we don't decide the fate of the language, just making the efforts easier - treating Moldavian and Romanian together in Cyrillic and Roman scripts, not unsimilar to the way Serbo-Croatian handles Serbian, Croatian and Bosnian. If you are enthusiast or speaker of Moldavian (i.e. Romanian in Cyrillic), you can still contribute in it. In Romania#translations you'll find:
* Romanian: {{t+|ro|România|f}}
*: Cyrillic: {{t|ro|Ромыния|f|sc=Cyrl}}
You can add words specific to Moldova, which are not used in Romania using {{qualifier}}. If you're not an editor, just wish to stir trouble, then Wiktionary is not for you. --Anatoli (обсудить) 00:05, 16 December 2011 (UTC)[reply]
By the way, are you aware that the Moldavian Wikipedia has been closed? The reason being that the language is no longer officially used in Moldova. --Anatoli (обсудить) 00:09, 16 December 2011 (UTC)[reply]

Loss of usage-context categories

At one time, before the "reform" of our category system, we had categories that indicated the usage context of many specialist terms. We now have topical categories instead. I propose that we need to reinstate the usage context categories.

Topical categories for a specialist field are intended to include senses of all terms that relate to the topic in question. Context tags (of the occupational type) are intended to indicate that a given term is likely to be understood only by those with a specialized knowledge in the area.

I think that all terms in a specialist context connected with a given topic should be member of the topic category, but that not all terms in a given topical area should bear the context tag and be in the usage context. DCDuring TALK 11:45, 3 October 2011 (UTC)[reply]

What reform of the category system are you referring to, and at what timepoint is it supposed to have occurred? From what I recall from 2006, the category for, say, physics was always a topical one. We have many usage context categories; what we do not have are usage context categories for the likes of physics, chemistry, mathematics, etc. A usage context category for, say, mathematics cannot be reinstated, as it never was there in the first place; rather, it can be newly introduced, such as "Category:English terms only used in the "context of mathematics", or "Category:English terms restricted to mathematics" or the like. --Dan Polansky 11:57, 3 October 2011 (UTC)[reply]
I had always interpreted Category:Physics as a usage context, defined by a usage context label which was not applied to terms that that were not so limited. I had no interest in topical categories and have little interest now at Wiktionary, as I find such categorization information at Wikipedia when I need it.
What I perceive as a reform was probably the unintended result of the actions of those who did/do not perceive there to be a worthwhile distinction between topical categories and usage contexts. The use of context tags to create populate the topical categories without also creating appropriate usage-context categories is my evidence of the lack of sensitivity to the distinction. DCDuring TALK 12:37, 3 October 2011 (UTC)[reply]
I think DCDuring has either slightly midunderstood or is being sarcastic (the latter, I think). Labels like {{physics}} are allowed, just they should serve as true contexts and not just a shortcut for convenience. So boson legitimately uses physics, but it would be silly to use it for entries like solid, light, liquid, gas and so on where they are clearly not only or chiefly used in the field of physics. Ditto England shouldn't have a {{geography}} tag. Mglovesfun (talk) 12:41, 3 October 2011 (UTC)[reply]
However, solid could be in a physics category (a category of physics-related terms), without having any sense-line tag. - -sche (discuss) 13:05, 3 October 2011 (UTC)[reply]
Just as we have regional and register context tags that populate usage categories, IMO, we should also have usage categories that reflect occupational usage contexts. Maintaining a consistent distinction between topical and usage categories, while, of course, recognizing the distinction, would be quite worthwhile. For example, terms that are in a given topical category can have some senses (Type I) that are not in any topical category, some senses (Type II) that are in the topical category but properly understood outside any specialist context, and some topical senses (Type III) that are properly understood only by cognescenti and belong in a usage context category. (The last sometimes verge on being prescriptive, but, to be included here, must show evidence of use by multiple authors.) As categories themselves are limited in usefulness for a dictionary because they do not specify a specific Etymology or PoS, let alone sense, it will always be tempting for well-intentioned contributors to apply usage-context labels to senses of Type II.
To be clear: Entries with senses of Types II and/or III should be in topical categories if folks want to maintain such things. Entries of Type III should certainly be in a usage-context category if we aspire to be useful as a dictionary.
I doubt that it makes sense at this time to have some sense labeling to indicate which sense it is that qualifies a term to be in a given topical category, though such labeling would discourage misuse of context tags that have associated topical categories. DCDuring TALK 13:42, 3 October 2011 (UTC)[reply]
Like how {{slang}} denotes a sense that is likely to be understood only by those with specialized knowledge of slang? —RuakhTALK 13:20, 3 October 2011 (UTC)[reply]
Sorry, I should have also said what -sche said, that in cases where a term is used in a context but not specialized (that is, the term retains its general-use meaning) a written category could/should be added at the bottom. So rather than tagging foul with every sport that has the concept of fouls, add the categories at the bottom by hand. I'm not sure why some users are so reluctant to add categories at the bottom, it's not particularly difficult to do. Mglovesfun (talk) 13:49, 3 October 2011 (UTC)[reply]
I thought it obvious that we have different types of usage contexts (which actually reflect reality). We have register (informal, formal) and regional. We have some contexts which indicate offensiveness and we have some that indicate media-related restrictions (colloquial, IM/internet). There may be some types missing and there are other useful ways of classifying usage contexts. Occupational contexts are another type of usage context. They are a superior approach IMHO to marking some terms as "jargon" and hoping that a user could figure out from topical categorizations the specific places in which a given term could be expected to be understood when used. DCDuring TALK 13:57, 3 October 2011 (UTC)[reply]
It is indeed obvious that we have different types of usage contexts, but I don't think any of them indicate — nor should indicate — who is likely to understand a given term. Rather, they should indicate the context in which a term is used. Hence the term "usage contexts". ;-)   If "solid" is only used in physics contexts, or has a specific sense when used in a physics context, then it doesn't really matter that it's a term everyone knows. —RuakhTALK 14:17, 3 October 2011 (UTC)[reply]
I've been thinking along these lines myself. I think it may be worthwhile. I'm really not sure, though: after all, the benefit of a jargon dictionary is that it provides all the terminology for a field, and (e.g.) solid is terminology in physics, even if it's also used by others. But if this is something we want to do, then a good way to proceed might be as follows: Keep [[Category:en:Physics]] as a topical category and restore [[Category:en:Jargon:Physics]] (or perhaps [[Category:English jargon:Physics]] or even [[Category:Jargon:en:Physics]]) as a term-of-art category.​—msh210 (talk) 16:06, 3 October 2011 (UTC)[reply]
I like this idea. - -sche (discuss) 20:28, 3 October 2011 (UTC)[reply]
I strongly object to the use of "jargon" in any category name or usage-context label. Whatever it may mean in a linguistics context (!), a few of the common senses are definitely pejorative. We have enough difficulty trying to prevent contributors (not just newbies, either) from being prescriptive without providing such encouragement. Even our inadequate entry has one of the pejorative senses, though it is not so labeled. AFAICR that is why {{jargon}} was deleted. DCDuring TALK 23:02, 3 October 2011 (UTC)[reply]
Here's how to use written categories instead of contexts: diff. Mglovesfun (talk) 12:18, 9 October 2011 (UTC)[reply]
The only trouble is that {{sports}} puts the entry in a topic category, not a context category.
Also, there is no reason for the context to be "sports" in general rather than the specific sports in which this is understood.
The usage contexts are one set of categories, which are linguistic. I think they are relatively well defined. The topical categorization is not well-defined, except as it is derived from the usage categorization. For example, bending brake could be in topical categories "Tools", "Metalworking", even "Roofing". I'm not sure about the range of usage contexts, but I doubt that it is in the vocabulary of the most in the general population. "Metalworking" and "Roofing" would seem to be possible usage contexts, but not "Tools". DCDuring TALK 02:11, 11 October 2011 (UTC)[reply]

Why are they locked? Engirst 13:00, 3 October 2011 (UTC)[reply]

Dubious; it is normal to lock a page if there's an edit war, but it becomes ethically dubious when one of the contributors in the edit was protects a page with their version instated. It's always better for someone outside of the conflict to lock such a page. FWIW the edit to 英國 looks valid to me; the fact the citation contains some Latin script characters doesn't make it valid. I'll accept that Engirst is doing it to prove a point that Latin terms are used in Mandarin as 'borrowings' but that doesn't invalidate the citation. FWIW the Middle French version of 'Le Tiers Livre' I read contains some Greek citations in Greek characters, but I'd like to think that doesn't make it invalid as a source for Middle French. Mglovesfun (talk) 13:54, 3 October 2011 (UTC)[reply]
Engirst was edit-warring on Thames河 to prove his point. He never writes anything in Mandarin except when he needs to troll his ideas. I removed his edit, which is 1) not synchronised with the simplified version, 2) doesn't provide pinyin and translation into English - it's a long-time convention Engirst has been violating. And most importantly 3) pushes his English words in Mandarin before the decision is made about the usage of English words in Mandarin. All his edits are generally considered bad by Mandarin contribitors. This is not a personal attack. He has been banned for his behaviour (i.e. the attempt was made to ban him multiple times). I will remove the block on 英國英国 (Yīngguó) when the decision is made about the usage of foreign proper noouns in Mandarin. It's a concern that a person who worsens the quality of our Mandarin entries continues editing. --Anatoli 21:30, 3 October 2011 (UTC)[reply]
英國/英国. I've synchronised the entries, added formatting, pinyin and translation, removed example with a very uncommon English name in the Mandarin sentence. Will have to lock the entries if edit warring starts. --Anatoli 22:22, 3 October 2011 (UTC)[reply]

Company names

I feel there should be a vote on confirming the Company names section of WT:CFI. As it is, too many people disagree with it, and it clearly doesn't constitute consensus anymore. -- Liliana 18:19, 3 October 2011 (UTC)[reply]

There should better be a vote on removing the section "Company names" from CFI. See also Wiktionary:Beer_parlour_archive/2011/April#Poll:_Including_company_names. Rather than not being supported by consensus any more, the section never was supported by consensus in the first place, I figure. --Dan Polansky 18:36, 3 October 2011 (UTC)[reply]
The straw poll you (Dan) link to is interesting. Five of its twelve respondents opined that "No company should have a dedicated sense line in any entry", which is at least as restrictive as our CFI and possibly (depending on how you read our current CFI) much more restrictive. The other seven opined that "Some companies should have dedicated sense lines in some entries", which is possibly (depending on how you read our current CFI and depending also on what those respondents would include in their "some") the same as our current CFI, though possibly more or less restrictive than our current CFI. So while a vote (fairly composed) might lead to some change, I suspect it will not: I suspect that the current CFI are a good compromise in this regard, where there is no consensus.​—msh210 (talk) 18:45, 3 October 2011 (UTC)[reply]
The current CFI on company names, above all, is unsupported by consensus. CFI should not contain an unsupported compromise between two positions; if no consensus has been reached, CFI should state only so much. And if there is no consensus on specific rules for inclusion of company names, then the regulation of company names can be left to the section for names of specific entities, which is achieved by removing the section WT:CFI#Company names, and by removing the second bullet item from the section WT:CFI#Names of specific entities. While the statement "Some companies should have dedicated sense lines in some entries" does not contradict current WT:CFI#Company names, I find it likely that those who support the statement would like to see WT:CFI#Company names removed as unclear and too restrictive. The critical part of WT:CFI#Company names to be removed is this: "To be included, the use of the company name other than its use as a trademark (i.e., a use as a common word or family name) has to be attested." --Dan Polansky 19:05, 3 October 2011 (UTC)[reply]

Requesting short-term block for Special:Contributions/90.205.76.53

As can be seen at http://en.wiktionary.org/w/index.php?title=%E9%AC%BC&action=history, among other places, this user is becoming a persistent nuisance. Re-reverting registered editor fixes multiple times should be grounds for a short-term block, no? -- Annoyed, Eiríkr Útlendi | Tala við mig 22:05, 3 October 2011 (UTC)[reply]

Let's try a bit harder to reach out to this person before blocking, they do seem to be editing in good faith. - [The]DaveRoss 22:23, 3 October 2011 (UTC)[reply]
Is anyone but a checkuser likely to succeed at communicating to an anon? DCDuring TALK 23:17, 3 October 2011 (UTC)[reply]
I am not sure what being a checkuser might do to increase success, everyone can see what the IP address of an anonymous editor is. - [The]DaveRoss 23:20, 3 October 2011 (UTC)[reply]
D'oh. DCDuring TALK 00:57, 4 October 2011 (UTC)[reply]
For future reference, the place for this is [[WT:VIP]].​—msh210 (talk) 23:24, 3 October 2011 (UTC)[reply]
Thanks msh210, I'd posted there in August about a different IP address (that seems to be the same user) and got no response, so I thought I'd try posting here instead. -- Eiríkr Útlendi | Tala við mig 23:31, 3 October 2011 (UTC)[reply]
 :-)  Good point.​—msh210 (talk) 00:20, 4 October 2011 (UTC)[reply]
What do we do with people not acting in good faith but capable of avoiding all administrator blocks like Engirst and his many-many aka's? Maintaining and fixing his entries is time-consuming and unproductive. His whole activity is about proving his points, which is otherwise called trolling. A rhetorical question, perhaps, his activity and entries have been discussed many times. --Anatoli 01:24, 4 October 2011 (UTC)[reply]
There are fancy blocks for people who change IPs frequently. Other than overt vandalism I can only think of two times we have bothered making that effort and both times there was strong community support for banning a particular person outright. - [The]DaveRoss 01:28, 4 October 2011 (UTC)[reply]
Well, to me Engirst (+ many aka's and anons) is such a case where a sophisticated block might be in order or long overdue. It was discussed too but I think the attempt to do it failed. He is just wasting a lot of time "promoting Mandarin written in Roman letters". First pinyin - full of errors, incosistent and out of synch with both traditional and simplified entries, now English proper names used in Mandarin in Roman letters. I agree with people saying he's clearly got some agenda (pinyinisation, converting Mandarin to Latin alphabet?). --Anatoli 01:44, 4 October 2011 (UTC)[reply]
As for Japanese at least he doesn't have the adequate knowledge to make useful contributions in good faith in the first place even if he wanted to, which he does not seem to. Haplogy 02:10, 4 October 2011 (UTC)[reply]
@Haplogy: Just to be clear, do you mean User:Engirst, or User talk:90.205.76.53? -- Eiríkr Útlendi | Tala við mig 05:24, 4 October 2011 (UTC)[reply]
I mean the IP user. Sorry I should have been more specific. --Haplogy
Based on the conversations people have had with him on wiki I think it is clear that this is a young person who has recently become interested in Japanese. While I agree that language learners are not the most useful editors to have on the project there is certainly merit to having them. If there is any way to channel this person's energy into more useful edits we should try that rather than putting more effort into blocking someone who is probably just trying to figure things out. - [The]DaveRoss 10:48, 4 October 2011 (UTC)[reply]
I generally agree with Dave here. What got up my nose about this particular IP user was their insistence on reverting my fixes, multiple times, in the same entries. Figuring things out is one thing; being a persistent nuisance is another. -- Cheers, Eiríkr Útlendi | Tala við mig 16:16, 4 October 2011 (UTC)[reply]
There is always that question, whether something is willful disregard or simply confusion or ignorance. Depending on how much experience someone has with a wiki community they may not know that reverting multiple times is taboo, or even really understand why their edits are getting undone. - [The]DaveRoss 19:24, 4 October 2011 (UTC)[reply]
True enough. However, when the IP user's own edit summary is "Undo revision ...", it starts to look a lot like they're aware of the editing and history features, and are choosing to ignore other edit summaries. This is just my own perspective, which calls into doubt the user's good faith - I'd be happy to be proven otherwise.  :-/ -- Eiríkr Útlendi | Tala við mig 20:54, 4 October 2011 (UTC)[reply]

Gheg Albanian

We have a category for Gheg Albanian, and we have ten entries with Gheg Albanian as an L2 header. We also have numerous entries which handle Gheg Albanian like this/this (with context tags). Should we convert the ten Gheg entries to use an ==Albanian== header and a (Gheg) context tag, or should we move the Gheg information out of the (standard) Albanian sections like this?

The former is preferable for me. BTW, the second example seems to be missing some important categories - parts of speech. There should not be many under Gheg Albanian header. --Anatoli 03:31, 4 October 2011 (UTC)[reply]
How different are the two variants, anyway? -- Liliana 05:12, 4 October 2011 (UTC)[reply]
We don't have skilled people here but Tosk Albanian is the standard and most common, most entries/translations are in Tosk. Albanian Wikipedia and Wiktionary are not separated into Tosk and Gheg, perhaps we shouldn't separate either, like we don't separate Belarusian. --Anatoli 06:36, 4 October 2011 (UTC)[reply]
The two are sufficiently different to be mutually unintelligible, and so can be considered distinct languages. The old-Tosk derived Arbëreshë and Arvanitika are also unintelligible with Standard Albanian (Tosk), even their "dialects" are perceived as unintelligible by their speakers. --JorisvS 11:44, 4 October 2011 (UTC)[reply]
So, should we split the Gheg and Tosk entries? - -sche (discuss) 07:43, 7 October 2011 (UTC)[reply]
I'd say, therefore, yes. --JorisvS 10:37, 7 October 2011 (UTC)[reply]
Alright, that sounds reasonable to me, as they do have separate ISO and Wikt codes, and it was User:Dick Laurent (who speaks at least some Albanian, sq-1) who created some of the ==Gheg Albanian== entries. I'll start splitting. Less than 100 words will be affected (when split, less than 200). - -sche (discuss) 18:45, 7 October 2011 (UTC)[reply]
I think we should nest Albanian in translations tables (like this). - -sche (discuss) 18:52, 7 October 2011 (UTC)[reply]
Sure, why not? Though Tosk should possibly be the default, ala water. -- Liliana 19:26, 8 October 2011 (UTC) addendum: oh I see someone changed that too, hmm[reply]

We also have a good number of Albanian entries with a Gheg "pronunciation". I suspect, however, that the orthography is actually just Tosk and that it should be different when properly written in Gheg (as opposed to Tosk written by Gheg speakers). While we could add entries by using the key at Wikipedia's Gheg Albanian article, I don't know how reliable the IPA used in these entries is, nor whether the result would be how the words are actually written. --JorisvS 14:06, 12 October 2011 (UTC)[reply]

(Hm, this discussion hasn't had all that much participation...) Any objections to changing the translation adder to nest aln (Gheg) as Albanian/Gheg and sq as Albanian/Tosk? --Yair rand 15:32, 25 October 2011 (UTC)[reply]
No objections; I think it should nest them. - -sche (discuss) 20:33, 25 October 2011 (UTC)[reply]
Done. --Yair rand 17:54, 27 October 2011 (UTC)[reply]
(Pointing out that as of two months ago there are 2187 "Albanian" translations. That's a lot of edits if this is going to be standardized.) --Yair rand 17:59, 27 October 2011 (UTC)[reply]
Maybe something a bot could do? --JorisvS 20:21, 27 October 2011 (UTC)[reply]
Could you also change the translation adder to nest Arbëresh (aae) and Arvanitika (aat) the same way it now nests Tosk and Gheg? --JorisvS 16:23, 2 November 2011 (UTC)[reply]
Done. --Yair rand 16:30, 2 November 2011 (UTC)[reply]

Now that Gheg is separated from Tosk, I've begun wondering about the clarity of our ==Albanian== specifically for Tosk (Albanian) to our users. Thoughts? --JorisvS 16:23, 2 November 2011 (UTC)[reply]

You could compare this to {{de}} "German", {{gsw}} "Alemannic German", {{nds}} "Low German". I don't think clarification is needed, since, similar to the German case, Tosk is the dominating variety. -- Liliana 16:45, 2 November 2011 (UTC)[reply]
So is this completely incorrect? --Yair rand 16:51, 2 November 2011 (UTC)[reply]

Splitting Gheg off from Standard/Tosk is foolish. — [Ric Laurent]19:38, 2 November 2011 (UTC)[reply]

Do you also have a reason for your opinion? --JorisvS 20:27, 2 November 2011 (UTC)[reply]
Separating Gheg and Tosk and the few other minority dialects makes as much sense as splitting American/Scottish/British/Australian English. The differences that cause problems in mutual intelligibility are like flashlight and torch in English. There might be bumps, but they are NOT different languages. This is a fact and that's all I have to say on it. If you all want to do something completely retarded, go for it. — [Ric Laurent]22:35, 2 November 2011 (UTC)[reply]
No, all sources I've seen addressing mutual intelligibility say that this is limited between the varieties of Albanian, which means that they are different languages by this criterion. So, the comparison with the Englishes is wrong. It is more appropriate to compare the situation to German, where we have (at least) Alemannic, Austro-Bavarian, and Low alongside the Standard, all with limited intelligibility with each other. --JorisvS 23:01, 2 November 2011 (UTC)[reply]
When one adds an Albanian translation, it starts nesting (it didn't nest before):
* Albanian:
*: Tosk: {{t|sq|WORD}}
I don't know if it's a good idea. Tosk Albanian is standard, if anything Gheg could be marked using {{qualifier}} or nested. Can we nest only non-standard versions or just mark them? Tosk Albanian shouldn't be nested, just like Norwegian Bokmål, IMHO.
Can we change it back to
* Albanian: {{t|sq|WORD}}

If anyone wants nesting, here are the codes for Albanian varieties:

  • aln – Gheg
  • aae – Arbëreshë
  • aat – Arvanitika

--Anatoli 22:46, 2 November 2011 (UTC)[reply]

As per Liliana, Tosk is the dominating version, like standard German, so please change back. --Anatoli 22:48, 2 November 2011 (UTC)[reply]
What does it mean to be the "dominating" version? --Yair rand 22:50, 2 November 2011 (UTC)[reply]
Standard, most common, most likely to have texts to be written in, more useful for users learning the language or wanting to find translations. Albanian dictionaries are also in Tosk. Tosk Albanian or standard Albanian is used in Kosovo as well, although the native dialect is Gheg. --Anatoli 22:55, 2 November 2011 (UTC)[reply]
Hm. The link I posted above in response to Liliana's comment is to a WolframAlpha query of Tosk and Gheg, and it says that Gheg has about a million more native speakers than Tosk, and about a million more total speakers than Tosk. I don't know how reliable those statistics are, or whether they're contradictory to Tosk being "dominating", since the meaning isn't very clear. (Incidentally, it also lists translations of the numbers one to ten for both languages, and not a single one of them are the same in both Tosk and Gheg.) --Yair rand 23:03, 2 November 2011 (UTC)[reply]
I want to point out that Tosk is not synonymous with standard Albanian, but standard Albanian is based on the Tosk dialect. I'm not really familiar at all with Arvanitika or Arberashja, so I won't comment on whether they should nest in translation tables, but if Tosk and Gheg alternatives exist, those varieties can follow whatever is used in standard Albanian with {{qualifier}}. L2 for Gheg and Tosk should both be ==Albanian==.
Like Anatoli suggested, people from Tirana and Pristina will understand each other fine. Maybe not perfectly, but certainly as well as someone from Valley Forge and Edinburgh. — [Ric Laurent]23:01, 2 November 2011 (UTC)[reply]
I'm not aware of Anatoli having said that. Because Gheg speakers also come into contact with the Standard (Tosk), they learn to understand it. This is passive bilingualism, not mutual intelligibility. (Cf. in former Czechoslovakia speakers of Czech and Slovak could communicate with each other in their own native language, not because Czech and Slovak are mutually intelligible, but because each was a passive speaker of the other. The young generation of today has not (passively) learned the other's language and they have trouble talking to each other). --JorisvS 23:15, 2 November 2011 (UTC)[reply]
You aren't aware of Anatoli saying that because he didn't - he was talking about standard Albanian and the Gheg they speak in Kosovo. I used the city names.
I study standard Albanian and Tosk. I used to talk to a young guy from Kosovo and we understood each other perfectly fine. You can resist this simple fact all you want, but the fact remains that Gheg and Tosk are no more different languages than what they speak in Texas vs what they speak in Dublin. There are certainly dialectal differences, and understanding may come with a bit of strain at places, but the existence of a Gheg incubator does not make Gheg its own language. Ask people who speak Gheg what they speak, and they say Albanian. That's not some wild coincidence. Most words are the same or identical, inflection is identical. Anyone with even the most basic knowledge of Albanian like myself can read that Gheg article about Albania you linked to and understand it with little trouble at all, even if they aren't particularly familiar with the quirks of Gheg beforehand. — [Ric Laurent]23:40, 2 November 2011 (UTC)[reply]
Asking people what they speak doesn't mean anything. Ask a Croat and he'll say "Croatian", even though it is Serbo-Croatian he speaks. Q: How deep was this conversation? --JorisvS 23:54, 2 November 2011 (UTC)[reply]
We sent messages back and forth for at least two weeks. We got pretty involved and talked about a bit of a range of subjects. — [Ric Laurent]00:13, 3 November 2011 (UTC)[reply]
As a matter of fact, it was so easy to understand him, I didn't even realize that it was Gheg he was speaking until long after we met. He used words like "asht" and used n's instead of r's in a lot of places, which I thought was odd or maybe just typoes. But no, they were just regional spellings - because that's what Gheg is: a regional dialect. Just like they have in Texas or Georgia or Ireland or New Zealand. — [Ric Laurent]00:19, 3 November 2011 (UTC)[reply]
I understand that it was the written language you're referring to, not the spoken one. Written languages are always easier to understand than when spoken and closely related languages may be completely intelligible to each other's native speakers when written. E.g. being a native Dutch, one can easily read the Afrikaans WP, yet understanding the spoken language is much harder. --JorisvS 14:29, 4 November 2011 (UTC)[reply]
This is a digress but mutual intelligibility is a tricky thing. Languages can be extremely close but still hard to understand without some exposure. What makes languages mutually intelligible is understanding the pronunciation, knowing how sounds change. Even very short exposure to a similar language can open these secrets. Czech and Slovak, like Russian/Ukrainian/Belarusian are extremely mutually intelligible, Slavic languages have up to 60% of common vocabulary and up to 80% or more in closer languages but pronunciation and other factors confuse people who never heard a language. Of course, nationalists will disagree and will highlight differences, rather than making some effort to understand. --Anatoli 23:28, 2 November 2011 (UTC)[reply]
It's not a digress, it's the core of the issue. Yes, mutual intelligibility is tricky. The percentages you mention are actually easily enough to prevent varieties from being intelligible (though usually sufficient for partial intelligibility, which may still remain for far lower percentages), and thus being distinct languages. And no, I don't need to be reminded of those Croatian nationalists who desperately tried highlighting differences, however superficial, to "prove" that Croatian is a different language from Serbian. --JorisvS 23:54, 2 November 2011 (UTC)[reply]
I'm not even trying to prove that Slavic languages are distinct languages (not talking about Serbo-Croatian controversy). Partial but very high intelligibility and learnablity is what characterises Slavic languages, far more mutually intelligible than Semitic, Romance or Germanic languages. Long time ago I went with a Russian friend (not a linguistic type) to Poland who had 0% Polish. He could communicate in 4 weeks on a passable level and understood a lot when he learned to transform the Polish phonology in his mind into something closer to him + learned a few linking, most common words. I have a proof that Russians can learn Polish in a month (without actually learning just talking and listening) and I don't think Polish is more complicated than Slovenian (the most distant) to the Russian language. Back to the topic, Albanian shouldn't be split, Albanians don't do it, why should we? --Anatoli 00:18, 3 November 2011 (UTC)[reply]
I've studied standard Albanian and understand Gheg just fine. Not only are the vast majority of words the same or very similar (like sodomize vs sodomise) but inflection is identical. They're not two languages, Tosk and Gheg. This is a simple fact to which one person around here seems to be highly resistant. — [Ric Laurent]23:40, 2 November 2011 (UTC)[reply]
Tell me, then why do the sources I've seen speak of limited intelligibility? --JorisvS 23:54, 2 November 2011 (UTC)[reply]
I don't know, maybe because they're retarded dicks. Like I said earlier and have said numerous times: Even for someone with a basic knowledge of standard Albanian, Gheg is not difficult to understand. Most words are the same, as are inflections. Even skimming the Shqipnia article, I understood it fine. If you don't want to trust the only person in this conversation who has actually studied Albanian fine like I said, if you want to do something utterly retarded, it's on you. If you want to take a month to study basic Albanian grammar based on the templates I made for wiktionary, you'll be able to understand that incubator entry with little difficulty.
I'd apologize for my hostility, but this is as stupid as trying to keep Serbo-Croatian separate. — [Ric Laurent]00:13, 3 November 2011 (UTC)[reply]
(Clueless person speaking: ) Despite the fact that the only person here who speaks the language is saying that they're not separate languages, I'm having a hard time understanding how this could be the case. w:Gheg_Albanian#Differences_between_dialects lists a whole bunch of really common words that are different between dialects/languages: "to be", "I do", "I can", "is" (If it's only those specific tenses then it makes more sense, but...), and it seems that none of the lowish numbers are the same either? Ric, do you think you could give a rough estimate of what percent of words are the same spelling in Gheg and Tosk? --Yair rand 00:43, 3 November 2011 (UTC)[reply]
That's really difficult to say for a couple of reasons. For one thing, Albanian has a really rich system of derivation. Related adjectives, gerunds and verbs are very easy to make, and they easily compound the differences. Albanian makes sort of a dialect continuum so basic words will vary a lot from place to place. Also, the spellings are likely to vary quite a bit because Albanian is a million times more phonetic than English. As for the list on that page, I've seen it and I'm hesitant to trust it completely. I have a feeling it may have been made from words that aren't exactly "standard" in their dialects, but from subdialects - especially since they aren't using the citation forms. Half the verbs listed are participles, like qenë/qënë/kjenë, punuar/punue (never seen that participle in Gheg, I'm almost sure that's a subregionalism).
One of the most common differences between Tosk and Gheg (which you can see in the name of Albania itself) is the r/n variation. Shqipëria vs Shqipnia. There's other stuff like that in the list, like pjekuri/pjekuni, dhelpër/dhelpën Oh also the ë, Tosk tends to keep it even when it's silent in a word. Some Gheg-speakers keep it, some don't.
All that aside, when you're only focusing on the differences it's easy to think they might be different languages. There are some really distinct differences in regional Serbo-Croatian, but they're still one language. For god's sake, they use different names of the months in different places.
I maintain that the differences in the various dialects are no more serious than the dialects of English - they're just spoken in a much more narrow area. — [Ric Laurent]01:13, 3 November 2011 (UTC)[reply]
A whole bunch is easy to find when one is looking. Even in this list there are obvious similarities. If we base our understanding on how American and British are different (looking at a vocabulary list), rather than how similar they are, than we may think that they are different languages. The list of similarities would not be practical, as it would be most of the vocab. --Anatoli 00:54, 3 November 2011 (UTC)[reply]
I wouldn't consider it significant evidence if it were just random words like "desk" or "drive" (well those might be bad examples, but you know what I mean), but if, say, American English had different words for "in", "be", "it", "the", "do", "can", "because", "like", "what", "have", "good", etc., then it would be a different situation. --Yair rand 01:08, 3 November 2011 (UTC)[reply]
The examples in the Gheg Albanian link don't show the most common words, besides, there's synonymity and word choices. An opinion of a native speaker or someone speaking Albanian would be more appropriate to make a good judgement. When I look at Swadesh list of Slavic languages, in many cases it's not that I don't understand e.g. some Macedonian or Czech words, in Russian we may use a different root or a word can be obsolete or less common, the common practice could be slightly different but the same thing could be expressed in various ways. I give an example to demonstrate what I mean. If you look at translations for I have a question, without some knowledge of Slavic languages, you may wonder why Polish/Czech/Slovak phrases are so radically different from Russian/Ukrainian/Belarusian but Slavic people won't have trouble understanding them because we can rephrase Russian/Ukrainian/Belarusian in such a way that would be close to Polish/Czech/Slovak grammatically but that would be less common. The spelling and variations in pronunciation are not unique to Albanian, for someone not knowing English, thru/through, color/colour, "do you have"/"have you got" could look like drastic differences. --Anatoli 02:44, 3 November 2011 (UTC)[reply]
In my ~5 years here I've only known two Albanian speakers. Zeke and Nemzag, though I'd say Nemzag is obsessed with obscure fringe nonsense and his own research and assumptions, which I'd categorize as completely unreliable. Haven't seen Zeke in ages, though she's still in my G-mail contacts. But to Anatoli's point about grammar and vocabulary choice, I would say the differences between Gheg and Tosk are closer to English variants than Slavic languages. Inflection is, if not completely identical, nearly identical in my experience. The main differences come, as in English and Serbo-Croatian, in the form of the phrasal formations. One of the examples in Yair's link was apparently the infinitive construction, which in Tosk and standard Albanian I know for a fact is "për të" plus the participle, whereas the table in the link (I can't personally verify this, having never seen it) suggests that it's "me" plus the participle, which appears to take a slightly different form in Gheg - again based only on the table, not my personal experience. Substantival and adjectival inflection are identical in Gheg and Tosk. If there are differences in verbal inflection, they can be easily noted in my totally beautiful conjugation tables without making whole new ones for Gheg and Tosk. — [Ric Laurent]11:23, 3 November 2011 (UTC)[reply]
We must separate American from English! List of British words not widely used in the United States. --Anatoli 00:59, 3 November 2011 (UTC)[reply]
I love you. lol — [Ric Laurent]01:13, 3 November 2011 (UTC)[reply]
He-he :) My dictionary doesn't have "diaper", "trash" or "checkers". I only know "nappy", "rubbish" and "draughts". How did you manage to deviate so much? --Anatoli 02:44, 3 November 2011 (UTC)[reply]
Apropo, it wouldn't surprise me if there were pockets of people in the Tosk regions who speak some local dialect that's completely unintelligible to all other Albanian speakers, just like if I were to drive 2 hours to the East I have no idea what the fuck any of those damn Hyde-county rednecks are talking about. — [Ric Laurent]23:04, 2 November 2011 (UTC)[reply]
In any case, "sq" is reserved for Albanian, not Tosk Albanian. The language code for Tosk Albanian is "als". So if someone adds a translation using "sq" into Albanian, which is not Tosk, then it gets misleading. Cf to Norwegian - "no" where Norwegian Bokmål's code is "nb" and Nynorsk is "nn". Tosk Albanian is used in education, media, books, etc. in both Albania and Kosovo, although there are writers who still write in Gheg. --Anatoli 23:12, 2 November 2011 (UTC)[reply]

I agree with Ric Laurent, unsurprisingly. And I would like to reiterate what's already been said: Standard Albanian is a Tosk variety, or, at least, a variety heavily influenced by Tosk. Standard Albanian does not equal Tosk, any more than the British version of Standard English equals Cockney. And according to a good friend who is a native speaker of a Gheg variety, since the fall of Communism in Albania, the standard language has been more and more influenced by Gheg varieties. I'm not sure why, exactly. Maybe economic reasons or something? embryomystic 01:39, 4 November 2011 (UTC)[reply]

User persistently inserting examples of proper names written in Roman, using his aliases or his own user account

A user persistently inserting examples of proper names written in Roman into Mandarin entries or creating "Mandarin" using English proper nouns - Leeds, Hyde Park, London, Thames, etc. with or without Chinese siffixes, using his aliases he has no problem generating (this time it was Special:Contributions/2.27.73.75 or his own user account - Special:Contributions/Engirst. I had to fix - convert to proper Mandarin spelling and protect quite a few entries from him. --Anatoli 04:51, 4 October 2011 (UTC)[reply]

He has no problem generating new IP addresses: Special:Contributions/2.27.72.78. --Anatoli 04:58, 4 October 2011 (UTC) (Note: range blocks were tried before). --Anatoli 05:01, 4 October 2011 (UTC)[reply]

User:Liliana-60 has unprotected Nei Mongol with a summary "this is the wrong way to deal with single user issues".

Copying my question to Liliana-60, which may be of interest to others:

It may be wrong to protect pages because of one user but can you suggest anything else? I've been trying not just revert everything he does but fix and use some of the positive information he adds. It's hard though. He is very productive and inventive as far as avoiding blocks goes and his whole activity is about to show that Mandarin can be written in Roman letters, proving his point and edit-warring. --Anatoli 05:35, 4 October 2011 (UTC)[reply]

Use abusefilter to record all changes to Mandarin entries, block all edits which create Mandarin sections in entries with names containing two consecutive Latin letters, block all edits which create Mandarin entries with name containing one word which doesn't contain any Pinyin diacritics (āáǎàōóǒòēéěèīíǐìūúǔùüǖǘǚǜêê̄ếê̌ềĀÁǍÀŌÓǑÒĒÉĚÈ), and block edits which add #: or #* examples to Mandarin Pinyin entries (characterised by presence of ===Romanization=== and/or {{pinyin reading of|...}} and/or {{cmn-pinyin}}). 60.240.101.246 07:23, 4 October 2011 (UTC)[reply]

Pinyin entries (===Romanization===) are allowed but I see what you mean. ;) --Anatoli 10:27, 4 October 2011 (UTC)[reply]
I think that using abuse filters to enforce policy might be a bad idea, I think expanding their scope beyond pure vandalism can have potentially harmful side effects. Does anyone have links to the range blocks or discussions about blocking this user? I am always concerned about getting rid of people who have so much passion about this project, even if that passion seems (or is) completely misdirected. - [The]DaveRoss 10:41, 4 October 2011 (UTC)[reply]
From memory he generated IP addresses way outside his normal ranges. Also, there is a chance that if he knows that he may be blocked somehow (that he is not so "invincible") - temporarily or permanently, he may change his attitude and won't work against the rules and consider opinions of others. No, not asking for a definite block just yet. In any case, if a complex block would be used, that would be a collective decision, not individual. I heard something about the possibility of contacting ISP, if there is a serious attack or vandalism but it's not that bad. Yes, passion should be controlled, otherwise they cause problems or work for others.--Anatoli 12:00, 4 October 2011 (UTC)[reply]

(merged with above) --Anatoli 21:52, 4 October 2011 (UTC) Planck常数 (Planck Chángshù) It is a real word, please see Google Books. 2.25.214.61 21:38, 4 October 2011 (UTC)[reply]

Yes, mixed language example are citable alright but I hope after the vote, only 普朗克 (Pǔlǎngkè)常數常数 (chángshù) / 普朗克 (Pǔlǎngkè)常数 (chángshù) (Pǔlǎngkè chángshù) will be allowed, the reasons explained many times, I won't repeat. Your pinyin entries, romanising mixed language will also become invalid. You are the only person pushing to write foreign words in Mandarin using Roman letters. Otherwise, there wouldn't be a need for the vote. Also, use your real account, Engirst, no need to pretend you are many.

A freshly generated IP-address - 2.25.214.61

I have started keeping records of the number of IP addresses you are using and wil try to find your old user names and IP addresses, as this is a rather rare case of abuse. Nothing personal. --Anatoli 21:52, 4 October 2011 (UTC)[reply]

A recently generated and blocked (not by me) IP Address: Special:Contributions/2.25.212.83. --Anatoli 23:12, 4 October 2011 (UTC)[reply]

For what it is worth, even though there seem to be a lot of IPs it is really only one ISP. That ISP has 3 /16s, but we can narrow it down to maybe 4 or 5 /21s or higher I think. I would want to check and see how much collateral damage that would mean but blocking all of the IPs this person uses would not be hard if that was the consensus. - [The]DaveRoss 01:50, 5 October 2011 (UTC)[reply]
I previously range blocked him before the vote on pinyin entries had passed using a subnet filter of 16 bit. I have been trying to help him in good faith, but he's really testing my patience. His sole purpose is to make romanization of Mandarin the standard in this dictionary. Personally, I think this is simply not viable in the long run. It's the sheer amount of information in the language that will be lost and the amount of confusion that this will cause as a result, should everything be written in pinyin. Mandarin, unlike English, has a huge number of homophones and heteronyms. This is THE reason why it should not be romanized. The syllable , has over 50 known homophonous characters associated with it, each with its own set of meanings and in some cases, its own set of heteronyms. This is also one of the reasons the Japanese adopted Kanji characters to distinguish between the meanings of homophones. If things keep worsening, I will consider blocking him again. JamesjiaoTC 03:23, 5 October 2011 (UTC)[reply]
The "user with many names" seems to be following pinyin entry rules, more or less. It's a new issue. abc123 (his original name) or Engirst is ready to fight (edit war) over Thames河, Hyde公园 and many others, forces examples like "London是英国的首都。" instead of "伦敦是英国的首都。". I had to protect some pages (temporarily) from his edits. As a Chinese speaker, what's your opinion on this type of entries? --Anatoli 03:50, 5 October 2011 (UTC)[reply]

A freshly generated IP-address - 2.25.212.57 --Anatoli 09:41, 6 October 2011 (UTC)[reply]

My proposal: I will block anonymous contributions from the ranges which have recently been abused. We leave Engrist unblocked (unless new reason for blocking arises) for the time being at least until the end of discussions on how to handle the particular Mandarin issues currently in debate. Once those issues have been resolved, Engrist can choose whether or not to abide by the results; if Engrist chooses not to follow the community resolution we formally ask Engrist to leave the project, modify the blocks to include logged in users, and actively block future socks. I think this can be done with minimal collateral damage. - [The]DaveRoss 21:25, 6 October 2011 (UTC)[reply]
At the moment, we are only blocking individual IPs. I propose to range block. I've done some research and it seems that his ISP (in London, UK) gives out dynamic IPs in the range of 00000010 00011000 00000000 00000000 and 00000010 00011011 00000000 00000000 with a subnet mask of 11111111 11111100 00000000 00000000 (in IPv4: 2.24.0.0 to 2.27.255.255 with a 14-bit subnet mask). Blocking the whole range would mean possible collateral damage, but it wouldn't be too bad if we still allow account creation. JamesjiaoTC 22:29, 6 October 2011 (UTC)[reply]
FWIW, here is a list of IPs: 2.25.191.81 (?), 2.25.191.225 (?), 2.25.193.30, 2.25.212.57, 2.25.213.147, 2.25.214.61, 2.27.72.254, 2.27.73.75, 2.27.72.78 (I am not convinced of this one). AFAICT, few other editors have edited recently from IPs in that range. - -sche (discuss) 22:28, 6 October 2011 (UTC)[reply]
All of these are incarnations of the same entity. JamesjiaoTC 22:32, 6 October 2011 (UTC)[reply]
This is why I proposed to do the range block, I checked the potential ranges and can avoid pretty much all collateral damage as well as target the IP address space which is allocated to whatever region this user is in. - [The]DaveRoss 22:46, 6 October 2011 (UTC)[reply]
I actually forgot about my previous reply to your proposal. Silly me. Well at least you have my support. JamesjiaoTC 22:50, 6 October 2011 (UTC)[reply]

Another IP-address for the record - 2.27.72.125. Does anyone still think it's different people? --Anatoli 01:20, 7 October 2011 (UTC)[reply]

Immediately after me trying to talk to him, he "moved" to a new IP address: Special:Contributions/2.25.212.90. It must a game of chasey for him. --Anatoli 02:06, 7 October 2011 (UTC)[reply]
Does anyone else think that the range is too wide between 2.27.... and 2.25...? --Anatoli 02:08, 7 October 2011 (UTC)[reply]
The IP hops might be intentional, they might be the way the ISP operates. Seeing as we are not blocking most of these IPs I can't imagine why they would change IPs between edits. There are three much smaller ranges (/23s) which are more realistic. - [The]DaveRoss 02:13, 7 October 2011 (UTC)[reply]
Some ISPs allow a fresh IP address to be assigned when you cold restart your modem. I think Engirst might have found this trick. JamesjiaoTC 02:32, 7 October 2011 (UTC)[reply]
Some ISPs also give their users subnet IPs and force all traffic through proxies, which means that every few minutes or hours they may have a different IP presented to the outside network based on which proxy they end up on. AOL was like this for many years. What you say makes sense if we were blocking each IP, since we hardly have any blocked it makes little sense. - [The]DaveRoss 04:03, 7 October 2011 (UTC)[reply]
I don't understand why, though. I have addressed him several times with no answers but every time he changes his IP address. His user account (Engirst) is not locked. He prefers the backdoor, as if nobody can see what is happening. BTW, his first account was "123abc", not abc123 as I said before. Then, there was "Ddpy". Most of his edits are now gone but there are still many to be fixed or deleted. --Anatoli 02:43, 7 October 2011 (UTC)[reply]
I'd be happy for us to delete any pinyin where we don't have the Hanzi equivalent. Perhaps we could make it a formal rule. I'm not sure such a rule is needed, as I deleted about 100 such entries last night and nobody objected. Mglovesfun (talk) 07:54, 7 October 2011 (UTC)[reply]
Would that be bot-able? Basically check all pinyin entries to see if there are any hanzi entries that list the same pinyin, and delete the pinyin entry if no such hanzi entries are found? -- Eiríkr Útlendi | Tala við mig 16:49, 7 October 2011 (UTC)[reply]
Well, a pretty good but imperfect solution is this edit to {{pinyin reading of}} which checks if the first parameter (aka tra, trad) exists. If it doesn't exist it categorizes the entry in Category:Mandarin pinyin entries without Hanzi. This of course won't work for entries that don't use pinyin reading of, and it will miss entries that exist but lack the correct language (that is, they have only Japanese/Cantonese/Korean or whatever). Mglovesfun (talk) 16:53, 7 October 2011 (UTC)[reply]
Hanzi live in a particular range of Unicode, so I think it would be possible to find all of the pinyin readings that way, regardless of template usage. - [The]DaveRoss 19:56, 7 October 2011 (UTC)[reply]
New "additions" of Special:Contributions/2.25.214.239, all mixed language items, the foreign names are all deliberately untranslated: Ohm定律, a correct Mandarin is "欧姆定律" (Ōumǔ dìnglǜ), Banach空间 - "巴拿赫空间" (Bānáhè kōngjiān), Hilbert空间 - "希伯特空间" (Xībótè kōngjiān), also by another 123abc's sockpuppet: Special:Contributions/2.27.73.100 Hausdorff空间 - "豪斯多夫空间" (Háosīduōfū kōngjiān). Happy to block to user and delete all these entries, they are not Mandarin. Soft redirect might be considered if we have the Mandarin, not mixed entries.--Anatoli 21:56, 9 October 2011 (UTC)[reply]
Concurrent discussion: Wiktionary:Requests_for_deletion#Planck.E5.B8.B8.E6.95.B0, Talk:Planck常数

It is a real word, please see Google Books. Anyhow, shouldn't deleted and blocked. 2.25.214.61 21:38, 4 October 2011 (UTC)[reply]

I answered above. --Anatoli 21:55, 4 October 2011 (UTC)[reply]
All languages in the world, big or small should be equal. They are real words, only dictators want to ban real words. 2.25.214.61 22:03, 4 October 2011 (UTC)[reply]
Am I banning a language? I don't want you to insert mixed language entries and translations - simply called Chinglish. Wiktionary is not to show how people with poor language skills use it. Quoted "普朗克常数" has 1,500 hits in Google Books, why would anyone want to promote "Planck常数" instead (29 hits)? You are just spreading illiteracy. No-one is trying to force "Planck 상수" (Korean) or "постоянная Planck" (Russian). I have to protect pages because of you, just stop it, will you? --Anatoli 22:32, 4 October 2011 (UTC)[reply]
"poor language" is just your personal idea, but Planck常数 is used in professional books. 2.25.214.61 22:39, 4 October 2011 (UTC)[reply]
Who defines poor language? CFI does not have a quality stipulation on acceptable words, nor is it acceptable for any Wikimedia project to promote anything. If there's 29 hits of Planck常数, then it has the same justification as Planck's constant, and quite probably should have a usage note pointing to 普朗克常数.--Prosfilaes 05:12, 5 October 2011 (UTC)[reply]
I have no choice but temporarily protect pages from you. To anyone, please contact me if you think I'm abusing my administrator rights. I really see no choice at the moment. The reasoning is explained many times, I won't repeat. Seems like déjà vu. "Poor language" is my abbreviation of all said before. --Anatoli 22:42, 4 October 2011 (UTC)[reply]
Yes, you are. You are a language dictator. 2.25.214.61 22:47, 4 October 2011 (UTC)[reply]
Whatever, when a person like you says it, it means I'm doing the right thing, thank you. Knowing your records, I'm 100% sure that if you had adminsitrative rights you would dictate Mandarin without Chinese characters onto Wictionary or something. I don't think you are a passionate linguist, you're obsessed with your "transition to Roman letters" ideas. If you are Chinese, you must be ashamed, I support the Chinese person who told you off. --Anatoli 22:55, 4 October 2011 (UTC)[reply]
(@2.25.214.61 etc) Please refrain from name calling, it will certainly not further your cause. It is very important to recognize that this website is a collaborative project. Even though we all have our own opinions about what Wiktionary should be, we agree to sacrifice some of what we want so that Wiktionary can be what the whole community wants. Please take some time and consider what your goals are, and then present them to the community for discussion, persuade us, and allow us as a community decide which course to take. A sure fire way to lose any support you may have had is to simply try to impose your ideas on the community against its will and then get defensive about it. If you continue to be as combative as you have been then we will most likely ask you to leave for the good of the project. That may not be a bad thing, Wiktionary is not a good fit for everyone, but I would rather you decide that the community effort is worth some sacrifice and join us under the agreed upon terms. Thanks, - [The]DaveRoss 01:40, 5 October 2011 (UTC)[reply]
Thank you, TheDaveRoss. I just want to comment that similar ideas are shared by people on pinyininfo.com, some of them do make sense to me - standardisation of pinyin and transliteration of Chinese names for all Mandarin speaking countries and areas. I'm sure he will be welcome there. Another thing - he was already told to leave, blocked numerous times, only a few by me. Talk to User:Tooironic. Then he reappeared with no difficulty, making the administrator right to block a contributor who creates more problems than adds value, a joke. --Anatoli 01:52, 5 October 2011 (UTC)[reply]
Problems are caused by somebody abuse power. Wiktionary has no rule to ban mixed scripts till now. 2.25.214.61 02:12, 5 October 2011 (UTC)[reply]
I agree, problems can be caused by those who abuse power. Problems can also be caused by those who refuse to listen to others in the group. The way rules are developed on Wiktionary is very organic. We don't have a rule for something until a disagreement about it arises. Once a disagreement does arise, those who are in disagreement stop what they are doing and open the issue up for discussion, either between those who are close to the problem or the community as a whole. If it makes sense, there is a rule created based on the result of the discussion. Just because something doesn't have a "rule" doesn't mean that it is allowed. We don't have a rule about deleting the main page, yet it is something for which a person would be punished. Thank you for your willingness to discuss the issue. - [The]DaveRoss 02:27, 5 October 2011 (UTC)[reply]
(@2.25.214.61 - Engirst) Really? It must be me? What about your toneless pinyin story? Remember this Wiktionary:Beer_parlour_archive/2010/May#block_list?
BTW, I added many Mandarin translations using mixed scripts, eg. DVD player, edited 卡拉OK, created T恤衫. I have no problems with many others. Good try but you need more correct answers. --Anatoli 02:34, 5 October 2011 (UTC)[reply]

On such topics, decisions should be based on consensus, not on votes based on personal opinions. E.g. even if a majority wants to exclude a language (for political or whatever reasons) while a minority wants to keep it, it should be kept if it can be called a language. It's the same for words. There should be a discussion between open-minded people until a consensus is reached. Lmaltier 18:05, 5 October 2011 (UTC)[reply]

Lmaltier, although I broadly agree with you here, I'm not sure you know what consensus means, and as a result, you seem to contradict yourself.
Regarding individual terms, the crux of the current issue revolves around what constitutes "Mandarin", and the majority opinion (i.e. rough consensus) appears to be that Mandarin does not include "Alzheimer's" or "Einstein" or "Thames" or "Planck". I think all the Chinese editors here would agree that 常数 (chángshù) is Mandarin, which makes Planck常数 a curious mixed-language hybrid term.
I see only two clear paths forward for keeping terms like Planck常数:
  1. Create a ==Chinglish== (or similar) language heading, and categorize such terms under this.
  2. Include such terms, but keep the entries extremely simple, just listing the terms as alternate spellings or misspellings and linking through to the hanzi-spelled entries that contain the definitions, usage examples, etc.
Without any general consensus (there's that word again) as to which course to take, I'm inclined to view these as truly mixed-language terms, that would thus not belong under any single-language header. -- Cheers, Eiríkr Útlendi | Tala við mig 18:42, 5 October 2011 (UTC)[reply]
In any case, there should be an agreement before we allow creation of so many hybrid entries, it's not a common practice here. Only one user (even if under different names or anonymously under different IP addresses pushes it so ardently). That's why I created the vote - to get a collective decision. Lmatier, you'll have a chance to vote and express your opinion. If the vote fails (hope not), then we need to discuss the details. I agree with Eirikr that we shouldn't keep them as just Mandarin entries because they are not. --Anatoli 23:12, 5 October 2011 (UTC)[reply]
What I mean is that there should be clear and simple principles, the main ones being all words in all languages and a header for a language means that the word is used in this language. A consensus is not the result of a vote, only the fact that all open-minded people agree, after discussion based on arguments, that principles are met, even if some (or the majority) would prefer not to include the word for personal reasons (because they don't like it, etc.), or agree that principles are not met. Lmaltier 05:21, 6 October 2011 (UTC)[reply]
We don't allow some SoP's, even if some people think they are words, do we? We don't need "blue sky" or "tram no. 20" or "Chinese for London is 伦敦". To me and a few others, as you can see "Planck常数" is not a word but two: Planck + 常数, and one of them is not Chinese, even if it's used in a Chinese, it's still English inside Chinese. There will be more and more English words used by Chinese but why do we need to include them here if they are haven't become part of the language? --Anatoli 05:41, 6 October 2011 (UTC)[reply]
Anatoli, "Planck常数" is not a semantic sum of parts: its meaning cannot be obtained from the knowledge of the meaning of "Planck" and "常数". The same is true of the English "Planck constant", for which we have Planck's constant. This discussion should not be in BP anyway, but rather in RFV (if you think the term is not attestable) or in RFD (if you think the term is a semantic sum of parts or have other reasons to believe the term does not meet CFI). Furthermore, "Planck常数" is not a proper noun, so the vote you have proposed (Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters) will have no effect on the inclusion of "Planck常数". See also WT:RFD#Planck常数, which I have created based on your having tagged the term for RFD on 4 October. --Dan Polansky 06:41, 6 October 2011 (UTC)[reply]
The Chinese term for "Planck constant" is "普朗克常数", not "Planck常数". There's nothing Chinese about the name "Planck". If a person called Mark says "我叫Mark" - my name is Mark, rather than "我叫马克 (wǒ jiào Mǎkè)". "Mark" doesn't become Chinese translation of English "Mark" but "马克" is. (One of the books with "Planck常数" has "Heisenberg 和 Schr6dinger"(?)). Engirst hardly contributes in the main Mandarin area, only pinyin. Why when he does, it's "Planck常数", not "普朗克常数". The ratio is 1,500 to 29. Don't you see he has an agenda? Quite amusing were his examples he was forcing - "London是英国的首都". Is this Mandarin?! You can find "Obama总统", "Cameron首相". So what? Do we start adding them as Mandarin? --Anatoli 08:36, 6 October 2011 (UTC)[reply]
Do you acknowledge that "Planck常数" is not a semantic sum of parts? --Dan Polansky 11:43, 6 October 2011 (UTC)[reply]
That would be recognising it as a term, no I don't recognise it as a word, there are two languages in one sentence. It's artificial and not assimilated at all. "A rare misspelling of" is the best I can give it, it could be a single unit to someone using a hybrid of languages, like the person who could say "我是America人" - "I'm American". Are you actually reading what I said before? --Anatoli 12:02, 6 October 2011 (UTC)[reply]
I am asking about whether you acknowledge that the term is not a sum of parts. I am not asking whether you acknowledge the term to be a word, whether you deem the term worthy of inclusion, or whether the term is "artificial" or "assimilated". This question of whether it is a sum of parts can IMHO be fairly objectively answered in the negative, so I am asking whether you can confirm the observation that the term is not a semantic sum of parts, disregarding for a while your goal of getting the term excluded. If you claim that the term is a semantic sum of parts, can you explain whether you deem "Planck constant" a semantic sum of parts and why? --Dan Polansky 12:10, 6 October 2011 (UTC)[reply]
普朗克常数 (Pǔlǎngkè chángshù) is a semantic term. Planck常数 is European Planck + Chinese constant. If you look up 常数, then you will have the translation of Planck常数 without any need for Planck常数. —Stephen (Talk) 12:17, 6 October 2011 (UTC)[reply]
As another example of a term, which is not assimilated but was uttered and there is one citation, is "сраный ковбой" (shitty cowboy). I requested its deletion because it never caught on, not assimilated in the meaning American (abuse). Mixing English names and words into Chinese is not a new trend and we already have the English names. Most famous English proper nouns can now be found in a Chinese text, rivers, cities, mountain ranges, formulas, theorems will follow the original term with a Chinese suffix. Will "Mont Blanc山" or "California州" become Chinese only because they are followed by 山 and 州? --Anatoli 12:19, 6 October 2011 (UTC)[reply]
@Dan Polansky. "Planck常数" is a sum of parts. --Anatoli 12:25, 6 October 2011 (UTC)[reply]
Are you saying that "Planck常数" is a semantic sum of parts, while the English "Planck constant" is not a semantic sum of parts and "普朗克常数" is not a semantic sum of parts? Is this conjunction of three assertions what you are saying? --Dan Polansky 15:48, 6 October 2011 (UTC)[reply]
@Anatoli, @Dan --
I think you two might be talking past each other. @Anatoli, by saying that "Planck常数" is not SOP, I think Dan is stating that the meaning of this phrase is not clear just from the parts -- if I only know (or only look up) "Planck" and "常数" as individual pieces, I have no idea that "Planck常数" is intended to mean h in physics.
Meanwhile, @Dan, I think what Anatoli is getting at is that "Planck" is a term in English (and other European languages), and "常数" is a term in Mandarin, and while the average Mandarin reader would understand the latter, the former would only be understood by that subset of Mandarin readers who are also at least somewhat familiar with European languages. By saying that "Planck常数" is SOP, I think Anatoli is stating that this term is comprised of two distinct parts, and only one of these parts is recognizable as Mandarin.
@Anatoli, @Dan, have I understood each of you correctly? -- Hoping this helps clarify, Eiríkr Útlendi | Tala við mig 20:05, 6 October 2011 (UTC)[reply]
Eiríkr Útlendi (or just "Eiríkr"?), you understand me perfectly well. I believe my use of the phrase "sum of parts" and "semantic sum of parts" is in perfect align with the customary use of the phrase in English Wiktionary, and also fits the natural reading of the phrase "semantic sum of parts". My use refers to WT:CFI#Idiomaticity and its use of the term "idiomatic", defined in CFI in this way: 'An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components.' Where CFI says "idiomatic", I say "not a sum of parts" and "not a semantic sum of parts".
Re: 'By saying that "Planck常数" is SOP, I think Anatoli is stating [...]': This would mean that Anatoli has invented a new meaning of "sum of parts" as applied to terms, a meaning that has nothing to do with WT:CFI#Idiomaticity and hence is irrelevant. I reject this new meaning as part of a meaningful discussion about inclusion-worthiness of terms; the term "sum of parts" has, in Wiktionary discussions, a specific bound meaning such that editors are not free to redefine the term as they see fit. --Dan Polansky 20:48, 6 October 2011 (UTC)[reply]
──────────────────────────────────────────────────────────────────────────────────────────────────── @Dan, one thing I haven't heard articulated yet by you (or at least haven't understood) is your views on the status of the term Planck常数 with regard to language. SOP or not, the main reservation from Anatoli (and myself if I'm perfectly clear about that) is that Planck常数 is not a single-language term. In arguing for this term's inclusion, do you view Planck常数 as common use in Mandarin contexts, and therefore meeting CFI as a single-language term?
Anatoli and IP user 60.240.101.246 are self-identified as Mandarin speakers, and neither are happy including this term, with both stating that Planck常数 as a whole is not Mandarin. James Jiao identifies as a native speaker and weighed in here regarding the term Thames河, and at least some of his points in that thread would seem to apply to Planck常数 as well. The main user(s) adding such terms and arguing for their inclusion, specifically Engirst and multiple IP users who may or may not also be Engirst, have never to my knowledge indicated whether they are Mandarin speakers, even when asked point-blank. (I'm not fluent in Mandarin nor familiar enough with writing styles to say much about specific Mandarin terms on my own authority, but I am concerned about the possible precedent and how that might affect entries classified as Japanese; hence my participation here.)
I would appreciate it if you could explain a bit about your specific reasons for wanting to include Planck常数. Your views on this term's non-SOP-ness are clarified by your post above, so what other reasons do you have? I'm honestly curious, and I do not feel like I understand your position well enough to really agree or disagree in any clear and reasoned fashion. -- Eiríkr Útlendi | Tala við mig 21:12, 6 October 2011 (UTC)[reply]
I have not said anything about whether I want "Planck常数" included. Rather, I wanted Anatoli to stop erroneously claim that "Planck常数" was a sum of parts. This is a hard subject; the thought about it is not made clearer by fallacious argumentation that involves erroneous claims of sum-of-part-ness, and terms such as "madness", "spread illiteracy", "have an agenda", and "Chinglish". A reason for wanting the term included would be that it meets CFI. A term that meets CFI can still be tagged as "rare"--which it definitely is--or even as "nonstandard"--which it seems to be as well. The dictionary's containing a term does not yet mean that the dictionary somehow endorses the term or recommends its use. The dictionary merely registers observations about the actual use of language. I have no strong feelings about "Planck常数"; it is so rare that it can be considered a rare malformation or something, not much unlike a rare misspelling; I do not really know. I do admit that no one will probably want to look up the term, an indication that it could be deleted. What I am passionate about is elimination of wrong argumentation, though, wrong as far as I am able to tell anyway. Furthermore, CFI does not say anything about "common use", other than in "Any word may be rendered in pig Latin, but only a few (e.g., amscray) have found their way into common use", which is a sentence in a rather poorly phrased section of CFI that has been kept for no consensus for deletion (5:4:0 for deletion) in the vote Wiktionary:Votes/pl-2011-01/Final_sections_of_the_CFI, but should better be deleted anyway so as not to mislead, as WT:CFI#Attestation does not say anything about "common use". --Dan Polansky 21:39, 6 October 2011 (UTC)[reply]
Ah, thank you, now I have a better understanding of where you're coming from. FWIW, I am slowly warming to the idea of inclusion with a soft redirect to the main entry at 普朗克常数 and a note about rarity, iff acceptable citations can be provided.
As a minor point, WT:CFI#Attestation does state “Attested” means verified through 1. Clearly widespread use -- notably, not as the sole limiting criterion, but "common use" would appear to be one of the criteria. -- Eiríkr Útlendi | Tala við mig 22:08, 6 October 2011 (UTC)[reply]
Above all, point 1. of WT:Attestation is an item of a disjunctive list (A or B or C or D), so it is not a necessary requirement for attestation. Point "1. Clearly widespread use" should IMHO be deleted from CFI; it just misleads. Fact is, point 3. of WT:Attestation ("Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year, [...]") provides more lenient criterion than the point 1., so the point 1. is redundant. What point 1. currently does is make it possible for people to claim in WT:RFV that a questioned term "is clearly in widespread use", but that is IMHO a matter of procedure rather than an extended definition of "attested": CFI does not state how the attestation should be documented, in particular, whether the quotations need to be actually entered into the Wiktionary database. So again, removing point 1. would simplify things without changing the substance of CFI, IMHO. --Dan Polansky 07:16, 7 October 2011 (UTC)[reply]
That's a clear explanation, thank you. Wiktionary:CFI#Attestation_vs._the_slippery_slope suggests that attestation alone should not be the sole justification for inclusion, however; what's your view on that? (And perhaps this particular discussion should be moved into a separate thread? This is getting unwieldy.) Never mind on that second part, just saw your reply over at Wiktionary:Requests_for_deletion#Planck常数, which answers my question. -- (Updated) Eiríkr Útlendi | Tala við mig 16:54, 7 October 2011 (UTC)[reply]
My view on that is that Wiktionary:CFI#Attestation_vs._the_slippery_slope is purely informative; it provides information on interpreting the rest of the document, not actual rules for what passes CFI.--Prosfilaes 17:00, 7 October 2011 (UTC)[reply]
(After an edit conflict)I do not claim to have a new definition of "sum of parts" but "Planck常数" is a sum of parts because it's not a Chinese term at all. I do have strong feelings about NOT keeping this type of entries because they are simply wrong. A physical Chinese dictionary simply uses 普朗克常数 (Pǔlǎngkè chángshù), explaining that 普朗克 (Pǔlǎngkè) is the transliteration of the name "Planck". I'm not skilled at presenting my arguments in English well but allowing "Planck常数" would present a bad precedent, like "Archimedes螺线" instead of "阿基米德螺线" (Archimedean spiral) or similar (don't quote me on the exactness of the possible way of someone writing in a Chinese text). I'm less worried about "sum of parts" rules than the quality of foreign language entries. Sum-of-parts problem is noticed quickly when an entry is English but if they are in FL, many get through unnoticed. Are you angry with me because I used "madness", "spread illiteracy", "have an agenda", and "Chinglish"? It's madness to convince everyone that "Thames河" is Mandarin for "Thames", it's also illiterate, although it's often forgiven to overseas Chinese not knowing how to write a foreign name in Chinese. "Madness" is a strong word but I do have strong feelings about it. I'm not calling Engirst (he doesn’t want to use this account any more?) mad but I DO think he has an agenda. His agenda (it's only one male user, not many) was confirmed many times by Chinese speaking contributors, let me call it "Mandarin in Latin script". Next term - "Chinglish", among other things, means "mixed Chinese and English" or a hybrid language, not offensive. People do use Chinglish, Japlish, Runglish, Konglish, etc. but we don't have CFI for them. I don't think I was offensive to anyone but if I was I apologize. I had an argument with a Russian Wikipedia editor that "Bluetooth" is not a Russian term, well he quoted sources like our case with "Planck常数", still "Bluetooth" hasn't become a Russian word in this spelling. Languages not using Roman letters all have different perceptions of what IS part of their language, especially if it is written in a different script, generally, in 99.9% cases - if a word is not in a native script, it's not part of this particular language, with a very few known exceptions. Shall we agree to disagree at this point? You are welcome to take part in the vote and present a summary of your reasons.
After reading new Eirikr's comment - yes, rather than deleting, having a soft redirect could be a compromise I would accept, Chinese struggle themselves knowing how to transliterate a foreign name and there could be variants, not just between China/Taiwan/HK but even in one country. --Anatoli 22:33, 6 October 2011 (UTC)[reply]
Re: '"Planck常数" is a sum of parts because it's not a Chinese term at all': If this is not a redefinition of "sum of parts", then I do not know how a redefinition would look like. It does not seem to have anything to do with WT:Idiomaticity: 'An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components'. --Dan Polansky 07:16, 7 October 2011 (UTC)[reply]
I already expressed my acceptance. --Anatoli 21:59, 7 October 2011 (UTC)[reply]
Right, but if the vote passes, it will ban soft redirects for entries that contain or are proper nouns. If we're OK with making the entries "soft redirect" (point in an explanatory way to the main entries, like 'ave points to have), we shouldn't necessarily hold that vote; we should just make mixed-language/mixed-script entries into soft redirects. - -sche (discuss) 22:35, 7 October 2011 (UTC)[reply]
We have some time before the vote. We need to see the reaction of opponents of the proposal first. --Anatoli 22:42, 7 October 2011 (UTC)[reply]
Besides, I don't see a contradiction of the vote and redirects. Banning will not disallow redirects like Mockba. I may add a clause. --Anatoli 22:45, 7 October 2011 (UTC)[reply]
I am going to oppose the vote Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. It is essentially prescriptivist ('"Thames河", "Planck常数", "Alzheimer病", etc. could be made soft redirects to the correct Mandarin entries.', emphasis on "correct" mine). Furthermore, it seems fairly incoherent at this point. The vote seems to modify CFI for Mandarin, yet the sixth reason stated in the vote claims the discussed terms such as "Alzheimer病" already fail CFI as being sum of parts. The vote seems to be predicated on the assumption that it is the business of a dictionary to "prevent spreading illiteracy", whereas the business of a descriptivist dictionary is to document what can actually be observed, and mark it as "rare" and "nonstandard" if it fits observation. By including a term, a dictionary does not promote the term, especially when the term is marked as nonstandard. In particular, by including vulgar terms, a dictionary does not promote their use; by including terms marked as obsolete, a dictionary does not promote their use. By an analogy, a library of all books ever published on the Earth contains all books, regardless how objectionable the books may seems to the librarians or majordomos of the library. There are some further issues with the vote. --Dan Polansky 07:54, 8 October 2011 (UTC)[reply]
It's normal to have language specific policies, especially if they are more restrictive than CFI for English (not breaking the existing rules). People who know the language they are editing in, will know better, other people may offer decisions that may be wrong or not followed. Dan, languages you work in, are mostly Roman based, not sure you understand that words like iPhone, for example, can be used in a Russian, Mandarin or Japanese text, you'll find a millions of citations of it, but they are not part of that language even if they are the official forms - a sign will say "iPhone", not "айфон", does it make sense? Similar with "Planck常数", only Chinese living overseas and mixing languages will know what it means. I start suspecting that you too have some agenda. Why so much enthusiasm towards Mandarin all of a sudden when we are dealing with a mixed script? You are even being aggressive towards me calling my arguments "fallacious". Also, why does it worry you personally, what is included as a Mandarin term in Witionary? Do you work with Mandarin? No Mandarin dictionary in whatever country, no matter how large, would include such terms. Anyway, the discussion in Talk:Planck常数 seems to lead to a possible compromise. If we won't reach it, we'll decide on the vote. BTW, I don't think extending it to one month is needed, two weeks will suffice. --Anatoli 11:13, 8 October 2011 (UTC)[reply]
"Have an agenda" and concern with personal motivation are fallacies of irrelevance; my enthusiasm is of no one's concern. I find many of your arguments fallacious ("characterized by fallacy; false or mistaken"), and feel entitled to say so without considering it a personal attack. I am worried with a proliferation of prescriptivist inclusion criteria, and with spread of prescriptivist thought in Wiktionary, as well as with incorrect use of the term "sum of parts" AKA "nonidiomatic", incorrect with respect to WT:Idiomaticity.--Dan Polansky 13:08, 8 October 2011 (UTC)[reply]

I don't understand what this discussion is about. This is a non-Mandarin word (Planck) combined with a Mandarin word (常数). Mandarin speakers don't perceive "Planck常数" to be a Mandarin word (ask any native speaker if in doubt), so even if this is not a SOP, it shouldn't be kept. "Москва" is used in English but English speakers don't regard it as an English word, hence it was deleted unanimously. 60.240.101.246 13:21, 8 October 2011 (UTC)[reply]

Re: 'Mandarin speakers don't perceive "Planck常数" to be a Mandarin word': What evidence for this assertion do you plan to provide? Are you saying that not a single Mandarin speaker considers "Planck常数" to be a Mandarin word or that most Mandarin speakers considers "Planck常数" to be a Mandarin word? --Dan Polansky 13:33, 8 October 2011 (UTC)[reply]
With your capability you will not be able to find one that does (exclud. possibly Engirst, who doesn't seem to be Hanzi-literate). 60.240.101.246 13:36, 8 October 2011 (UTC)[reply]
Care to answer my questions? What evidence? Not a single does or most don't? --Dan Polansky 13:39, 8 October 2011 (UTC)[reply]
OK. Here is what you wanted. I can't speak for everyone of course, but I do understand the mindset and perceptions of languages by native Chinese speakers better than you do. There are too many references on this issue, eg. 《直用原文──现代汉语外来语运用中的一个新趋势》,《試論漢語文字和中國人的傳統思維方式》,《原形借词——现代汉语吸收外来语的新发展》,《论外来语对现代汉语的冲击》,《关于外来语及其周边概念的考察》,《关于汉语文字的几点认识》,《2010年中国语言生活状况报告》,《现代汉语中字母词研究综述》,《外来语在汉语中的使用及对汉语的影响》, they basically all comment that the increase in loanwords needs to be noted and become alerted to; they don't fit into Chinese phonology and sound very foreign; a recent trend is words in other languages used directly without being transcribed or translated; these words are used to avoid confusion or for convenience; they do not appear in formal situations where transcription and translation always occur and the general public doesn't regard these words as being assimilated into the Chinese lexicon; phonologically adapted loanwords tend to be replaced by native calques eventually; this tendency contrasts starkly with the Japanese and Korean cases, where massive and indiscriminate importation is currently occurring; and in conclusion the import of loanwords damages the structural integrity and purity of Chinese, although some young people view this as fashionable, it should be regulated and discouraged. 60.240.101.246 14:32, 8 October 2011 (UTC)[reply]
Essentially the Wiktionary community is a miniaturised version of the general public. Out of those who actively participated in the deletion discussion of these mixed script entries, people who know some Mandarin (Anatoli, Tooironic, Jamesjiao, me) all voted against the inclusion, and people who don't know the language (Lmaltier, Dan Polansky, Prosfilaes, -sche (initially)) tended to keep these. The chance of this occurring assuming equal probabilities for the two cases is 1/256, or 0.4%, low enough to be considered statistically significant. 60.240.101.246 14:40, 8 October 2011 (UTC)[reply]
Thus, you do not plan to provide any evidence; instead, you offer yourself as a witness.
Let me highlight this quotation: "[...] the import of loanwords damages the structural integrity and purity of Chinese, although some young people view this as fashionable, it should be regulated and discouraged". This quotation is outright prescriptivist. A prescriptivist lexicographer sees it as a goal of a dictionary to protect "the structural integrity and purity" of a language. Such a prescriptivism is typical of language academies around the world. By contrast, the English language has no such central regulatory body of language; an Anglo-American descriptivist dictionary does not see it as its aim to protect the purity of language but rather aims at documenting the use of language as it actually occurs, regardless whether language authorities approve or disapprove of its use. Moreover, it is still possible in a descriptivist dictionary to note that some authorities consider a term incorrect, whether by means of the template {{nonstandard}} or by means of a usage note. The current entry "Planck常数" lead the user of the dictionary to synonyms: 普朗克常数, 浦朗克常数, 卜朗克常数. If this entry is deleted, the fact that '"Planck常数" is a nonstandard term whose standard and widely accepted synonyms include 普朗克常数, 浦朗克常数, and 卜朗克常数' remains undocumented in the dictionary, an unfortunate circumstance. --Dan Polansky 14:53, 8 October 2011 (UTC)[reply]
Re: "[...] they do not appear in formal situations": Neither does ain't and gonna; you do propose to delete these as improper English? Should Category:English informal terms be deleted? And what about such foreign importations as English háček, which threatens the purity of the English language? --Dan Polansky 14:59, 8 October 2011 (UTC)[reply]
Dan, I find your doggedness in this issue to be a bit odd. I understand your concerns about prescriptivism versus descriptivism; that part makes sense to me. That said, IP user 60 here, Anatoli, and James Jiao, among others, are basically making the point that terms like "Planck常数" are about as intelligible to Chinese readers as "Москва" is to English readers. If that is the case, and if "Москва" has been deleted as "not English", why are you apparently so opposed to deleting "Planck常数" as "not Mandarin"? I confess I'm confused by your stance, and I must assume it's because I don't fully understand your perspective. -- Eiríkr Útlendi | Tala við mig 03:14, 9 October 2011 (UTC)[reply]
"Москва" is perfectly intelligible to English readers; or at least as intelligible as pemoline, ironbark or votator. Furthermore, we didn't delete Москва because it wasn't an English word; we decided it was Russian in English and deleted the English section and not the Russian section. You want to delete Planck常数 as a whole and act like this attestable word doesn't exist just because it doesn't fit your constraints.--Prosfilaes 04:33, 9 October 2011 (UTC)[reply]
@Prosfilaes: Are you speaking for yourself, or on Dan's behalf?
That aside, your comment here comes off as disingenuous. English readers unversed in Cyrillic will not find "Москва" at all intelligible, certainly not as "moskva". Moreover, we decided it was Russian in English and deleted the English section sounds an awful lot like "Москва" has been deleted as "not English", leaving me uncertain what distinction you are making. Is your intended point that, since some headword "Москва" still exists, removing the English is acceptable?
Regarding you want to delete Planck常数 as a whole and act like this attestable word doesn't exist just because it doesn't fit your constraints -- my only constraint is that a term be filed under the appropriate language. "Москва" is not English, so I support that term not being listed under an English heading. Planck常数 doesn't appear to fit under any of our existing language headings, so I support that term not being listed under any of our existing language headings.
Regarding attestation that a particular string exists in use somewhere, ittyshay seems like it might be attestable given the number of hits at google:ittyshay, but WT:CFI, as it's currently written, counsels against including pig Latin. In a similar mien, google:"my+natsukashii" suggests attestability for natsukashii in English contexts, but it is not included here under an English heading, ostensibly as it is not recognized as English. Attestation alone appears to be insufficient for inclusion -- which strikes me as reasonable, for it is unreasonable to argue that attestation in a given language context alone makes a term that language -- which is kind of the whole point of this thread, that Planck常数 is not Mandarin. -- Eiríkr Útlendi | Tala við mig 05:27, 9 October 2011 (UTC)[reply]
@Eiríkr Útlendi: CFI does not advise against pig Latin but rather says that pig Latin can be included as long as it is attested, giving amscray as an example; you should read the relevant "slippery-slope" section again, and read again my response that the "common use" and "general use" used in that section are misleading and match neither the current practice nor WT:Attestation. Again, in case in doubt, you can create a new thread here in Beer parlour in which we clarify whether people agree that "common use" should be required for pig Latin. google:"my+natsukashii" searches world wide web, which does not count for attestation; google books:"my natsukashii" finds nothing and google books:"ittyshay finds nothing; the relevant point of CFI is "Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year". From what I can tell, you still have a poor grasp of how CFI usually gets applied, especially the attestation section. Instead of focusing on idiomaticity and attestation as specified in CFI and as usually applied, the supporters of deletion variously claimed that "madness" must be stopped, that importation needs to be regulated, that we must not "spread illiteracy", or, now, that the phrase is non-intelligible to many native speakers. But the non-intelligibility to many native speakers is not a concern per WT:CFI; "Planck常数" or "Planck 常数" seems attested Google books. There are many specialist terms entered into Wiktionary as English that are not readily understood by the majority of English-speaking population. Wiktionary registers attested terms rather than terms that are readily understood. The assertion that 'Planck常数 is not Mandarin' seems implausible, as the term seems attestable in running Mandarin text; the presence of Latin characters alone does not exclude the phrase from Mandarin, as then also "AA制" and "T恤" would be no Mandarin. As regards "Planck常数" vs "Москва", "Москва" can be claimed to be Russian embedded in English and is borderline-attestable in Usenet, none of which holds for "Planck常数", which seems attestable in Google books. As regards my motivation described above as "doggedness in this issue", that again is of no one's concern and has no bearing on the correctness of my arguments, and thus, again, is a fallacy of irrelevance. I do not see why my stubborn attempt to defend CFI and lexicographical descriptivism is "doggedness", while the stubborn attempt to import lexicographical prescriptivism into English Wiktionary (most conspicuously documented in one of the responses of the anon 60.240.101.246 above in this thread) should be considered non-dogged or reasonable. I also don't see why your repeated responses, the last of which mostly ignores points made in my post, should be considered non-dogged; you had the option of not butting in in the conversation between me and 60.240.101.246, now disclosed as a marked prescriptivist who wants to protect the purity of language. In any case, "have an agenda", "madness", "doggedness", and similar non-concerns are best avoided in the discussion. --Dan Polansky 07:04, 9 October 2011 (UTC)[reply]
I speak for myself, of course. Москва was not deleted; someone looking it up will still find the word. As a practical matter, you haven't changed the entry at all for users. If Planck常数 does not fit under any of our existing language headers, then we need to create one that it does fit under.
I disagree hugely on your reading of WT:CFI. Even ignoring my argument that the whole "Attestation vs. the slippery slope" section is informative, not prescriptive, it starts "This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form." ittyshay, looking at Google Groups, is fair game. We don't use the Web for attestable materials, and the first dozen pages on Google Books and Groups for "natsukashii" don't show many examples that are clearly uses and clearly English.
"it is unreasonable to argue that attestation in a given language context alone makes a term that language"? What? Can you offer a general rule to figure out what language a word is from then? I was going on the general rule of thumb that "Platonic" was English because it's always used by English speakers in English sentences, but apparently that's not good enough.--Prosfilaes 07:30, 9 October 2011 (UTC)[reply]
──────────────────────────────────────────────────────────────────────────────────────────────────── (after edit conflict) @Prosfilaes: I would argue Москва is less intelligible than pemoline (or is not intelligible) because it isn't in the same script as English readers know. One of my English friends, after living in Kyiv for a year, told me how carefully he had guarded slips of paper with the addresses of his destinations on them, because he could read no Cyrillic and had no way other than those slips of paper to tell taxi drivers where he wanted to go. In contrast, if his destination had been "Pemoline-Ironbark Station on Votator Street", he could have lost the paper and still pronounced for the taxi driver where he wished to go, even if he had no idea what the words meant. However, I appreciate your underlying argument about the difference between Москва and Planck常数, even if I don't entirely agree with it. I trust you wouldn't object to usage notes in a [[Planck常数]] entry explaining how it was nonstandard, proscribed? If you wouldn't, I'm trying to advance soft redirects (after others suggested them) as a compromise to banning mixed-script entries, precisely so as to address that concern, which you and Dan and Lmaltier express well.
@Eirikr: what would you think of soft redirects, like this? You're warm to them? I want to know if this is compromise is acceptable to more people than a ban is.
If it is... we should all stop arguing XP hehe. - -sche (discuss) 05:39, 9 October 2011 (UTC)[reply]
I'm okay with soft redirects as a general idea, provided the sinophone editors are on board (since they're more the ones to say about Chinese entries anyway :). As an interesting wrinkle, google:allintext:+"planck常数"+は shows some use in Japanese contexts, albeit only seven hits, which I haven't gone through to evaluate. -- Eiríkr Útlendi | Tala við mig 06:35, 9 October 2011 (UTC)[reply]
Historically, there may have been a few English speakers who only knew the w:Deseret alphabet. (Okay, historically they're probably outnumbered by the number of bilingual English/Russian children who only knew the Cyrillic alphabet.) I would find usage notes on Planck常数 almost essential. As a compromise, I'd have no complaint with soft redirects.--Prosfilaes 07:30, 9 October 2011 (UTC)[reply]
(after edit conflict) I removed the controversial point about SoP. It wasn't the main reason, anyway. I agree that mixing words from English and Mandarin when one is speaking or writing in Mandarin is always brushed off as Chinglish by native speakers, no matter how educated the speaker or writer is, who uses it. I stand by what I said before. Leaving people's or place names untranslated is only used when either the writer or the reader may not be able to read or write that name, no exceptions made for small or rare names. @60.240.101.246 Well, if we had an agreement, it would be deleted immediately but obviously we don't. So we have to go through the vote or decide in favour of a soft redirect option (the latest version suggested by -sche). To me it's obvious that "Planck常数" or "Alzheimer病" are not Mandarin just because Mandarin common words are attached to them but I don't want to argue about this forever, let the vote decide, hopefully the common sense will prevail, not the desire to include everything for which there is some attestation. Honestly, it's tiring. --Anatoli 13:52, 8 October 2011 (UTC)[reply]
Then what are they? I don't care if they're Mandarin or not; they're words. Label them how you will, but don't just delete them because you don't like them.--Prosfilaes 04:33, 9 October 2011 (UTC)[reply]

Italian Wikipedia

The Italian Wikipedia is closed in protest over far going plans by the Italian government that threaten independence see Jcwf 02:51, 5 October 2011 (UTC)[reply]

How does an Italian law apply to a website edited by users from around the globe, even if it is, in Italian? JamesjiaoTC 02:57, 5 October 2011 (UTC)[reply]
One of the many open questions I suppose. I do not know. But if all governments start doing this I do think we are in trouble. Jcwf 03:02, 5 October 2011 (UTC)[reply]
"This proposal, which the Italian Parliament is currently debating". Wait, so it's not actually a law yet? Wikipedia has jumped the gun. The European court of human rights might throw it out, no? As it would seem to contradict the laws protecting freedom of expression. Mglovesfun (talk) 09:15, 5 October 2011 (UTC)[reply]
As Jamesjiao says, who does the law apply to? I thought the Wikipedia server was in the US. Would it only apply to Italian citizens? If so, why only in Italian? And if it doesn't only apply to Italian citizens, and I wrote something on the Italian Wikipedia, could I hypothetically break Italian law and get extradited to Italy? Also, nobody is 'in charge' of the Italian Wikipedia, so if Wikipedia fails to conform with a ruling against it, who gets charged? It does say defamatory statements should not be made, just that any such statements should be removed if requested. So the person making the original statement isn't guilty of anything. I'll look into it. Mglovesfun (talk) 09:19, 5 October 2011 (UTC)[reply]
This seems to be a voluntary action of Italian Wikipedia in protest over a planned law. If this is so, I would like to see the vote on Italian Wikipedia that has lead to this decision. it:W:Pagina_principale now redirects to W:it:Wikipedia:Comunicato_4_ottobre_2011, as does W:it:Portale:Comunità, so we cannot even read a discussion in the community portal that could have lead to that decision. --Dan Polansky 14:24, 5 October 2011 (UTC)[reply]
This seems to be the vote: it:W:Wikipedia:Bar/Discussioni/Comma_29_e_Wikipedia. --Dan Polansky 14:36, 5 October 2011 (UTC)[reply]
...which says the notice is up for but a day (according to Google Translate, anyway).​—msh210 (talk) 15:39, 5 October 2011 (UTC)[reply]
Read on, and the community decided to make it indefinite, with post-notice discussion w:it:Wikipedia:Bar/Discussioni/Sciopero:_il_punto_della_situazione (here). - -sche (discuss) 18:46, 5 October 2011 (UTC)[reply]
The protest action does not seem to affect the mobile version of Italian Wikipedia, so the page is available for reading here: http://it.m.wikipedia.org/wiki/Wikipedia:Bar/Discussioni/Sciopero:_il_punto_della_situazione. --Dan Polansky 06:11, 6 October 2011 (UTC)[reply]

m:Wikimedia Forum#Italian Wikipedia -- Liliana 14:40, 5 October 2011 (UTC)[reply]

Now m:Wikimedia Forum/Italian Wikipedia. —Angr 06:46, 6 October 2011 (UTC)[reply]

biblical quotes as example sentences

Should long-winded biblical quotes be used as example sentences? I'm asking because an anon IP (probably User:123abc) has been adding them to many different Mandarin entries (e.g. 肚子, 一切, 什么, etc), but none of them are really practical for learners nor really relevant to the words themselves. Do we have a policy on example sentences? It's also possible that the translations are copyright. ---> Tooironic 03:34, 5 October 2011 (UTC)[reply]

Yes, it's abc123/Engirst. I have wikified and fixed his examples in 肚子. He just copies them from one entry to another. The other issue is that only simplified is given (if the entry is for both) and the traditional version is out of synch 甚麼 or 什麼. It's not answering your question but I wanted to mention this as well. --Anatoli 04:01, 5 October 2011 (UTC)[reply]

His contributions (Special:Contributions/2.25.214.61) are also discussed here. --Anatoli 04:06, 5 October 2011 (UTC)[reply]

(after an edit conflict) There is nothing wrong with quoting the Bible to illustrate the use of a word, or to attest to its existence; I've added lines from the Bible to entries. We prefer sentences which illustrate the usage of a word well, and therefore we shorten overly long sentences by using ellipses and move sentences that do not illustrate words' usage well to the Citations namespace, but we generally do not remove accurate quotations of literature, because these attest to the existence of the word. The sentences in the entries you link to fail to acknowledge their source, however, which is indeed a copyright/credit issue. The sentences also fail to bold the portion of the English translation that corresponds to the Chinese headword. I would remove the sentence in 一切 because it is fails to acknowledge its source and is badly formatted and opaque; if the source were added, I would just format it correctly and move it to the Citations namespace (because it is still not good as an illustration of the use of the word). I will try to format and source the sentence in 肚子, and shorten it to "你必用肚子行走,終身吃土", because that is a good example sentence. - -sche (discuss) 04:11, 5 October 2011 (UTC)[reply]
I think it is helpful and good to wikify/linkify the individual words in Chinese example sentences, because it is otherwise unclear where the word-separations are, but I think it is our policy not to linkify any words in example sentences. (Do we make an exception for Chinese? That would be fine by me.) - -sche (discuss) 04:13, 5 October 2011 (UTC)[reply]
I think it is our policy not to linkify any words in example sentences. Oh, I didn't know that! If that's true I wasted my time but I'll seek confirmation. I think it's very useful too (like Wikibooks) and you can also see what's missing. The word forms could link to lemmas. I'll just wait for others to comment on the quotes. --Anatoli 04:18, 5 October 2011 (UTC)[reply]

All of these overly promotional or propaganda-like quotes should go. Adding a few quotes from Bible is fine, but adding tons of content from the Bible to entries which barely have any citations is unacceptable. 60.240.101.246 07:33, 5 October 2011 (UTC)[reply]

I disagree, if an entry barely has any citations, what it needs is citations! Removing the only current citation seems perfectly counterproductive. We use Bible quote for other languages, notably English and Hebrew. Can't we just treat the Bible like any other book? I'd be happy for Qu'ran, Torah etc. quotes to be used as well. Anyway, these are citations not example sentences; an example sentence is 'made up' for convenience, as it's quicker than finding an actual quote. Mglovesfun (talk) 09:02, 5 October 2011 (UTC)[reply]
How about using Qur'an for common English or Japanese entries, like "all", "what", "belly" (using translations of course)? It'll be weird, wouldn't it. I'm happy with Buddhism quotations - that's something acceptable and deeply ingrained in Chinese culture. But Christianity quotations? No. And those sentences (e.g. 一切, 肚子) - they are not how Chinese sentences are normally constructed. They just sound so - "preachy". 60.240.101.246 09:47, 5 October 2011 (UTC)[reply]
I'm not saying not to replace the citations with other, better citations. Mglovesfun (talk) 09:57, 5 October 2011 (UTC)[reply]
As Mglovesfun said: entries without many citations are the ones that need citations! Citations which do not show normal, fluent sentence construction should be moved to the citations page, though. - -sche (discuss) 18:54, 5 October 2011 (UTC)[reply]
I have only seen quotes from Genesis, which would make them equally Hebrew and Christian, but that is not the point. I am not sure what the problem is here, it almost seems like we are looking for reasons to get mad at Engrist. I have used quotes from the Bible (Old and New Testaments) and have not heard a thing about it, it is a seminal text in Hebrew, Greek, Italian and English. I do understand that it doesn't have the same cultural weight in Chinese, but that doesn't have any bearing. If you said "these are bad usexes because they don't accurately or readily convey the usage of the word" I would be on board. As it is it seems more like you are either against Engrist or against the Bible and neither of those stances make a compelling argument. — This unsigned comment was added by TheDaveRoss (talkcontribs).
It's quite obvious that Engirst is here to preach and to promote Pinyinisation of Chinese. Do we really want to see all basic Mandarin entries accompanied by nothing but one or more quotes from the Bible, and the Chinese category dominated by Pinyin not character entries? It's madness really. I'm sure if I were a user who adds uninterruptedly advertising quotations, or a user who constantly writes Chinese Communist Party propaganda by adding English-language quotes from the official PRC press, I would have been banned instantly. There really is no difference. What's more - e.g. in the two quotes added to 一切, both have errors in their Pinyin somewhere. 60.240.101.246 10:53, 5 October 2011 (UTC)[reply]
Since pinyin entries are valid, there's no need to 'promote' then, no more than Russian in Cyrillic script needs 'promoting'. Mglovesfun (talk) 11:02, 5 October 2011 (UTC)[reply]
Pinyin IS promoted and preached by him and by some other people. Please read this site]. I agree with standardisation movement but not with the replacement of Chinese chracters with pinyin. Mao planned this too. A few Westerners took it literally, including the owner of pinyin.info and Engirst. Some of the material on the site caused outrage by Chinese people. Anyway, this transition is not happening and writing purely in pinyin is only used in educational purposes but we may get into situation when we have more pinyin than Chinese characters. --Anatoli 05:15, 6 October 2011 (UTC)[reply]
Pinyin entries are at present allowed iff the corresponding character entry exists and no quotations should be included in Pinyin entries (Wiktionary:Votes/2011-07/Pinyin entries). Both rules were made to control the Pinyin enthusiasm of Engirst, but neither rule is obeyed by him [1][2][3]. 60.240.101.246 11:13, 5 October 2011 (UTC)[reply]
I personally delete pinyin when there's no corresponding traditional or simplified. I mentioned this on Wiktionary talk:About Sinitic languages but nobody's supported me as of yet. Mglovesfun (talk) 13:05, 5 October 2011 (UTC)[reply]
I did express my weak or tentative support (Sounds like a reasonable suggestion...), read my reply @03:11, 5 October 2011. We only check randomly the pinyin entries, many wouldn't SoP by any standards and we wouldn't create Mandarin entries to match. There are so many of them, he could have spent more time creating the matching hanzi. On the other hand, toned pinyin entries can be good if they are correct, follow the rules, we may not catch up fast enough on creating Mandarin entries, besides, I do a lot of translations, many of them are red-linked, anyway. E.g. qíshǒu is missing at the moment, we don't have 騎手骑手 (qíshǒu) and 骑手 (qíshǒu) (rider, horseman) yet but there is nothing wrong with the term. Not to sound like we are "bullying" him, perhaps the pinyin editor should be invited to the discussion. --Anatoli 05:05, 6 October 2011 (UTC)[reply]
Does 123abc speak much Mandarin? I think if he were a native speaker he'd be able to write in Chinese characters and also I hope would make fewer mistakes in pinyin. The thing is he's immune to blocks, you can block him as much you as like and he just comes back with a new IP address. He's put himself above the rules. Mglovesfun (talk) 19:29, 6 October 2011 (UTC)[reply]
If folks have identified his (assuming this user is male) ISP, it's just a matter of blocking everything from that ISP. Or possibly contacting that ISP and getting the user banned at that level. This single user's disruptiveness is wasting a considerable amount of time and energy, so much that I'm beginning to think that losing the potential contributions of other anons by blocking the whole ISP's block would be more than offset by the actual savings made by getting rid of this one user.
Unless they can somehow be persuaded to change... except they seem immune to any attempt at two-way communication. < sigh. > -- Eiríkr Útlendi | Tala við mig 19:55, 6 October 2011 (UTC)[reply]

I just checked a couple entries on my watchlist that IP user Special:Contributions/2.25.212.57 edited today, specifically and . Both edits added biblical usexes that didn't actually show very clearly how the word in question is used, so I reverted both edits. Looking at this user's contributions shows what can only be described as a crapflood. Would someone please block this IP? The time to assume good faith is long since past. -- Eiríkr Útlendi | Tala við mig 20:16, 6 October 2011 (UTC)[reply]

  • My instinct is that having extensive Biblical quotations in many Mandarin entries is a poor idea. What remains unclear is whether to have a Biblical quotation in a Mandarin entry is better than to have no quotation at all. If someone starts removing these Biblical quotations, I do not think I will object. --Dan Polansky 21:04, 6 October 2011 (UTC)[reply]
I don't object to the idea of including biblical quotes. To me they are just example sentences. However, what I do have a problem with is the fact Engirst is not attaching such a quote to an existing definition (in the cases I've seen - ); as a result, it renders the effort meaningless as users will likely be more confused than enlightened. JamesjiaoTC 21:51, 6 October 2011 (UTC)[reply]
Right. The quotations are sometimes OK, and sometimes great examples of the figurative usage of terms (presuming other Mandarin texts use them figuratively, not just the Bible), but other times they are not good illustrations, and should be moved to the Citations: page for that reason (if sourced). Quotations should be removed if unsourced, as incompatible with the GDFL (because they appear to be quotations created by the user and released under the GDFL, but are in fact quotations created by another person and possibly not released under such a licence). Other times there is no definition and adding incorrectly-formatted quotations is unhelpful. - -sche (discuss) 21:57, 6 October 2011 (UTC)[reply]
I just went through a slew of them out of curiosity -- most didn't clearly show the word in question, most appeared to be copy-pasta of the same few quotes, and many were for words where the entry doesn't even have a def and the usex doesn't really provide one either. What a waste of time and effort. -- Eiríkr Útlendi | Tala við mig 06:44, 7 October 2011 (UTC)[reply]
abc123/Engirst strikes again as Special:Contributions/2.27.72.125 with his biblical examples. --Anatoli 01:17, 7 October 2011 (UTC)[reply]
More biblical examples by a fresh IP address Special:Contributions/2.27.73.100. --Anatoli 00:35, 10 October 2011 (UTC)[reply]

Japanese and Korean affixes

Japanese Wiktionary hasn’t been using a hyphen for Japanese affixes, and they decided officially not to use it (→ ja:Wiktionary:編集室/2011年Q3#日本語の接頭辞・接尾辞). Korean Wiktionary has already decided not to use a hyphen for Korean affixes either (→ ko:위키낱말사전:자유게시판#접두사 및 접미사 and ko:위키낱말사전:자유게시판/2010-12#접미사에 하이픈을).

The affixes with a hyphen in the following categories must be renamed, except the ones written with Latin letters.

Although page names must follow the rule strictly for the sake of interwiki links, entry names in a page can have a hyphen. — TAKASUGI Shinji (talk) 04:29, 5 October 2011 (UTC)[reply]

Note: we also have the option (if our Japanese and Korean editors prefer to include the hyphens in the page titles) of creating unhyphenated pages as redirects, and asking the Japanese and Korean Wiktionaries to create hyphenated versions as redirects. This is how en.Wikt and de.Wikt (which use l') link to and from fr.Wikt (which uses l’). - -sche (discuss) 05:03, 5 October 2011 (UTC)[reply]
FWIW, I prefer hyphenless headwords, but that might just be me.  :) -- Eiríkr Útlendi | Tala við mig 05:10, 5 October 2011 (UTC)[reply]
I have no preference. - -sche (discuss) 05:23, 5 October 2011 (UTC)[reply]
I have no preference either, and I made most of them and hyphenated a lot of them. I can go back and make the necessary changes if we decide to go without hyphens. That's cool. The other languages linked to Category:Japanese suffixes, such as fr:Catégorie:Suffixes_en_japonais, have no hyphens. Only English. There's only one complication I can think of--Template:suffix and Template:prefix automatically add hyphens. Redirect from [[-affix]] to [[-affix]]? Stop using them? Anyway let's vote at Wiktionary:About Japanese. Another newbie like me might make the same mistake. Haplogy 05:32, 5 October 2011 (UTC)[reply]
As you know, I’m talking only about Japanese and Korean affixes. Japanese and Korean Wiktionarians use a hyphen for affixes in languages written in latin alphabet, just like English Wiktionarians. — TAKASUGI Shinji (talk) 05:50, 5 October 2011 (UTC)[reply]
We could add a nohyphen= parameter to {{suffix}} et al, or create {{ja-suffix}} etc. - -sche (discuss) 05:53, 5 October 2011 (UTC)[reply]
{{suffix}} already has a language switch, we could easily add ja and ko to it. Mglovesfun (talk) 11:04, 5 October 2011 (UTC)[reply]
{{suffixcat}} would need to be changed as well. —CodeCat 11:14, 5 October 2011 (UTC)[reply]

So it looks like the consensus is Japanese and Korean affixes should have no hyphens. Wiktionary:About Japanese does not address the issue, so are there any objections to making a vote to add a section called Affixes with this information? The page says any changes must be put to a vote so I guess I can't just change it myself without a vote. I assume this means that counters should not be hyphenated as well. AJA is unclear--change that too? Haplogy 13:36, 5 October 2011 (UTC)[reply]

I'd suggest that no vote is needed, since everyone seems to agree. Mglovesfun (talk) 13:41, 5 October 2011 (UTC)[reply]
In that case, I'd like to make that change if there are no objections. Please take a look and change the wording or whatnot if necessary. The sole alterations in extant text are that I removed the hyphen from " e.g., -" under Counter word (助数詞) and added "Do not use a hyphen" to Counter word, and changed Counter word to Counter words since every other POS header is plural there. @Dan: The argument on the Japanese beer parlour is mainly that hyphens are not customarily used in Japanese, and that other languages should follow suit for consistency. If there was anything else I didn't get it, but consistency is good enough for me. Haplogy 15:18, 5 October 2011 (UTC)[reply]
Things were more complicated: counter words are traditionally classified as suffixes in Japanese but as nouns in Korean, even though they function quite similarly. Now we don’t have to show the disagreement. — TAKASUGI Shinji (talk) 00:43, 7 October 2011 (UTC)[reply]

What is the reason to use no hyphens for Japanese and Korean affixes, while we customarily use hyphens for English affixes? I speak no Japanese, so I cannot read any rationale provided in the Japanese Wiktionary. --Dan Polansky 14:09, 5 October 2011 (UTC)[reply]

I urge that no action be taken yet: The 'consensus' referred to above is over the span of but half a day! My initial instinct is that ja and ko be treated the same as en, but I await an answer to Dan's question.​—msh210 (talk) 15:53, 5 October 2011 (UTC)[reply]
Haplogy added an answer above to Dan's question, but I'll chime in too and note that Japanese does not use hyphens at all -- those *exceedingly* rare situations where I've seen a hyphen used in Japanese text, it was used precisely because it looks unusual and out of place. No monolingual Japanese dictionary that I've ever seen uses hyphens. Bilingual dictionaries that I've seen appear to be a bit more varied, suggesting no hard-and-fast convention but rather editor preferences. My gut instinct is to follow the JA WT decision, partly for consistency and partly from my perspective that hyphens in Japanese just seem wrong somehow. -- HTH, Eiríkr Útlendi | Tala við mig 16:32, 5 October 2011 (UTC)[reply]
I don't really support either option over the other one at this point, but if I had to defend the support of hyphens of Japanese affixes in English Wiktionary, it would be thus: The use of a hyphen before or after a term is immediately understood by English speakers as indicating an affix. Thus, it makes sense to use hyphens with Japanese affixes in English Wiktionary, even if Japanese Wiktionary decides not to use them. A notable feature of the decision of Japanese Wiktionary (ja:Wiktionary:編集室/2011年Q3#日本語の接頭辞・接尾辞) is that only two people voted in support (Mtodo, and Goat), with presumably TAKASUGI Shinji (talkcontribs) having proposed the whole thing and thus implicitly having voted in support, making up only three people in total. --Dan Polansky 17:01, 5 October 2011 (UTC)[reply]
Dan makes a good point here -- EN WT is targeted at readers of English, something some of us (myself included) occasionally lose sight of when getting our heads deep into our other languages. Consider me back on the fence for now regarding this issue. -- Eiríkr Útlendi | Tala við mig 17:09, 5 October 2011 (UTC)[reply]
Before I started working on affixes, most of them did not have hyphens, and that leads me to think that I have been the only person to use them with Japanese. Eirikr and I are the only particularly active editors right now in Japanese that I know of, and Eirikr is more knowledgeable than me, so I thought that was consensus enough. I've already changed AJA, too early it seems. I noticed that Goat cited EN WT's decision to delete トランス-[4], but it was deleted for a completely unrelated reason. By the way Category:Mandarin_suffixes uses hyphens most of the time. Hyphens or none are both okay by me, but not half and half as they are right now. Haplogy 18:04, 5 October 2011 (UTC)[reply]
Just for clarification, I didn’t propose to stop using a hyphen; it was already a de facto rule not to use it. I just proposed to make it official on Japanese Wiktionary. I don’t think the number of voters matters a lot. — TAKASUGI Shinji (talk) 00:14, 7 October 2011 (UTC)[reply]
In light of Eiríkr Útlendi's comments, I lean towards deleting the hyphens (from entries and headwords), so that users of Wiktionary have the correct impression that hyphens are not used in Japanese. We could use etymology sections or usage notes to note that the prefixes etc are prefixes etc, like this. - -sche (discuss) 19:07, 5 October 2011 (UTC)[reply]
Is there some alternative to a hyphen that would make sense? One comment above is that the hyphen "looks wrong" in Japanese. Would U+FF0D FULLWIDTH HYPHEN-MINUS look better; for example, 重- instead of 重-? (This is similar what we do with Hebrew, using e.g. ל־ instead of ל-, though we keep the latter as a redirect.) —RuakhTALK 19:45, 5 October 2011 (UTC)[reply]
Hmm, my comment about hyphens looking wrong in Japanese is simply because no one in Japanese uses them. They look about as out of place as using Japanese punctuation in English would look、 a bit like this 「sample」 here。  :) I don't think using these different types of hyphen fixes the "wrongness", simply because they're still hyphens, and still look out of place in a Japanese context. -- Eiríkr Útlendi | Tala við mig 21:26, 5 October 2011 (UTC)[reply]

I agree that Japanese and Korean affixes should not bear hyphens. The same should be applied to Chinese affixes as well. (btw, there are many more practices in the Japanese Wiktionary which are potentially beneficial here. They disallow romaji, pinyin, or any other romanisation entry; combines Chinese into one header; writes wago with kana and kango with kanji, etc. Their entries do look a lot clearer than ours: ja:字, ) 60.240.101.246 20:23, 5 October 2011 (UTC)[reply]

It bears noting that JA WT doesn't need to use romaji because they can safely assume that everyone using JA WT already knows at least kana. We can't make that same assumption here on EN WT with regard to kana, kanji, hanzi, Devanagari, Hebrew, Khmer, what-have-you.
Whether we should allow or encourage the creation of Latin-alphabet entries for languages that traditionally use other writing systems is a different question, but the persistence of many editors suggests that there is a demand for such entries, perhaps in part because of the limitations of the MediaWiki software. For instance, I may know that Hindi and Urdu for formal second-person plural is āp, but if I don't know how to write this using the Nastaliq or Devanagari scripts and can only search for the Latin-alphabet rendering, I am instead directed automatically to a page about Tocharian A and B, with no hint that the pages for آپ (āp) or आप (āp) even exist. Similarly, if I know that the Mandarin for stone is pronounced shí but I don't know how to input , a search for shí would show me just Irish and Navajo, leaving me confused and frustrated, were it not for the editor(s) who added the romanized Mandarin entry to that page.
Until such serious usability shortcomings are addressed, Latin-alphabet renderings are an easy workaround. -- Cheers, Eiríkr Útlendi | Tala við mig 21:26, 5 October 2011 (UTC)[reply]

Another complication- so we are going with no hyphens. But is that no hyphens in romaji too? For example, for the suffix do we have {{ja-pos|k|suffix|hira=かい|rom=kai}} or {{ja-pos|k|suffix|hira=かい|rom=-kai}}? User TAKASUGI seems to think that Japanese character pages (kanji and kana) should not have a hyphen but Roman character pages should. TIA Haplogy 04:54, 7 October 2011 (UTC)[reply]

I just think they are separate. My understanding is that the use of hyphens is not language-dependent but character-dependent, like spaces, which Japanese don’t use when they write with kanji and kana but they use when they write with Latin letters. Anyway the community should decide it. — TAKASUGI Shinji (talk) 09:00, 7 October 2011 (UTC)[reply]
I understand now. That makes sense, using hyphens with romaji but not using hyphens with kanji or kana. AJA has been updated to reflect this and most affixes have been updated per policy as well. Haplogy 17:07, 10 October 2011 (UTC)[reply]

Edit tools for search

Would it be possible to have edittools for the search bar? Right now, we can use them to type special characters in entries, but not when they appear in the title of an entry. If I want to create new Gothic or Proto-Germanic entries I first have to edit an existing page, use the edittools to type the name there and then copy it into the search bar. It's not very convenient that way. —CodeCat 11:18, 5 October 2011 (UTC)[reply]

You don't have to use the search bar to create a new entry, though. You can just create a new redlink in your sandbox and then click on it. —Angr 11:36, 5 October 2011 (UTC)[reply]
And that's what I said is inconvenient... —CodeCat 12:35, 5 October 2011 (UTC)[reply]
Well, you said you had to then copy the name into the search bar. Clicking the redlink is slightly less inconvenient than that, but admittedly still more inconvenient than having the edittools right there at the search bar. I just wonder how much clutter that would create, considering the search bar is present on every page, regardless of whether it's being edited or not. —Angr 12:46, 5 October 2011 (UTC)[reply]
Maybe the edit tools could appear in a small menu to the left of the bar, and only appear in a small window below when you click on it? —CodeCat 13:01, 5 October 2011 (UTC)[reply]

We used to have a preferences option for this, but IIRC it broke a while ago with nobody having fixed it to date. -- Liliana 14:39, 5 October 2011 (UTC)[reply]

My preferred edittools character set does not appear under the search box by default for me, but can be made to appear and persist if I select a different character set and then select the one I prefer. It would be nice not to have to bother, but this is just two clicks. I am unsure how long the edittools characters persist. DCDuring TALK 15:57, 5 October 2011 (UTC)[reply]
Apparently it disappears after each save. DCDuring TALK 15:58, 5 October 2011 (UTC)[reply]
Yeah, we really need a way to type special characters in the search bar. I'm thinking a little popout keyboardy thing next to the search bar. Started working on a script at User:Yair rand/keyboards.js. --Yair rand 00:44, 7 October 2011 (UTC)[reply]

Gtroy sockpuppets

Does the community want me (or any other sysop) to continue to block sockpuppets of the permanently blocked User:Gtroy? His latest ID was User:Totallynotfairbro, most of whose contributions seemed reasonable (but he still forgets basic formatting issues from time to time). SemperBlotto 07:22, 7 October 2011 (UTC)[reply]

Not a very useful comment, but I don't know why he needed sockpuppets. He seemed to me to be slowly gaining respect after a bad start, then decided whilst not blocked (though has been blocked since) to create a load of supplementary accounts, even working off two accounts simultaneously. Mglovesfun (talk) 07:27, 7 October 2011 (UTC)[reply]
I suggest we allow him to edit, for now. His pronunciation of beefcake is interesting; many entries are borderline SOP, but... we have RFD for those, and his other pronunciations are OK. - -sche (discuss) 07:38, 7 October 2011 (UTC)[reply]
I suggest when unblock Gtroy (talkcontribs), his primary account, give him a stern warning and indef block if said warning is not sufficiently adhered to! Mglovesfun (talk) 07:49, 7 October 2011 (UTC)[reply]
I agree with Gloves. He's not a perfect editor, but does more help than harm. Also, chasing sockpuppets can last for ever. --Rockpilot 08:03, 7 October 2011 (UTC)[reply]
I agree, I think his entries are mostly quite good, and most of the problems seem to be typos or relating to complex formatting issues. He’s new here and does not know how seriously Wikimedia views legal threats. He should be warned about that. He seems to be allergic to Ric (like Ric was allergic to Razorflame). I don’t really understand this personality friction very well, but I suspect if they knew each other just a little better, they would be friendly. Gtroy takes Ric entirely too seriously. The pronunciation at beefcake, while interesting, is not, I think, very useful and probably should be replaced with a plain vanilla model. —Stephen (Talk) 09:58, 7 October 2011 (UTC)[reply]
Haha just heard this, sounds like the death metal interpretation to me. Yeah, should be deleted. Mglovesfun (talk) 10:08, 7 October 2011 (UTC)[reply]
I agree, unblock and mentor. bd2412 T 20:26, 13 October 2011 (UTC)[reply]
Just to be clear on this, Gtroy = Wonderfool, right? Or at least, Totallynotfairbro = Acdcrocks = Rockpilot = Wonderfool (whether or not Gtroy = Totallynotfairbro). - -sche (discuss) 08:20, 12 October 2011 (UTC)[reply]
Nope, Gtroy appears to be another user entirely. -- Liliana 10:04, 12 October 2011 (UTC)[reply]
I am 99% sure WF doesn't have an American accent, and GT was recording new audio with one, so no, he isn't. Equinox 20:32, 13 October 2011 (UTC)[reply]
My suspicion was aroused because (Rockpilot=Wonderfool) and (Acdcrocks=Totallynotfairbro=?) both nominated words on the WOTDN talk page (Wiktionary_talk:Word_of_the_day/Nominations). I suppose it's simple that Gtroy could have seen WF do it and decided it was a good idea (since neither could edit the semi-protected WT:WOTDN page itself). - -sche (discuss) 21:11, 13 October 2011 (UTC)[reply]
User sent yet another email to info-en wikitionary.org about this block; ticket 2011101210018795. I explained the problem of legal threats to him before but I'm not getting involved in this. I'm going to be blunt and state that en.wiktionary is poorly set up for me to suggest avenues with which to request consideration of an unblock by anyone but the original administrator. There is no {{unblock}} template at all and if you compare MediaWiki:Blockedtext to w:MediaWiki:Blockedtext, it's pathetic. The stated advice to email the OTRS team is not the correct course of action. Your email address is not necessarily monitored by admins at Wiktionary to handle this sort of thing and you leave me no options to provide to the user. Adrignola 03:52, 13 October 2011 (UTC)[reply]
How bout a vote of confidence on whether I should be blocked or not by all the admins that takes everything that both I and Dick have said and done into account with a public comment period?Catch22 09:11, 15 October 2011 (UTC)[reply]
I've updated our MediaWiki:Blockedtext a bit, and recreated the (previously deleted!) {{unblock}} template. - -sche (discuss) 05:57, 13 October 2011 (UTC)[reply]
Thanks. Adrignola 16:09, 13 October 2011 (UTC)[reply]
This is Acdcrocks/Gtroy, I got blocked by Dick again but with no cause, he seems to have willfully ignored this entire discussion and only blocked me for "sockpuppetry" even thought I have not created any new accounts. I would like to maintain the ACDCrocks account and be able to maintain a contributions history and watchlist in one place. I can't place unblock on my talk page as me because I am blocked from editing my own talk page, I can also not e-mail any users as I am blocked from doing that too.71.142.74.66 21:07, 13 October 2011 (UTC)[reply]
  • For the record, I blocked Troy (Gtroy/ACDCrocks) indefinitely because he started making weird legal threats at me. No matter how furious I get with other editors (and it does happen, believe it or not) I never lose my mind enough to threaten them. Troy doesn't handle criticism well, constructive or otherwise, doesn't seem to take direction well, and I don't know if he'll ever quite understand the criteria for inclusion - the sum of parts issue in particular. In my opinion, the pros of letting him stay are outweighed by the cons. The quality of his editing weighed against the content of his apparent character... doesn't inspire me. Say what you will about my personality, but I do kickass work, and I listen when you say "hey you did this wrong asshole". (PS Troy, I wasn't ignoring this topic - I just didn't know it existed. I don't frequent the BP. I tend to have more constructive things to do.) — [Ric Laurent]22:58, 13 October 2011 (UTC)[reply]
I made one such claim after Dick made some very offensive and vulgar insults at me and he continues to use the most uncouth and incendiary rhetoric about me whenever possible. Not much class there. He is using his admin powers despotically and is insincere in his claims of being the victim in this situation. I handle criticism very well, what I didn't do was at first understand how wiktionary differed from wikipedia but I did figure that out over time. And learned a lot from the suggestions of others particularly SemperBlotto and Equinox. Dick's comments here really just show he does not like that items that I have added and instead of taking them to verification and deletion just is justifying blocking me for not harboring his opinion of sum of parts and exclusionary wordview by blocking me for the restraining order comment but from his own narcissistic comment preceding this one its clear to me the his true reason from blocking me was the ulterior motive of disliking my lexicographic style and my person. I think when there is clearly just a personality conflict it should be left the the community to decide what to do, not either of the parties involved.Catch22 09:11, 15 October 2011 (UTC)[reply]
      • I think that is what you don't understand. I did not reoffend. Dick did not consider this community discussion a valid reason for me to be unblocked so he blocked my account again. This discussion seemed to me to decide I should be allowed to stay and just that I should be warned about sockpuppetry and legal threats. I didn't make any more legal threats nor was the original one in any way serious. But since my account was blocked, I could not bring it up here. I could not e-mail anyone but the info@wiktionary e-mail address and they said basically they changed the rules and now they don't take e-mails. They said to place an unblock template on my talkpage which i did. But no response. My IP address got blocked too, so I created a new account so that I could parpiticipate in the discussion and taking into account that your word said I could stay (ACDCrocks) if I did not create any accounts but was blocked anyways because Dick considers this discussion non binding doesn't that make the "I can stay part" null as well, making it pretty fair and logical I would create another account? I didn't deny it, I did not try to be a sockpuppet, I outed myself right here in fact. And I chose catch22 as the handle to show the position I was in. You guys did not meet your end of the bargain that I could stay. And you say I was disruptive and harassing, but I wasn't, look through the edits, I did not conflict with anyone or ever even contact Dick in any way with the ACDC account. I did contact everyone involved in this discussion to get to the bottom of it and participate in the discussion. I also take offense to the fact that you think me quoting other users that Dick "sucked" when we first started here was any sort of insult. It wasn't. I was trying to relate to him. In any case how could you consider that an insult when he says things like "People who suck dick are troublemarkers" when talking about me. That is outright vile in comparison. So in closing, are you as good as your word? And did I reoffend? I don't think so. I was as transparent as could be.

??Anybody at all?71.142.74.66 17:48, 18 October 2011 (UTC)[reply]

    • This is really quite ridiculous. So far as I can tell, there has been only one offense mentioned and proven, and it was a small offense of a nonserious nature that only required a warning and explanation, and a retraction by Troy. The warning and explanation were given, the retraction was made and an apology. The other accusations (like "started making weird legal threats" = multiple threats...no evidence of multiple threats has been shown) and "you reoffended" (again, no evidence or explanation of what the "offense" was). Those who are blocking Troy are bullying him and abusing your administrative powers; you are not giving him or us a reasonable explanation of your actions; and you are not allowing him any measure of due process. —Stephen (Talk) 18:03, 18 October 2011 (UTC)[reply]

Twice-borrowed terms

We have categories for twice-borrowed terms, which are words that were borrowed into another language and then later borrowed back from that language into the language it originated from. I've been adding Dutch words to this category but there is a question I have. At what point can you consider something 'the same language'? I would consider Frankish (the source of many French words) a form of Dutch, so any of the French words of Frankish origin that were borrowed into Dutch later would be twice-borrowed terms. But is a word that was borrowed from Old Norse into Norman French and then from French into modern Norwegian a twice-borrowed term? What about words that were borrowed from Proto-Germanic into Latin and then from Old French into Middle English? —CodeCat 12:52, 7 October 2011 (UTC)[reply]

Just a simple comment: The Wikipedia article w:Reborrowing explicitly has an example of a term that is twice-borrowed because its derivations are "Old Norse → English → Swedish". --Daniel 19:19, 18 October 2011 (UTC)[reply]

Usability of translation tables

Translation tables are currently actual tables in HTML, but they don't actually contain tabular data. The two-column layout is nice for people with wide screens, but for those who have less width available it's not really convenient. I also noticed that the 'mobile view' feature still shows the translations in two columns, like here. This is obviously less than ideal for people using mobile phones. I'm not quite sure how this could be improved, but I would like there to be at least some kind of option to show the translations in one column (and in <div> if possible). —CodeCat 13:13, 7 October 2011 (UTC)[reply]

Hmm, yes, it's been a while since I've messed around with CSS and such, but isn't there some way of specifying the minimum and maximum widths of a display element? Would it be possible to rework things like {{translations}} and {{der-top}} to allow for dynamically resizing these lists into however many columns fit best on the user's screen? -- Suddenly feeling the urge to break open my HTML references, Eiríkr Útlendi | Tala við mig 16:47, 7 October 2011 (UTC)[reply]

...-based pidgins or creole languages

Look at the beginning of Category:Pidgins and creole languages and you'll see what I mean. I never put a high value in these "pidgin/creole by source language" categories, and this proves excellently why they're pointless - some languages have so many conceivable sources that you can put them in five or six of these categories. (heck, Category:Gullah language has 15 source categories!) Therefore, I propose to delete them, and no longer categorize creoles by source languages. -- Liliana 16:33, 8 October 2011 (UTC)[reply]

I think that we should only categorize by superstratum languages. Creoles have so many substratum languages that it's very hard to identify them all. —Internoob 21:25, 10 October 2011 (UTC)[reply]

Vote on banning Latin-containing Mandarin

Some thoughts on Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters, in a separate thread.

The vote seems to be a response to the reckless activity of 123abc (talkcontribs) aka Engirst (talkcontribs). The vote seems unneeded to me, going overboard. The reckless activity of the user can be checked by changing the RFV procedure for Mandarin terms containg Latin as follows:

  • A term that contains Latin letters and is marked as "Mandarin" can be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations.

This would be a change of procedure rather than definition of what is included in Wiktionary, a change concerning only a well-defined subset of would-be Mandarin terms, many of which are unlikely to be attestable. The process simplification would be major: instead of sending terms created by Engirst to RFV one by one, admins could speedy delete such terms. The only place in which the citations would be collected for these entries would be citations namespace, so the mainspace entry could remain deleted until the citations are provided. --Dan Polansky 07:53, 9 October 2011 (UTC)[reply]

This appears to be a good idea. Introducing exceptions to the CFI rules is unwise, because it makes things more complex (see KISS principle) and less neutral. Banning a whole class of entries because of a user is very unwise (it's like closing the project because of vandalism). Also don't forget that users specializing in some categories of entries help very much, especially when they specialize in uncommon terms, less likely to be addressed by other editors.
I would propose the same procedure change for all infinite series such as numbers or the like. Lmaltier 08:10, 9 October 2011 (UTC)[reply]
I don't think you should delete 卡拉OK. Fugyoo 08:21, 9 October 2011 (UTC)[reply]
卡拉OK would be kept as soon as it would be attested in Citations:卡拉OK. Attesting those few Latin-containing Mandarin entries that we already have and are genuinely attestable should be a manageable amount of work, don't you think? --Dan Polansky 08:28, 9 October 2011 (UTC)[reply]
(after 3–4 edit conflicts, haha) We should possibly say "non-Hanzi" in place of "Latin letters" (to exclude Cyrillic, Greek etc), but amending procedure in this way is a good, practical idea. Should we generalise it to all languages? (Ie, speedily delete any mixing of scripts? I can think of arguments in both directions, though the arguments in favour of generalisation are more hypothetical: someone could create a flood of inيظ#English entries.) - -sche (discuss) 08:15, 9 October 2011 (UTC)[reply]
Good point, Fugyoo. Maybe we should just be direct (without a vote to make it any formal part of policy or procedure, just using our common sense) that it is a single editor whose contributions we have reason to doubt, while we would allow a month at RFV for doubtful terms from other editors? We'd use the same common sense to speedily delete any flood of inيظ#English entries. - -sche (discuss) 08:38, 9 October 2011 (UTC)[reply]
I would keep the procedure as narrow and non-generalized as possible, tailored to check Engirst. Thus, I would go for "Latin letters" and for "Mandarin". I would not oppose a generalized procedure, though. A more general procedure needs more testing and is more likely to have unexpected side effects. --Dan Polansky 08:53, 9 October 2011 (UTC)[reply]
You want to ban riemannsche ζ-Funktion? -- Liliana 14:16, 9 October 2011 (UTC)[reply]
No, of course not. English and Mandarin use foreign letters, if they have to. Both α粒子 and α-particle are perfectly OK but when they are transliterated, they are transliterated using native scripts - 阿尔发粒子 (ā'ěrfā lìzǐ) and alpha particle. --Anatoli 23:42, 9 October 2011 (UTC)[reply]
I think Liliana was directing that comment at me, anyway, for asking if we should make the rule apply to all languages. I was only asking, though, and I see the arguments against making it apply to all languages are convincing. - -sche (discuss) 23:53, 9 October 2011 (UTC)[reply]
卡拉OK and OK and a few others will be kept, they are legitimate exceptions and they are common nouns. The vote is about proper nouns, not common nouns. The common noun containing Latin proper nouns in full, in particular Planck常數Planck常数, will not be allowed either. Proper nouns containing Latin or other letters invented by Chinese will be allowed as well. It's all on the page. If we all agree to soft-redirect, there won't be a need for the vote. --Anatoli 09:12, 9 October 2011 (UTC)[reply]
Re: "The vote is about proper nouns, not common nouns": Wrong. From the vote: "This vote only affects proper nouns and common nouns using non-Chinese proper nouns as part of a common noun [...]". "Planck常数" and "Alzheimer病" are common nouns.
Are there any people who oppose having the soft-redirects?
What do you think about the speedy-delete procedure for Latin-containing mixed-script Mandarin terms? --Dan Polansky 09:49, 9 October 2011 (UTC)[reply]
If you reread my comment I mention common nouns containing proper nouns in full - Planck and Alzheimer are proper nouns. We have one person, native Chinese speaker opposing soft-redirects. Speedy-delete procedure? Good idea. Forms like Thames河, London市 should be deleted on sight. If we agree on soft redirects, we have the common and standard Chinese term and somebody insists on having them, they could be converted to soft redirects. This practice should not be encouraged, native Chinese people don't consider them Mandarin. Borrowings are transliterated or translated into Chinese characters, exceptions are abbreviations. --Anatoli 10:10, 9 October 2011 (UTC)[reply]
You would do well to ensure that every sentence you say is true. It is a poor practice to expect me to correct one your sentence from a later sentence. The sentence "The vote is about proper nouns, not common nouns.", ending in fullstop, is false, and you should acknowledge as much.
If the only person who opposes soft-redirects is 60.240.101.246, there is nothing to worry about: he is a self-proclaimed prescriptivist, who wants to protect the purity of language. --Dan Polansky 10:41, 9 October 2011 (UTC)[reply]
I disagree. 60.240.101.246 is a native speaker. It's not prescriptivism, it's common sense. There is no real equivalent of "Alzheimer病" in English I can quote but think of errare humanum est. Is it attested? Yes. Is it used by English speakers and writers? Yes, a lot. Is it English, though? No. You and Engirst are using citations as a weapon to introduce words into Mandarin, which don't belong there. --Anatoli 23:42, 9 October 2011 (UTC)[reply]
As I've said many times in the course of the debate, I don't care what language they're listed under, so long as someone can look them up. You want to rip them out of the dictionary as a whole, which is against the spirit of a multilingual descriptive dictionary. I'll see your errare humanum est and raise you noli illegitimi carborundum. Is it Latin? Certainly not. It doesn't look like English. So should we delete it from our dictionary and screw over all the users who might want to look it up?--Prosfilaes --70.180.206.122 09:29, 10 October 2011 (UTC)[reply]
Of course, errare humanum est' is English. Of course, if it's used in English, it should also get an English section. It's useful, because it's an indication that it's used in English, and for pronunciation (I suspect it's not pronounced the same in English and in French, but I don't know how it's pronounced in English). The most popular French dictionary (Petit Larousse) has a famous section (pink pages) about these foreign phrases used in French. The principle presence of a section for a language if the term is used in the language is a very sound principle (and the only possible principle if we don't want to be subjective). Lmaltier 17:13, 10 October 2011 (UTC)[reply]
I would be careful about generalizing and banning all mixed scripts in all languages. Some modern languages mix scripts as standard practice. Examples include some of the Caucasian languages that prefer to use Latin I instead of Cyrillic Ӏ; Ossetic prefers Latin æ to Cyrillic ӕ; and Chuvash prefers ă/ĕ/ç to Cyrillic ӑ/ӗ/ҫ. I know that some of us think we should force everyone in the world who uses a non-Roman script to adopt the recently devised Unicode Consortium ranges to write their languages, excluding all exceptions, but really, the native speakers and writers of each language do have a right to come to an agreement with each other to use the letters and code points that they decided upon. And in technical usage, it is not uncommon to find terms such as u-bend translated into some non-Roman script languages with the Roman letter u. There are many, many valid exceptions to a rule to ban all mixing of scripts. —Stephen (Talk) 09:48, 9 October 2011 (UTC)[reply]
The vote doesn't concern languages other than Mandarin, especially if it's the norm for these languages to mix scripts. There are valid exceptions in Mandarin (and other languages) as well. 三K黨三K党 (Sān-kèi-dǎng) and 三K党 (Sān-kèi-dǎng) (Ku Klux Klan) are perfect examples of Mandarin proper nouns containing Latin letters. They are Chinese inventions. --Anatoli 10:10, 9 October 2011 (UTC)[reply]
Yes, I know, but -sche suggested making this a blanket ban against all mixing of scripts, asking, "Should we generalise it to all languages?" —Stephen (Talk) 10:21, 9 October 2011 (UTC)[reply]
The vote is complicated as is, no need to generalise. I won't agree to generalisation. I think -sche meant Japanese. There is no current controversy there. The few exceptions are known and no-one is pushing unwanted mixed-script terms. --Anatoli 10:29, 9 October 2011 (UTC)[reply]
Good points. Keep it specific to Chinese (perhaps even use our common sense not to speedily delete but to RFV existing Chinese entries which we know are good but which are not cited, as the vote does say they "can" be deleted, not that they "must" be). - -sche (discuss) 20:21, 9 October 2011 (UTC)[reply]
Why are Banach空间, Banach空間 and Hilbert空间 deleted? They are cited. Please see here, here and here. 2.27.73.100 22:41, 9 October 2011 (UTC)[reply]
Why should we reply to you when you never reply to anyone? Anyway, for others, the only compromise the majority of Chinese speaking editors except for one native speaker - Special:Contributions/60.240.101.246 (he is outright against such entries), could reach is a soft redirect, like this one Planck常数, provided the correct Mandarin entry exists. That, of course excludes, city, park, state, people, whatever names entirely in Roman letters with or without qualifiers, "London#Mandarin" or "London市#Mandarin" will be deleted on sight. As your entries are all bad - no value in them, close to 100%, we may use bulk delete of all your entries, under any IP-address you use for the sanity of Mandarin entries. I don't think there will be strong opposition to expelling you completely and deleting all your "work" in one go. --Anatoli 22:55, 9 October 2011 (UTC)[reply]
@2.27.73.100/Engirst: FYI, citations have to be formatted correctly and placed in the entry or the Citations: page, it isn't enough to link to a Google search. Raw Google results are not acceptable citations, anyway; citations (for any word in any language) must be durably archived, which in practice means you should look on Google Books and Usenet (which you can access via Google Groups, but notice that not all Google Groups are Usenet groups). WT:" tells you how to format a citation of a Book, and you can look at entries like rainburn to see a common format for citing Usenet posts. - -sche (discuss) 23:12, 9 October 2011 (UTC)[reply]
Understanding of what is a good Chinese entry will now differ unfortunataly as we now have 123abc's entries' advocates with no knowledge of Mandarin. "Ohm定律" and "Planck常数" are bad enough but next will be place and personal names in Roman letters - entirely of with place name qualifiers. In 123abc's point of view "London" or "London市" is also a Mandarin word. --Anatoli 22:26, 9 October 2011 (UTC)[reply]
Nah, we'll delete "London" and "London市" (unless someone proves it means something non-SOP); there's been overwhelming consensus on both of those points, because we have London#English already, and because "London市" is sum-of-parts, just like "London city" or "the city of London" would be, except in the narrow, uncommon sense of that term. The "existing Chinese entries which we know are good" I referred to above are entries like "卡拉OK", which Fugyoo brought up. - -sche (discuss) 22:51, 9 October 2011 (UTC)[reply]
Alright, to codify the two ideas we've reached agreement on (soft redirects, and speedy deletion), I will start a Wiktionary:Votes/ page for the soft-redirect policy vote, and then invite everyone to tweak and improve my wording. Dan, would you set up the vote on changing RFV procedure? :) - -sche (discuss) 00:02, 10 October 2011 (UTC)[reply]
Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries. Discuss the vote's wording on the talk page, please, where I ask several questions. - -sche (discuss) 02:49, 10 October 2011 (UTC)[reply]
I answered some questions, renamed to "Mandarin", added some comments and made some changes. We need to describe the criteria for the established and standard Mandarin terms containing Latin, Greek, etc. letters. --Anatoli 03:21, 10 October 2011 (UTC)[reply]

@-sche and vote: I would create a vote for my proposal, but I want to let it sit in Beer parlour a bit longer, so people can comment on it, oppose it, and propose changes in wording. I think the discussion should better sit from 3 to 5 days in BP before I create a vote. An updated proposed wording is this:

A term that contains Latin letters and is marked as "Mandarin" can be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations. Such a term can but does not have to be speedy deleted: each admin can decide to avoid deleting "卡拉OK" in spite of there being no citations in "Citations:卡拉OK".

A deletion summary, which is not part of the vote-to-be, could be this: "Mixed-script Mandarin entry that is not yet attested by quotations in citations namespace; see also WT:Attestation" Anyone please feel free to create a vote if I forget to do so in a couple of days. --Dan Polansky 09:59, 10 October 2011 (UTC)[reply]

I don't like "Such a term can but does not have to be speedy deleted". An admin can choose not to delete any file they want, and this sentence gives no protection for when an admin walks by and does delete 卡拉OK. It provides no guidance and doesn't change the rules at all.--Prosfilaes 13:20, 10 October 2011 (UTC)[reply]
The second sentence merely highlights the use of "can" rather than "should" in the first sentence, as such distinctions get easily overlooked. It emphasizes that a deletion is not a necessary consequence of missing quotations. The second sentence could be dropped, but it seems to me that it makes the first sentence clearer. --Dan Polansky 13:31, 10 October 2011 (UTC)[reply]
I'm not a huge fan of that. I think it better if admins are mop wielders, not deciding whether or not a page is "good enough" to stick around. I also think it provides at best illusionary protection to 卡拉OK; whether that says can or should, an admin can walk by anytime and be fully justified in deleting it. If you want 卡拉OK to stick around, cite it; otherwise accept the fact that your new rule will make it speedyable.--Prosfilaes 13:42, 10 October 2011 (UTC)[reply]
Here's an alternative for you:

A term that contains Latin letters and is marked as "Mandarin" should be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations.

If there's going to be a vote, both alternatives can be offered for consideration. --Dan Polansky 13:51, 10 October 2011 (UTC)[reply]
An example of cited mixed-sript Mandarin entry could be like this Banach空间 (This cited example has been deleted by Anatoli):
Mandarin
Noun

Beer parlour (simplified, Pinyin Banach kōngjiān)

  1. Banach space

2.27.73.173 12:30, 10 October 2011 (UTC)[reply]

Oh you can talk? No, they will be deleted on sight in this format. That's a general consensus. --Anatoli 12:36, 10 October 2011 (UTC)[reply]
Engirst AKA 2.27.73.173, free free to collect three properly formatted quotations at Citations:Banach空间. However, chances are the entry will be restored only days later: you have been evading blocks and showed very little cooperation with other editors, so restoring entries that you have created is no priority for Wiktionary editors. --Dan Polansky 13:43, 10 October 2011 (UTC)[reply]
The entry should be formatted exactly as Planck常数 ("mixed language") - a soft redirect to 普朗克常数 (Pǔlǎngkè chángshù) ("correct term"), Dan Polansky, you and all editors agreed to this. Banach空间 will not be created before 巴拿赫空间 (Bānáhè kōngjiān) exists. Some didn't agree to this condition. We may need to go ahead with the vote - Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries. --Anatoli 22:00, 10 October 2011 (UTC)[reply]
  • Alright, let's go ahead and make "卡拉OK" speedily-deletable. (Admins should use common sense in deciding what to delete, and defer to Mandarin-speaking editors when uncertain, but let's accept for the moment the presumption that they will not.) As Dan said, "卡拉OK would be kept as soon as it would be attested in Citations:卡拉OK. Attesting those few Latin-containing Mandarin entries that we already have and are genuinely attestable should be a manageable amount of work, don't you think?" I'll start citing some of them. - -sche (discuss) 22:42, 10 October 2011 (UTC)[reply]
Yes, we should save genuine "mixed script" Mandarin entries and make a clear distinction between "mixed script" terms and "mixed language" (code-switching). --Anatoli 23:39, 10 October 2011 (UTC)[reply]
An administrator shouldn't uses double standard. Please see here. 2.27.72.128
A revised version of the vote has started: Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries. --Anatoli 22:14, 17 October 2011 (UTC)[reply]

Attestation vs. the slippery slope

I would like again to get the section WT:CFI#Attestation vs. the slippery slope removed from CFI. A previous attempt at Wiktionary:Votes/pl-2011-01/Final_sections_of_the_CFI ended 5:4:0 for deletion.

I argue that the section is needless and misleading.

The section is needless, as, if it gets removed, the following dialogue covers the case:

  • Alice: Adding the entry for the particular term "ttt" will lead to entries for a large number of similar terms. Thus, we should delete "ttt".
  • Bob: That is not a CFI consideration. CFI mandates that a term should be included if it is attested and idiomatic.

Done; no need to list every wrong argument for deletion in CFI.

The section is misleading, as two of its bullet points refer to "common use" and "general use" in contradiction with "Attestation" section, implying that a term in pig Latin should be included only if it "has found its way into common use". My undestanding of how CFI should work is that a term in pig Latin should be included only if it is idiomatic and attestable, regardless of whether it "has found its way into common use".

Do any opposers of the vote find any of this convincing? Are there any new supporters of the removal of the section? --Dan Polansky 08:25, 9 October 2011 (UTC)[reply]

For anyone who would want to respond is a poll-like fashion, in which discussion is of course also welcome, here are some templates: {{subst:support}}, {{subst:oppose}}, {{subst:agree}}. --Dan Polansky 08:34, 9 October 2011 (UTC)[reply]

I supported (and still support its removal) as it's not criteria for inclusion, but rather more of a discussion about what to include and what not to. If anything it's more suited to Wiktionary talk:Criteria for inclusion! Mglovesfun (talk) 11:46, 9 October 2011 (UTC)[reply]

Hello, I propose to merge this template with {{en-noun}}. It automatically displays the plurals and their pronunciations. JackPotte 13:49, 9 October 2011 (UTC)[reply]

We don't indicate the pronunciations of words in the headword line of our entries on en.Wikt, though, we indicate pronunciations in the ===Pronunciations=== section. A very large number of English words have at least two different pronunciations (UK and US); some words have eight or more possible pronunciations (of the singular alone!), like pecan. That would require the headword line to be more a headword paragraph! - -sche (discuss) 20:33, 9 October 2011 (UTC)[reply]
What -sche said. JamesjiaoTC 03:24, 10 October 2011 (UTC)[reply]
Clever thing mind you, it attempts to work out the pronunciation of the plural based on the IPA inputted and it attempts to work out the plural using only the PAGENAME. --Mglovesfun (talk) 12:03, 10 October 2011 (UTC)[reply]

I've just finished fr:Template:fr-accord-rég2. JackPotte 18:51, 15 October 2011 (UTC)[reply]

And fr:Template:es-rég and fr:Template:pt-rég2. JackPotte 21:27, 16 October 2011 (UTC)[reply]

Making 'see also' clearer to users

We use the template {{also}} to show links to other pages that are written with the same letters but with diacritics or capitals. The recent discussion at Wiktionary:Feedback#Prestige shows that this can be very confusing to new users. It has to be added to every page and it's easy to miss a few possibilities or even just to forget to add it. And compared to fr:Prestige, it's just too small and doesn't stand out. It's very easy to miss. The 'see also' text itself isn't really always confusing to users, only if the difference is just capitalisation. When you edit a new page beginning with a capital, like Nonsenseword, the wiki software warns you that the title might not be correct. But there is no warning if the page already exists.So for that reason I think it would be nice if warnings about capitalisation could be automatically added to every page, perhaps even outside the wikitext. —CodeCat 11:59, 10 October 2011 (UTC)[reply]

maybe 'the title of this page is {{PAGENAME}}, see also [] '. --Mglovesfun (talk) 12:01, 10 October 2011 (UTC)[reply]
That isn't really any clearer at all, it just repeats the name of the page. The problem isn't that users don't see the name of the page, it's that they don't understand the significance of the capitalisation. The current system with {{also}} helps somewhat to clarify this, but it's not very obvious to users and it's not used consistently enough either. —CodeCat 12:05, 10 October 2011 (UTC)[reply]
It might help to provide a more visible contrast between say Fish and fish. Mglovesfun (talk) 16:27, 10 October 2011 (UTC)[reply]
I'd like to put something Wikipedia does for ambiguous titles, eg "This entry is a name, for other senses see fish" or "This entry is about a German noun, for other languages see prestige". I think a bot could do it. Fugyoo 22:23, 11 October 2011 (UTC)[reply]
Maybe have {{also}} display "Entries for similar words:" or similar.​—msh210 (talk) 00:44, 12 October 2011 (UTC)[reply]

Why is Banach空间 deleted?

Discussion moved to Talk:Banach空间.

Bot generation of Portuguese verb forms

I have noticed that there are quite few entries for Portuguese verb forms (mainly some forms generated by WF's bot)and currently no bot dealing with their creation. So I have modified my User:BuchmeierBot code in order to be able to deal with Portuguese verb conjugation tables. I would like to generate the forms of verbs, that already have a conjugation table (of course after checking for correctness of the conjugation). Should I start a vote? Matthias Buchmeier 16:10, 10 October 2011 (UTC)[reply]

We shouldn't use double standard

- -sche said: "deleted, per the precedent and discussion of WT:RFD#Москва". pizza#Mandarin is deledted, so OK#Mandarin should be deleted as well. Actually their meaning can be found from English entries. So, they are not necessary. 2.27.73.173 18:56, 10 October 2011 (UTC)[reply]

You're right, we shouldn't. We don't want to have a double standard for you as opposed to anyone else who won't listen to what people say. —CodeCat 19:01, 10 October 2011 (UTC)[reply]
  • I propose this post by a banned user who does not cooperate with Wiktionary editors, rarely answers questions but feels himself entitled to start a new BP dicussion whenever he sees fit, and possibly cannot even read Chinese characters, is left without any further response. --Dan Polansky 19:02, 10 October 2011 (UTC)[reply]
You have NO STANDARDS. That's the problem. You keep inventing stuff AND not being consistent with what you do. You have been blocked again for 3 days for doing this shit: [5]. JamesjiaoTC 21:28, 11 October 2011 (UTC)[reply]
Japanese Romaji used "-" for suffixes as well, please see here. So, you use double standard indeed. Alexando 07:16, 12 October 2011 (UTC)[reply]
Sorry.. you didn't really respond to the 'inconsistency' part. Besides, why are you compareing Mandarin with Japanese again? JamesjiaoTC 21:22, 16 October 2011 (UTC)[reply]

Phrasebook, again

I think Equinox pretty much got it at Talk:I'm transsexual - our phrasebook, as is, is a sick joke. Pretty much half the phrases are about sex (some of them as silly as I'm horny - I mean c'mon, who actually says that?), while actual phrases that you would find in a printed phrasebook (what day is it, can you give me directions, etc.) are curiously absent. This shows it needs some kind of reform, and most importantly a radical pruning. -- Liliana 22:07, 11 October 2011 (UTC)[reply]

Agreed. Having travelled to countries with languages that I speak very little of, I'd say most of the sex-related phrases should be removed, unless you travel for the sole purpose of fornication. JamesjiaoTC 22:14, 11 October 2011 (UTC)[reply]
Being transsexual has nothing at all to do with sex, though. —CodeCat 22:31, 11 October 2011 (UTC)[reply]
is it necessary though? I mean, if you were a real transperson, the last thing you would do is disclosing it to strangers... no? -- Liliana 22:33, 11 October 2011 (UTC)[reply]
Yeah, it's hardly something you'd just drop into a conversation, is it? One that made me laugh earlier was "I'm mute", as though a mute person could actually say it. BigDom (tc) 22:39, 11 October 2011 (UTC)[reply]
I'm illiterate is a good one as well. -- Liliana 22:42, 11 October 2011 (UTC)[reply]
Well, a mute person could write the phrase. - -sche (discuss) 22:44, 11 October 2011 (UTC)[reply]
The pronunciation section is rather pointless though. -- Liliana 22:46, 11 October 2011 (UTC)[reply]
True.. but how hard is it for a mute person to express this notion via body language? I'd bet body language (point at mouth, wave hands) can convey this more swiftly. JamesjiaoTC 22:48, 11 October 2011 (UTC)[reply]
It could be said over the internet? —CodeCat 23:11, 11 October 2011 (UTC)[reply]
Someone who is recognizably foreign might be misunderstood as trying to convey their inability to speak the local language, rather than their inability to speak at all. (But I agree with DCDuring, below. To the extent possible, we shouldn't be asking "Could this be useful?", only "Is this useful?") —RuakhTALK 20:13, 12 October 2011 (UTC)[reply]
Does a phrasebook require more constancy of purpose and contributor discipline than we can sustain? Subtle aspects of policy don't seem sustainable for very long here. We seem be susceptible to anarchism.
If we had users constantly asking us how to say or write phrasebook-type expressions, we could at least focus on meeting the needs of real users. But we have only the vaguest notion of what we are trying to do. Though pruning might be necessary, I doubt that it is the key breakthrough that a phrasebook needs to achieve success at Wiktionary. DCDuring TALK 23:25, 11 October 2011 (UTC)[reply]
Agree to the clean up. Also agree that "I'm mute" is useful. We are are a written dictionary. You can write or print the translation in another language. --Anatoli 06:50, 12 October 2011 (UTC)[reply]
I'll agree as well then, sometimes I think our phrasebook is about who can create the silliest entry without it being deleted. I'm might go with I'm fucked meaning I'm drunk, I'm tired, I'm disabled/crippled, I'm in trouble (etc.). Mglovesfun (talk) 10:47, 12 October 2011 (UTC)[reply]
For the record, Liliana, I say "I'm horny" all the motherfucking time. Frequently there are some qualifiers between the subject/verb and adjective. But yeah. All the time. I'm horny right now, even. — [Ric Laurent]11:25, 14 October 2011 (UTC)[reply]
Do you frequently feel the need to say it in languages that you don't even speak well enough to construct a simple sentence in? —RuakhTALK 13:17, 14 October 2011 (UTC)[reply]
Uh... What kind of question is that? Was it even serious, or just meant as some sort of affront. — [Ric Laurent]00:44, 4 November 2011 (UTC)[reply]
Oh I am so glad you're unable to hit on me. On a more serious note, what about other phrasebooks? I know it's not a CFI rule, but still a good guideline. -- Liliana 19:53, 14 October 2011 (UTC)[reply]
I could hit on you if I wanted. "Unable" is perhaps the wrong word. — [Ric Laurent]00:44, 4 November 2011 (UTC)[reply]

123abc, again???

See http://en.wiktionary.org/wiki/Special:Contributions/Christofo -- it appears that the many-times-banned user is back, now creating hyphenated pinyin entries as if that is a normal thing to do. Please direct attention to this development. 71.66.97.228 01:13, 12 October 2011 (UTC)[reply]

Yes, your diagnosis was correct. Nuked all his entries. --Anatoli 06:46, 12 October 2011 (UTC)[reply]
Japanese Romaji used "-" for suffixes as well, please see here. So, you use double standard indeed. Alexando 07:18, 12 October 2011 (UTC)[reply]
this has been mentioned multiple times before. Japanese entries have nothing to do with Mandarin entries. If there are issues, they need to be treated separately. Obviously you don't listen. So... I am gonna block you again.. this time, I will not allow you to create new accounts. JamesjiaoTC 03:44, 13 October 2011 (UTC)[reply]
He is very good at avoiding all blocks and generating new IP-addresses whenever he wishes. He was blocked multiple times including range blocks. He doesn't have a lot of linguistic or communication skills but he's got that skill.
As for the issue with Japanese, first of all, it's a language policy. If most editors agree to do it one way, it goes, if not, then there's a vote. Japanese editors may be happy to discuss the issues of triplication related to Romaji entries. The Romaji entries usually contain the minimum information, so no one complained and Romaji entries were created ONLY when Kana/Kanji entries were also there. --Anatoli 06:42, 13 October 2011 (UTC)[reply]

User 123abc again, again

Blocked, and still edits?

He's now assiduously adding bible verses (and links) to Mandarin entries. What is wrong with this project that this has happened several dozen times now, over a period of nearly a year? 71.66.97.228 19:54, 12 October 2011 (UTC)[reply]

I guess the problem is that {users with enough knowledge of Chinese to deal with his edits} and {users with enough technical knowledge to deal with his edits} seem to be two mutually exclusive sets. (The overlap of these sets with {users with enough time and patience to deal with his edits} may also be relevant.) Previously, I've tried to address this by starting a vote that would reduce the amount of knowledge of Chinese that was necessary — in fact, my goal was to make the formatting for Mandarin pinyin entries so restrictive that it could be enforced by a bot — but Chinese-speaking editors' responses to the vote, while positive in tone, just left me more confused than ever. So maybe I should work on it from the other angle: trying to reduce the amount of technical knowledge that is necessary, in the hopes that that will enable the Chinese-speaking administrators to cope with his edits better. —RuakhTALK 20:09, 12 October 2011 (UTC)[reply]
The pinyin entries are now as simple as can be, see yánlì, all Category:Mandarin pinyin should be formatted as per Wiktionary:Votes/2011-07/Pinyin entries. If they are automatically created by a bot and the job is good, we should revisit it. The Chinese entries are indeed, a bit complicated, noteably the "rs" value (radical sort for the initial character) but this info is available in Wiktionary. Anatoli 21:47, 12 October 2011 (UTC)[reply]
The main problem is that somebody don't know the function of Pinyin entry especially for learners, but oppose Pinyin just because of don't like Pinyin. For your reference, an good example for make use of Pinyin entry for learners, please see here. Afex 20:41, 12 October 2011 (UTC)[reply]
The problem is your unwillingness to engage in dialogue unless things are going you're way. You're happy to engage in dialogue when people are agreeing with you, and when people stop agreeing with you, you just clam up. Mglovesfun (talk) 20:46, 12 October 2011 (UTC)[reply]
Do you intend in engaging in dialogue? Mglovesfun (talk) 06:26, 13 October 2011 (UTC)[reply]
Why not if you like, but don't block me and try to close my mouth first. Sundy 12:26, 13 October 2011 (UTC)[reply]
If you want to engage in dialog, use Engrist. As it is you are abusing multiple accounts which is against the rules. All other accounts will be indefinitely banned on sight. - [The]DaveRoss 20:13, 13 October 2011 (UTC)[reply]
123abc talked on Mglovesfun's talk page, for the first time I see more than one sentence at a time. --Anatoli 22:37, 13 October 2011 (UTC)[reply]

Colloquialisms and nonstandard terms

Are colloquialisms considered nonstandard terms? My take is that they are not, hence my edit to Template:lexiconcatboiler/colloquialism. --Dan Polansky 10:04, 12 October 2011 (UTC)[reply]

Sort of, I suppose all slang terms, informal terms and colloquial terms nonstandard. Mglovesfun (talk) 10:53, 13 October 2011 (UTC)[reply]

New administrator nomination - User:Haplology and User:Eirikr

Please don't ignore the new nomination - Wiktionary:Votes/sy-2011-10/User:Haplology for admin. He has been very active in Japanese and works quite professionally - Special:Contributions/Haplology.

I also nominated User:Eirikr, another Japanese editor but he is not available at the moment, the vote will start as soon as he accepts it. --Anatoli 06:58, 13 October 2011 (UTC)[reply]

Linking to a particular sense within an entry

Is there any way to link to a particular sense within an entry, rather than to the entire entry? I know how to use the pound sign to link to a section, but for most words (I'm working with Chinese entries) this will only go as far as the section for a particular language, not to the individual senses. I have read about MediaWiki's "subpage" feature, but I don't know if that would work. In particular, I would like to be able to link words in a Wikisource document to the particular sense used in that context. If this is not currently possible, where would I start in proposing this feature, or perhaps in helping to implement it? Craig Baker 20:52, 13 October 2011 (UTC)[reply]

You can use {{senseid}} to link to a particular sense. - [The]DaveRoss 21:10, 13 October 2011 (UTC)[reply]
There is no documentation to {{senseid}}. How does one link to a properly formatted sense ? DCDuring TALK 23:45, 13 October 2011 (UTC)[reply]
Essentially it just sets up a span id which you can then refer to like any other anchor. The formatting is as follows: (taken from peach) # {{senseid|en|fruit}}, the first parameter is the language section and the second parameter is a unique (for the page) gloss which is also the name of the anchor when referring back to the sense. To refer back you include the language and gloss [[peach#English-fruit|peach]] resulting in peach. This certainly should be documented at the template too. - [The]DaveRoss 01:36, 14 October 2011 (UTC)[reply]
I can't get it to work when the gloss contains spaces; for example, neither among: mingling or intermixing nor among: mingling_or_intermixing works. Does anyone know how to do it? If I find out, should I add the senseid doc page, or should I wait for those involved in its development to write docs? Craig Baker 03:51, 19 October 2011 (UTC)[reply]
As no one has stepped up to add the documentation, you might take a run at it. DCDuring TALK 11:31, 19 October 2011 (UTC)[reply]

An idea... wanted languages

We have pages for wanted entries, but so far we're lacking a list that shows which languages are in the most need of improvement. For example, our Old Norse coverage is quite bad given its popularity, and there aren't many Estonian entries either. It would be nice to see at a glance which languages need the most work, so that editors (also potential new editors) can see if their skills would be especially needed on Wiktionary. —CodeCat 21:04, 13 October 2011 (UTC)[reply]

Some kind of easily available statistics per language would also be good. It won't show the quality of entries or translations but some education. Also, it may sound harsh for small languages but what people think about ratings or "languages in bad need of contributions"? Well, we have few entries in Old Norse but how important is it? We also have very little Burmese, Lao, Malay, let alone Sinhalese content. These are state languages with millions of speakers but we don't have very few contributions in these languages. --Anatoli 23:10, 13 October 2011 (UTC)[reply]
Re: "Some kind of easily available statistics per language would also be good": We have Wiktionary:Statistics; and if there's anything that you want that isn't already there, I bet you can convince Conrad to add it. —RuakhTALK 03:10, 14 October 2011 (UTC)[reply]
Thanks for the advice. After posting, I actually found Wiktionary:Statistics. That's useful. --Anatoli 03:41, 14 October 2011 (UTC)[reply]
Why not go ahead and start a draft somewhere? -- Liliana 03:46, 14 October 2011 (UTC)[reply]
Perhaps worth discussing first what we want to achieve. Will a new policy attract new editors? Having a list of languages in need of improvement is a good start or something (better than nothing). Statistics may show only the quantity, not quality.
If the statistics is true for the last year, look at number of entries for some official languages:
  • Sinhalese - 75
  • Malagasy - 84
  • Kazakh - 173
  • Burmese - 134
  • Kyrgyz - 152
  • Malay - 409
  • Lao - 558
Do we need to advertise? --Anatoli 04:15, 14 October 2011 (UTC)[reply]
I adore your Russian bias. Belarusian is very much in need of improvement. -- Liliana 04:17, 14 October 2011 (UTC)[reply]
Not sure whether you were sarcastic, I actually didn't mention Russian or any Slavic languages. Yes, that's right. Belarusian needs improvement but Belarusians themselves do not seem to be worried about their language loss. --Anatoli 04:39, 14 October 2011 (UTC)[reply]
The last sentence is so true. 60.240.101.246 09:42, 16 October 2011 (UTC)[reply]
No comments? I'm throwing in Interlingue then - for a constructed language, its coverage here is poor at best. -- Liliana 09:39, 16 October 2011 (UTC)[reply]
I've created Wiktionary:Languages needing improvement. —CodeCat 13:08, 16 October 2011 (UTC)[reply]
I have added some obvious ones, focusing on state languages. --Anatoli 22:55, 16 October 2011 (UTC)[reply]
It's hard to tell which languages are actually needed. Needed in what way, for what purpose? Russian has many speakers, but only a few English words are loans from Russian. The etymology sections of English entries have more use for French and Latin words (and Old Norse) than for Russian words. But another approach is to ask, for which languages could a recruiting campaign yield good results? In that case, we can compare the ranking of each language in WT:Statistics with its ranking in the list of Wikipedias. The Russian Wikipedia is large and growing fast, and there are many Russian wikipedians, which could be recruited to Wiktionary. There are far fewer active Arabic speaking wikipedians, so trying to recruit among them would be less successful. Of the languages closest to me, Norwegian has the healthiest Wikipedia, far larger than the Danish Wikipedia. And Norwegian is still a very small language in WT:Statistics (smaller than Danish), so recruiting among Norwegian wikipedians could be useful. --LA2 18:01, 17 October 2011 (UTC)[reply]
Quote: "Russian has many speakers, but only a few English words are loans from Russian. The etymology sections of English entries have more use for French and Latin words (and Old Norse) than for Russian words." What does this has to do with wanted languages? --Anatoli 02:53, 19 October 2011 (UTC)[reply]
It has to do with the definition of "wanted" as a measurement of which languages are more wanted than others. If more Russian entries are "wanted", by which definition of "wanted" is that? --LA2 08:39, 19 October 2011 (UTC)[reply]
I never said Russian was wanted more than others. We do have Russian speaking editors but very few in other languages. You're answering a question with a question. So, by your logic, contributions into a language close to English or which gave English more borrowings (German, Dutch, Latin, French?) are more important than those further away from English - Chinese, Hindi, Burmese, Sinhalese? --Anatoli 08:51, 19 October 2011 (UTC)[reply]
I'm just mentioning etymology sections of English words as one possible definition of a want. Q: Why do we need Latin entries? A: Among other things, to explain the etymology of English words. (This means we would focus on those Latin words that have found use in English.) But by that measurement, the want for Russian entries is not so large. To just list the most "wanted" languages is pointless unless we define the want. If the purpose of the list is to recruit new contributors, then we can skip the definition of a want, and instead list languages for which we might successfully recruit new contributors. Russian would be a good candidate, because there are many active Russian wikipedians, who only need a little extra training to become productive in Wiktionary. --LA2 16:47, 19 October 2011 (UTC)[reply]
I'm still puzzled about your idea of a "want for a language" being linked to English somehow - borrowings, etymology of English words, as if other languages only serve the purpose of understanding English words better. Anyway, as it was mentioned above Wiktionary:Statistics shows languages with few contributions or individual categories, like Category:Sinhalese_nouns show how few noun entries Wiktionary has (or other parts of speech). To be fair to all languages - a number of entries alone could be a criterion for "wanted". As discussed on Wiktionary_talk:Languages_needing_improvement, other criteria could be chosen as well - number of speakers (Hindi wins over Cebuano), the status of a language - official (Sinhalese has less native speakers than Kannada in India but it's official (along with Tamil) and it may be more important to reach out to a whole country), co-official (do you really need to know Maori, Irish or Hindi to communicate in New Zealand, Ireland or India?), only spoken and seldom written (Hakka, Min Nan have many speakers but is there enough written material or information broadcast in these languages. As User:CodeCat said, even official languages with a very small population like Marshallese may have lower priority than an unofficial language with a lot of speakers. Is a language or a dialect important for survival in a country? You can't do without Mandarin in China but you can get away without any Indic language in India. Not meaning to lower the status of any language or dialect, it's just some ideas for consideration coming to mind. --Anatoli 21:59, 19 October 2011 (UTC)[reply]

Not the kind of jumper that makes you itches

[6] "He said, I know a little Latin, man a cus man a kai / I said I don't know what it means; he said neither do I". Do any of us know? Sounds more like Greek. Equinox 21:05, 13 October 2011 (UTC)[reply]

Maybe it is w:Manacus, Manacī. —Stephen (Talk) 22:45, 13 October 2011 (UTC)[reply]
I think it's a garbled version of amicus amici. Fugyoo 00:23, 14 October 2011 (UTC)[reply]

Block page spoiled with JavaScript

When you have to block someone, the page now has an extra dropdown box that disappears or reappears depending on your selection. It disappears with a stupid JavaScript "delayed fade" effect. This means you cannot efficiently use the Tab key to move from one UI control to the next. Who makes these retarded decisions? Equinox 21:39, 13 October 2011 (UTC)[reply]

I dunno, the tabbing works pretty O.K. for me. Even if I tab to the control right before it disappears, Firefox remembers my position in the tab order, so if I hit tab again, it moves me to the next field. How does it behave in your browser? —RuakhTALK 01:46, 14 October 2011 (UTC)[reply]
If I tab while the "tabbee" is in mid-fade, the focus apparently vanishes. It could be a problem with Opera, since the focus should certainly never be on an invisible thing, but I can't be sure exactly where the focus is, and anyhow given the general awfulness and incompatibility of browsers you'd hope that stuff like this would be tested thoroughly. My main objection is that the "fading" is purely a cosmetic gimmick, offering nothing useful (modally hiding controls is nasty anyway — why not disable them?), and yet manages to get in the way. Equinox 01:52, 14 October 2011 (UTC)[reply]
I see. What happens if you hit tab after the fade-out? By the way, I think that if — if — you're going to hide controls this way, then the fading is actually a good idea, since it gives the user time to register what's happening. Otherwise they'll just catch that something changed, but they won't understand what. But yeah, I agree with you that it would be better to just disable the control. We can probably override this somehow with site-wide JavaScript, though I don't know if it's a good idea to do so, since I doubt it's intended to be messed with. —RuakhTALK 03:00, 14 October 2011 (UTC)[reply]
Duh! I knew something annoying had happened, but couldn't quite figure out what it was. If it ain't broke, don't fix it! SemperBlotto 07:18, 14 October 2011 (UTC)[reply]
This is still very annoying. Can someone hack a way around it? Equinox 21:47, 25 November 2011 (UTC)[reply]

Brand names and physical products

WT:CFI (WT:BRAND in particular) says this: "A brand name for a physical product should be included if it has entered the lexicon". Some people in RFV (DCDuring, Equinox, and others) have been acting as if the part "for a physical product" were not there, arguing that WT:BRAND is intended to cover banking services, among other things. I have repeatedly argued that, whatever the part of CFI is intended to do, what it actually does is speak only of physical products, which are tangible, space-extended objects with non-zero mass, such as food, clothing, footwear, consumer electronics, and cars, but not software, databases (data collections), books, movies, and the like.

Please, let those who want WT:BRAND to apply to all brand names including "Citibank" and "Lufthansa" create a vote that removes "for a physical product" from CFI's section for brand names. Then the repetitive discussions in RFV are over.

By contrast, I would like to see WT:BRAND removed from CFI. There is IHMO no serious risk of commerical spam relating to inclusion of brand names. Above all, single-word brand names can host interesting lexicographical material, including pronunciation and etymology. --Dan Polansky 07:59, 14 October 2011 (UTC)[reply]

Our entry physical doesn't cover it, but I think there's a difference in two senses of physical here. For example is a table physical in the same way that wind or heat is physical? So a website isn't a 'physical product' like a table is, but it can be considered physical in terms of bits on a server, which correspond to electricity (um, I think, I'll let the experts explain it).
Specifically in response to Dan Polansky, I agree that some products are non-physical. Cartoon characters like Mickey Mouse are non-physical. They may have physical representations (toys, etc.) but are by nature non-physical. It would be nice to clean up WT:BRAND and WT:COMPANY. Mglovesfun (talk) 09:57, 14 October 2011 (UTC)[reply]
Thanks for raising this issue. I agree that editors have been wrongly trying to enforce WT:BRAND's rules for things that are not physical products — just because something has some physical reality, that doesn't make it a "physical product" — but I support resolving the issue removing the "physical product" bit. —RuakhTALK 11:17, 14 October 2011 (UTC)[reply]

Patrolling enhancements now on by default, and now include deletion.

Admins —

I've been bold and made two big changes to the patrolling enhancements. If anyone disagrees with either of them, please either revert, or let me know and I'll revert.

The changes are:

  • A "delete" button is now added for each newly-created page that has not yet been marked as patrolled. A text field also appears at the bottom of the page; whenever you click an edit's "delete" button, the current contents of the text field will be used as the deletion reason (the edit-summary-like message that appears in the deletion log). For example, if you are an administrator who knows Chinese, you can just visit http://en.wiktionary.org/wiki/Special:NewPages?hidepatrolled=1 every day or two, type something like "Engirst cruft" in the text-field, and go to town.
    • The text field looks kind of crappy, and is probably confusing. I welcome any improvements.
    • There's no drop-down to choose one of the predefined deletion reasons at MediaWiki:Deletereason-dropdown. Anyone who's better at UI design than I am, please feel free to add this. :-)
  • The patrolling-enhancement Gadget is now turned on by default for anyone with the "patrol" right.
    • If you dislike it, you can turn it off via Special:Preferences: in the "Gadgets" tab, uncheck "Patrolling enhancements – makes it faster and easier to mark edits as patrolled.".
    • Also, if you dislike it, please comment here. If it turns out that multiple admins dislike it, then we should probably de-defaultize it.
    • Edited to add: Of course, it would be even better if we could improve it so that all admins do like it, if that's possible.

I welcome any questions, comments, suggestions, concerns, threats, . . .

RuakhTALK 14:52, 14 October 2011 (UTC)[reply]

Addendum: By the way, I should have mentioned: the code for the Gadget itself is at MediaWiki:Gadget-PatrollingEnhancements.js. The bit of wikitext that turns it on by default is at MediaWiki:Gadgets-definition. —RuakhTALK 15:29, 18 October 2011 (UTC)[reply]
Thanks!​—msh210 (talk) 18:05, 17 October 2011 (UTC)[reply]
I suspect it just isn't working properly, but shouldn't individual new pages have a 'delete' button next to them, not just a single delete button. Or else how do I know what I'm deleting? A small 'delete' button next to every new page that's also an unpatrolled edit sounds fine to me. But currently, that isn't what this is. Mglovesfun (talk) 15:12, 14 October 2011 (UTC)[reply]
For me, in Firefox 7, in IE 8, and in Chrome, I do have a small "delete" button next to individual new pages. What browser are you using? I can try to debug . . . —RuakhTALK 15:26, 14 October 2011 (UTC)[reply]
I've just tried to delete "super-calli-frage-listic-epi-ali-doctus" with delete reason of "tosh" and I get a message saying that a token must be set. SemperBlotto 15:31, 14 October 2011 (UTC)[reply]
Yup, bug. (Introduced during migration from my personal JS to the Gadget's JS.) I noticed and fixed it a moment ago. Sorry about that. :-/   —RuakhTALK 15:35, 14 October 2011 (UTC)[reply]
I use Firefox. Will clear my caché now to see what the current version is like. Mglovesfun (talk) 15:50, 14 October 2011 (UTC)[reply]
I have two patrol buttons and two delete buttons. Using the deletion summary didn't work, it just displayed the default. Mglovesfun (talk) 15:54, 14 October 2011 (UTC)[reply]
Re: two patrol buttons and two delete buttons: keeping up my string of excessive boldnesses for the morning: http://en.wiktionary.org/w/index.php?title=User:Mglovesfun/vector.js&diff=14073538&oldid=14039687. Re: deletion summary not working: Oops, thanks, you're right, it doesn't work for me anymore, either. It worked yesterday, though, so hopefully it's a quick fix. —RuakhTALK 16:00, 14 October 2011 (UTC)[reply]
O.K., that's working now. Thanks again. :-)   —RuakhTALK 16:08, 14 October 2011 (UTC)[reply]
I just tried the delete button in Firefox and in Opera; it worked in FF and in Opera. :) Is the "mark" button intended to be used in conjunction with another feature? If not, it just seems to allow marking as patrolled with checking, which seems odd. (Nonetheless, it works in both browsers.) - -sche (discuss) 17:03, 14 October 2011 (UTC)[reply]
I'm sorry, I don't understand the question. What do you mean by "marking as patrolled with checking"? :-/   —RuakhTALK 17:43, 14 October 2011 (UTC)[reply]
Oops, I mean "without checking". In the past, I had to click on "diff" and look at the diff to find the "mark as patrolled" button. Now, I could just click "mark" in Recentchanges, without checking the diff to see if it was vandalism or not. Why would I do that...? - -sche (discuss) 18:19, 14 October 2011 (UTC)[reply]
Ah, I see. You're right, of course, but there are a number of cases where it's useful:
  • Whitelisting (whereby the button gets "clicked" automatically when you load the page).
    • A number of pages in the Wiktionary: namespace are whitelisted. These are pages that are so high-traffic that we don't really have to worry about vandalism going unnoticed and unreverted. Similarly, all pages in the User talk: namespace are whitelisted, as are users' edits to their own user-pages and sandboxes (e.g., in my case, User:Ruakh and User:Ruakh/Sandbox).
    • An IP address can be whitelisted, which has roughly the same effect as granting a user the "autopatrolled" privilege (except that it's mediated by this Gadget, rather than being built-in).
    • When granting the "autopatrolled" privilege to a user, we can also whitelist him/her temporarily, so that their existing unpatrolled edits can be quickly marked as patrolled.
  • If there are a bunch of edits to a single page, I can just go to its history, view the overall diff of edits, and if the overall result is O.K., then I don't need to view each individual diff to mark all the edits as patrolled.
  • If there are a bunch of similar-looking edits by a single editor (e.g., creating thirty Khmer nouns in an hour, with the automated edit-summaries that show you the initial page contents), then I can just look at a representative sample of edits to confirm that there's no funny business going on, then mark a bunch of edits as patrolled in short order.
    • I also have code in my own common.js that applies the patrolling enhancements to user-contributions pages, which makes this a bit easier for me. I haven't added it to the Gadget, though, because I'm not sure if it's ready for prime-time.
In addition, you were right that there's another feature that someone (maybe Connel MacKenzie?) intended for it to be used in conjunction with:
  • If I have Lupin's Popups turned on, then I don't have to actually click on the diff to see what changed. (That's the Gadget whose description reads, "Navigation popups, page previews and editing functions popup when hovering over links".)
but I find that feature very annoying, so I almost always have it turned off. (Still, you might as well try it out and see what it does. Even if you find it as annoying as I do, you still might find uses for it.)
RuakhTALK 18:53, 14 October 2011 (UTC)[reply]
Good points. Thank you for the comprehensive reply. Oh, and I had tried Lupin pop-ups before, but they only showed a bunch of links to the page's talk page, whatlinkshere, deletion logs, etc, without any page content. I tried them again now, though, and found that if I hover over the link long enough (and wiggle the mouse a bit, but that's probably just voodoo), the changed content also shows up. (I'll probably come to be as annoyed by it as you are, but for now it seems useful, when patrolling.) - -sche (discuss) 04:37, 18 October 2011 (UTC)[reply]
The red and blue stuff is too large and garish for me. It leads to more scrolling and visual annoyance. Small icons instead of the large words, or just a lesser font size, would be good. Equinox 11:51, 15 October 2011 (UTC)[reply]
How about now? —RuakhTALK 13:14, 15 October 2011 (UTC)[reply]
That's definitely better for me. Equinox 13:15, 15 October 2011 (UTC)[reply]
Is it possible to allow default deletion summaries? Perhaps by specificing something ine one's javascript? Mglovesfun (talk) 16:54, 15 October 2011 (UTC)[reply]
Done. Actually, doubly done. You can set either a default value named GPE.initialDeleteReason that gets put into the input-box initially, but which you can override by clearing out that box, or a default value named GPE.deleteReasonIfBlank that gets used when you click the delete-button if the input-box is blank. Or you can even set both, in which case the latter is used if you explicitly clear out the former. To set them, you would put something in your common.js (or vector.js or whatnot) that looks like
GPE.initialDeleteReason = "I forgot I could specify a deletion reason!";
GPE.deleteReasonIfBlank = "I couldn't think of a deletion reason to enter!";
RuakhTALK 21:42, 15 October 2011 (UTC)[reply]
Could we set the site-wide JS so that if any admin leaves the field blank, it uses as the deletion summary a link to --explanation of deletion--? - -sche (discuss) 01:28, 18 October 2011 (UTC)[reply]
Yes, that could easily be done, by changing the line
GPE.deleteReasonIfBlank = '';
to
GPE.deleteReasonIfBlank = '[[Wiktionary:Sysop deleted|--explanation of deletion--]]';
(Individual admins could still override it by setting their own defaults.) I'm not sure how I feel about that, though . . . I don't know. What do other people think?
RuakhTALK 02:31, 18 October 2011 (UTC)[reply]
Can we "force" a more substantive choice, ie, not allow a default, but require a positive choice by the admin? I'm not in love with that default. DCDuring TALK 02:40, 18 October 2011 (UTC)[reply]
You mean, such that the delete-button won't even work unless the admin has entered some sort of deletion summary? Yes; we could replace
if(reason == '')
reason = GPE.deleteReasonIfBlank;
with something like
if(reason == '')
{
alert('Error: a deletion reason must be specified.');
return;
}
But I'm a bit loath to do that, since my goal is to lower the barrier to patrolling . . .
RuakhTALK 03:40, 18 October 2011 (UTC)[reply]
Mostly @DCDuring: The software allows admins to give no reason when deleting terms the traditional way... which isn't a convincing argument for or against forcing a choice here. Many bother to choose sysop-deleted, broad as it is, as their reason... which struck me as odd when I realised it (I had thought it was the default if the traditional deletion-reason box was left empty, but no, that default is the page content). It's a fairly comprehensive page, so it does seem a decent default, even if a specific reason would be better. After all, an admin isn't likely to find it convenient or possible use this tool to delete something that's failed RFD or otherwise needs a specific deletion summary; things deleted whle patrolling are most likely to be vandalism, misplaced Wikipedia pages, etc (like sysop-deleted covers).
@Ruakh: this does make patrolling easier for me. Thank you! - -sche (discuss) 04:37, 18 October 2011 (UTC)[reply]
Edit summaries communicate in a timely fashion mostly to readers of pages such as their watchlists or Recent changes. That would not include new and casual users. (Of course, the edit summaries are important for reviewing history, but that it is not timely nor likely to be attended to by inexperienced users.
If we could have another channel for contributor-targeted communication, then edit summaries could be explicitly targeted toward experienced users' watchlists and entry history.
Just spitballing, but would it be possible to have a deletion of a recently created page trigger a message, presumably canned, but also including some entry-related specifics. on the creating user's (yes, even an anon's) talk page? Could it be limited to users/IPs with no previous contributions? Such a message might be a good way to try to convert would-be contributors to actually useful contributors. What would be the risks of doing so? Are there other ways of classifying users or contributions to generate appropriate messages? DCDuring TALK 14:57, 18 October 2011 (UTC)[reply]
Re: "would it be possible to have a deletion of a recently created page trigger a message [] ?": In the general case, that would require a bot that watches the deletion log, which we're unlikely to have in the near future. (It's quite possible to create such a thing, and several editors here have the technical know-how, but our available programmer-hours of technical expertise are quite limited.) But in the specific case of entries deleted via the patrolling-enhancements Gadget, it would not be very difficult to add that functionality into the Gadget. (It also would not be very difficult to have the functionality controlled by a checkbox, with the message only being sent if the admin so chooses. We could then decide whether or not the checkbox should be checked by default.) —RuakhTALK 15:25, 18 October 2011 (UTC)[reply]
I was thinking of it in the context of patrolling, which for me means Recent changes and my watchlist with the Gadget. Is it hard? Is it worth the effort? If some those who patrol regularly (I am only occasional) would be willing to use it, that would be evidence of value. DCDuring TALK 18:43, 18 October 2011 (UTC)[reply]
I don't know. The thing is, even when an admin is patrolling via Special:NewPages?hidepatrolled=1, it seems unlikely to me that all of his/her deletions will be via the Gadget: in most cases, (s)he won't know that the entry should be deleted until (s)he has actually clicked through to it, and at that point I think (s)he's more likely to use the regular deletion interface — which it would be more difficult/awkward to add this feature to, IMHO — than the Gadget's delete-button. —RuakhTALK 00:23, 19 October 2011 (UTC)[reply]
My own habit would be to look at an entry, than use back-button functionality to return to the watchlist or Recent changes page, from which I delete or mark as patrolled, so the Delete functionality would be be useful. DCDuring TALK 02:01, 19 October 2011 (UTC)[reply]

Straw Poll: each section of our CFI

I apologise if I have chosen a poor format, but (following comments on WT:RFV#Finnair) I propose a straw poll to gauge the community's opinion of each section of CFI. (This is broader than just the necessary changes to BRAND CFI that Dan has a section for, above.)
  • If you think the section is good pretty much as-is, vote "keep as-is" (or "support").
  • If you think we should change a section, but still have a section (for example, change our criteria for including brand names, but still have criteria for brand names that are different from our general criteria), vote "change". If you can explain what you would change briefly (not three paragraphs), please do so.
  • If you think we should remove a section (for example, remove our specific criteria for including brand names, so that only general criteria apply to them), vote "remove" (or "oppose").
  • If you want to add a section (for example, to handle taxonomic names), add it under its own header in the "Sections to add" section. If your proposed section's text is very long, consider posting it in your userspace and simply putting a link under the header. Sign your section so we know who added it.
This way, we develop a clear idea, all in one place, of which sections are liked as-is, which (if any) a majority of editors would put on the chopping block, and which (if any) a majority would change. I used fake==== for some subsections so this page's TOC wouldn't explode, but I left some real headers so everyone could edit one section at a time and perhaps avoid edit conflicts. There's a section for general discussion at the bottom, if you'd rather comment there that you would "remove sections A and B, and change C, but keep the rest". - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
PS: Where CFI has a section, followed by text, followed by subsections, I have commented here on the subsections in their own (well) subsections, and my comments on the general section only apply to the text that is not part of any of the subsections. For example, my comments on the section "Attestation" are about the bit "“Attested” means verified [...] include the ISBN." My comments on the subsection "Conveying meaning" are in a subsection for that subsection. - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]

Sentences 1, 2, 3

"As an international dictionary, Wiktionary is intended to include “all words in all languages”. A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic."

  • My vote: keep as-is, a good statement of purpose, clarified by subsequent sections. I'm not opposed to changing it, though, if that's what a majority wants. - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • Keep as is, except to suggest that it should guide the drafting of other sections of specific application, not be a substitute for them. DCDuring TALK 20:43, 14 October 2011 (UTC)[reply]
  • Keep as is. I think that you should be able to take any text (in any language), wikify it, and get no red links (I'm still not sure about red links due to capitalization at the beginning of sentences). SemperBlotto 07:15, 15 October 2011 (UTC)[reply]
  • Change to add "Broadly speaking," to the beginning of the second sentence. Also, I'm not sure what purpose the first four words serve, or even what they mean: perhaps remove them. Finally, since the next section explains term, much as the following two explain attested and idiomatic, term should be boldfaced here just as the other two are.​—msh210 (talk) 01:14, 17 October 2011 (UTC)[reply]
  • Sentence 1: Keep as is (per Semperblotto). Sentences 2 and 3: change to something such as A term of a language should be included if it's used in the language and can be considered as belonging to the vocabulary of the language (e.g. it can be useful to a learner of the language to learn it). Use in the language is normally shown through attestations.. A dictionary is not used only when you run across a word and want to know what it means. There are many other uses. Lmaltier 19:47, 18 October 2011 (UTC)[reply]
  • Change in part per msh210 and Lmaltier. Specifically, remove the superfluous "As an international dictionary", which is captured by "all words in all languages", prepend "Broadly speaking" to the second sentence, and by indicating that a word would be part of the vocabulary of a language. However, also keep the notion that we should provide definitions for things that people might run across and wish to learn the meaning. bd2412 T 00:08, 22 October 2011 (UTC)[reply]

"Terms" to be broadly interpreted

Attestation

  • Keep as-is for now, continue to change as necessary (there have been several successful and unsuccessful votes to change this section). Perhaps refine the paragraph which follows the list, and which could be argued to be more explanation and discussion than criteria. - -sche (discuss) 19:08, 14 October 2011 (UTC). I support Ungoliant and CodeCat's proposal to add a definition/clarification of the term "extinct". - -sche (discuss) 04:04, 17 October 2011 (UTC)[reply]
  • Keep as-is basically. DCDuring TALK 20:45, 14 October 2011 (UTC)[reply]
  • Change 4th. It would be nice an exact definition of "extinct" in this sense. Also, does a transliterated form count as a contemporary source? To be honest I'd like this criterion removed, but since there was a vote for it there is nothing I can do :-( Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)[reply]
  • Keep concept. I'm not sure about all of the details, though. And we're really misusing the term "attested", a fault which we compound by structuring the section as a definition of the term! —RuakhTALK 02:24, 15 October 2011 (UTC)[reply]
    • Also — due in part to the way the MediaWiki presents header levels, it's not immediately obvious that there are three subsections clarifying various aspects of list-item #3. So a bit of re-formatting might be in order. —RuakhTALK 15:46, 17 October 2011 (UTC)[reply]
  • Keep as-is - I'm reasonable happy with this (and related) section(s). SemperBlotto 09:54, 15 October 2011 (UTC)[reply]
  • Change to explain what extinct means and whether transliterations for the purposes of study count as attestations. —CodeCat 10:12, 15 October 2011 (UTC)[reply]
  • Change. It currently emphasizes Usenet over books, which is terrible. It says we don't quote WMF sites, which it false: we don't count on them for attestation, but we have no problem quoting them. It says we allow recorded audio and video without mentioning spelling issues. It mentions ISBN but not ISSN or DOI, and should mention all or none.​—msh210 (talk) 01:14, 17 October 2011 (UTC)[reply]
    At least we commented out "<!-- removed: blogs and -->". I spotted that in the raw text — we should actually remove it. - -sche (discuss) 16:19, 17 October 2011 (UTC)[reply]
  • Change, very much so. Remove "1. Clearly widespread use"; remove "2. Usage in a well-known work"; rewrite the paragraph after the bullet lits. --Dan Polansky 08:57, 17 October 2011 (UTC)[reply]
  • Change, Remove "1. Clearly widespread use": rare words are welcome, provided that we can make sure that they clearly exist. Add the case of words formed in a systematic way, such as -able or -like adjectives (considered by my Pocket Oxford Dictionary as always existing): in this case, one attestation should be sufficient, to prove that tehir existence is not only virtual. * Remove spaning at least one year: this would be the main added value for an Internet dictionary to define words as soon as they appear, when readers are most likely to want to look for their sense! This kind of restriction is understandable only for paper dictionaries. Lmaltier 20:02, 18 October 2011 (UTC) Lmaltier 19:54, 18 October 2011 (UTC)[reply]
    Just so we're on the same page: right now, words only need to meet one of the criteria, not all of them. Rare words are welcome, and aren't excluded by the fact that we accept words which are "clearly in widespread use": rare words just have to meet one of the other criteria, since they don't meet that criterion of clearly widespread use. awkwardnessful, for example, has been used in ≥3 books and Usenet posts, so we keep it. - -sche (discuss) 20:40, 18 October 2011 (UTC)[reply]
    Yes, but this seems to be useless and misleading, as we sometimes read this should be deleted, because it's not in widespead use. Note that the -ful suffix is not one with systematic applicability, unlike -like. Lmaltier 21:04, 18 October 2011 (UTC)[reply]
    I see no need to give special importance to nonce words systematically formed, especially as having a definition for them at all is less important (as can be seen by traditional dictionaries that just have a list of words formed with un-, as no definition was necessary.) Some time limit seems useful for keeping words without serious use out of Wiktionary; it also makes making up a word and adding it (via three Usenet accounts, for instance) much harder. If you're using a word that young seriously, you should provide a definition somewhere near.--Prosfilaes 20:43, 18 October 2011 (UTC)[reply]
    I limit my proposal to a very small number of cases, the -like adjectives being typical. But we should not require 3 attestations when the POD does not require any attestation to consider that the word exists in English...
    It may happen that a word is very widely used worldwide just after being introduced. It should not be excluded. However, when only a few uses can be found, and there is a doubt about the authenticity of these uses, it should be excluded. Lmaltier 21:04, 18 October 2011 (UTC)[reply]
  • Also change the permanently recorded media bit: note that all Internet pages can be recorded permanently (archived) by the software when we want to. But it's difficult to accept purely oral (and not recorded) attestations... This should be clarified. Lmaltier 21:09, 18 October 2011 (UTC)[reply]

Idiomaticity

  • Change. This section is messy. Its passing mention of the Phrasebook should be a separate section, establishing the Phrasebook with clear purpose and different CFI (especially with regard to idiomaticity). - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • Change and also add some information about how to add single-word terms that are not idiomatic. This may seem strange for English, but in Finnish there are suffixes like -kin that can be added to almost any word, and this would be considered idiomatic in Finnish. Similar cases would also apply for unusually long compounds in German like the name of that law, or the name of that very long protein. —CodeCat 22:28, 14 October 2011 (UTC)[reply]
  • Change. Mglovesfun (talk) 08:56, 15 October 2011 (UTC)[reply]
  • Change. Current practice is to allow megastar in English no matter what it means, and the section should reflect that if that's what the community thinks appropriate; but the community has agreed certain Finnish and Hebrew single-word terms are not idiomatic, and the sections should reflect that, too. Or it should be broader. Also, the whole section is too wordy without being precise enough. And of course we need phrasebook criteria, but that's an issue under debate.​—msh210 (talk) 01:14, 17 October 2011 (UTC)[reply]
  • Change. Remove the "megastar" paragraph. Probably remove the paragraph that starts with "This rule must be applied carefully and is somewhat subjective, ". Further rewrite seems to be in order. --Dan Polansky 09:11, 17 October 2011 (UTC)[reply]
  • Change.RuakhTALK 14:57, 17 October 2011 (UTC)[reply]
  • Change. In particular I'd like to see a considerably more inclusive first sentence: ‘A term is considered idiomatic when it is particularly characteristic of a given language, especially when it shows unusual grammar or when its meaning is not obvious from its component parts.’ Ƿidsiþ 15:19, 17 October 2011 (UTC)[reply]
  • Change. (and remove under this title). Idiomacity should not be a requirement. But belonging to the vocabulary of the language is a requirement. It's often the same, but not always (e.g. Atlantic salmon is not idiomatic as defined here, but belongs to the vocaulary of English nonetheless). Lmaltier 20:07, 18 October 2011 (UTC)[reply]
  • Change per Lmaltier. Idiomacity should not be a requirement. If terms are consistently used as a phrase, such as garbage bag, then the ability to deduce the meaning from the component parts should be irrelevant. bd2412 T 00:42, 22 October 2011 (UTC)[reply]
  • Change: "page the" to "page, the", and add a link to ELE(?); possibly remove the subsection header or remove the subsection entirely. I do not feel strongly about this; I would not mind keeping it as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • If kept, then, yes, it needs a comma, per -sche. Also, it should probably be merged into the next section, which see my comments on.​—msh210 (talk) 01:14, 17 October 2011 (UTC)[reply]
  • Remove; does not belong to criteria for inclusion. If kept, at least remove the sentence "Once it is decided that a misspelling is of sufficient importance to merit its own page the formatting of such a page should not be particularly problematical." --Dan Polansky 09:11, 17 October 2011 (UTC)[reply]
  • Change per -sche. I don't think we should just remove the section outright, because even when we include a misspelled word, we don't really include it as a word: we include it as a spelling, and the entry is geared toward pointing readers at the correct spelling. —RuakhTALK 14:57, 17 October 2011 (UTC)[reply]

Idiomatic phrases: Pronouns, Articles, Verbs

Proverbs

Languages to include: Natural languages

  • Change. I would keep the first sentence; the second sentence is more explanation than criteria, and is unclear: "a proposed language is considered a living language, or a dialect of or alternate name for another language" — I would at least remove "living" (surely there are debates over whether dead tongues were languages or dialects). - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • Change. Should give some parameter as to what is a language and what is a dialect (ISO codes?), and make it clear that dialectal forms/pronunciations are also allowed (because some people might think "dialect" means "non-standard"). Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)[reply]
  • Change to reflect (or link to) agreed-upon rules of what dialects to count as languages and what not.​—msh210 (talk) 01:14, 17 October 2011 (UTC)[reply]
  • Change. All languages with an ISO code, or a Wikimedia code, or meeting some criteria (to be discussed) should be accepted automatically. Other languages should be accepted only after discussion. This is what we have decided for fr.wikt, and this is a good thing (no need for too many polemics). Lmaltier 20:16, 18 October 2011 (UTC)[reply]
Sign languages
Constructed languages
Reconstructed languages

Exclusions: Vandalism

Protologisms

Fictional universes

Wiktionary is not an encyclopedia
  • I have no strong opinion on this section; lean keep as-is. (Why is "the successor of Saul" allowed a sense-line at David under this section, as it stands?) - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • Change - I would be happy with short encyclopaedic content if it helps to explain the meaning of a term. SemperBlotto 10:18, 15 October 2011 (UTC)[reply]
  • delete as is. This is better handled by specific criteria for specific types of terms, and long contradicts common practice at Wiktionary (note how Houdini is listed as an example of what not to include, yet this very sense passed RFD!) -- Liliana 10:41, 15 October 2011 (UTC)[reply]
  • Remove; if not that, remove the Houdini paragraph. --Dan Polansky 11:00, 15 October 2011 (UTC)[reply]
  • Merge whatever of this is not already in the "Names of specific entities" section thereinto and remove this.​—msh210 (talk) 01:14, 17 October 2011 (UTC)[reply]
  • Keep, somewhere, but probably not here. Note that this explains that Houdini may be included, but not as a page about the escapologist, only as a page about his name, about the word. I add that even a very long mathematical definition such as the mathematical definition of a vectorial space is encyclopedic, bt should be kept nontheless, as this is the definition. I suggest to add that the definition may be considered as the intersection between an encyclopedia page and a language dictionary page with the same title. Lmaltier 20:51, 18 October 2011 (UTC)[reply]
Language-specific issues

Names

Company names

Brand names

In-depth discussion of this section: Wiktionary:Beer_parlour#Brand_names_and_physical_products.
  • Change. Per many RFV discussions, "a physical product" should be changed to either "a product" (if it is meant to include all products), or something like "a tangible/three-dimensional product" (if it is meant to exclude non-tangible products). - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • Change to include all commercial names, advertising, and political slogans. DCDuring TALK 20:52, 14 October 2011 (UTC)[reply]
  • Change to include commercial names and advertising, e.g. Internet service providers and banks as well as tangible items, and commercial creations like toy brands and cartoon characters. I'm not so sure about political slogans; they are not brands per se and I imagine most of them would fail CFI for other reasons. Equinox 20:59, 14 October 2011 (UTC)[reply]
  • Remove; keep all single-word attested brand names of pharmaceuticals, at least. --Dan Polansky 11:00, 15 October 2011 (UTC)[reply]
  • I personally think it should be changed (and tweaked) to include company and organization names, names of (non-brand) computer programs, titles of books of the Bible, and other things, and to clearly include brand names even not of "physical" products. But I'm not sure that has consensus. In any event, it should be changed to reflect consensus, if possible.​—msh210 (talk) 01:14, 17 October 2011 (UTC)[reply]
  • Change They should be kept when they are words. But there could be more stringent criteria (at least x independent attestations not originating from the company) to prevent abuse (these words can be created at will, with a legal status). Lmaltier 20:46, 18 October 2011 (UTC)[reply]
Given and family names
Genealogic content
Names of specific entities
  • Change. The section admits that it is incomplete. ("Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which.") We should complete it. - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • Change with largely exclusionary intent, possibly allowing for phased inclusion of types, (eg, in "populated places": countries, then provinces/states, then cities with greater than 100K population). DCDuring TALK 20:52, 14 October 2011 (UTC)[reply]
  • Not my personal cup of tea, but seems, in its ambiguity, to reflect whatever consensus exists. Keep as is for now (viz, until consensus develops further one way or another).​—msh210 (talk) 01:14, 17 October 2011 (UTC)[reply]
  • Change Should be accepted when they are words making possible the creation of a page with useful linguistic contents. e.g. Confucius is OK, not Winston Churchill. More generally, belonging to the vocabulary of the language and allowing a linguistic description should be the main criteria. Lmaltier 20:46, 18 October 2011 (UTC)[reply]

Issues to consider: Attestation vs. the slippery slope

In-depth discussion of this section: #Attestation vs. the slippery_slope.
See also

Sections to add

Translingual entries

Note: there is Wiktionary:About Translingual, but it is not formal policy.
  • I propose that we develop criteria for including translingual or might-be-translingual entries such as taxonomic names and Latin phrases such as caveat emptor: specifically, we should have criteria for determining which language(s) to consider them: Latin? Translingual? English? German? - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • Change to clarify that a phrase from a language does not become translingual unless it assumes a meaning inconsistent with its meaning in that source language, eg, two-part species names are Latin. DCDuring TALK 21:02, 14 October 2011 (UTC)[reply]
  • There was a discussion here too: "What language is smithii?" (not involving pizza). I don't have a very strong opinion on which headings should be used when, but I would like a decision to be made official. It's worth noting that there is almost always a specific translingual scientific meaning which is different to the original Latin. E.g. carex is Latin for reed, but in translingual scientific vocabulary it refers only to a particular genus of reeds. Should that meaning be under translingual or Latin? I think it should probably be under "Translingual". (For Carex it's currently under "English"). Regardless, it would be great to have some decision for official policy. Pengo 11:59, 20 October 2011 (UTC)[reply]
    • carex is Latin, Carex is translingual (but using other language headers as well should be allowed, to provide the English, French,etc. pronunciations, the gender in each language, and examples of use in the language). About smithii (alone), it does not mean anything in international conventions (they only require that it should follow the rules of Latin grammar), and I think that it can be considered as Latin. Lmaltier 19:52, 20 October 2011 (UTC)[reply]
      • I agree with your first sentence (carex/Carex). But not sure what you mean by smithii not meaning anything. In all instances where it's used in taxonomy it means "Smith's" (named for or by). Pengo 23:18, 21 October 2011 (UTC)[reply]
        • I mean that, in international conventions, binominal species names have a very precise meaning, but conventions don't define any meaning for smithii used alone. Lmaltier 20:07, 24 October 2011 (UTC)[reply]
      • carex is English too, hence the Anglicised plural carexes. A genus name is often adopted into English in lower case to refer to specific instances of the genus. "Nice wellingtonias in your garden." Equinox 23:25, 21 October 2011 (UTC)[reply]
      • Classifying Chinese characters as translingual (they are, well, Chinese, even if shared by four languages and Chinese dialects/topolects) is wrong but I don't see an easy fix here. The issue also exists in assigning all individual characters to a part of speech. The original meaning from Classical Chinese may be lost, it may be a pure phonetic with no separate meaning or never used separately. We also have a large amount of radicals, not words. The meaning, usage and readings will differ across CJKV languages. --Anatoli 00:29, 31 October 2011 (UTC)[reply]

Phrasebook

Note: there is Wiktionary:Phrasebook, but it is not formal policy.
  • Per my comments above about the section "Idiomaticity", I think we should have formal Phrasebook criteria. We might have them on a separate page and only link to that page from a section on the main CFI page. - -sche (discuss) 19:08, 14 October 2011 (UTC)[reply]
  • Delete There needs to be a project with active participants and serious intent. There is no evidence of such interest. DCDuring TALK 21:02, 14 October 2011 (UTC)[reply]
  • Delete In its current state it's a shame for Wiktionary, and I doubt it has any chance of improving fast enough. Maybe reopen in half a decade or so. Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)[reply]
  • Keep as long as we can find proper criteria. —CodeCat 22:35, 14 October 2011 (UTC)[reply]
  • Keep in some form, deleting this section entirely will simply mean that the phrasebook will have no rules. Mglovesfun (talk) 08:52, 15 October 2011 (UTC)[reply]
  • delete Similar to a company which is running losses monthly, we need to concentrate on our core topic, which is building a dictionary. The phrasebook can come back later once there is interest. -- Liliana 10:47, 15 October 2011 (UTC)[reply]
    You assume that Wiktionary is a single company that can focus on only one project at a time. But there are many Wiktionary users who can do many different things at a time. If they want to help, let them help in whatever way they feel is best, as long as it is an improvement. I would agree with you if I felt that a proper phrasebook would not improve Wiktionary, but I think it would. I don't think it's really our job to tell users what to focus on by banning everything else. —CodeCat 11:03, 15 October 2011 (UTC)[reply]
  • We should probably consider not just the phrasebook, but also Category:English non-idiomatic translation targets, since those entries are also not idiomatic but kept for the sake of translations. Perhaps a sensible rule would be to just disallow definitions altogether for English non-idiomatic terms, because definitions like 'Indicates that the spaker is hungry' are silly. The whole idea of non-idiomaticity is that no definition should be needed. —CodeCat 12:13, 17 October 2011 (UTC)[reply]
  • Change This should be an actual phrasebook (same principles as the thesaurus), with pages such as At the restaurant (French). This is what would be useful to readers. Lmaltier 20:37, 18 October 2011 (UTC)[reply]

Placenames

Discussion

I have a question. Can anyone vote? Ungoliant MMDCCLXIV 20:09, 14 October 2011 (UTC)[reply]

This isn't a formal vote, so go ahead. -- Liliana 20:10, 14 October 2011 (UTC)[reply]
Right! Everyone should give input. :) - -sche (discuss) 20:12, 14 October 2011 (UTC)[reply]
Usually any registered user can vote. DCDuring TALK 21:04, 14 October 2011 (UTC)[reply]

Thanks to User:-sche for this. DCDuring TALK 21:04, 14 October 2011 (UTC)[reply]

I'd like a more holistic approach, though I know that's not easy at all. The document shouldn't contradict itself and should be clear. It should define any potentially ambiguous terms, for example, what is a 'word', what is a 'language'? Example of contradictions are "all word in all languages" can contradict the rules on fictional universes, the rules on brand names and the rules on company names. I find such contradictions are a natural product of wikis, where one editor edits one part of the page, another editor edits another part independently. Mglovesfun (talk) 16:35, 15 October 2011 (UTC)[reply]
I like what User:DCDuring wrote on his user page about that. -- Liliana 16:44, 15 October 2011 (UTC)[reply]
I'm flattered. It is just cautionary, though. DCDuring TALK 18:05, 15 October 2011 (UTC)[reply]
In legal drafting, there are usually clauses beginning with notwithstanding that indicate that a given clause is to be read as superseding the ones mentioned in the "notwithstanding" clause. There are also standard rules of construction for interpreting apparent contradictions in the absence of their explicit resolution. Obviously, it is best to be as explicit as possible about conflicts that are noted at the time of drafting and to attempt to identify as many of them as possible at that time. For example, in our case, attestation seems to override other considerations in that an absence of attestation (at least for lemmas) is deemed to be fatal to includability. DCDuring TALK 18:05, 15 October 2011 (UTC)[reply]
I concur.​—msh210 (talk) 01:30, 17 October 2011 (UTC)[reply]
This poll remains open, but I'll comment on the results so far:
Some of us want to tweak Sentences 1, 2, 3 in an exclusive direction, some want to change them in an inclusive direction, but at least half of us can live with them as-is, which is what's likely to happen, in the absence of a majority for any particular change. I plan to roll msh210's format tweaks (bold terms, change the header) and some missing commas into one "cleanup" vote, which may also include clarifying "extinct" in the section on "Attestation". About half of us would keep "Attestation" as-is, only tweaking it to clarify "extinct", and it seems likely we will keep it as-is, in the absence (again) of a majority for particular change: Ungoliant would remove criterion 4, Dan would remove 1 and 2, Lmaltier 1 and part of 3, but I'd oppose most of those changes. Almost all of us are OK with "Conveying meaning".
All of us agree "Independence" should be rewritten. I plan to revive February's discussion so we can decide how to rewrite it, and make that a second vote. (Maybe we can get Ben to help us; he has experience cleaning up declarations of Independence.) A majority of us say keep "Spanning at least a year" as-is; Dan thinks the section is unnecessary; Lmaltier thinks the criterion itself should go. All of us agree "Idiomaticity" needs to be rewritten, but we need to discuss how. A majority want to change "Spellings", but in unrelated ways, so I'm marking it as something we should discuss further. We agree that "Formatting" should be reformatted. (I'll roll that into the "cleanup" vote.)
Most of us (who expressed an opinion) are OK with "Inflections" as-is. Msh210 suggests an update to the "Idiomatic phrases" section to bring it into line with actual practice; I plan to include that in the "cleanup" vote unless someone is opposed to it. All bzw. most of us want to change "Natural languages" and "Constructed languages", but we need to come to a consensus about specific changes. "Sign languages" and "Reconstructed languages" are good as-is or with a little tweaking. Most of us agree on removing "Vandalism" and "Protologisms", or moving them to a different page / to the "Attestation" section, respectively; I plan to make that a third vote (structured so people can vote to delete both, keep both, or delete one and keep the other). We're OK with "Language-specific issues".
I'll comment on "Fictional universes", "Wiktionary is not an encyclopedia", "Names", "Company names", "Genealogic content", "Names of specific entities" and "See also" later. We unanimously dislike BRAND, and though we're divided on how to change it, the dedicated section further up the page shows that a vote on the words "physical product" is in order. I suggest a vote with three options: change "physical product" to "product" (or similar, to make clear that it includes intangibles), change it to "tangible product" (or similar, to make clear that it includes only tangibles), or keep the status quo (unclarity); if neither of the first two options passes, the unclear status quo continues.
We agree: change "Given and family names" only to remove or answer the question of patronymics. Most of us agree: remove "Issues to consider: Attestation vs. the slippery slope"; I'll include that in the Vandalism-Protologisms vote. - -sche (discuss) 08:11, 21 October 2011 (UTC)[reply]
The BRAND vote should include an option that goes beyond "tangible or intangible product" to something like "commercial offering". For example, British Telecom is not a product but it ought to fall under brand rules IMO. Equinox 14:58, 21 October 2011 (UTC)[reply]
Even that is too narrow, because it would exclude noncommercial brand names like Debian, Anthrocon and so on. —CodeCat 00:28, 22 October 2011 (UTC)[reply]

Chinese radical changes

What the hell is going on? See http://en.wiktionary.org/wiki/Special:Contributions/213.79.124.126

Also, please archive this page so it doesn't take forever to load. It's just simple common sense.

71.66.97.228 07:38, 15 October 2011 (UTC)[reply]

I oppose radical changes (lol). Mglovesfun (talk) 08:51, 15 October 2011 (UTC)[reply]

Did you look at it? 71.66.97.228 23:20, 15 October 2011 (UTC)[reply]

Reverted... changes like this need to be discussed first. JamesjiaoTC 21:16, 16 October 2011 (UTC)[reply]
I don't know any Chinese, but just looking at that page it's hard to discern the specific details of a character. I imagine that page will be used relatively often by people who know little Chinese (students or just curious people), and it would be hard for them to read the characters. So could they be made bigger on that page? Maybe twice the size? —CodeCat 12:45, 19 October 2011 (UTC)[reply]

Sanskrit dictionaries - parts of speech + language portals

Background
  1. I am planning to write a bot to import definitions from publicly available and sanskrit-english, sanskrit-sanskrit digitized dictionaries to some wiktionary so that it may be collaboratively edited.
  2. Ideally, for a given word or word-root like 'अङ्ग', I would want (English, Sanskrit, Hindi) definitions from various dictionaries to be collected in a single place.
Observations
  1. en.wiktionary.org records English definitions of many Sankskrit word-roots. A part of sa.wiktionary.org interestingly seems to mark a beginning of duplication of some of these words.
  2. Parts of speech used in these definitions are inadequate.
    1. It is important to distinguish word roots from words in sanskrit. Eg: गम् is a verb-root. But it is never used in a sentence uninflected. With inflection, according to time, mood, number and case, forms like 'गच्छति', अगच्छम् are used. So, it will be importand from the perspective of dictionary users to ideally record all these forms (or atleast the roots) and distinguish the verb-root from the many inflected forms.
    2. Further, word-roots are classified into different groups (including grammatical gender in case of noun-roots and इट्, गण in case of verb-roots), which determines how it may be inflected and used in a sentence. The same string can appear in multiple classes, and have different meanings. For the dictionary to be useful to the users of (and translators to and from) the Sanksrit language, these should be indicated while classifying word roots.
  3. en.wiktionary.org lists definitions in different languages for a given string. Instead for the purpose of definitons and translations to or from Sanskrit, we would like definitions in different languages
Questions

Given the requirements mentioned (collating multiple language definitions in a single place, need for richer part-of-speech tags), should I plan to upload definitions to en.wiktionary.org or to sa.wiktionary.org? I am fairly certain that, in the latter case, no one will object, the only downside will be partial duplication in the two dictionaries.

Vishvas vasuki 15:15, 16 October 2011 (UTC)[reply]

Welcome! Such a bot would be very welcome here, overall. A few comments:
  • We accept English definitions/translations of Sanskrit words, and Sanskrit translations of English words, but we do not accept Hindi definitions/translations of Sanskrit words.
  • Some duplication between projects is expected — even desirable. For example, en.wiktionary.org entries for English words include translations into French, and fr.wiktionary.org entries for English words also include translations into French. But the target audiences are different, so the presentation is different.
  • Just because something is publically available, that doesn't mean that you can copy it here. It must be either "in the public domain" (meaning that no one owns any copyright on it, for example because it's very old), or else publically available under an appropriate free license.
  • You're very, very new here. Before starting to run a bot, you need to become familiar with our norms and practices; that will take time.
  • We do use different part-of-speech headers for different languages, where necessary. If other Sanksrit-speaking editors agree with you about what part-of-speech headers are needed, you can start working on documenting the system, at Wiktionary:About Sanskrit.
RuakhTALK 16:31, 16 October 2011 (UTC)[reply]
It's very interesting. Sanskrit definitely needs some boost. Yes, before importing. You should try and create Sanskrit entries manually, look at the existing ones, like Category:Sanskrit_nouns, Category:Sanskrit_verbs, etc. Seek advice and see if the entries come out in an acceptable format. --Anatoli 07:56, 21 October 2011 (UTC)[reply]

en-verb: person derivatives, e.g. "baker", "maker", "actor"

Verbs in English very frequently have a derivative based on a person performing the verb (anyone know the correct term for this construct?). For example, bake has baker, make has maker, act has actor. Would it be worthwhile to add to template:en-verb such that we could show the (one or possibly more, eg: actor, actress) of these derivatives? This would allow an organized and easy way to find the appropriate term rather than looking through the "Derived terms". Facts707 03:03, 18 October 2011 (UTC)[reply]

But these words are derived terms, they're not considered part of the inflection of a verb. To add one or possibly two derived terms to the headword line doesn't really make much sense to me. We could also add -ness to adjectives otherwise, or -lik to nouns. —CodeCat 09:46, 18 October 2011 (UTC)[reply]
I don't think we should include agent nouns in verbs' inflection lines, not because they wouldn't be uesful there — maybe they would be — but just because the inflection line is already pretty long. —RuakhTALK 11:51, 18 October 2011 (UTC)[reply]
I oppose this idea. Mglovesfun (talk) 14:24, 18 October 2011 (UTC)[reply]
Great opportunity to work on one's Javascript skills, designing a custom inflection line for English verb lemmas. Perhaps it could even be made a gadget. I am a little skeptical that there is much call for this, however. DCDuring TALK 15:21, 18 October 2011 (UTC)[reply]
I'm opposed too. The inflection line is for inflectional morphology; this is derivational morphology. Also, what would we do with forms like cooker? —Angr 10:05, 19 October 2011 (UTC)[reply]
There's also the problem that English actor is derived from Latin actor, and not from English act. Some of these agent nouns developed in other languages and then were borrowed into English, rather than developing in English from the English verb. So, they're not always Derived terms, but sometimes are Related terms. --EncycloPetey 15:20, 6 November 2011 (UTC)[reply]

Portuguese bots?

Hi, anyone have a Portuguese conjugation bot? Or do I have to run one myself? --Rockpilot 10:06, 19 October 2011 (UTC)[reply]

I think BuchmeierBot (talkcontribs) now does Portuguese. Mglovesfun (talk) 12:53, 19 October 2011 (UTC)[reply]

Using a serif-like font for headwords in Chinese characters

In Chinese characters, 'serif' fonts (I'm not sure if it's the proper term) have the advantage that they show the individual strokes of the character more clearly than 'sans-serif' fonts. This is important for a dictionary which may possibly be used by students of Chinese or Japanese. Could the default font for headwords in Chinese characters (and maybe Japanese Kana as well) be changed? —CodeCat 12:49, 19 October 2011 (UTC)[reply]

It looks a bit ugly having serif for Chinese when the entire rest of the dictionary is in sans-serif. Additionally, I don't know how it would look in the font size we use. -- Liliana 12:28, 20 October 2011 (UTC)[reply]
Oppose. If you can change your browser's settings, it should be relatively easy to tell it how to display text depending on language tag, whereas if Wiktionary sets a default, it will be considerably more difficult for someone who does not like Wiktionary's default to override that and get it to display in another way. I think I've had that experience on Wikipedia where somebody thought it a good idea to impose their favourite IPA font on everyone else.
For people who can NOT set browser settings, it might be helpful if Wiktionary offered an equivalent user preference, but I don't know if that is technically possible. On a larger note, it would be great if browser settings could be set to handle display by script or language+script instead of just language. For example, many CJK fonts have ugly glyphs for the (extended) Latin range, but if I mark pinyin as zh-Latn it will be displayed using the same font that's used for all “Chinese” text, rather than a non-CJK font that would look better. --Dustsucker 03:36, 14 November 2011 (UTC)[reply]

Pinyin and Romanization headers

Why do some Mandarin entries use the header Pinyin, some use Romanization, and some use both. What's the difference? WT:RFC#shí mentions this. Mglovesfun (talk) 12:55, 19 October 2011 (UTC)[reply]

I have cleaned up some. --Anatoli 22:18, 19 October 2011 (UTC)[reply]
But how? Since both headers are valid. Mglovesfun (talk) 11:02, 21 October 2011 (UTC)[reply]
I haven't seen Pinyin headers but we have been using and agreed on Romanization. Pinyin is the romanisation method for Mandarin, anyway. The entry is OK in terms of formatting but a native speaker may need to sort some characters I wasn't able to check properly - some are rare and not very productive. I'm also not sure if we need to list radicals like /, they are never used to make words, they are character components. My suspicion is the entry was created by running a tool like Wenlin, which can generate lists of characters for a pinyin reading, not very useful, IMHO. --Anatoli 11:10, 21 October 2011 (UTC)[reply]

Chinese character "etymologies"

This is concerning edits at and other characters (eg. ), and the dual use of the word "etymology", referring to 1) word etymology; 2) graphical origin of a glyph. The latter sense is not "etymology" per se - which, according to Wikipedia, is "the study of the history of words, their origins, and how their form and meaning have changed over time", not the development of the written form of a word, which Wikipedia treats as "origin" or "history".

At present, Wiktionary treats the origin of glyphs as "etymology", which is fine for non-word glyphs, for example "b"; but not so clear with glyphs which also carry meanings on their own, for example "". This dual use of the word "etymology" is much more notable when dealing with Chinese characters, almost all of which represent morphemes. The current format of Chinese character entries involves a "translingual" section at the top, which is where graphical "etymology" of a character is supposed to be discussed. Because most Chinese characters are partially phonetic and convey some information regarding similar-sounding characters at the time of coinage, this practice produces an inconsistency in the format, i.e. information which is non-translingual (applicable to certain stage of Chinese only) is placed under a heading "Translingual". To use an example, this old revision of "字" says the graphical "etymology" of the character is phono-semantic - which is true iff Old Chinese is the language being discussed, where the similarity in pronunciation of 子 ("child") and 字 ("to nurture; word") formed the basis for the invention of the character 字. This is some language-specific information, not "translingual". Readers would be misled to think that the concepts "child" and "word" are pronounced alike in all languages below on that page, which is not true.

A more obvious inconsistency is in ("to be old"), which is said to be "cognate" with ("to examine"). Graphically, the common origin of these two glyphs is obvious, but to say these two glyphs are "cognate" is again a language-specific issue. In Chinese, these two characters are doublets and cognate, but this is not true translingually. Similarly, the "cognacy" between ("to participate") and ("three"), if it were mentioned in the "translingual etymology" of 參, is neither "translingual" nor true "etymology". 129.78.32.23 03:20, 21 October 2011 (UTC)[reply]

I agree. is the worst offender, it kinda mixes everything in the Translingual etymology section, it hardly makes sense. -- Liliana 03:24, 21 October 2011 (UTC)[reply]
Proposal: use "Etymology" for the origin/coinage of the phonetic version of the character (if a single-character entry) or the origin/coinage of the word (if a multi-character term), and use "Graphical significance" (which could be a subheader of "Etymology") for the way the character looks and has developed over time (for single-character entries). 71.66.97.228 07:42, 22 October 2011 (UTC)[reply]

Trademarks

Hi folks. At the Foundation, we've come across an interesting problem, and we need some guidance from you. We've had two companies contact us with concerns over terms in the English Wiktionary. Both are terms that are trademarked, but are defined on Wiktionary as generic terms. One of them (pycnogenol) has a citation list showing generic usage going back to the early 80's and a mention of the trademarked term as a brand name in the Usage Notes, the other (threatscape) has only one citation showing generic usage and no mention of a trademark or brand name usage.

What we need some help with is understanding what kind of policy Wiktionary has for trademarked terms and how to handle them when they arise. I've read WT:BRAND, and while it mentions genericized trademarks it does not mention strong criteria for determining when a trademark has been genericized, or what to do when a term is assumed to be generic but is also trademarked.

Has there been a policy discussion over this that I haven't found you can point me to? Or is this something that hasn't had a consensus decision yet? I would venture the opinion that this is something that needs a consensus decision, as these instances are going to continue (we've had these two come in quick succession, and I know it's a matter of time before we get more). If the community can come to a policy decision, it will give the community direction in creating definitions, and can give the Foundation something to point to when future instances arise.

There also needs to be a determination on what to do with these two entries; the makers of Pycnogenol ask for all generic definitions to be removed and only describe it as their proprietary product. The holders of the Threatscape trademark would at least like to add something to the entry mentioning their trademarked property.

Thanks for your help! Philippe (WMF) 04:29, 21 October 2011 (UTC)[reply]

  • Interestingly, @Philippe, we're currently revising the sections of WT:CFI that pertain to brand and company names (BP#1, BP#2), though for a different reason: a few users have proposed that we include many more trademarks (by which I mean: strings of letters which exist only because they were coined to be trademarks, and which have not become genericised, as distinct from pre-existing words which have been trademarked and as distinct from genericised trademarks) than we currently do — but a greater number of us seem poised to enact changes that will reduce (possibly eliminate) the number of such trademarks we include. We will continue to include genericised trademarks like hoover, and we will especially continue to include strings of letters which have been words for longer than they have been trademarks, because it is our mission and most fundamental policy to include attested words. AFAICT, "Pycnogenol" was trademarked [in the US] in 1993, so as SemperBlotto notes, both of these terms were words before being trademarked. They have also continued to be generic words after being trademarked. (Edit: Actually, I'm finding conflicting reports of when "Pycnogenol" was trademarked.) - -sche (discuss) 19:58, 21 October 2011 (UTC)[reply]

These companies don't object to the inclusion of their trademarks here, only to the mention that their mark might be used in a generic way. The same happened on fr.wikt about fr:qualimétrie (see the lengthy discussion page). In some countries, companies have to protect their trademarks, and this might be the only reason of these contacts: getting a proof that they protect their trademarks. In any case, mentioning in the page that it's a trademark is useful, and necessary if the word was created by the company. Lmaltier 20:42, 21 October 2011 (UTC)[reply]

  • (Intellectual property attorney hat on) We have had some discussions on this point previously, including a vote that I can not readily find to not put TM symbols in entries, and I have pointed out in every instance that Wiktionary has no legal obligation whatsoever to indicate the trademark status of a word. We face no legal liability of any kind for including and defining a word that happens to be a trademark, because we are not making trademark use of the word (i.e., we are not using the word to "sell" Wiktionary). I think our policy should state as much. If we assume the burden of noting the trademark status of words, we will be stuck with the fact that millions of common nouns in the English language (and others) are used as trademarks with respect to some products (dove, ace, coach, apple, cricket, fiesta, eagle, west, Mars, planters, and so on ad infinitum). Of course, noting the identity of a company that coined a word is important for etymological, not legal purposes. (Intellectual property atorney hat off). bd2412 T 20:51, 21 October 2011 (UTC)[reply]
Putting a trademark gloss or symbol on words that are only trademarks is very different from adding notes at common nouns that they are trademarks for specific things (e.g. Dove soap). Most cases of the latter would fail WT:BRAND. Equinox 21:05, 21 October 2011 (UTC)[reply]
Trademarks are transitory. They are abandoned from time to time (as when a car company discontinues a certain model); they can lose their federal registration status if the owner fails to file five year, ten year, and twenty year renewals. Furthermore, trademark registrations are country-specific, meaning that a mark can have different owners under different conditions in different countries. It's a morass we need not get into. Indicating the origin of a word or a sense as a trademark fulfills our mission. However, I think that giving an indication of the current legal status of a mark goes beyond our reasonable scope. bd2412 T 23:25, 21 October 2011 (UTC)[reply]
What bad things could happen if we included "Claimed to be a trademark in some jurisdictions by IPCo. See their website/Contact them for details." as a usage note in cases where a company provides the information? DCDuring TALK 23:57, 21 October 2011 (UTC)[reply]
First, by doing so we make ourselves advocates for claims of trademark ownership. My primary practice has been as a trademark attorney, either seeking to obtain trademark protection for a client, or advocating that one or the other of competing parties was the legitimate owner of the mark. Do we want to be in the position of stating in our definition that two different parties each (inconsistently) claim the exclusive rights to the mark? Do we want to be in the position of having parties seek to influence courts based on their ability to convince us that their claim is legitimate? Second, where do we draw the line if trademark owners in fact ask us to include such language with respect to common nouns, to "tide" and "whirlpool" and "crest" and "scope"? I am concerned that if we hold ourselves out as willing to include language indicating trademark status for some entities, then we do open ourselves up to liability for those companies for whom we refuse to provide the same service. Second, once such a notation is added to a word, it will have to be checked from time to time to be sure that it is still valid. Companies will have an interest in having trademark claims added while they use a mark, but have no interest in having it removed once they abandon that model. bd2412 T 01:23, 22 October 2011 (UTC)[reply]
I tend to agree with / be persuaded by bd2412: including information on a word's trademark status is outside our scope, especially to the extent that we do not include trademarks as such. - -sche (discuss) 01:46, 22 October 2011 (UTC)[reply]
I absolutely agree with BD. There is no need for it and no advantage in it. Ƿidsiþ 20:49, 22 October 2011 (UTC)[reply]
Thanks, everyone. This is really very helpful. I'd like to request that if anyone else wants to provide some input on this, I'll continue to watch this page, or you can send it to me directly by email at philippe wikimedia.org. Thanks again!! — This unsigned comment was added by Philippe (WMF) (talkcontribs) at 22:50, 22 October 2011 (UTC).[reply]
Just as a follow-up, I just talked to our legal team, and we're very comfortable with both this discussion and the way that things are headed. We'll be happy to support this and it gives us a good understanding of the community's views on it. Thanks!Philippe (WMF) 22:31, 24 October 2011 (UTC)[reply]
I would propose a brief statement to be inserted into the appropriate policy page indicating that we will note the origin of a word as a trademark where this is an etymologically significant fact, but that because trademark status varies by time and place, we do not include such information in our entries. Our entries are not intended to provide a legal opinion as to the trademark status of a word, or the legitimacy of any claim to rights in a word. bd2412 T 23:56, 24 October 2011 (UTC)[reply]
I support inclusion of brands, if we don't allow them yet. Microsoft, McDonald's, Toyota are examples of world-known words, which are used too often to be ignored. There is no benefit in excluding them and users need to be able to understand brand names in a foreign language text, same with place names, personal names. I don't see much benefit in translating Roman letter brands into other Roman script languages, well Apple (company) is Apple in Swedish, German, Afrikaans, etc. --Anatoli (обсудить) 22:59, 7 December 2011 (UTC)[reply]
This discussion resulted in WT:TM.

A proposition: All words in all dictionaries.

I would like to propose that we should pursue a goal of incorporating in our lexicon all words in all dictionaries. If a reasonably reliable published dictionary of real-world terms (as opposed to fictional-universe terms or manufactured-language terms) contains a real-world definition for a word or a phrase, even a phrase that is encyclopedic or a sum-of-parts phrase, we should include that phrase somewhere in Wiktionary, even if it is only in a glossary. For example, I have a recent edition of Black's Law Dictionary. It has entries (to pick a random page) for Pennoyer rule, Pension Benefit Guaranty Corporation, pension plan, perfection of security interest, and perils of the lakes. I believe that we should not be in the position of lacking words that can be found in other dictionaries, and even if we do not include these terms in our corpus, we should have a glossary to which these terms redirect providing a quick sense of their meaning and, perhaps, referring the reader to the appropriate Wikipedia article on the subject. This is not a proposal to amend the CFI itself, but to maintain broad openness to inclusion among our entries or in a glossary of terms appearing in serious dictionaries of legal, medical, scientific, professional, and cultural terms. bd2412 T 03:08, 22 October 2011 (UTC)[reply]

Oppose very strongly, we should not try to be other dictionaries. We cannot be better at being Websters than Webster, ot better at being Oxford than Oxford. We should also not reproduce the errors in other dictionaries. We should use own own criteria to avoid duplicating other people's work. Mglovesfun (talk) 09:44, 22 October 2011 (UTC)[reply]
I specifically stated in my proposal that This is not a proposal to amend the CFI. We would still require independent attestation, and would still write our own definitions. Encyclopedic or sum-of-parts terms appearing in other dictionaries (and my proposal is aimed primarily at technical and professional dictionaries) would be relegated to a glossary or an appendix. Websters and Oxford are merely trying to be "the best dictionary", so why should we not want to be better than Websters and Oxford, and better than all the technical and professional dictionaries out there? bd2412 T 14:27, 22 October 2011 (UTC)[reply]
Other dictionaries are not perfect and we should not ape them. We especially don't want to copy ghost words. We should be doing our own research on attestation: uses, not mentions. Equinox 09:48, 22 October 2011 (UTC)[reply]
Equinox, that is not what I have proposed to do at all. bd2412 T 14:36, 22 October 2011 (UTC)[reply]
It would certainly reduce the number of RfD debates if inclusion in even a single approved reference work qualified an entry for inclusion. Instead of dealing with so many entries at retail we could deal wholesale with the adequacy of a given reference work as a source of entries. I am not sure what criteria would work for assessing the adequacy of a reference work for this purpose, though. In English the "unabridged" dictionaries, obviously, would qualify. If we have a phrasebook, then there need be no question about learner's dictionaries. Wordnet includes some SoP terms (IMO), but ones that would be highly attractive as translation targets. I don't think we want Urban Dictionary without major qualification. I could imagine that it would not be hard to come to agreement on legal dictionaries, though some of the entries have seemed encyclopedic to me. OTOH, I am not at all convinced that business, finance, management, and investment glossaries have lexicographic authority.
And I would expect that we would want to apply our current practices to allow inclusion of items not in any improved reference. DCDuring TALK 13:34, 22 October 2011 (UTC)[reply]
It's one case where I'd prefer 'usefulness' and 'accuracy' to 'simplicity'. Mglovesfun (talk) 13:39, 22 October 2011 (UTC)[reply]
@Mglovesfun, Equinox: I think you're both arguing, at least in part, against a straw man. Even if we include a given word just because other dictionaries do, that doesn't mean we have to trust in the accuracy of their definitions. We could include zzxjoanw, for example, while making clear that it's a joke or a hoax. —RuakhTALK 14:04, 22 October 2011 (UTC)[reply]
If zzxjoanw has ever been used and not merely mentioned, fine. Equinox 14:10, 22 October 2011 (UTC)[reply]
It has not. That's why we can't trust other dictionaries' definitions. ;-)   (Though BD2412 is obviously talking about accusations of SOP-ness and {{rfd-redundant}}cy, rather than unattestability, so my example was a poor one. Also, he explicitly said he's O.K. with using appendices for these, and we already do include zzxjoanw in an appendix.) —RuakhTALK 14:32, 22 October 2011 (UTC)[reply]
That is correct, my proposal is primarily directed towards making sure that 1) we are not missing words that are attestible and meet the CFI, and are included in other dictionaries (including technical and professional dictionaries), and 2) allowing attestible SOP and encyclopedic terms to be included in a glossary or appendix here if they are listed in a technical or professional dictionary elsewhere). I am not proposing at all to include terms that are not found in the real world. As an example, I initially voted to delete angstrom unit in the VfD for that term. After researching and discovering that the phrase appears in several technical dictionaries, I changed my vote to keep. bd2412 T 14:42, 22 October 2011 (UTC)[reply]
I have no objection to the use of dictionaries or similar reference works as a source of words (or terms) - as long as you are prepared to write your own definition, and be prepared to verify the word from the real world. SemperBlotto 14:09, 22 October 2011 (UTC)[reply]
@Ruakh and BD2412 I just oppose the idea as a whole of trying to be more like other dictionaries. The details beyond that don't matter too much. Mglovesfun (talk) 14:50, 22 October 2011 (UTC)[reply]
I don't want Wiktionary to be in the position of someone needing to look up a technical term and having to say to themselves, "Wiktionary won't have this, I'll have to look elsewhere". I want that someone to think of Wiktionary as the most likely place to have whatever definition they seek. bd2412 T 14:58, 22 October 2011 (UTC)[reply]
My rebuttal to that specific point is a lot of dictionaries for technical terms include terms which would fail WT:CFI#Idiomaticity, because that's our rule, not theirs. A bit like in baseball, games played, which refers the number of [[games]] [[played]]. Mglovesfun (talk) 15:38, 22 October 2011 (UTC)[reply]
(An interesting example, since in the case of "games played" there is a different definition for a defensive player, an offensive player and a team, none of which would probably match any pair of definitions on either game or played. - [The]DaveRoss 00:35, 25 October 2011 (UTC))[reply]
I would propose in that case that games played, failing idiomacity, should not be within our main body of definitions, but should be in an appendix or glossary of baseball terms (probably along with such things as left-handed pitcher and batting order). bd2412 T 20:26, 22 October 2011 (UTC)[reply]
If this is not a policy proposal but a general ideal that all words found in specialist dictionaries which meet WT:CFI should be included, then it seems a bit pointless to me, as all words that meet WT:CFI should be included, both ones which are in other dictionaries, and ones which aren't. Mglovesfun (talk) 10:33, 24 October 2011 (UTC)[reply]
My thinking on this is primarily driven by my review of technical and professional dictionaries. As it happens, I currently work for a court that deals with technical definitions with unusual frequency, so the court's library has mind-boggling shelves upon shelves of technical dictionaries in every conceivable field. For example, there is the Dictionary of Mining, Mineral, and Related Terms, which I was able to get a CD version of, and which TheDaveRoss is presently crunching into a format uploadable to our project space. Of course, many definitions included in these dictionaries are SOP or possibly encyclopedic, but the fact that they are defined there indicates a market for people looking for those definitions in a dictionary format. Hence my concern that Wiktionary aim to become the place where people go to look up the definitions of words and phrases, no matter how arcane or specialized, and no matter whether they are in a gray area of being possibly SOP or encyclopedic (even if terms facing these questions are only included in appendices). bd2412 T 20:29, 24 October 2011 (UTC)[reply]
Finding a phrase in another dictionary or technical glossary is a very strong clue that it's worth a definition (specialized technical writers are much more likely to know what should be included than us, because they know better their subject). I assume that this is what BD2412 means. Of course, I exclude encyclopedic dictionary entries such as Charles Darwin, I think only to set phrases. And when it's worth a definition, it's also worth a normal page. But this is not a reason to copy errors from other dictionaries, of course, I agree with SemperBlotto. Lmaltier 20:00, 24 October 2011 (UTC)[reply]
@Lmaltier yes that's one possibiliyu, but also as you discuss and as BD2412 and I discuss above, 'specialized' dictionaries are also likely to include things that wouldn't meet our CFI like games played in baseball. They often behave more like Appendix:Glossary than our main namespace. Mglovesfun (talk) 12:40, 25 October 2011 (UTC)[reply]
Oppose strongly, partly for some of the reasons above, but also because: (1) The American Heritage Dictionary has entries for Cleveland, (Stephen) Grover, Cosby, William Henry, Jr., Guevara, Ernesto, etc. and (2) B.D. Jackson's Glossary of Botanic Terms likewise has ridiculously useless definitions for terms such as necklace-shaped and rope-shaped, as well as a host of terms that appear to be peculiar to individual authors, such as flag-apparatus, paronychietum, and meridisk. These are all real-world terms, often cited from published authors, but of no particular value in a dictionary. We have developed our own criteria for inclusion for a reason, and if you look at a host of other dictionaries, that reason becomes quickly apparent. --EncycloPetey 15:10, 6 November 2011 (UTC)[reply]
of no particular value in a dictionary: why? Anyway, we don't select words on their value, we include all words, even rare ones. Lmaltier 06:09, 8 November 2011 (UTC)[reply]
EP, would you object to including such terms in a glossary or appendix? I don't propose that everything found in another dictionary automatically belongs in our mainspace. bd2412 T 17:19, 17 November 2011 (UTC)[reply]
I think a method of adding common collocations, vocabulary terms, set terms, and SOPs that have legitimate usefulness for the medical, legal, and other speciality fields need to have a place here somewhere.Lucifer 15:50, 27 November 2011 (UTC)[reply]
One thing we already do for some highly specialist fields is to create appendices. Equinox 15:53, 27 November 2011 (UTC)[reply]
My point is not about "specialist fields" but about idiosyncratic authors. If only one person ever used a particular term, only in obscure works, then how could it ever meet WT:CFI in a meaningful way? --EncycloPetey 17:30, 27 November 2011 (UTC)[reply]
  • Strong Support: We're not paper, there's no need to arbitrarily restrict the amount of entries we have. We're supposed to be better than paper dictionaries, not worse. I see nothing wrong with having references to Bill Cosby or Grover Cleveland or anything else that has influenced words. Keep in mind that a lot of dictionaries (including the AH, which happens to be right here next to me) often have not only the definitions we do, but some SOP definitions (for the record, SOP needs to go) and some historical/biographical definitions. But I'm perfectly fine with us having them too Purplebackpack89 (Notes Taken) (Locker) 16:10, 27 November 2011 (UTC)[reply]
But we're also supposed to be a dictionary and not an arbitrary list of encyclopaedic topics (like actors). You have Wikipedia for that. Why do you think the Oxford English Dictionary doesn't include Bill Cosby? Hint: it's not because they don't have enough paper. Equinox 16:17, 27 November 2011 (UTC)[reply]
Some dictionaries are encyclopedic, and include entries such as Bill Cosby. But we are a language dictionary (this point should be made clearer). We are a linguistic work. And it's not possible to deal with Bill Cosby in a linguistic way, because this name is composed of two words, and only these words can be addressed linguistically. Lmaltier 16:34, 27 November 2011 (UTC)[reply]
I don't understand what you mean by: it's not possible to deal with Bill Cosby in a linguistic way, because this name is composed of two words. Does this imply that you now agree with my comment above about excluding necklace-shaped? How does it pertain to other two-word entries we have like Sri Lanka? Why is it that we have Sri Lanka but not Bill Cosby? Your argument does not make this point clear. --EncycloPetey 17:30, 27 November 2011 (UTC)[reply]

Etymologies of place names

In the discussion about CFI above I mentioned that it would be useful to include the etymologies of place names even if they are small and don't meet CFI otherwise. For example there is little to be defined about a small place like w:Eersel, but its etymology would nonetheless be useful and not encyclopedic at all. So for that reason I'd like to propose that we allow the etymologies of all place names, but only their etymologies, to be entered in an appendix of some sort if they don't qualify for CFI. If there is enough support we can try a vote. —CodeCat 15:14, 22 October 2011 (UTC)[reply]

I support Ungoliant MMDCCLXIV 17:27, 22 October 2011 (UTC)[reply]
  • From what I see, current CFI does not exclude "Eersel". CFI contains no restriction on the size of a geographic entity whose name is considered for inclusion. The criteria for geographic names that were in CFI and were removed from it recently only referred to the ability of the geographic name to carry lexicographical information such as etymology and pronunciation. From what I can see, no vote is needed. For a rather small village currently in Wiktionary, see Rückingen, which has 5,800 inhabitants. For clarification: in current CFI, geographic names are governed by the section WT:CFI#Names of specific entities, which says this: "A name of a specific entity must not be included if it does not meet the attestation requirement. Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which." --Dan Polansky 18:41, 22 October 2011 (UTC)[reply]
    • In this case the question then becomes how small places meet those requirements. They ought to be in widespread use at least because anyone in the village uses the name, but usage in permanently recorded media would usually be limited to maps, surveys and legal documents (many early Dutch and German place names are found only in Latin texts). The village may be known by only a small amount of people outside it, and most place names are derived from the local dialect which is often otherwise undocumented. So in a way, place names are highly dialectal terms that are mentioned only in maps, and used only in the community for which the place is significant but for which no written language may exist. And since it was coined in the local dialect, you can also wonder what the language of a place name really is? Sometimes the official name of the place is actually an exonym and is not the name used in the place itself. As an example, Girona was until recently known only by its Spanish name Gerona, despite being predominantly Catalan-speaking, and similarly Hiiumaa in Estonia was known officially by its Russian exonym Хийумаа (Khiyumaa). All of this makes it hard to treat them the way we treat regular words, it would make more sense to group them by geographic area than by language. —CodeCat 19:07, 22 October 2011 (UTC)[reply]
      • Yes, it's sometimes difficult, but this is not a reason to deal with them in an appendix (other words may be difficult too). Village names are words, they should be addressed the same way as other words. For small places, it may happen that no other dictionary adresses them, their presence here is a real added value. And not only for their etymology: their pronunciation is very useful too, and is different in different languages. And demonyms, etc. Please keep it simple. Lmaltier 19:48, 24 October 2011 (UTC)[reply]
        • I am not saying we shouldn't allow place names in the main namespace. Many place names have useful information that could be added. But what if we include not just all words in all languages, but all place names in all countries as well? Even a small country like Luxembourg could contain thousands and thousands of place names, do we want to allow all of them? And if not, then what kind of criteria can we apply to place names? Eersel may be readily attested because it is fairly large still. But what about tiny places with only a few houses? Essentially, maps and surveys are the only things that document every place name, but they are really like dictionaries, and we have a policy against including material from dictionaries blindly. Can maps be used to attest place names, even though they are mentions and not uses? —CodeCat 20:11, 24 October 2011 (UTC)[reply]
          • Yes, there is a huge number of placenames; but it's no more difficult that including all words: we all know that it's impossible, but it's our objective. And I think that placenames on maps should be considered as uses in the language used by the map. When you write Y is a small town..., you use Y, and it's the same for maps. When you write Y is the word used in French to refer to a small town..., this is a mention. Lmaltier 21:01, 24 October 2011 (UTC)[reply]
            • What should the definitions of these places be? Should every place in Luxembourg be defined as A place in Luxembourg? —CodeCat 21:44, 24 October 2011 (UTC)[reply]
              • In my opinion, there should be an indication about where the place is + ideally a map showing its position (such a map makes the definition much clearer and more precise). But, of course, no demographic or economic data. Lmaltier 19:19, 25 October 2011 (UTC)[reply]
  • I would like some more opinions from others on this if that's possible. I would like to start adding place names but I don't want to get in people's way if I do that. —CodeCat 19:50, 25 October 2011 (UTC)[reply]

{{look}}

AFAICT place names are all acceptable under the CFI. However, IMO places aren't. Just their names. In other words, not every place named Eersel gets its own sense line. Rather, if there is more than one Eersel and all of them are municipalities (towns or cities or the like), then the only definition should be "{{non-gloss definition|A municipality name.}}" or "{{non-gloss definition|A name of several municipalities in the Ducth-speaking countries.}}" or the like. Or if some Eersels are neighborhoods, some cities, and some counties, then "{{non-gloss definition|A place name.}}" or "{{non-gloss definition|A name of several counties, municipalities, and neighborhoods in the Netherlands.}}" or the like.​—msh210 (talk) 15:45, 2 November 2011 (UTC)[reply]

But it would be possible for each Eersel to have a distinct etymology, in which case it would be unavoidable to have separate senses for each of them. For example, America, the continent, has a different etymology from America, a small village in the Dutch province of Limburg. And similarly for Californië in Gelderland, Nederland in Texas, Colorado and even the Dutch province of Overijssel (so there is a Nederland in Nederland). —CodeCat 16:02, 2 November 2011 (UTC)[reply]
I've just (once again) come across this section, so will answer now, though belatedly. We can have "===Etymology=== The name of the capital of England is from whatever. Most other cities were named after the capital of England; however, the town in San Serriffe was named after James London, who discovered the islands."​—msh210 (talk) 22:47, 7 December 2011 (UTC)[reply]
Re: "If there is more than one Eersel and all of them are municipalities (towns or cities or the like), then the only definition should be [...]": No such agreement has been reached, AFAIK. Thus, more definition lines of municipalities are accepted in the entry for "Paris", while, at the same time, not every place needs to have a dedicated definition line. The last thing on which we have agreed at least for a limited period of time until it was removed from CFI again was this: 'If the name is shared by several places, some of the places bearing the name can have a dedicated sense line, while other ones can be covered under a summary sense line such as "Any of a number of cities in Anglophone countries"', per this revision. --Dan Polansky 16:03, 2 November 2011 (UTC)[reply]
I've just (once again) come across this section, so will answer now, though belatedly. I never said there was such agreement: I said "IMO". (See [[IMO]].)​—msh210 (talk) 22:47, 7 December 2011 (UTC)[reply]
  • We already have multiple etymologies for multiple places with the same name. Christchurch is a good one. SemperBlotto 16:08, 2 November 2011 (UTC)[reply]
    • The way the etymologies are written in Christchurch is not how it's normally done on Wiktionary. Is it how we want to do it? —CodeCat 16:30, 2 November 2011 (UTC)[reply]
      • I think the etymology of Christchurch should above all mention the words Christ and church - it's not necessarily transparent to a student in China. Why each place received this name might be called encyclopedic. In any case explaining after which Matti every one of the 50+ Finnish places named Mattila were named would be out of question. I've defined common placenames as "Any of a number of places in Finland", with a separate definition if some place is particularly important. I don't think entries for even small villages are likely to be deleted if they contain a good etymology and pronunciation, but I'd be wary of creating entries for compound words (X River, South X) and for names of minor places without anything else than a definition.--Makaokalani 17:09, 3 November 2011 (UTC)[reply]
        Why a name was given to a thing or to a place is not encyclopedic, it's etymologic. would be out of question: why? Inhabitants would be happy to find the etymology of the name of the place where they live. And don't forget associated demonyms (e.g. in French, there are many places named fr:Beaulieu sharing the same etymology, but not the same demonym). Lmaltier 21:04, 8 November 2011 (UTC)[reply]
        to Msh210: each sense of a word deserves its own definition. Some words have an etymology but no actual sense, such as surnames or 1st names. But placenames have senses. Nobody is obliged to add all these senses, but they should not be removed when present: they are useful, and sometimes required for etymologies (sometimes), pronunciation (sometimes), translations, demonyms... Lmaltier 21:13, 8 November 2011 (UTC).[reply]

Yes, we should have entries for the words which make up place names, especially to include their etymologies.

But no, specific signified places are not “senses” to be defined. The dictionary entry Paris oughtn't be gazetteer of three dozen specific cities, towns, counties, and neighbourhoods (which is rightfully at w:Paris (disambiguation)#Geography), any more than the surname Smith should be a phone directory of a few million specific people (the prominent ones go in w:List of people with surname Smith).

The origins and etymologies of these words and names—toponymy and onomastics—belongs in the dictionary. But the identities and locations of each of these places, who named them, and when, and why—geography and history—not so much. Michael Z. 2011-11-09 02:49 z

Don't you feel the difference with surnames? The sense of a surname might be anybody with this surname, but there is no real sense. Placenames have senses (most often one or two). And everything related to the names (including their etymologies and their senses) belong here. Lmaltier 06:07, 9 November 2011 (UTC)[reply]
That's not true. A famous individual will commandeer a surname just as a famous city will. The placename Paris immediately makes one think of the capital of France, regardless of how many cities, towns, villages, and hamlets bear that name. Likewise, Darwin, Tolstoy, Picasso, Bismarck, Rousseau, and Gandhi all immediately make one think of a particular individual, despite the number of other people bearing each surname. I don't see the difference you claim exists, and I don't believe there is such a difference except in terms of temporal scale and biological processes (places usually last longer than individuals and reproduce themselves far less often). --EncycloPetey 17:40, 27 November 2011 (UTC)[reply]
Yet, a surname is associated with a family (this is the sense, more or less, although the same surname and the same etymology may be shared between several families), not to individuals. When a name is associated to an individual (e.g. Confucius, it should be included. A placename is associated to a place, to a specific entity. This is the sense. Places don't reproduce themselves. Lmaltier 06:25, 28 November 2011 (UTC)[reply]
Places don't literally reproduce themselves but people might name a place after another place. That's how Harlem was named, for example. —CodeCat 11:28, 28 November 2011 (UTC)[reply]
Lmaltier, you are generalizing rules based on some common cases, where they don't really exist. Not all people in all cultures get a surname from a family. Look at w:Icelandic name, for example, in which each person gets a new surname from their same-gender parent's personal name. And conversely, some place names do get inherited, e.g., w:North Kildonan, Winnipeg is a city ward that takes its name from the former super-entity, the City of West Kildonan, which kept its name from when it had been the R.M. of West Kildonan, previously split off from the R.M. of Kildonan, which inherited its name from whence its first European settlers came in Scotland.
So if there is some fundamental distinction between the names of all persons and all places, this isn't it. Michael Z. 2011-12-07 03:57 z
I know that family names are not used in Iceland, only patronyms. And the definition for patronyms should be name given to the son (or daughter) of somebody with ... as his first name.
Places names are created for each place, they are not taken from a set of available names, even if this name may be taken from another place. The only possible definition is the place. This is the difference. Lmaltier 21:27, 7 December 2011 (UTC)[reply]
Around here we have the town of Churchill, Churchill Drive, Churchill Drive Park, Churchill High School, Churchill Park Church, &c. There are thousands more Churchills, worldwide, including places, people, inanimate things, and organizations. I don't believe any of these specific entities should be mentioned in the dictionary, much less “defined.” Their naming is the result of the productive use of one toponym, originating independently in several places having churches on hills, and widely known thanks especially, but not exclusively, to one statesman. That's the information about this name that the dictionary should present. Michael Z. 2011-12-11 22:30 z
Place names are allowed - we had a few votes, let's not open again this can of worms. Etymologies are always good to have but this should not be the rule to have them, it's not always known. --Anatoli (обсудить) 22:45, 7 December 2011 (UTC)[reply]

Definitions versus Descriptions

Dictionaries are commonly supposed to contain definitions; but most dictionaries seem to contain merely descriptions. For example, the Wiktionary definition of chaconne is currently "A slow, stately Baroque dance"; but all slow, stately Baroque dances are not chaconnes.

So my question is: should such descriptions be expanded to be as specific as possible, or would that be considered undue clutter? Paul Magnussen 20:59, 22 October 2011 (UTC)[reply]

  • Unhelpfully: ‘it depends’. That definition could probably use a little more detail, but in general if you need to write more than one sentence, then it's probably too much. Where Wiktionary editors will get jittery is where a definition becomes ‘encyclopaedic’, but that is of course a subjective line. Ƿidsiþ 21:05, 22 October 2011 (UTC)[reply]
    • At some point it becomes too hard to define a term, because the specific defining characteristics aren't easy to write out in a single sentence. At that point referring to Wikipedia becomes a good alternative. —CodeCat 21:12, 22 October 2011 (UTC)[reply]
  • (See also [[user:msh210/specificity]] and its talkpage.​—msh210 (talk) 06:53, 23 October 2011 (UTC))[reply]
  • In a way, I'd say don't worry about it. In a sense, a definition is a description which has only the information necessary and sufficient to describe something completely, but we have dictionary entries, not true definitions. I think a dictionary is where the grammatical features belong, and the real-world description should be an abbreviation of a complete encyclopedic one, with the knowledge that the encyclopedia has the responsibility for that. Real-world descriptions in a dictionary and an encyclopedia tend to overlap, but grammatically, words have more discrete states, such as person (first person, second person), tense (past, present), mood, aspect, being countable or uncountable, and whatever else the grammar provides for. This information, grammatical information, especially about verbs, is hard to find in an encyclopedia. For example, if you look up "marries" in WP, you get redirected to the noun "Marriage". The "Third-person singular simple present indicative form of marry" definition is the type of information that should have a complete treatment here, and a full understanding of marriage is not our problem. Haplology 08:12, 23 October 2011 (UTC)[reply]
I concur. Generally speaking, we cannot achieve perfect definitions. But we can (and should) in some cases, when it's easy, e.g. by providing the full mathematical definition of topological space, or the scientific name as a complement of the definition of Atlantic salmon. It makes the definition unambiguous, even if it does not help everybody... Lmaltier 20:52, 24 October 2011 (UTC)[reply]

“Chaconne — a slow, stately Baroque dance.” Is it

  • Any slow stately dance of Baroque Europe?
  • A particular style of Baroque dance that is slow and stately?
  • A specific Baroque dance which happens to be slow and stately?

If we can't answer this question, then we can't even know whether the existing definition is defining or descriptive. This is as much a problem of syntax and grammar as it is of the facts provided. Michael Z. 2011-11-09 00:57 z

On my talk page, a user suggested that {{figuratively}} should redirect to figurative instead of the other way around. What do we think? {{literal}} does the same thing, in that it redirects to {{literally}}. Mglovesfun (talk) 10:39, 23 October 2011 (UTC)[reply]

See also template talk:figurative and template talk:figuratively.​—msh210 (talk) 16:57, 23 October 2011 (UTC)[reply]
Neither. Each displays its pagename in the context label, so they have different uses. They should categorize identically and display differently. If, however, the community disagrees with me and decides, as Martin says someone suggests, to swap the redirect, then we'd better not do so before checking uses of the templates: we don't want (e.g.) "(literally or figuratively)" to become "(literally or figurative)".​—msh210 (talk) 16:57, 23 October 2011 (UTC)[reply]
Keep them the way they are. The user seems to have very strong feelings about it but didn't provide any reasoning. I can't imagine what it would be, and I for one don't share their feelings about it. Both labels make sense to me and both figurative and figuratively sound fine in my opinion. One could imagine an entry which is figurative in its main sense and also has another figurative sense, being a second level of figurative-ness if you will. In that case figuratively is better--the main sense is figurative and another is meant figuratively. Haplology 17:36, 23 October 2011 (UTC)[reply]
But the way they are doesn't allow that: both display as "figuratively".​—msh210 (talk) 18:07, 23 October 2011 (UTC)[reply]
I'd be happy to allow {{figurative}} as a separate context label to figuratively. Mglovesfun (talk) 10:38, 24 October 2011 (UTC)[reply]
On reflection, I'm not sure there is value in having them separate. A word could be figurative and hence used figuratively, so really they're the same thing. Since neither categorizes, it barely matters. Mglovesfun (talk) 12:37, 25 October 2011 (UTC)[reply]

Hello all, this is an announcement about the latest installment of our user competition, to tie in with Halloween. It is all about writing a short Halloween story, and hopefully we can give some of the most common words here some much-needed improvement. Let me know if you think some things should be altered or added. Sign up at Wiktionary:Halloween Competition 2011. --Rockpilot 09:17, 24 October 2011 (UTC)[reply]

Not sure I have the time or the inclination for another competition. Can't we wait until Xmas? Mglovesfun (talk) 10:41, 24 October 2011 (UTC)[reply]
Not with that theme. Halloween stories are better than Christmas stories. --Daniel 23:19, 24 October 2011 (UTC)[reply]

Wonderfool (Rockpilot)

Just for everyone's info, I've blocked him again. This was the immediate trigger [7] but I think people may agree that he was gradually making more and more trouble, and not really contributing much of use. Equinox 21:51, 24 October 2011 (UTC)[reply]

(spectation) How is "Harvey" an example phrase of spectation/regard? It looks like a mistake to me. If it is a mistake, then Rockpilot was right to remove it. If it turns out to be correct, I think it needs an explanation, because it is meaningless to me. —Stephen (Talk) 22:50, 24 October 2011 (UTC)[reply]
The entries I have been adding are from Webster 1913 (though I have been checking to ensure that the words are reasonably attestable). With that dictionary, the habit was to name a major author who had used the word, and not necessarily to include the quotation (due to limitations of space and printing techniques, I suppose). This obviously isn't perfect for us, but I think they are worth including as an easy way to find a citation for a word (search Google Books for the word and the given author). I am adding the Webster entries in a semi-automated way (involving a significant initial effort on my part in writing Webster-to-Wiktionary conversion code) and felt these were worth keeping. Ideally people would be improving these entries by finding the named citations and including them in full. Rockpilot is not interested in doing this kind of work and would rather create deliberately divisive competitions and rude comments. I am certain that he only removed the citation names from my entries in order to cause trouble. If he wanted to do useful work he could have found the citations. Equinox 22:57, 24 October 2011 (UTC)[reply]
Just to make the point, I have found the Harvey citation in question and added it (see spectation) though I can't identify the original publication. Can you? Would you like to add it? Or perhaps everyone would rather add ridiculous phrasebook entries for do you love transsexuals?. Dorks. Equinox 23:02, 24 October 2011 (UTC)[reply]
I have the same reaction as Stephen. "Harvey" is intelligible to those who know it is a placeholder for a quotation by Harvey, but to others, it is indistinguishable from vandalism — it looks like someone named Harvey inserted his name into an entry. If an IP had removed it, I would assume the IP was acting in good faith, removing apparent vandalism! Of course, RP/WF knew it wasn't vandalism, and probably knew it was a useful placeholder. - -sche (discuss) 23:11, 24 October 2011 (UTC)[reply]
Then what do you suggest? Perhaps we need a new template. Please make it. Equinox 23:14, 24 October 2011 (UTC)[reply]
I see no reason to think that Rockpilot meant to make any sort of trouble by removing it. I think he was acting in a perfectly reasonable manner. It was incomprehensible to me, and if I had seen it, I would have removed it myself.
I don’t know anything about what is involved in the semi-automated addition of entries, but maybe these citation names could be hidden from view and a template added asking for help finding the citation.
At the very least, I would unblock Rockpilot, because I don’t think he had anything but the best of intentions in this particular matter. —Stephen (Talk) 23:20, 24 October 2011 (UTC)[reply]
I agree with Stephen. I have unblocked him. He has helped me a lot with German conjugations! -- Liliana 23:25, 24 October 2011 (UTC)[reply]
Rockpilot cannot claim ignorance of what the underlying phenomenon (material taken from Websters), as I had explained it and their practice for noting the author of a usage citation or a dictionary that had included the term. I think it was in the context of a previous deletion.
The existence of bare surnames and surnames associated with usage examples is quite widespread among the many English entries that have not been much revised since being copied from Webster 1913. DCDuring TALK 00:24, 25 October 2011 (UTC)[reply]
I think a template is a good idea; we already have a category. I've created {{rfquotek}} ("k" stands for "known", because we "know" a bit of information, either the person or the work we want a quotation from). It takes the name of the person or work as its first and only parameter, and is used like this. Feel free to improve and/or rename. - -sche (discuss) 23:48, 24 October 2011 (UTC)[reply]
The template is handy, but even handier would be a reliable means of identifying the entries and sections therein likely to need such a template. As the process of bringing such citations to our standard doesn't seem to be something many contributors find worth undertaking, it would seem we need a cleanup list for these things, on which the few contributors who are willing to do this can focus their efforts. DCDuring TALK 00:24, 25 October 2011 (UTC)[reply]
WT:Abbreviated Authorities in Webster lists all the author abbreviations used (and their fuller forms). The list could be used to search for and templatize all the occurrences within entries. --Bequw τ 13:12, 25 October 2011 (UTC)[reply]
I would omit the surnames, and add {{R:Webster 1913}} to every imported entry instead, thereby making the surnames available one click away. Otherwise, the request templates are likely to sit in Webster 1913 entries for ages, providing close to no useful information to readers of Wiktionary. But even if the surnames stay, kudos to Equinox for doing the import! --Dan Polansky 07:35, 25 October 2011 (UTC)[reply]
No hard feelings, Eq. And thanks for the unblock, Liliana. They were all good-faith edits, and I'm glad something positive is appearing as a result of my controversial edits. It's not the first time, and I'm sure it won't be the last time I do something controversial here either! --Rockpilot 07:53, 25 October 2011 (UTC)[reply]
Outcome: I am using the new template on surnames (thanks, -sche) and I already put the Webster template on all these entries (unless the word seems remarkably common and I can't believe we don't have it). Equinox 22:56, 26 October 2011 (UTC)[reply]

Template:ja-forms - what is this for?

I just ran across and was confused to see that the Japanese entry includes kanji boxes for "simplified" and "traditional" forms of the kanji -- both of which are only relevant in a Chinese context. I was surprised to find that this box is put in by the {{ja-forms}} template. Since simplified and traditional are in fact not Japanese forms, I'm strongly tempted to remove this template from those Japanese entries that use it and placing it in the ==Translingual== section of those pages, where it appears to belong. By way of reference, and (neither of them character forms used in Japanese) place this template in the ==Translingual== section. I'm also tempated to edit the template slightly to clarify that "shinjitai" is for Japanese use, and that "simplified" and "traditional" are for Chinese use.

Does anyone else have an opinion on the matter? And perhaps the template should be moved to a more appropriate name, since this cross-referencing of Chinese character forms is applicable to all Chinese character entries that have alternate forms, not just Japanese? -- Eiríkr ÚtlendiTala við mig 15:26, 26 October 2011 (UTC)[reply]

I support moving it to the Translingual section and renaming the template. - -sche (discuss) 19:24, 26 October 2011 (UTC)[reply]
Me too. Haplology 16:09, 27 October 2011 (UTC)[reply]
Certainly don't keep it as it is. However, do Translingual CJKV characters have traditional and simplified forms? I think not. Mandarin does, for other Chinese languages I don't know. Unless I'm very very wrong, Japanese, Korean and Vietnamese do not have traditional and simplified forms. Mglovesfun (talk) 16:11, 27 October 2011 (UTC)[reply]
No, Mglovesfun, you're right, Chinese has simplified and traditional, Japanese has shinjitai and kyūjitai, with kyūjitai in Japanese writing basically the same as Traditional Chinese. I'm not aware of any simplified characters specific to Korean; South Korea at least seems instead to just be phasing out Hanja characters in their entirety, which simplifies education considerably but does regrettably reduce shared written vocabulary with the rest of the Chinese script world. Vietnamese is wholly outside of my realm of expertise.
Extrapolating a bit, my suspicion is that the Japanese-speaking (-reading?) editors who created {{ja-forms}} might have been responding to the use of {{Hani-forms}}, which is generally added to the ==Translingual== section, as over at . This makes sense since these forms are used (at least historically) across the breadth of the Chinese writing for all dialects. Perhaps {{ja-forms}} was intended to expand upon this to include all (or at least more) attested forms of a Chinese character? The Simplified Chinese page has no Japanese entry (nor should it), and it uses {{ja-forms}} to point to both the Traditional Chinese form of and the Japanese shinjitai form of ; this begins to make sense to me from the perspective of Chinese character forms being akin to alternate spellings, and thus a Chinese character entry (in any language) should list these alternates, just as we have both colour and color.
That said, using the ja- prefix on the template name seems misguided at best. I'd be happy to just make sure that {{Hani-forms}} can handle shinjitai and kyūjitai, and replace all calls to {{ja-forms}} and then delete it. And possibly tweak {{Hani-forms}} and its documentation to specify use only for single-character entries. What say you all? -- Eiríkr ÚtlendiTala við mig 21:13, 27 October 2011 (UTC)[reply]
I support Eiríkr Útlendi's proposal above. However, as for kyūjitai, I'm not sure how much it would make sense to have another cell for it in the new {{Hani-forms}}, since for almost all the characters a kyūjitai form is identical to its traditional form (at least when represented as a Unicode character). I believe it would suffice to add a note like 'traditional (or kyūjitai)', having an optional parameter kyujitai=x to separate the two for the very rare cases of disagreement. --Whym 04:15, 4 December 2011 (UTC)[reply]

Terms of Use update

I apologize that you are receiving this message in English. Please help translate it.

Hello,

The Wikimedia Foundation is discussing changes to its Terms of Use. The discussion can be found at Talk:Terms of use. Everyone is invited to join in. Because the new version of Terms of use is not in final form, we are not able to present official translations of it. Volunteers are welcome to translate it, as German volunteers have done at m:Terms of use/de, but we ask that you note at the top that the translation is unofficial and may become outdated as the English version is changed. The translation request can be found at m:Translation requests/WMF/Terms of Use 2 -- Maggie Dennis, Community Liaison 00:42, 27 October 2011 (UTC)[reply]

I don't think you need to apologize for English on this project! — lexicógrafa | háblame02:31, 27 October 2011 (UTC)[reply]
That's alright. We forgive you. --Daniel 08:24, 27 October 2011 (UTC)[reply]
I'm afraid it's completely incomprehensible. Do we have any English-to-Wiktionarian translators around? --Yair rand 16:13, 27 October 2011 (UTC)[reply]
Luckily for me, I have an automatic translator. I know they are unreliable and whatnot, but the English-to-English option is perfect. It's flawless, I swear. I understood everything. --Daniel 17:44, 27 October 2011 (UTC)[reply]

Categories of names 3

Wiktionary:Votes/2011-10/Categories of names 3 started. --Daniel 08:25, 27 October 2011 (UTC)[reply]

Moving large discussions to subpages

A problem I often have with discussions in the Beer Parlour and related rooms is that the pages often get very long. Once more new discussions are added it becomes hard to keep track of all of them because of all the scrolling involved. We do archive discussions but that doesn't always help because there is just too much in between. So I wonder if it would be a good idea to move larger discussions to subpages, and link to them from the main page? That way, the BP is kept clean and another advantage is that you can watch that discussion page, which is much more effective in following a discussion than watching all of the BP at once. Perhaps a system similar to how WT:VOTE works? —CodeCat 15:53, 27 October 2011 (UTC)[reply]

For information, fr.wikt uses a different subpage for each month. Lmaltier 20:03, 27 October 2011 (UTC)[reply]
I have no strong objection to either of those methods. I do worry that sorting pages by month will lead to some confusion as to where a current discussion is ongoing, so I think making subpages for discussions that become very long is the better solution. bd2412 T 20:27, 27 October 2011 (UTC)[reply]