Wiktionary:Votes/pl-2014-04/Unified Chinese

Unified ChineseEdit

  • Voting on: Treating the various Chinese varieties (Mandarin, Cantonese, Wu, Min Nan, Min Dong, etc.) under a single header, "==Chinese==". Re-introducing the language code "zh". (See Wyang/歷史 for an example of how Unified Chinese entries may look.) Changing the way translations are formatted to:
* Chinese: {{t+|zh|北京}}
*: Mandarin: {{t+|cmn|北京|tr=Běijīng}}
*: Cantonese: {{t|yue|北京|tr=bak1 ging1}}
*: Min Nan: {{t+|nan|北京|tr=Pak-kiaⁿ}}
  • Rationale:
  1. There has been an overrepresentation of Mandarin at Wiktionary (20467 Mandarin nouns, 317 Cantonese nouns, 10 Wu nouns), but 99% of the Mandarin content here is actually cross-topolectal, not restricted to Mandarin. The reason for the marginalisation of other varieties is that it is practically troublesome and unnecessary to have to duplicate everything (templates, simp-trad tables, etymologies, definitions) except the pronunciation for all 17 ISO-coded Chinese topolects. The proposed format will produce a more succinct format and allow editors to add more regional Chinese content (i.e. coverage of Chinese topolects) by providing pronunciation to a written word shared across Chinese topolects, which accounts for about 99% of the Chinese-language corpus.
  2. The remaining 1% of words, which are never used in Standard Chinese (Mandarin), can still be handled by common templates and modules (the pronunciation and other information will only be given for that word in a given dialect). See User:Atitarev/佢哋 for a Cantonese-only example entry (an example of a word which is not used in formal writing even in Cantonese speaking areas).
  3. Difference in senses/usage will be achieved by using context tags, usage notes, etc. E.g. the word User:Wyang/告白, which has different senses in different topolects:
# [[announcement]], [[public]] [[announcement]]
# [[expression]] of one's thoughts; especially, [[declaration]] of love, [[confession]] of one's feelings towards someone
# {{cx|Cantonese|regional Mandarin|Min Nan}} [[advertisement]], [[ad]]
  1. This vote only concerns words written in Han/Chinese characters (Hanzi). Words written in non-Han scripts devised specifically for particular topolects may keep their topolect headings. This applies to Cyrillic Dungan, Xiao'erjing Mandarin (in Arabic script) and POJ romanisation of Min Nan. The formats of templates in the examples above are not the subject of the vote; they can be discussed separately if needed.
  2. . The voters only vote on the proposed action, not on the rationale.
  • Background: The topolects were separated mainly because differences in their pronunciation of the same words/characters are very large. This change will allow us to add multiple pronunciations in the same entry.
  • Vote starts: 00:01, 30 March 2014 (UTC)
  • Vote ends: 23:59, 28 April 2014 (UTC)
  • Discussion:


  1.   Support --Anatoli (обсудить/вклад) 10:49, 30 March 2014 (UTC)
  2.   Support Wyang (talk) 22:49, 30 March 2014 (UTC)
  3.   Support --kc_kennylau (talk) 09:00, 31 March 2014 (UTC)
  4.   Support JamesjiaoTC 20:14, 31 March 2014 (UTC) Although I do not support the idea of continuing the practise of specifying the HSK level of a compound. We could probably deal with this in a later vote.
    HSK is just a category. I prefer to keep them in categories too (they need a clean-up). --Anatoli (обсудить/вклад) 22:59, 31 March 2014 (UTC)
  5.   Support DTLHS (talk) 23:00, 31 March 2014 (UTC)
  6.   SupportMr. Granger (talkcontribs) 23:38, 1 April 2014 (UTC)
  7.   Support A step long overdue. -- Liliana 22:36, 3 April 2014 (UTC)
  8.   Support -- but only weakly, because I personally don't have a lot to do with Chinese entries outside of pointing to them in etymologies, and thus I feel that my vote should have less weight. ‑‑ Eiríkr Útlendi │ Tala við mig 23:40, 4 April 2014 (UTC)
  9.   Support -- I'm not seeing any problems with the proposed implementation. Bumm13 (talk) 19:09, 16 April 2014 (UTC)
  10.   Support -- I support this proposal in theory (after all, "Mandarin" dictionaries don't really exist in the Chinese world), but I hope the technical side of this can be worked on, as though I am a prolific Mandarin editor, I would be unable to help with that as I have no relevant knowledge or skills. ---> Tooironic (talk) 22:45, 23 April 2014 (UTC)
    @Tooironic Thank you for the support. Prolific Mandarin editors will need to help to use the new way of using various Chinese topolects under one L2 header "Chinese". It's a big change. --Anatoli (обсудить/вклад) 00:01, 24 April 2014 (UTC)
    Do you mean that you would never expect to see a Cantonese-Mandarin dictionary like this one? DAVilla 05:39, 27 April 2014 (UTC)

  Support —Stephen (Talk) 13:02, 28 April 2014 (UTC)


  1.   Oppose - strongly harms reusability of data; is probably not how readers use en.WT in practice. - Amgine/ t·e 02:03, 31 March 2014 (UTC)
    @Amgine Could you please elaborate and give one or two examples where it harms reusability of data? In my opinion is quite contrary. Note that current Cantonese, Wu and other topolect speaking editors favour the change (as in the latest discussion topic linked above). On your second point, changing the existing practice is usually what the vote is for. --Anatoli (обсудить/вклад) 02:14, 31 March 2014 (UTC)
    • The following sinitic wikipedias exist:
      • zh
      • zh-yue
      • zh-min-nan
      • gan
      • wuu
      • zh-classical
      • hak
      • cdo
      A current effort to create localized captchas for these languages faces challenges due to script issues. However, were these languages collapsed under a single heading the multi-script result would likely make the data too expensive to work with. We are facing this exact issue with en.Wiktionary's decision to collapse all SerboCroatian languages under the family L2. Bluntly, decisions like these, made for the convenience of editors but not for technical requirements, make the data less desirable or useful because of the increased costs and manipulation required.
    • Changing the current practice of editors should not change the practice of readers. What we understand of en.Wiktionary readers is most are not native English speakers; many are looking for a specific word in a specific non-english language for the purpose of translation or learning. Our readers are not linguists, or logophiles: they are students. It is likely they do not care for or know how to interpret the highly-technical information we currently present. If I understand this proposal, the complexity will likely stay the same, or slightly increase. However, the ability to find a word in a known language will be more difficult, even impossible in some cases if the implementation is similar to SerboCroatian. - Amgine/ t·e 04:51, 31 March 2014 (UTC)
      • @Amgine and any others -- Amgine states, “What we understand of en.Wiktionary readers is most are not native English speakers...” Where is such usage information available? I ask not to dispute anything here, but out of a broader curiosity regarding our audience. Solid usage data would be invaluable in making many different decisions about what we do here. ‑‑ Eiríkr Útlendi │ Tala við mig 05:36, 27 April 2014 (UTC)
      • I think you have misunderstood the proposal. The 'Mandarin' is already multi-script (simplified and traditional) pre-merger as it currently stands. Merging them applies to Chinese character-scripted words only, as stated in the rationale. There will not be any increase in script complexity or diversity, and consequently the proposal does not incur a greater difficulty in data processing. Similarly, the proposal does not result in a greater difficulty in data retrieval, as all information (forms, pronunciation, definitions, etc.) is stored in the respective section, more visually accessible (succinct) and complete than if information is fragmented as is now. Wyang (talk) 04:57, 31 March 2014 (UTC)
    @Amgine. Thanks for clarifying your position. I disagree with you completely, though. It seems to me that the complexity you describe is of type: "I can't find a Cantonese or Croatian word in this dictionary, since I refuse to accept to look for them under "Chinese" or "Serbo-Croatian" header, I refuse to accept these names", to put it sarcastically :). Your perspective is rather political, IMHO, not really related to real user requirements. I assure you that Cantonese, Wu and Min Nan users will be happy to find entries like 歷史, it has everything needed for a good entry. Since you mention Serbo-Croatian unification, all opposition to the unified approach was mainly nationalistic, not linguistic, there is no point in duplicating all info in Serbo-Croatian entries from 3 to 8 times. I agree that the complexity of our entries are not for everyone. We may not help to address your concern if my suspicions are correct. --Anatoli (обсудить/вклад) 05:00, 31 March 2014 (UTC)
    I'm quite sure that a Min Nan reader would be happier to find le̍k-sú than the equivalent in Chinese characters. That's the script in which the largest work in Min Nan is written, namely the Min Nan Wikipedia. I'm also a bit concerned that, in the example, you're treating romanizations as pronunciation. Have you conflated in your mind the idea of transliteration and oral production? Not everything works like pinyin: sometimes other scripts are actually used in writing. DAVilla 19:23, 26 April 2014 (UTC)
    Just to settle your concerns, as far as I am aware (and I am an amateur genealogist) I have no ethnic connections to anyone in the regions where Serbo-Croation languages are natively spoken, nor fiscal, academic, or political connections either. C'mon, I'm a Merican; I probably can't find any such state on a map. However, I *do* have a programming problem creating separate dictionaries for sr, hr, sh, and bs captchas from en.WT dumps. If you have a simple solution to such a problem, I would very much appreciate knowing what it is. (Incidentally, a simple transclusion of a subpage would obviate any maintenance issues. I'd make a snarky, sarcastic, and probably wildly ignorant comment here to illustrate my point but I prefer to remain civil even when my peers are engaging in verbal attacks.) - Amgine/ t·e 06:27, 31 March 2014 (UTC)
    Sorry, if I sounded as if I were attacking you, no, I thought your reasons for objections were different. In any case, where did I attack you. Perhaps you could consider gains for topolect learners/ achieved by the proposed change. I can't help you with your immediate query but many problems can be solved, when asked nicely:). We have quite a bunch of technically skilled people and achieving what you're trying to achieve may be possible. I'll wait for others to address the technical side. --Anatoli (обсудить/вклад) 06:47, 31 March 2014 (UTC)
    But, you see, it is now a problem to be fixed which was not a problem previously. Now we need to add back the language identification which was removed. It may be more convenient for the editors, but there are (and were) technical solutions which would have the same effect for the editors while not creating the problem for the future. - Amgine/ t·e 07:08, 31 March 2014 (UTC)
    Have you actually seen the count of current non-Mandarin entries at Wiktionary? The number is nowhere near a dictionary level. No serious analysis or study of a Chinese topolect can be done with the current level of non-Mandarin presence. My point is, you claim you're losing a lot of valuable information. You may want to save the bits and pieces we currently have but it's not much right now but there will be more.
    Like with Serbo-Croatian, any written Chinese term is also applicable to any Chinese topolect written in Chinese characters in one form or another (script and dialect differences are taken care of with Serbo-Croatian, which was unified YEARS AGO), even if some terms may be considered literary, formal, rare or regional, in short, a list of Chinese words is a list of e.g. Wu words as well. That's why I thought you were going to talk about the small number of topolect-specific terms or differences in senses. A future technical solution I referred to is for something yet to be created, such as Cantonese, Min Nan, Wu pronunciations and labels (regional, formal or specific to a topolect, which is rare) - listing dialects that have some contents, and creating structure for any other. --Anatoli (обсудить/вклад) 07:43, 31 March 2014 (UTC)
    As previously, we will have to agree to disagree. You wish to remove data from the set, which will create problems with the data reuse in the future. I have just demonstrated that the previous action has an ongoing negative impact. Yes, that data may be rebuildable from the existing templated metadata, but you have no idea how and you just wave your hands and hope someone, somehow, can do so. I have looked at the templates and I do not agree; I believe only those words which existed previous to unification could, by searching back through revisions (and therefore possibly including errors which were later corrected), be properly assigned. Until you can show me that this maneuver is technically required, I will assume it is not, and merely a personal preference which harms the existing data completeness. - Amgine/ t·e 13:23, 31 March 2014 (UTC)
  2.   Strong oppose. Many of these languages are more closely related to Middle Chinese than they are to each other. The treatment of "Chinese" as a single language is a political maneuver by a heavyweight government, one that claims more territory than it even has administrative control over. Using a single header will be a clear indication to non-Mandarin Chinese editors that they are not welcome here. The only good thing I can say about this proposal is that it would be consistent with the grouping under translations, but on the other hand Chinese is the only language family where we use these groupings. It would be more consistent still to ungroup the translations so that they correspond to our current language headings. DAVilla 19:05, 26 April 2014 (UTC)
  3.   Oppose I am changing to oppose. I found this: "Statistics is a guess by educated Chinese speakers, talking about WRITTEN Chinese in Chinese characters only", written by Anatoli, now on the talk page of the vote. After this redefinition of what statistics means, this looks very suspect. There is no sourcing provided for any claims made by the supporters of the vote. --Dan Polansky (talk) 19:19, 26 April 2014 (UTC)
    I'm sorry for the confusion I caused. There is a list of Hong Kong Supplementary Character Set -HKSCS, which is comparably small. Among the characters there are many colloquial particles and characters used to write names. Any discussion, article or post is actually struggling to highlight the difference between dialects, using the same known words. --Anatoli (обсудить/вклад) 06:06, 27 April 2014 (UTC)
    The differences can't be large simply because written Chinese is understood by all dialects speakers, not only because they know Mandarin as well but because each character has local pronunciation, so they just read words in their way. @Wyang could you help provide statistics on reused written vocabulary among Chinese topolects. It may not change the votes but it's useful to have, anyway. --Anatoli (обсудить/вклад) 09:36, 27 April 2014 (UTC)
  4.   Oppose For simplicity and consistency with fundamental principles of the project: language headers for all languages should be accepted. I have nothing against using the zh code, in addition, for practical reasons, but sections with other codes should not be deleted, and people should be allowed to create new ones. The same applies to Arabic. If Mandarin, etc. sections are deleted, it's more than probable that they will be recreated in the future, when coming back to a sounder policy. Lmaltier (talk) 09:00, 27 April 2014 (UTC)
    Mandarin, Cantonese, etc. apply, first of all, to pronunciations or readings of the characters. --Anatoli (обсудить/вклад) 09:36, 27 April 2014 (UTC)
    This is also the case for French and English for many, many words, such as most -tion words (e.g. interjection). And some written words are shared by many languages (e.g. Nepal). The only difference here is that this is more systematic for Chinese languages. Other possibly different data: homophones (?), paronyms (?), examples (when a regional word does not exist in all Chinese languages, examples including this word don't apply to all languages), quotations (if a writer is known to use a given language, his quotations should be considered as applying to this language only), etymology (of the pronunciation...), usage notes, definitions and translations (when there are slight differences in senses depending on the language)... In other words, all sections (or almost all sections) may be different in some cases. Please, let's keep sound, simple, neutral, principles. Again, I'm not against using zh, I'm against forbidding other Chinese language codes. This might be less of an issue for Wikipedia, but we are a linguistic site, don't forget it. Lmaltier (talk) 14:51, 27 April 2014 (UTC)
    The fact that there are bad precedents does not make this decision better. And anyway, you cannot compare Serbo-Croatian, which may be considered as a single language by some people, with Chinese languages: nobody considers them as a single language. And nobody answered about concerns I mention, e.g. what about the etymology of the spoken word? Lmaltier (talk) 05:40, 29 April 2014 (UTC)
    I'm not sure I understand what your concerns are. Several of the points you raised are clearly unsubstantiated, for example "with Chinese languages: nobody considers them as a single language". Although I do not share that view, the lede in Chinese language could easily disprove your argument. You used this claim to argue that the Serbo-Croatian precedent is not citable here. Wyang (talk) 05:54, 29 April 2014 (UTC)
    I was meaning "no linguist". The page you mention compares them to the variety of Romance languages. Lmaltier (talk) 21:21, 29 April 2014 (UTC)


  1.   Abstain What does an abstain vote do? I wish there are YouTube videos that teach the Min Bei, Min Zhong, and Gan dialects of Chinese. --Lo Ximiendo (talk) 03:45, 17 April 2014 (UTC)
    An abstain vote does nothing. It's the same as not voting, but shows that you read and considered the vote. --WikiTiki89 22:52, 23 April 2014 (UTC)
    @Lo Ximiendo The current vote doesn't mean to destroy or ban any Chinese dialect. Quite on the contrary, it would be easier to add any Chinese topolect entry, which also uses Chinese characters - it doesn't matter if a term is only used in a dialect, not Mandarin. Please re-read the above and let us know if you have any questions/concerns. --Anatoli (обсудить/вклад) 23:29, 23 April 2014 (UTC)
    It's not exactly the same as not voting, in that it contributes to quorum. (We don't have a specific definition of quorum, but I think a vote with one supporter and ten abstainers saying "I'm O.K. with this, whatever, who cares" is more likely to be closed as a pass than a vote with one supporter and the sound of crickets.) —RuakhTALK 02:21, 27 April 2014 (UTC)
    But that's not because of the abstains, but because of the comments. --WikiTiki89 03:07, 27 April 2014 (UTC)
      Abstain --Dan Polansky (talk) 16:59, 25 April 2014 (UTC) I worry a bit about this, especially in light of W:Chinese_language#cite_note-5. However, I do not have the time and enthusiasm to look into the issue. Google translate has "Chinese" and Bing Translator has Traditional Chinese and Simplified Chinese, so let us hope this is workable, reasonably accurate and fair, and the supporters of this vote know what they are speaking of. Here is the note 5 of Wikipedia:
    Several authors note that Chinese varieties are as diverse as a family of languages:
    • David Crystal, The Cambridge Encyclopedia of Language (Cambridge: Cambridge University Press, 1987), p. 312. "The mutual unintelligibility of the varieties is the main ground for referring to them as separate languages."
    • Charles N. Li, Sandra A. Thompson. Mandarin Chinese: A Functional Reference Grammar (1989), p. 2. "The Chinese language family is genetically classified as an independent branch of the Sino-Tibetan language family."
    • Norman, 1988, p. 1. "[...] the modern Chinese dialects are really more like a family of languages [...]"
    • DeFrancis, 1984, p. 56. "To call Chinese a single language composed of dialects with varying degrees of difference is to mislead by minimizing disparities that according to Chao are as great as those between English and Dutch. To call Chinese a family of languages is to suggest extralinguistic differences that in fact do not exist and to overlook the unique linguistic situation that exists in China."}}
    --Dan Polansky (talk) 16:59, 25 April 2014 (UTC)
    Your concerns are well founded. DAVilla 18:58, 26 April 2014 (UTC)
  2.   Abstain. Mandarin is a fruit to me. This should be decided by knowledgeable editors. --Vahag (talk) 11:51, 27 April 2014 (UTC)

Point of orderEdit

  • It's not clear to me that this vote specifies which languages are to be reduced, of the many languages the Chinese government claim to fall under the umbrella of "Chinese". If one day the People's Republic decides to include Tibetan in that list, assuming it doesn't already, would this vote mean we'd have to follow suit? If the entry at the suggested term 歷史 is to be taken as an example, would Korean and Vietnamese be affected as well? DAVilla 19:39, 26 April 2014 (UTC)
    Mate, that is seriously an absurd argument. You've been brainwashed by western media's demonising portrayal of the Chinese authority for too long. Wyang (talk) 22:20, 26 April 2014 (UTC)
    To be honest, I don't think the Chinese government needs any help demonising itself. It does fine on its own by letting its actions speak for it. In any case, Chinese is definitely a language family rather than a language, but that only applies to the spoken form. In writing it's not so clear because of the nature of the script. I feel somewhat ambivalent about whether to merge the languages. On one side I understand that it's more practical, but on the other I have no idea what trouble we might run into in the future that would marginalise non-Mandarin varieties. I think this is DAVilla's concern as well; the last thing we want to do is continue to promote the idea that only Mandarin is True ChineseTM. —CodeCat 22:42, 26 April 2014 (UTC)
    Well obviously we want to keep Dungan separate, as that one's written in Cyrillic and not in Chinese characters. -- Liliana 23:19, 26 April 2014 (UTC)
    Wyang, I apologize for politicizing the argument myself. It's counter-productive to throw around allegations in a debate. I said that it's not clear which languages are to be reduced, and I should have left it at that, or asked if there's a already a known list of languages grouped under Chinese in translations, and if we could take this vote to mean that it applies to that group. My point of order still stands as such. DAVilla 04:36, 27 April 2014 (UTC)
@DAVilla. Do you really think that shared written language between varieties of Chinese is the result of the actions of the Chinese government?
The language situation in China and with Chinese topolects (even if one calls them languages) is unique. Nobody denies substantial pronunciation differences. Any other differences are exaggerated. The lists of terms, which are different across Chinese varieties are really small or there are some predictable changes.
You should ask yourselves why there are no Mandarin-Cantonese or Mandarin-Wu, etc. dictionaries - only pronunciation guides and guides on some common phrases and points, which differ varieties of Chinese.
The opponents should read the discussion linked above, where users of affected topolects took part.
The official language of Hong Kong is "Chinese", even if it's pronounced the Cantonese way. Why? Because they write the same way. The same can be said about any Chinese topolect, which uses Han characters. It's not just the formal Chinese writing but any Chinese literature, song lyrics, TV speech transcripts, etc.
In order to vote, no knowledge of Chinese is required but I'm afraid you have to use concrete examples where it seems to be a problem using common "L2" header. Otherwise, it all really seems as brainwashing. An example of a non-Mandarin entry is given above.
@Dan Polansky please re-read your own last quote (DeFrancis).
@CodeCat please provide an example of a future problem, which we haven't addressed yet. --Anatoli (обсудить/вклад) 05:33, 27 April 2014 (UTC)
@Wyang I think it would be interesting to show some examples of both the huge commonalities between major Chinese topolects (an article in a Hong Kong or Taiwanese newspaper) and examples of differences. I've just found a recent Cantonese example phrase 租金係,等我諗諗先,噢係嘞,總共萬一文。, which is in Cantonese vernacular, i.e. written NOT in a way used by Mandarin speakers. It's pure Cantonese. (Note, however, like with Serbo-Croatian case, we are dealing with separate vocabulary words here, not long sentences). --Anatoli (обсудить/вклад) 05:48, 27 April 2014 (UTC)
Ultimately there is no technical barrier in placing all Chinese words under the same level 2 header. We could do this with other languages as well. For instance, hola means hello and amor means love in both Spanish and Catalan. Wouldn't it be logical enough to specially mark any words that only exist in one language or the other? I mean, we could go as far as to have a single Earthican language. On the flip side, we could split languages like English, too, if it was really in our interest. So no, I don't "have to use concrete examples of where it seems to be a problem", and you don't get to frame the question that way, because given the logistical possibility of what we can agree is much more absurd, it's not a question of technical feasibility. It's a question of what's in the best interest. You list simplification as a pro, I list oversimplification as a con.
Now, you've got to be kidding yourself if you don't think government has any influence on language, whether it be in China or elsewhere. You must remember, that's why many member states gave up Russian as soon as the USSR fell apart. With Beijing as the seat of power, it's no accident that not just Mandarin but specifically the Beijing dialect became the standard. We can at least agree that the Communist Party simplified the writing system, correct? Speaking of which, since Hong Kong still uses traditional characters, it's a great generalization to say that "they write the same way" as in Beijing. Hey, that was your example, not mine.
Standards bodies exist for many languages, for instance the French Académie, whose dictionary is regarded as official in France. Yes, there are many similarities in the Chinese family, and part of the reason is that the languages are highly standardized, at least when it comes to simple, common terms. What you won't find as often in dictionaries are the idioms and slang. Film and television is monitored for content in Hong Kong and the mainland alike. That doesn't happen in France, where the official standard doesn't have the weight of law. English Wiktionary made a decision a long time ago to be descriptive rather than prescriptive, so the censors can cry all they want about words like LP or its more vulgar form, and the editors can cry all they want about words like OK not being transcribed , that is, in a way they weren't said at all.
I'm amazed at how many Asians seem to think I've been brainwashed by Western media when there's so much of that I don't trust either. Possibly you've confused me for the average American, or possibly you're just repeating what's been said of you. Regardless, you have me wrong. And while it's all well and good to discuss whether or not this step should be taken, no one has yet to address my point, which is to ask what exactly is being done. Is there a list of affected languages/dialects or not? DAVilla 07:51, 27 April 2014 (UTC)
Perhaps you have an idea that the aim of this vote is to suppress any Chinese variety? It is quite the opposite. If you follow the BP discussion, then you will see that no-one intends to do so. I assure you, all Sinitic languages/dialects will benefit from the change. Cantonese, Wu and other entries will be encouraged. It will be much easier to add vernacular dialectal forms, whether they have a standardised form or not. The USSR analogy doesn't count here, as ex-USSR republics are non-Russian, accept for Russia proper. On the other hand, Han people never deny their origin, even if they highlight their topolect.
No, no language list has been done. Cantonese (Yue), Wu, Min (excluding romanised Min Nan) were used in examples, they are definitely affected. Only Sinitic languages are affected, which also use Han characters. (It's ridiculous to mention Japanese, Korean and Vietnamese, which, apart from different writing systems, have different grammar). Missing from the list are Jin, Xiang, Kejia (Hakka), Gan, Hui (Huizhou) and some other major topolects and their dialect on mainland China (only those, which use Han characters). --Anatoli (обсудить/вклад) 08:13, 27 April 2014 (UTC)
One reason only a few varieties were mentioned is that some of them lack writing systems or standard transliteration. E.g if you want to add Xiang or Gan, etc. contents, you can only use IPA on existing Chinese entries, which is easily done and the current vote provide for that. No-one ever stopped dialectal contents but lack of templates, modules and enthusiasm made it difficult. I can show how any dialectal word of your term can be added but User:Atitarev/佢哋 is just one such example - a Cantonese-only term. (Cantonese HAS standard transliteration, so it's not just IPA). --Anatoli (обсудить/вклад) 08:23, 27 April 2014 (UTC)
It would affect languages listed in the table of Wiktionary:About Sinitic languages#The Chinese language group. Wyang (talk) 08:43, 27 April 2014 (UTC)
Just to make sure no further speculations are made, minority languages of China (inc. Tibetan, Uyghur, Mongolian, Zhuang) are not affected, including those that may use Chinese characters. --Anatoli (обсудить/вклад) 09:36, 27 April 2014 (UTC)
@DAVilla Re: "they write the same way". Yes, it's my example and I stand by what I said. Traditional/simplified are two sets used by all Chinese communities everywhere and Wiktionary provides both sets equally. I was talking about the actual words used, though. A newspaper in Hong Kong is basically written the same way as it is in Taiwan, if you wish to be specific about the set. There is a vernacular form of Cantonese, of course. It makes it more different from Mandarin (including the vocabulary used). The words that are written differently are very small in number, though, even if they are the most common words, like copulas, negation particles, pronouns and and sentence final particles.
Let's look at the vernacular Cantonese phrase I gave above: 租金,,,總共萬一。 (zou1 gam1 hai6, dang2 ngo5 nam2 nam2 sin1, o1 hai6 laak3, zung2 gung6 maan6 jat1 man1) "The rent is, just let me get it right, oh yes, 11,000 dollars altogether.". It's written in traditional Chinese here. It is understandable to a Mandarin speaker. Copula 係 is different, the use of 諗 is slightly different from Mandarin (see BP discussion about handling of senses), sentence-final particle 嘞 is not used as often but it is, AFAICT, 文 for "dollar" must be highly colloquial and regional. I see no obstacles for Wiktionary, though. The majority of words are still the same, one has to search for such extreme examples. --Anatoli (обсудить/вклад) 12:24, 27 April 2014 (UTC)


  • This vote has finished: 11-4-2 (support:73.3%). Could an admin or bureaucrat please check the results and close this vote? Thanks. Wyang (talk) 02:01, 29 April 2014 (UTC)
  • Yes, it passes - 11-4-2.
Congratulations! No need to be a bureaucrat to close the vote, I think. Closing. (Now, we'll have to make sure the concerns of people who voted against and think non-Mandarin Chinese will be suppressed are unjustified).
  • Opposer 1 based his arguments on the reusability of the data, and I have had discussions with him about this issue on my talkpage. My opinion is that the amalgamation does not result in a higher level of difficulty in data retrieval and analyses, at least for his Captcha work, and Wiktionary categories of multi-scripted languages are not the best way to generate Captcha, which he seemed to agree ("Yesterday a new dump of wiktionary was produced, and I'm working on automating a process to update the word lists generated from it. However, I will be recommending that we not use en.Wiktionary data in the future, and instead derive word lists from the wikipedia dumps.").
    Opposer 2 was concerned that this vote might be politically motivated, and that the result will decrease the level of participation by non-Mandarin editors. The former is unsubstantiated and is contradicted by the proposal. We have also argued in the proposal that this vote will have an effect opposite of that envisaged by Opposer 2.
    Opposer 3 was concerned about the dubiousness of the statistics cited in the proposal. I agree that the statistics is uncited, but it is easily supported by any dictionary in a non-Mandarin variety published in the missionary era (such as this one here), which covers a more comprehensive part of the vocabulary. Modern Cantonese or Hokkien dictionaries are heavily biased towards the variety-specific terms.
    Opposer 4's arguments were based on his objection to the disallowance of variety codes. I do not agree that merger and disallowing some codes are bad in terms of consistency. There are precedents, for example the Serbo-Croatian case.

I'd like to make a start on the merger work once this vote is settled. Wyang (talk) 03:26, 29 April 2014 (UTC)

Could an admin please add 'zh' to Module:languages/data2 now that the vote is closed? Thanks.

m["zh"] = {
	names = {"Chinese"},
	type = "regular",
	scripts = {"Hani"},
	family = "sit"}

Wyang (talk) 11:46, 29 April 2014 (UTC)

Done --Anatoli (обсудить/вклад) 11:54, 29 April 2014 (UTC)

As "opposer 2" I want to make it clear that I never meant to imply any of you supported this change under political motivation. China and really any of the superpowers do push their positions through education and other means, not just military. That doesn't mean, however, that the Chinese citizenry is knowingly involved in a mass conspiracy. I would label this more as politically inept than politically motivated. Certainly our usage statistics must be tracked by country, and I look forward to being proven wrong. DAVilla 03:14, 3 June 2014 (UTC)

You may be right (in a way) but the reality is there is very little written Chinese, which is NOT Mandarin. As much as we would like to support dialects, if there is no written material or even transliteration, audio recordings are missing in that dialect, we won't create here. Whose fault is that American Indian languages are not developed well? Even if we know the answer, what's next? There are researches into pronunciation of various Chinese topolects and dialects and subdialects but they mainly concern with SPOKEN, not written language.
On the bright side, we now have ability to support languages/topolects/dialects, previously neglected, lacking templates. The merger already helped increase Cantonese contents from about 200 badly formatted entries to over a thousand - with IPA, usage examples, etc. Wu has grown from nothing to a few hundred entries. Didn't you notice? Four major topolects experience growth, which also generated interest in new editors - Cantonese, Min Nan, Wu and Hakka, all of them have some written form and are documented well. Your position IS politicised, even if you don't admit it! Now, how can you help support Chinese dialects and enhance contents? Please do, we don't suppress anything. All languages and dialects are welcome. --Anatoli (обсудить/вклад) 03:28, 3 June 2014 (UTC)
  • Is this the consequence of a mass conspiracy too? Wyang (talk) 03:38, 3 June 2014 (UTC)