Wiktionary talk:Votes/pl-2014-04/Unified Chinese

Active discussions

HSK compound difficulty level in headword templateEdit

I find the thread on BP to be extremely hard to follow (inherent issue with WT's thread format), so I am just going to post here. I've been looking at User:Wyang/歷史. I noticed the headword template has been changed to include HSK's categorisation of compound difficulty level. Personally I am not familiar with it, so I will struggle with including it when creating a new compound. Is it going to be compulsory? Is it really worth including it? We are a dictionary after all and I don't really see the value of including information like this. JamesjiaoTC 21:15, 25 March 2014 (UTC)

I'm not with the familiar template either. I think it definitely shouldn't be compulsory, like any non-SoP categorisation. I'd also prefer the pronunciation template to allow optional parameters. It seems that Wyang prefers to edit the template itself. E.g., most entries will just have Mandarin or just Cantonese. Not sure if it's a good idea if e.g. Min Nan pronunciation is generated automatically. I don't know Min Nan, so I won't be able to check the accuracy but someone who knows should be able to do it on the entry, not on the template. --Anatoli (обсудить/вклад) 21:26, 25 March 2014 (UTC)
I don't get the need to include these tags in entries too - I always thought they should have been removed. Clearly they are not dictionary content, just something for learners, like Pinyin. There are appendix pages for these lists already (Appendix:HSK list of Mandarin words/Beginning Mandarin etc.). It's worse if they are treated as context tags, which means they have to be placed at the beginning of every sense, as in 上下.
I am fairly confident that automated IPA for every variety is possible, given that the romanisation is sufficiently informative. Complications in IPA pronunciations mostly result from non-simple compounding (eg. verb phrase) and presence of word boundaries. For Min Nan, I disabled automatic IPA on POJ containing ' ' (which means there are word boundaries) and '--' (which means it is a verb phrase and uses different tone sandhi patterns), i.e. automatic IPA is generated only for simple words two to four syllables long. There shouldn't be problems with this methodologically. There are plenty of Min Nan audios on the Internet (eg. [1]) so one can always check. Wyang (talk) 04:09, 26 March 2014 (UTC)
My point is, Cantonese, Min Nan, Wu transliterations/IPA shouldn't be added by default when editors make entries for e.g. Mandarin or Cantonese only. If you personally add them, that's fine, you edit it, it's your responsibility. That's why I think those transliteration should be added separately, even if they can be automatically generated. Someone has to check them. A person without Min Nan or Cantonese knowledge won't even notice if 音樂 is romanised as "im-lok" (nan) and "jam1 lok6" (yue) instead of "im-ga̍k" and "jam1 ngok6" --Anatoli (обсудить/вклад) 22:46, 26 March 2014 (UTC)
Yes, readings should always be checked against a word dictionary, not a character dictionary. By the way, im-lok or jam1 lok6 would sound funny... Wyang (talk) 22:55, 26 March 2014 (UTC)
Yes, by a human editor. If Cantonese, Min Nan, Wu transliterations are added automatically, when a standard Chinese entry is created, it may put off editors (who know only Mandarin) from making Chinese entries. So, while the current ponunciation format of User:Wyang/歷史 looks great in principle. I disagree with having all transliterations in the template and if non-Mandarin sections are loaded automtically. What does everyone else think? --Anatoli (обсудить/вклад) 23:03, 26 March 2014 (UTC)
The readings there are either from dictionaries (Min Nan and Cantonese) or personal knowledge (Mandarin, Shanghainese, Cantonese). Automatic readings for non-Mandarin varieties are very difficult, due to the multiple layers of borrowing and hence the presence of many more readings for one character. Wyang (talk) 23:13, 26 March 2014 (UTC)
I'm not questioning YOUR edits or sources but I'm saying that the pronunciation template shouldn't combine all varieties in one automatically but by a knowledgeable editor, e.g. you. Even better, separate templates for each version should exist, so that each could be added on the entry as a list - *Cantonese *Min Nan. I guess I'm not explaining well. --Anatoli (обсудить/вклад) 23:24, 26 March 2014 (UTC)

I'm not sure I still fully understand what you meant. This is the format of an empty pronunciation template:


. Each of these lines then calls a separate variety pronunciation template, namely {{Pinyin-IPA}}, {{Jyutping-IPA}}, {{Min Nan-pron}} and {{wu-pron}}. No readings except Mandarin will be generated when the entry is created with {{cmn-new}}. Hence the format of an entry would be


by default. One can add new parameters |c= if the reading in Cantonese can be verified by personal knowledge or dictionaries. Otherwise, it would not exist. Does this... correspond to your point? Wyang (talk) 23:31, 26 March 2014 (UTC)

I see. So, editors, can add mn=, c=, w= if they know the transliteration. I've checked 老番 I'm happy now. Thanks and sorry for the confusion. Are you going to work on Mandarin->Chinese conversion at some stage? No pressure, just curious. I need to fix regional categories, labels, thanks to @-sche, he showed me how to do it. "Cantonese Mandarin" looks weird. --Anatoli (обсудить/вклад) 23:42, 26 March 2014 (UTC)
Yeah, definitely. No further input was given at the BP discussion Wiktionary:Beer parlour/2014/March#A new format for Chinese entries (multisyllables), and the pronunciation template is ready. I think we are ready to go. Wyang (talk) 23:46, 26 March 2014 (UTC)


Am I correct to assume that this vote is only about treating Chinese variants as one language, but does not specify any details about how they are to be unified? —CodeCat 01:54, 27 March 2014 (UTC)

User:Wyang/歷史 is a VISUAL (since some templates need to be reworked/created) example (ignore See also part) but technically there will be hurdles. Here and at Wiktionary:Beer_parlour/2014/March#A_new_format_for_Chinese_entries_.28multisyllables.29 is a place to express any objections or concerns. Wyang is handling this at the moment but please join if you're willing to help. Your skills would definitely help. --Anatoli (обсудить/вклад) 02:03, 27 March 2014 (UTC)
(E/C)There will definitely be questions but I think when Mandarin (cmn) entries/translations are converted to Chinese (zh), then handling other varieties will get easier, they are smaller in number too. What details exactly seem to be missing?--Anatoli (обсудить/вклад) 02:07, 27 March 2014 (UTC)
Yes, if I understood you correctly. Specifically, "allowing Chinese as a header, gradually replacing topolect-level headers" and "allowing translations to be added after '* Chinese: ', without having to specify the topolect". Wyang (talk) 02:05, 27 March 2014 (UTC)
What I'm mainly concerned about is giving Mandarin special status or worse, assuming that "Chinese" implies Mandarin. Unifying is one thing, but systematically marginalising the others is another. That discussion is too confusing for me to follow, so I'm expressing my reservations here. —CodeCat 02:07, 27 March 2014 (UTC)
"Chinese" is a neutral term for any topolect (also used in Taiwan, Hong Kong, etc., think "Han", not (mainland) China). No dialect/topolect will be marginalised. The pronunciation section is open for any variety sharing the same spelling, specific ones will have their own only, such as Cantonese-only. --Anatoli (обсудить/вклад) 02:12, 27 March 2014 (UTC)
Could you give an example? The most vociferous opponents of "Chinese", if there really are any, are perhaps some Cantonese and Taiwanese speakers but Hong Kong specifies Chinese (rather ambiguously) as their state language and the official and most common press and literature is entirely in standard Chinese (=Mandarin). It's similar with Taiwanese topolects. --Anatoli (обсудить/вклад) 02:18, 27 March 2014 (UTC)
In the discussion, I saw that translations will list Mandarin directly under "Chinese" and not under a "Mandarin" subitem. I disagree with that, if that's what is being planned. (If all of this has already been worked out and decided on, it should be summarised in the vote, regardless of whether it's actually part of the vote. Otherwise I have no idea what agreements have been made outside the vote, that I would implicitly be agreeing to by supporting.) —CodeCat 02:20, 27 March 2014 (UTC)
The participants agreed that Mandarin translations go under "Chinese" because Mandarin is universally accepted as standard Chinese. The topolects will stay as they as not to clutter the table with qualifiers and confuse with non-Mandarin transliterations. --Anatoli (обсудить/вклад) 02:24, 27 March 2014 (UTC)
* Chinese: {{t+|zh|北京|tr=Běijīng|sc=Hani}}
*: Cantonese: {{t|yue|北京|tr=bak1 ging1}}
*: Min Nan: {{t+|nan|北京|tr=Pak-kiaⁿ|sc=Hans}}
As above (perhaps yue, nan, etc. should stay for interwiki purposes). --Anatoli (обсудить/вклад) 02:27, 27 March 2014 (UTC)
This change is actually the opposite of marginalisation. It enables editors to increase the non-Mandarin presence here, simply by adding topolect pronunciations ([2]) to words shared across Chinese topolects, which represent 99% of the Chinese-language corpus. It is not "Mandarin" that is promoted to "Chinese" in headers and translation tables; it is anything shared across Chinese topolects being promoted to "Chinese". Wyang (talk) 02:54, 27 March 2014 (UTC)

"See also" sectionEdit

If the "see also" section in Wyang's example page is included as part of this vote, I can't support it. The format is totally different from anything else we have on Wiktionary, but for no reason (at least none that's explained in the vote). —CodeCat 02:11, 27 March 2014 (UTC)

Wyang has confirmed that this is not part of the proposed change, so he should remove/fix it. --Anatoli (обсудить/вклад) 02:13, 27 March 2014 (UTC)

Older Chinese varietiesEdit

Are these going to be merged too? If so, what happens to terms that didn't exist until modern times, like say "television"? —CodeCat 02:15, 27 March 2014 (UTC)

If they are spoken by Chinese people and are attested - e.g. Dungan, Hakka, etc. then yes. --Anatoli (обсудить/вклад) 02:20, 27 March 2014 (UTC)
I meant more like Old Chinese and such. —CodeCat 02:21, 27 March 2014 (UTC)
Do you mean Classical Chinese? Any other old Chinese - tentatively yes but I'd need some details. --Anatoli (обсудить/вклад) 02:25, 27 March 2014 (UTC)
BTW, we don't have Classical Chinese entries "zho" or anything similar. --Anatoli (обсудить/вклад) 02:48, 27 March 2014 (UTC)
Old Chinese and Middle Chinese refer to the various phonological reconstructions, notably based on rime dictionaries, of the Literary Chinese spoken at that time. They are concepts virtually non-existent outside the linguistic reconstruction context, and their presence should be limited to phonological reconstructions of terms attested in those respective times. Literary Chinese terms will continue the practice of treatments as before; please see 不數日, 豺獭 for examples. User:Wyang/電視 is how the entry for "television" would look. Wyang (talk) 03:25, 27 March 2014 (UTC)


I have formatted User:Wyang/歷史 in relevance to the vote - removed new etymology format (not in use and not part of this change), confusing "See also", Wikipedia link, added the actually used {{zh-hanzi-box|历史|歷史}}. I suggest to use this revision for the vote. --Anatoli (обсудить/вклад) 00:43, 28 March 2014 (UTC)

Looks good, thanks for the edit. Wyang (talk) 01:03, 28 March 2014 (UTC)
I would go so far as to say that the precise format of even the pronunciation information should be left outside the scope of this vote, and that this should be a vote solely on whether or not to merge the Chinese varieties under one header, with this revision offered as one possible way of formatting Unified Chinese entries (to show that it is possible to unify them without marginalizing dialects, etc), but with precise formatting details left up to BP discussion. I am concerned that if you make it seem like one of the things this vote will set in stone is that pronunciation format, some users may vote "oppose" because they don't like that pronunciation format (and favour e.g. an untemplatized format like the one used on other languages' entries, or a more heavily templatized format, etc, etc). You may not think my concern is likely to be borne out, but just look at the kind of comments that have been made in most of the currently-ongoing votes. - -sche (discuss) 01:50, 28 March 2014 (UTC)
Thanks. It has now been mentioned in the vote itself. Wyang (talk) 03:11, 28 March 2014 (UTC)

Non Hanzi-based ChineseEdit

@Wyang Please comment on Chinese written in other scripts - Dungan (Cyrillic, e.g. вә), romanised Min Nan (see chò-chhân-lâng), Teochew, Hakka (there are words for which there may be no hanzi equivalent, Hui dialects (anything written in Arabic, like Xiao'erjing). Are they part of this vote? Should we just focus on topolects using hanzi? --Anatoli (обсудить/вклад) 00:53, 28 March 2014 (UTC)

They all can keep the topolect-level headings, since the words written in those scripts are not supposed to represent any other topolect. Wyang (talk) 01:03, 28 March 2014 (UTC)


Should the implementation - changing the template/module names be left outside this vote? --Anatoli (обсудить/вклад) 01:10, 28 March 2014 (UTC)

Which template/module names do you have in mind? Do you mean templates like {{zh-noun}}, {{zh-verb}}? Wyang (talk) 01:16, 28 March 2014 (UTC)
Yes. We probably just need to move "cmn" to "zh" templates after some changes. Could you add more to "rationale"? --Anatoli (обсудить/вклад) 01:19, 28 March 2014 (UTC)
Yes, that would be the most convenient option. I have tried to add a bit more there; feel free to change or add anything. Wyang (talk) 01:41, 28 March 2014 (UTC)
I have added more and have given 佢哋 as an example of a Cantonese-only entry. I don't know what (Taishanese) IPA(key): [kɪɛk] is on 佢哋. if you change User:Atitarev/佢哋, please update the revision number in the vote. --Anatoli (обсудить/вклад) 02:05, 28 March 2014 (UTC)
I have modified that page slightly. [kɪɛk] is the pronunciation of the word for "we" in Taishan Cantonese (Taishanese), and someone just put that there as if written Guangzhou Cantonese can be used to write all Cantonese dialects. Wyang (talk) 03:11, 28 March 2014 (UTC)

Vote startEdit

@Wyang It doesn't have to start in March, it has 2014-04 in the name, in case you need more time. --Anatoli (обсудить/вклад) 02:13, 28 March 2014 (UTC)

OK, I see. I didn't notice that. I think it should be all right. Wyang (talk) 03:11, 28 March 2014 (UTC)

Headword structure change.Edit

@Wyang We should mentioned proposed simplification of the headword - minus rs, trad./simp., pin and pint parameters. Sorting all terms by numbered pinyin should be addressed eventually. I agree now with losing pinyin from the header. It would be fair for all topolects, even if Mandarin and Pinyin will still be given preference for many reasons. --Anatoli (обсудить/вклад) 08:06, 29 March 2014 (UTC)

Wyangbot has been removing |rs= from entries, since it has been made obsolete quite a while ago. For this vote we probably could just assume that minimal further changes will be made to those templates. Once a unified approach is agreed upon, in the future |pin= and |pint= will go, and the pronunciation template will be the categorisation coordinator, generating categories based on the presence of topolectal pronunciations and additional PoS information and sorting them accordingly. Wyang (talk) 12:17, 29 March 2014 (UTC)

Scope and placement of rationaleEdit

The following item is placed into "Rationale" section:

  • This vote only concerns words written in Han/Chinese characters (Hanzi).

Since this item seems to clarify the effect of the vote, it should not be in rationale, IMHO.

Furthermore, putting the rationale to vote's talk page is preferred by some editors, including User:Ruakh. The advantage of doing so is a very clear separation of what is being voted on, and what is not being voted on. Thus, I propose that the rationale is copied to the talk page, and the main vote page merely references the rationale. --Dan Polansky (talk) 09:41, 29 March 2014 (UTC)

The rationale is meant to be more or less descriptive and give ideas how Chinese varieties may be handled in the unified approach and what is meant to be excluded, so that a person supporting the idea itself but not liking e.g. use of contexts or the structure of a non-Mandarin (Cantonese in this case) entry is not put off by the details described in the rationale, even if it's the current plan. The rationale should stay here, in my opinion, for everybody to see but I think the vote start should be moved further to iron out the vote setup. The reason is, the discussions and the reached agreement was mainly between editors working with Chinese and some Chinese speakers including non-Mandarin) and only a few people outside Chinese editing took part. --Anatoli (обсудить/вклад) 10:06, 29 March 2014 (UTC)
I agree that the rationale should be kept brief but easily accessible for visitors. Wyang (talk) 12:17, 29 March 2014 (UTC)

Moved from voteEdit

  • I can't see any other place to discuss the premise without casting a vote so I'll do it here:
    Could you please provide a source for the 99% vs 1% statistic quoted, clarify that the statistic is made up, or change it to something not involving invented statistics? — hippietrail (talk) 05:02, 24 April 2014 (UTC)
    I see this discussion is already full of wording misrepresenting the non-Mandarin varieties as "dialects" of Chinese or of Mandarin. I see this entire proposal also motivated in part to manufacture evidence to support this viewpoint.
    If Wu, Min-Nan, et al should be called "dialects" of Chinese, then at the very minimum Mandarin should likewise be called a dialect. For it is utterly certain that others are not dialects of Mandarin.
    I can't see much if anything here about Classical Chinese, which is perhaps more different from modern Standard Chinese than any of the contemporary varieties, at least in that it was mainly monosyllabic and contemporary varieties are mainly disyllabic/polysyllabic.
    If this unification were to proceed I would then be in favour of having entries for all varieties with non-Han spelling possible, including Zhuyin Fuhao entries for any varieties of Chinese it supports. I believe that's Hokkien=Taiwanese and possibly Hakka as well as Standard Chinese=Mandarin of course. — hippietrail (talk) 05:13, 24 April 2014 (UTC)
@Hippietrail a quick reply: The voters only vote on the proposed action, not on the rationale. Statistics is a guess by educated Chinese speakers, talking about WRITTEN Chinese in Chinese characters only. Classical Chinese is almost entirely monosyllabic. This vote is primarily for multisyllabic entries but will affect further actions. --Anatoli (обсудить/вклад) 05:24, 24 April 2014 (UTC)
@Hippietrail (E/C):Further answers. There is no reference to dialect/topolect/language/whatever wording here. It's just Cantonese, Wu, Min Nan, etc. A variety may be a dialect of another prestigious topolect, it doesn't matter in this case as it won't have any affect on the structure of Chinese entries, the pronunciations are added one line at a time as in sample entries, variants can be added on the right-side, erhua, neutral-tone variant, other.
This vote doesn't deal with Cyrillic/Romanised/Arabic forms and Pinyin, Zhuyin entries are not part of this vote either. I'm personally against adding another system not used a proper writing system, such as Zhuyin entries but as I said, it's outside the scope of this vote. --Anatoli (обсудить/вклад) 06:51, 24 April 2014 (UTC)
Well it appears not all concerns have yet been thought through, in which case I will have no choice but to vote against such a hasty action. — hippietrail (talk) 06:42, 24 April 2014 (UTC)
Making big changes is a difficult job for a small group of people. Which part exactly is a concern to you? All I said it's not addressed yet, it has a smaller scope. --Anatoli (обсудить/вклад) 06:51, 24 April 2014 (UTC)

Chinese terms spelled with one or all non-Chinese charactersEdit

How does this proposal cope with terms used in one or more Chinese varieties that are sometimes or always spelled with some or only non-Han characters? Here's a couple I can think of off the top of my head:

Neither seems to have a Chinese entry at the moment. — hippietrail (talk) 07:17, 24 April 2014 (UTC)

An example of a non-Han character entry is OK#Mandarin - highly discouraged. Note that the above may go through RFV, I haven't checked them but they look includable, they are Chinese inventions, anyway. The main discussion about regional contexts, structure is linked in the vote page Wiktionary:Beer_parlour/2014/March#A_new_format_for_Chinese_entries_.28multisyllables.29. It's a bit hard to read, perhaps, but it may answer your questions. --Anatoli (обсудить/вклад) 07:23, 24 April 2014 (UTC)
Return to the project page "Votes/pl-2014-04/Unified Chinese".