Wiktionary talk:About Chinese

Latest comment: 1 month ago by Benwing2 in topic too many label aliases?

untitled edit

Please see Wiktionary talk:Entries on Chinese characters#Sortkeys and subcats for single-character entries for discussion of how to categorize the single-Chinese-character entries since they (may) apply to Chinese, Japanese, Korean and Vietnamese (CJKV). - dcljr 08:39, 25 January 2006 (UTC)Reply

Autoformat has identified a number of entries that have the non-conforming language name "Chinese (traditional/simplified)". There are others that it has not yet flagged as well. I could not be trusted to correct this properly. DCDuring TALK 17:32, 9 May 2010 (UTC)Reply

As a rule, assume it's Mandarin. Traditional/Simplified entries go on a single, they're not really different 'scripts' but more like the French spelling reforms, were paraître becomes paraitre as the circumflex doesn't serve any purpose. Mglovesfun (talk) 08:26, 29 September 2011 (UTC)Reply

? edit

(Note: I don't know a thing about Chinese.) A few questions/issues:

  1. What's up with the categories? There are Category:cmn:All topics, Category:zh:All topics, Category:zh-cn:All topics and Category:zh-tw:All topics. What's the difference?
  2. WT:AZH#Min_Nan says that Min Nan "has four main branches... This poses a problem for Wiktionary, since these dialects are not mutually intelligible, and only one L2 header may be used per ISO 639 code. ... To date, virtually all entries for Min Nan have been based on the Amoy dialect, which is widely considered to be a de facto standard. The disposition of other dialects such as Teochew and Qiongwen Hainanese remains undecided at this time." I'm pretty sure that standard practice for branches among languages is to use context labels for words that don't exist in some branches. Why should this language be different?
  3. I seem to recall some consensus about not allowing toneless pinyin entries? If there was, shouldn't this be mentioned on WT:About Chinese?
  4. WT:AZH lists {{infl}} as being the standard template to use, and repeats it many times for all the languages that do not yet have templates built for them specifically. Rather than showing an explanation for {{infl}} over and over again, wouldn't it make sense to make the page say that for dialects that don't have specific templates yet, use infl, and then explain how to use it once?
  5. Are these languages treated as separate languages or as dialects of one languages? If they're separate languages, why do things like Category:Chinese templates exist, instead of being split into sections?
  6. What is the Wiktionary code for Mandarin, zh or cmn?

--Yair rand (talk) 07:02, 24 May 2010 (UTC)Reply

Just one answer for the moment: #What is the Wiktionary code for Mandarin, zh or cmn?. This is annoying but the assisted method doesn't work well with cmn, it creates {{ tø|cmn| for translations, this they can't be linked to zh:wiki. zh works better but bots change them to cmn. ZH is short for Chinese 中文 (Zhōngwén), CMN is Chinese Mandarin but both have the word Mandarin in templates. I learned to live with this :) The reasons for existence of Chinese and Mandarin are historical. Mandarin is standard Chinese and most written Chinese material is in Mandarin. There are no YUE, NAN, etc. Wiktionaries but there are some new WIkipedias in dialects. --Anatoli 12:36, 24 May 2010 (UTC)Reply
I proposed on WT:BP, and still do propose eliminating zh, zh-cn and zh-tw from category names. zh is used for translations as the Mandarin Wikiprojects uses the code zh not cmn. Mglovesfun (talk) 08:28, 29 September 2011 (UTC)Reply

Move debate edit

 

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Wiktionary:About Chinese edit

I'd prefer Wiktionary:About Chinese languages as a title. It makes it clearer that we don't allow Chinese as a language. Furthermore, as much content as is reasonable/possible should be moved to the individual languages involved - Wiktionary:About Mandarin shouldn't be a redirect. Mglovesfun (talk) 12:55, 9 November 2010 (UTC)Reply

I support moving the contents into Wiktionary:About Mandarin, Wiktionary:About Min Nan, etc. Despite these languages naturally sharing common characteristics, they conceivably have different conventions as well, such as grammar and names of templates. --Daniel. 13:02, 9 November 2010 (UTC)Reply
Wiktionary:About Chinese (or a renamed version) should still exist, at the very least it could give context on what we call 'Chinese' here, and then link to the individual languages' pages. Mglovesfun (talk) 13:22, 9 November 2010 (UTC)Reply
I support moving to About Chinese languages. IMO as long as there is no Mandarin-specific information to be split off of that page, hard-redirect from About Mandarin. Precedent, fwiw, is About sign languages, redirected to from both About American Sign Language and WT:AASE (ase is American Sign Language) as well as from WT:ASGN (sgn is the group (or whatever it's called) code for sign languages).​—msh210 (talk) 21:03, 10 November 2010 (UTC)Reply

Moved. Mglovesfun (talk) 16:17, 25 November 2010 (UTC)Reply

Move debate (2) edit

 

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Wiktionary:About Chinese languages edit

We don't have a Category:Chinese languages; we have a Category:Sinitic languages for that.

For that reason, I suggest moving Wiktionary:About Chinese languages to Wiktionary:About Sinitic languages. (And keeping the old name as a redirect.) --Daniel 19:05, 25 May 2011 (UTC)Reply

Done. Nobody objected. --Daniel 02:27, 8 June 2011 (UTC)Reply
Next time, please remember to check for double-redirects; in this case, that would be pages that redirect to Wiktionary:About Chinese languages. MediaWiki only supports one level of redirection, so once Wiktionary:About Chinese languages became a redirect to Wiktionary:About Sinitic languages, those redirects stopped working. (Don't worry, I've updated them now. Just something to remember for next time.) —RuakhTALK 03:09, 8 June 2011 (UTC)Reply
OK, I will check for all double-redirects next time. I've fixed some double-redirects to Wiktionary:About Sinitic languages, and missed others, before your help. Thanks. --Daniel 03:58, 8 June 2011 (UTC)Reply


Banning foreign proper nouns as Mandarin edit

I propose to make it a language policy of banning all proper names used in Mandarin context if they are not in Hanzi, regardless whether there are citations - Chinese do write in foreign language occasionaly, these foreign words don't become Chinese though. Foreign words should be and are transliterated into Chinese characters, otherwise they should not be considered Mandarin. The complexity is not a justification for not following this rule. This is to avoid entries such Thames河, Alps山, Alzheimer病, etc. once and for all. PRC and RC policies both regard using names in Roman letters as incorrect, which is widely accepted. --Anatoli 05:18, 29 September 2011 (UTC)Reply

I support this. Japanese speakers also use Latin-based foreign words in their writing occasionally, when there is a perfect katakana equivalent. Sometimes, it's done for stylistic reasons (as, very unfortunately, Western cultures are considered trendy in Asian countries), sometimes, well, some just want to show off. You can find this aspect especially in their song lyrics. Quite often the English lines don't even make sense whatsoever. Anyway, I digress. As I noted, writing in foreign scripts especially Latin-based languages is especially trendy among younger generations. Ok let me put it another way. I have seen English speakers putting words in Japanese hira or kata characters in their writing, when the same concept can be written in English perfectly. It's the result of a change in people's perception towards the Japanese (language or otherwise), which is now considered trendy and also the proliferation of Japanese learners in the past decade. Again, does it mean these words are now considered borrowed into English? If you say yes, then I have no problem with Thames河 being included in this dictionary. JamesjiaoTC 06:00, 29 September 2011 (UTC)Reply
Re: setting up a vote (something mentioned in the BP): do you want to set up a vote that would only ban proper nouns? Or do you think common nouns like e-mail地址 should be banned, too? If so, then the vote could be broader. But your comments on RFV suggest you wouldn't delete all mixed-script entries (eg Y字). Presuming you'd like to ban e-mail地址 but not Y字, how can the vote be worded, so that it does that? - -sche (discuss) 06:03, 29 September 2011 (UTC)Reply
@-sche, don't get me wrong, mixed scripts are perfectly normal, like the ones you listed and many more, eg. AA制 (ēi'ēi zhì). Karaoke can only be written as 卡拉OK in Mandarin. I'm talking about proper nouns, I don't want mislead users to believe that Oslo is Oslo市 in Chinese, even if you find examples of usage. I have seen a Chinese map of Australia on a Chinese site on the internet where ony biggest cities were translated into Chinese. A user like Engirst would start quoting the untranslated names as Mandarin, which is wrong.
@Jamesjiao, sorry you lost me, I don't know what you mean. Could you rephrase it, please?--Anatoli 06:20, 29 September 2011 (UTC)Reply
I was just comparing the analogy of using Japanese hiragana/katakana in English (esp. among Japanophiles) with the use of English (or other Latin script based languages) words in Chinese (due to trendiess probably?). This might not be a perfect analogy, but it's a start. You will also find that people are more inclined to use Latin characters in, especially for Proper nouns when using a computer keyboard (as opposed to handwriting). I also mentioned the fact that monolingual Chinese speakers wouldn't understand a mixed construction like this. JamesjiaoTC 06:45, 29 September 2011 (UTC)Reply
Oh another thing is pronunciation. For a word to exist in a language, there has to be a way to pronounce it. I can't imagine a non-English speaking Chinese speaker trying to pronounce Thames河 even if he/she is able to recognize and even pronounce the individual letters. JamesjiaoTC 06:52, 29 September 2011 (UTC)Reply
I definitely don't think that Kana words in English are to be considered English but I haven't seen it, that's why I couldn't understand what you mean. Yes, you're right, most Chinese speakers wouldn't have a clue how to pronounce Thames河 or Seine河, Hudson河 or Volga河. --Anatoli 09:47, 29 September 2011 (UTC)Reply

There is no only one standard for Chinese language. Chinese is not only for Mainland China, but for Taiwan, Hong Kong, Macau, Singapore and overseas. Such as President Bush is written as 布什, 布殊 and Bush as well. 2.25.212.4 13:02, 30 September 2011 (UTC)Reply

In which part of the world is the standard Chinese name for Bush "Bush"? 60.240.101.246 13:13, 30 September 2011 (UTC)Reply
There is no only one standard. A dictionary just record the words exist. 2.25.212.4 14:09, 30 September 2011 (UTC)Reply

Wow, I get such a strong sense of déjà vu here... Engirst, do you have any original arguments? Your points above have been refuted. As noted elsewhere:

  1. we already have a record of Thames and a record of ();
  2. using a term from one language in a sentence of another language may represent w:code-switching instead of borrowing;
  3. there is nothing intrinsically Chinese about Thames;
  4. the use of Thames in Thames河 is an example of an English term used as an English term in a Chinese context;
  5. the use of Thames in Thames河 is a collocation of two independent terms;
  6. as a non-idiomatic sum-of-parts phrase, Thames河 fails WT:CFI, just as yellow sweater or tasty kumquat fail WT:CFI for the same reason.

So, to extrapolate a basic list of criteria for including any word from Language A under the heading for Language B, not just proper nouns:

  1. Is the term used in Language B to convey any meaning that is different from its meaning in Language A?
  2. Alternately, is the term used widely enough in Language B that most speakers and/or readers of Language B should be expected to know and readily use the term?

Well, that's it, actually. I can't think of any other solid reasons for including a term from one language under the heading for another language. Use in Language B does not necessarily mean that the term has been adopted into that language. As soon as the term is used as Language B, i.e. where it has some meaning that is specific to that language or where it is well-known and widely used, then I am happy to advocate listing under both Language A and Language B headings. -- HTH, Eiríkr Útlendi | Tala við mig 23:04, 30 September 2011 (UTC)Reply

Your list seems good for the vote. I suggest to add the Mandarin romanisation entries, like Thames Hé vs Tàiwùshì Hé, the former falls into the same category. --Anatoli 21:43, 2 October 2011 (UTC)Reply
This is a very comprehensive list. Code-switching is what I had in mind, but I couldn't remember the term at the time. Code-switching occurs extremely often in Taiwan, not just between Mandarin and English, but Japanese, Korean and even their local flavour of Hokkien dialect as well. I often see short Japanese phrases like かわいいね。。。 in Taiwanese online blogs mixed in with Chinese characters. This is a very typical case of code-switching in writing. JamesjiaoTC 02:06, 5 October 2011 (UTC)Reply
The vote to ban this kind of entries is set up here. Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. --Anatoli 01:05, 3 October 2011 (UTC)Reply
Not being a speaker of Mandarin or Japanese, I have a question which might help to clarify the issue for those in a similar position. Which of the following example in English best equates to "Thames河" in Mardarin: "résumé" (a French word, wholly adopted but retaining glyphs which are not properly in the English alphabet), άλφα (a Greek word which, when used, is italicized to indicate that it is from a different language), or something completely different? I do think it might be a bit early for voting, since in all of the discussions around this topic I have only seen 5 or 6 contributors. - [The]DaveRoss 02:37, 5 October 2011 (UTC)Reply
In answer to your question: this is like Москва#English (a foreign word, which indicates that it is from a different language by being in a different script). - -sche (discuss) 03:41, 5 October 2011 (UTC)Reply
TheDaveRoss, it's only one user, not many (who creates/recreates them), trust me, with different IP's. The issue at hand is that this user claims that "Thames河" - English "Thames" + (river) is a Mandarin word, citing examples from books. Note that river names are always followed by or other similar words in Mandarin. There are other examples where foreign names are written in Mandarin without translating, showing the foreign name in the original script. My argument is that the Chinese word for Thames is 泰晤士河 (Google Books -3,150 hits) and there is no reason to include the SoP term Thames河, there is nothing Chinese in Thames. The rule and common practise is transliterate/translate people's names cities, etc. no matter how small. There are borrowings into Mandarin, very few have also a few Roman letters (三K黨 / 三K党 Ku Klux Klan) but writing full names in Roman letters is a case of code-switching. OK#Mandarin is a common noun, not a proper name, it has become partially naturalised. Like any other language, Mandarin uses native script to write words, using other scripts when it absolutely has to. "London市" or "Hyde公园" are not exceptions, they are case of code-switching (simply Chinglish) - correct and common terms - "伦敦", "海德公园". The issue is not just Mandarin specific. Some argue that bluetooth should be the right way to write the word in Russian. A similar situation could arise for Japanese, Russian, Hindi or Korean, Arabic, others, where people insert Roman letter names. I believe these names don't become naturalised. I hope expressed myself well. If a word in Roman becomes naturalised, then we can include them, still discussing pizza#Mandarin (a common word). --Anatoli 03:07, 5 October 2011 (UTC)Reply

Pinyin with no tra or sim edit

Is there any sensible way to find these? I have been speedy deleting some of these; given that {{pinyin reading of}} links to the tra and sim, it seems reasonable. For example we don't allow plurals that don't have a singlular ({{plural of|xyz}} when xyz doesn't exist yet). If anyone wants to create Hanzi entries for these, then recreate the pinyin, it is with my blessing. Mglovesfun (talk) 12:27, 2 October 2011 (UTC)Reply

I don't understand what you said. Engirst 12:40, 2 October 2011 (UTC)Reply
He is saying that we don't allow a plural form entry for English words when the singular form does not yet exist. He is asking if that also means that we shouldn't have the pinyin form when the traditional or simplified Mandarin forms do not yet exist. He has been deleting them when he sees them. - [The]DaveRoss 02:39, 5 October 2011 (UTC)Reply
I think Engirst considers the character entries too complex and is not worth his time creating. I digress. There is in fact here: vote (That a pinyin entry, using the tone-marking diacritics, be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling.). It doesn't however explicitly exclude pinyin entries when there are no character entries present. Maybe the wording can be change to something like: That a pinyin entry, using the tone-marking diacritics, only be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling.. JamesjiaoTC 02:47, 5 October 2011 (UTC)Reply
Sounds like a reasonable suggestion. There's not enough resources validating Romanisation entries (SoP, attestability, etc. let alone the Chinese characters - often one version is omitted). Not sure how this can be done but I support voting on this. Maybe Engirst will start creating some Chinese character entries before adding pinyin? (wishful thinking) --Anatoli 03:11, 5 October 2011 (UTC)Reply
I'm OK with users adding valid pinyin (attestable / with correct tone-markings) without adding hanzi, I'm also OK with users creating valid plurals (attestable) without creating the singulars... we allow that on de.Wikt, we even have bots to create forms without regard to the presence of the lemmata, because in that way, a user who looks up the form or the pinyin will at least have a bit of information, better than nothing. Having said that, I think all of you, as the active Chinese editors, could form a consensus and agree that you interpret the vote as requiring hanzi to exist first (this is how I always interpreted the vote), and delete pinyin entries that have no hanzi form, without having a new vote. - -sche (discuss) 03:36, 5 October 2011 (UTC)Reply
Seems like without any vote, nothing can be achieved in Mandarin space, most active Chinese editors (except for this user) all disagree with Engirst (he may now be avoiding his own user account) but changing or deleting his entries causes edit wars or someone may think he is just being bullied. --Anatoli 03:55, 5 October 2011 (UTC)Reply
Just as another reference for comparison --
If I understand it correctly, the current policy for Japanese entries is to have the main entry with most of the information located under the kanji headword when there is one, or under the kana headword otherwise, and for the romaji (Japanese pinyin, as it were) entries to *only* serve as disambig pages pointing users to the relevant other headwords. Consequently, romaji entries should not have any "See also", "Derived terms", "Usage notes", or other headings. The kōgai (kōgai) entry is a good example of this in action. -- Eiríkr Útlendi | Tala við mig 04:56, 5 October 2011 (UTC)Reply
Pinyin romanisation rules went further - parts of speech are not allowed but we do have many pinyin entries without hanzi. --Anatoli 00:23, 7 October 2011 (UTC)Reply

Are the tone-markings on these words correct? edit

Talk:Nèi Ménggǔ, Talk:Ménggǔ. (Other editors: feel free to list entries in this section if you doubt they have correct tone-markings. It should be helpful to have a single place to gather them for cleanup. If there is such a place already, other than the clogged WT:RFC page, please move these there.) - -sche (discuss) 12:01, 3 October 2011 (UTC)Reply

It's Nèi Měnggǔ and Měnggǔ. --Anatoli 12:51, 3 October 2011 (UTC)Reply
Your examples show the tone sandhi where the original third tone is pronounced as second in front of another third tone but it's usually not reflected in pinyin romanisation. --Anatoli 12:54, 3 October 2011 (UTC)Reply

Wakie-wakie, the vote is on. --Anatoli 04:50, 19 October 2011 (UTC)Reply

Mandarin part of speech template edit

Templates like {{cmn-noun}} allow p for pinyin as a first parameter. This should be phased out. There's an effort to remove all pinyin from part of speech categories and have them only in Category:Mandarin pinyin and subcategories, at some point the templates will have to follow suit, though we're months away from being ready. So this is a heads up. --Mglovesfun (talk) 21:22, 10 November 2011 (UTC)Reply

But this parameter serves the same purpose as tr - transliteration and the hyperlink allows to see if there are other hanzi with the same pinyin. I have no strong opinion on your suggestion at the moment.
I've been checking your list at User:MglovesfunBot/cmn-parts-of-speech-Latn, as you have noticed. It's quite big, very time consuming, inviting other Sinophone editors to join the effort. If the entry' hanzi are red-linked, it can be deleted, rather than converted. Sometimes I also leave entries if they only have a Japanese but no Mandarin entry (planning to add them later). --Anatoli 21:54, 10 November 2011 (UTC)Reply
{{cmn-noun|p}} is used for Mandarin nouns in the Latin script. Since we no longer use {{cmn-noun}} (cmn-adj, adv, abrr, etc.) for pinyin entries. Something like {{cmn-noun|ts|pin=fú}} will still work! --Mglovesfun (talk) 18:37, 11 November 2011 (UTC)Reply
I misunderstood, sorry, I was thinking about pin parameter. Can you give an example, please? --Anatoli 21:47, 13 November 2011 (UTC)Reply

Category:Mandarin Wade-Giles edit

Err, when did we approve Wade-Giles transliterations for inclusion? I can kinda understand Pinyin, but this? -- Liliana 15:23, 10 December 2011 (UTC)Reply

Thanks. Binned. --Anatoli (обсудить) 03:20, 11 December 2011 (UTC)Reply

Audio files edit

See commons:Commons:Village_pump#Category:Chinese_pronunciation. Mglovesfun (talk) 13:00, 19 February 2012 (UTC)Reply

I have some doubts about your request. The main reason being many homophones, and then the request should also specify if we want jiantizi, fantizi or both (there are variant characters) too. The conversion is far from straightforward. Perhaps, using audiofiles based on toned pinyijn was the right choice, even if it's more complicated to use bots to add audio files to hanzi entries. I see some of audio entries miss tone marks. --Anatoli (обсудить) 21:43, 19 February 2012 (UTC)Reply
I think the audio files should stay at the pinyin filenames, because if I am not mistaken, multiple characters with the same pinyin romanization X have the same pronunciation. Giving the file a pinyin filename allows it to be uploaded to all characters that have pinyin X. It seems easier to write a bot to do that, than to host the same file under dozens of names. - -sche (discuss) 21:48, 19 February 2012 (UTC)Reply

{{commonsrad}} edit

Note: the title of this section was previously {{Commonsrad}}.

Sarang (talkcontribs) has created {{Commonsrad}}, and would like me to run a bot that will add it to all entries and indices for radicals (e.g. and Index:Chinese radical/一). Does everyone agree that this should be done? —RuakhTALK 15:38, 11 April 2012 (UTC)Reply

If it's going to be bot-added, there is no harm in giving it a clearer name first. Maybe {{Commons radical}}? —CodeCat 16:29, 11 April 2012 (UTC)Reply
If Commonsrad seems not clear enough, I have no objections to give the name 5 bytes more — the data space of 2000 bytes more won't mind either. I chose a name close to {{Commonscat}} because it is very similar to it. In fact, Commonsrad can told a variation of Commonscat but with a display better suited for its usage, and the possibility for easy expansion whenever wanted. If then it may be used to link non-radical Chinese glyph Wiktionary pages to their Commons categories, Commonsrad is not so misleading than a clearer descriptive name like {{commons radical}}. -- sarang사랑 18:09, 11 April 2012 (UTC)Reply
It seems to be a use at Wiktionary to have template names with lower case initials (with upper case redirects)? Another question to decide! -- sarang사랑 05:48, 12 April 2012 (UTC)Reply
I'm not exactly sure from the description what the template will do, but it looks harmless enough. -- A-cai (talk) 22:26, 17 April 2012 (UTC)Reply
Template has been moved to {{commonsrad}}, hence the red links above. Mglovesfun (talk) 22:28, 17 April 2012 (UTC)Reply

Baxter-Sagart edit

I'm not really active in this project, but I did add a new appendix, Appendix:Baxter-Sagart Old Chinese reconstruction. It's referenced, and the table data is programmatically generated from the reference data with a program whose source code I also made available. I hope this in some way can be of help. - Gilgamesh (talk) 22:57, 31 May 2012 (UTC)Reply

葡文 edit

If someone knowledgeable could check that the pronunciation and pinyin of [[葡文]] are correct, it would be appreciated. :) - -sche (discuss) 21:00, 26 December 2012 (UTC)Reply

Thanks for that. Does "(written)" mean that 葡文 refers to written Portguese, or that 葡文 is {{literary|lang=cmn}} and mostly used in written Chinese and not in spoken Chinese? - -sche (discuss) 21:52, 26 December 2012 (UTC)Reply
It refers to written Portuguese (normally). 葡萄牙語 / 葡萄牙语 (Pútáoyá yǔ) and 葡萄牙文 (Pútáoyá wén) are more common words. The suffix / (yǔ) more commonly refers to the spoken and (wén) to the written language. --Anatoli (обсудить/вклад) 22:34, 26 December 2012 (UTC)Reply
Ah, interesting! - -sche (discuss) 00:01, 27 December 2012 (UTC)Reply

, 𡰪 edit

The transliteration and four-corner number, respectively, of these characters were tagged {{fact}}; can anyone verify them? they and the Japanese character (the On-reading of which has been questioned) are the last remaining Han characters tagged {{fact}}. - -sche (discuss) 00:01, 27 December 2012 (UTC)Reply

Toneless pinyin usage notes edit

Currently, our toneless pinyin entries all have a usage note at the bottom which says:

  • English transcriptions of Chinese speech often fail to distinguish between the critical tonal differences employed in the Chinese language, using words such as this one without the appropriate indication of tone.

I don't have much of a problem with it (although maybe "Chinese" should be changed to "Mandarin"), but I realized that if we do want to change it, it will be somewhat difficult, and some of them may be edited and fall out of synch. To solve that, I propose that we create a template called {{cmn-toneless-note}} or something similar and ask an editor with an AWB account to change all instances of the text into a template call. What do you guys think? —Μετάknowledgediscuss/deeds 19:13, 6 January 2013 (UTC)Reply

Support. - -sche (discuss) 19:43, 6 January 2013 (UTC)Reply
Support. Also, "using words" should probably be "writing syllables". (We don't have toneless-pinyin entries for whole words, only for individual syllables.) —RuakhTALK 20:28, 6 January 2013 (UTC)Reply
Well... sort of. On one hand, you are correct that this is only used for specific syllables, but OTOH the syllables are words, in the loose Chinese way of looking at what constitutes a word. (One Chinese man was trying arduously to convince me that all words in Mandarin are one syllable long. I was unsuccessful in my attempts to get him to revise his native definition of what a word is to the Western linguistic concept.) Incidentally, the entries (like nu#Mandarin) also point to forms like , which not only is marked for tone but also has a different vowel, and perhaps the note should reflect that. (Of course, I'm not sure how useful that is anyway, because when my friends don't have access to the character , they type nv3, not the equally inaccessible diacritic form.) —Μετάknowledgediscuss/deeds 21:13, 6 January 2013 (UTC)Reply
Well, if our goal were to conform to "the loose Chinese way of looking at" their languages, then we'd treat all of them as dialects of a single language. It isn't, so we don't. By most linguistically-well-informed accounts, the vast majority of Mandarin words are bisyllabic. —RuakhTALK 22:50, 6 January 2013 (UTC)Reply
I personally find your comment rather arrogant and disparaging. 129.78.32.21 04:36, 10 January 2013 (UTC)Reply
I don't find it arrogant but one needs to know Chinese (also Vietnamese, Thai, etc.) are traditionally called monosyllabic as all or almost all polysyllabic words are made of component words, exceptions are phonetic transription, characters that have lost their meaning over the time but it's less of a case with Mandarin. --Anatoli (обсудить/вклад) 04:44, 10 January 2013 (UTC)Reply
I was referring to the "dialect/language" comment, where he regarded "we" as identical to himself in having the personal stance of considering "Chinese is not a single language" to be false. It is a language, by Wikipedia at least. 129.78.32.21 05:04, 10 January 2013 (UTC)Reply
Views on this differ but I agree that Chinese topolects are more like dialects than separate languages, even if they may not be mutually comprehensible when spoken, quite different on the written level, they are often closer than dialects of other languages (provided they are written the Chinese way, using hanzi, not Roman, Cyrillic, Arabic or other scripts). Wiktionary treats Chinese topolects differently as per language headers but translation are all nested under "Chinese", e.g. Chinese/Mandarin, Chinese/Cantonese, etc. --Anatoli (обсудить/вклад) 05:13, 10 January 2013 (UTC)Reply

Please note that full words in toneless pinyin were explicitly forbidden by votes and almost unanimous agreements, it happened before Metaknowledge became active. --Anatoli (обсудить/вклад) 22:54, 6 January 2013 (UTC)Reply

So do you support this? —Μετάknowledgediscuss/deeds 06:03, 9 January 2013 (UTC)Reply
Yes, Support. --Anatoli (обсудить/вклад) 04:31, 10 January 2013 (UTC)Reply
Erm... so do any of you AWBers/botters want to actually do it? —Μετάknowledgediscuss/deeds 04:59, 10 January 2013 (UTC)Reply
Delete all pinyin, whether toned or not. Move it to Appendix at least. It is merely a transcription scheme, not even official orthography. 129.78.32.21 05:06, 10 January 2013 (UTC)Reply
It doesn't work this way. IP users (anonymous) with no or little contributions have little influence and structure is decided after discussions, votes, etc. Entries in Category:Mandarin pinyin do not claim they are proper writing, they are a helpful tool for users to help them find hanzi entries. They have limited information, all information is contained in hanzi entries. Compare bàoyuàn and 抱怨 (bàoyuàn). --Anatoli (обсудить/вклад) 05:19, 10 January 2013 (UTC)Reply
I knew they contain limited information. Still, they should not exist in the main namespace. This is a dictionary, much more specific than a "tool". The search function is sufficient in directing users to character entries for polysyllabics. With the monosyllabics a link to an Appendix page is all that is necessary. Keeping everything in the main namespace is unworthily energy-consuming. 60.240.101.246 06:40, 10 January 2013 (UTC)Reply

Proposal to change topical categories for Mandarin to match other languages, sort by pinyin, not radical edit

See Wiktionary:Beer_parlour/2013/April#Some small changes to Mandarin (also Cantonese, Min Nan) entry structure and about topic categories - suggestion. --Anatoli (обсудить/вклад) 00:20, 11 April 2013 (UTC)Reply

Chinese entries with vowelless pronunciations edit

The pronunciation transcriptions in the following entries do not list vowels, though I suspect they should:

  1. 妒嫉
  2. 积累
  3. 積累
  4. 妓男
  5. 喊叫
  6. 水汽
  7. 冷靜
  8. 坚固
  9. 堅固
  10. 记住
  11. 記住
  12. 评价
  13. 評價
  14. 相机
  15. 相機
  16. 即将
  17. 即將
  18. 前门
  19. 前門
  20. 经历
  21. 經歷
  22. 金牌
  23. 决不
  24. 決不
  25. 绝不
  26. 絕不
  27. 告罄
  28. 模具
  29. 顺其自然
  30. 順其自然
  31. 布拉吉

- -sche (discuss) 23:22, 23 May 2013 (UTC)Reply

How did you find them, at random or you have a script for that? User:Tooironic used to add IPA but he is less active now, User:Wyang has developed an entry creation template - Template:cmn new, which also generates the IPA, so for 积累, the IPA is /t͡ɕi⁵⁵ leɪ̯²¹⁴⁻²¹⁽⁴⁾/. My preference is to delete the IPA altogether (replace with {{rfp}}, rather than showing the wrong info. --Anatoli (обсудить/вклад) 23:36, 23 May 2013 (UTC)Reply
I found them at random(ish). I used WP:AWB to find entries containing deprecated IPA characters, and happened to notice that in addition to containing deprecated characters, all of these entries also lacked vowels. - -sche (discuss) 23:47, 23 May 2013 (UTC)Reply

Unified Chinese vote edit

Wiktionary:Votes/pl-2014-04/Unified Chinese is starting tomorrow. --Anatoli (обсудить/вклад) 00:45, 28 March 2014 (UTC)Reply

Capitalisation of demonyms and language names - a mini-vote edit

Hi,

@Tooironic, @Jamesjiao, @Kc_kennylau, @Wyang

Demonyms and language names are common nouns in Chinese. I suggest to use lower case for pinyin and no space, even if dictionaries are inconsistent. Please vote below and invite anyone who might be interested. So, for example: For 中國人中国人 (Zhōngguórén) - zhōngguórén, 中文 (zhōngwén) - zhōngwén, not Zhōngguórén/Zhōngguó rén and Zhōngwén.

Rationale: they are nouns and automatic pinyin generation makes them in lower case, Japanese has already implemented this. --Anatoli (обсудить/вклад) 00:45, 8 May 2014 (UTC)Reply

Support
  1.   Support Use lower case, common nouns (not proper nouns), spell pinyin without a space for most demonyms and language name --Anatoli (обсудить/вклад) 00:45, 8 May 2014 (UTC)Reply
  1. (material that could be used for the 'support' position) Xiandai Hanyu Guifan Cidian (现代汉语规范词典) doesn't use capitalization in Pinyin (but no one uses or even knows about the existence of this dictionary). Taiwan's 教育部國語辭典簡編本 & 教育部重編國語辭典修訂本 don't give capitalized Pinyin, but they also have spaces between every syllable and don't use 隔音符號隔音符号 (géyīn fúhào). --Geographyinitiative (talk) 00:01, 27 May 2019 (UTC)Reply
Oppose
  1.   Oppose The official instruction is to use capital letters and spaces. See w:Pinyin#Capitalization and word formation. --kc_kennylau (talk) 09:00, 8 May 2014 (UTC)Reply
    I don't mean place or personal names. It's about languages and demonyms--Anatoli (обсудить/вклад) 09:08, 8 May 2014 (UTC)Reply
    They're just names anyways. Do you capitalize the word English? --kc_kennylau (talk) 09:56, 8 May 2014 (UTC)Reply
    I do, in English but nihongo or nihonjin is not capitalised. Russian, Finnish doesn't capitalise those. French only capitalises demonyms, not languages. It can go both ways with language names and demonyms, dictionaries have one or the other way. That's why this discussion. --Anatoli (обсудить/вклад) 10:20, 8 May 2014 (UTC)Reply
    Okay, please find me examples of both cases, and I'll switch to abstain (I'm so lazy). --kc_kennylau (talk) 10:28, 8 May 2014 (UTC)Reply
  1.   Oppose Xiandai Hanyu Cidian (现代汉语词典) uses capitalization in Pinyin (and the dictionary is considered the standard dictionary of Mandarin Chinese in Mainland China). It's the normal practice to use capitalized Pinyin forms in Mainland China. The 1996 & 2012 versions of 汉语拼音正词法基本规则 both say location names should be capitalized. Also, 臺灣閩南語常用詞辭典 has capitalized POJ. --Geographyinitiative (talk) 00:01, 27 May 2019 (UTC) (modified)Reply
    @Geographyinitiative: Can you make up your mind if you support or oppose this proposal? You voted twice on the same day. It's illegal (your vote won't count) and supporting both options is not part of this vote but you can comment or abstain. Also, please check the topic of the vote. This is about capitalisation of Mandarin pinyin only and only for demonyms e.g. 中國人中国人 (Zhōngguórén) and language names e.g. 中文 (zhōngwén). Country, city names, etc. are not part of this vote - they are capitalised. --Anatoli T. (обсудить/вклад) 01:01, 27 May 2019 (UTC)Reply
    Whoops, I see what you are saying. But anyway, here's my response: 汉语拼音正词法基本规则 (2012) 6.3.3 says "专有名词成分与普通名词成分连写在一起,是专有名词或视为专有名词的,首字母大写。例如:Míngshǐ(明史)Hànyǔ(汉语)Yuèyǔ(粤语)Guǎngdōnghuà(广东话)Fójiào(佛教)Tángcháo(唐朝)"
Xiandai Hanyu Cidian 7 p513 Hànyǔ / p1620 yuèyǔ (but Yuè by itself is capitalized- don't know what's going on there) / p488 No entry for 广东话, but there is one for "Guǎngdōng níngméng" and "Guǎngdōng yīnyuè" / p396 Fójiào / p1273 No entry for 唐朝, but Táng by itself is capitalized
From experience, I assume all these are lowercase in Xiandai Hanyu Guifan Cidian, if you want me to list them out I can --Geographyinitiative (talk) 01:32, 27 May 2019 (UTC)Reply
Just put all the forms of Pinyin out there, standard and not. More fun that way anyway. Document linguistic phenomena. --Geographyinitiative (talk) 07:47, 28 May 2019 (UTC)Reply
I find your comments in this section about "do them all" disturbing. Do you actually realise we're building a dictionary? It's not a playground. I don't think this minivote is going anywhere, anyway. --Anatoli T. (обсудить/вклад) 08:24, 28 May 2019 (UTC)Reply
It's a bit extreme, but I think there is sense to it. We could avoid debate about whether chengyu are hyphenated or whether 不知道 is bu zhidao or buzhidao. And including bu zhi dao AND buzhidao might make searches easier. —Suzukaze-c 08:48, 28 May 2019 (UTC)Reply
Abstain
  1. Don't really have any preference for this as I am generally not interested in Pinyin. Wyang (talk) 01:02, 8 May 2014 (UTC)Reply
    What about proper vs common nouns. Is 普通话 or 美国人 a common or a proper noun? --Anatoli (обсудить/вклад) 09:08, 8 May 2014 (UTC)Reply
    My opinion ALL FORMS (capitalized and not, etc etc etc) of all romanizations that exist in any dictionaries (modern and historical) should be incorporated in some way. --Geographyinitiative (talk) 02:06, 26 May 2019 (UTC)Reply
  2.   Abstain: Survey major dictionaries and 1950-era documents. —Suzukaze-c 07:55, 26 May 2019 (UTC)Reply

Comments edit

I ran into a capitalized pinyin entry today: Lai2 (linked to from ). Capitalized tone-number pinyin like that does look weird to me. Capitalized diacritical pinyin looks less weird. - -sche (discuss) 20:11, 31 May 2014 (UTC)Reply

BOTH/ALL

 
gauntlets being thrown down

Let's incorporate all extant and historical methods (and future ones when they arise) of writing Hanyu Pinyin and every form of Chinese romanization into this dictionary. Why limit ourselves to rigid rules when we could embrace all the many flavors of Chinese romanization? See also my comments in other places on this page.


It's all about respect for the people who use or used the different romanization systems. It's about ensuring that people in the future understand what they are looking at when they read the old books that contain the romanizations that will be 淘汰了. Help the future now by using your position in the present to understand not just the present but also the past.

We need the capitalized and the lowercase versions, the spaced and the unspaced versions, the dashed and the undashed versions. All of it.

No one version is right by itself. Every form is right only when it is put in the context of the other forms. --Geographyinitiative (talk) 10:50, 26 May 2019 (UTC)Reply

I oppose eliminating the capitalized forms, but that doesn't mean I oppose the creation of pages with the lowercase forms. I'm saying do them all. --Geographyinitiative (talk) 01:34, 27 May 2019 (UTC)Reply

Capitalisation and part of speech of month names edit

Related to the preceding topic, capitalisation and part of speech of month names is being discussed at User talk:LlywelynII#Chinese_months_as_proper_nouns. - -sche (discuss) 18:23, 30 August 2016 (UTC)Reply

Transliterations of months should definitely be lower case and common nouns in Chinese. --Anatoli T. (обсудить/вклад) 23:25, 26 September 2016 (UTC)Reply

Single-character entry format edit

@Suzukaze-c Hi. Are you willing and interested in expanding the Wiktionary:About_Chinese#Entry_format section regarding single-character entries, the use of Definitions header and the need for {{zh-hanzi}}, parameters for |cat= in {{zh-pron}}? --Anatoli T. (обсудить/вклад) 23:18, 26 September 2016 (UTC)Reply

I am for most of it, but what about the blurry area of "what belongs on Template:zh-pron/documentation" and "what belongs on Wiktionary:About Chinese"? (For example, how much do we write about |cat= on each page, etc.) —suzukaze (tc) 23:38, 26 September 2016 (UTC)Reply
This is a policy document. That's the difference. It's OK to duplicate a bit and link to Template:zh-pron/documentation for detail. --Anatoli T. (обсудить/вклад) 23:47, 26 September 2016 (UTC)Reply
@Atitarev Wiktionary:About_Chinese#Entries_for_single_characters. What do you think? —suzukaze (tc) 03:49, 27 September 2016 (UTC)Reply
Looks good, thank you! The document can be tweaked over time but it's a good start.--Anatoli T. (обсудить/вклад) 03:52, 27 September 2016 (UTC)Reply

Obsolete policies on Middle and Old Chinese edit

==Historical languages==
{{wikipedia|History of the Chinese language}}
{{wikipedia|Historical Chinese phonology}}

Historical Sinitic languages include the spoken languages {{w|Middle Chinese}} (ltc) and {{w|Old Chinese}} (och), the written language {{w|Literary Chinese}} (lzh), and the protolanguage {{w|Proto-Sino-Tibetan}}. Entries for words in these languages are used, except for Proto-Sino-Tibetan, which is a protolanguage and thus in the Reconstruction namespace. These terms can also appear in etymologies for entries in modern Sinitic languages, and in entries for languages that have borrowed from Chinese, notably Japanese, Korean, and Vietnamese.

Finer distinctions are possible, such as Late Middle Chinese and Early Middle Chinese for the spoken language, and Literary Chinese versus earlier Classical Chinese for the written language. These distinctions can be made in the text of etymologies, but these do not have ISO 639 codes, and thus are not used for level 2 headings.

The precise meaning and status of these “languages” is complicated: narrowly speaking “Middle Chinese” and “Old Chinese” refer to various phonological reconstructions, notably based on rime dictionaries, and do not necessarily refer to a specific historical dialect or common language. Nevertheless, they are useful designations for historical periods.

Most modern Sinitic languages descend from Middle Chinese, with the notable exception of Min, which diverged earlier, with Proto-Min also descending from Old Chinese; see [[w:Historical Chinese phonology#Branching off of the modern varieties|branching of modern varieties of Chinese]]. A notable example of this difference is {{m|zh|茶}}, from which English {{m|en|tea}} is from Min and {{m|en|chai}} is from other Chinese.

Literary Chinese is significantly different from the spoken languages; this may be compared with Medieval Latin versus Romance languages. Literary Chinese (lzh) is the correct source language for literary terms in modern Sinitic languages, notably {{w|chengyu}} ({{w|four-character idiom}}s), and in borrowings such as the corresponding Japanese {{w|yojijukugo}}.

===Middle Chinese===
{{wikipedia|Middle Chinese}}

As Middle Chinese phonology is not attested (it is only reconstructed), please be sure to mark pronunciations with *.

===Old Chinese===
{{wikipedia|Old Chinese}}
{{wikipedia|Old Chinese phonology}}

As {{w|Old Chinese phonology}} is not attested (it is only reconstructed), please be sure to mark pronunciations with *. As sources differ, please carefully cite specific references (author and year) for any reconstructions.

Obsolete policies on cognates and stubs edit

==Cognates and stubs==
Across Sinitic languages, a single written form is very frequently shared across a long historical period and wide geographical area. Thus cognate entries in different languages appear on the same page; this occurs quite frequently for cognates in closely related languages in other scripts, but to nowhere near the same degree as in Sinitic languages. Due to this, it is generally unhelpful, and possibly incorrect, to create an entry for one Sinitic simply by copying the heading and definitions for Mandarin. It is unhelpful because this adds no information beyond which a reader could themselves guess (cognate so probably the same meaning), and possibly incorrect because words do differ between these language; blindly copying without a reference is not reliable.

Thus, when creating a new Sinitic entry, please try to add ''some'' information distinctive to the particular language, particularly pronunciation, references, or citations.

For etymologies, each entry should include an Etymology section indicating its immediate ancestor term. For native words in modern Sinitic languages this is either Middle Chinese (most) or Proto-Min (thence Old Chinese) for Min languages. Per usual practice (see [[Wiktionary:Etymology]]), it is acceptable to include full etymologies back to Proto-Sino-Tibetan in modern entries. However, unless there is something specific to the etymology of a term in a given language, this is tedious to repeat for all modern languages. It is thus preferred (and sufficient) to only include the full history at representative languages, namely Mandarin and Min Nan (most used in each branch), with other languages just indicating the immediate predecessor and having a link reading “more at Mandarin/Min Nan”.

Similarly, it is tedious and not helpful to list contemporary cognate terms ''unless'' some particular relationship or contrast is being given. Instead, ancestral relationships can be given both backwards (in the Etymology section), to Middle Chinese, Old Chinese, and Proto-Sino-Tibetan, and forwards (in the Descendents section), from Middle Chinese, Old Chinese, and Proto-Sino-Tibetan to later forms. In these Descendents sections, listing pronunciations of descendent terms along with the spelling allows easy comparison, and avoids the duplication of the same listing in all modern forms. These are more useful than sibling relationships between cognates.

New font for Chinese? edit

@Justinrleung, Suzukaze-c Is it just me who feels the font for Chinese is not as pretty as Japanese? I updated my Mac and it has become even uglier. It lacks the 'weight' (is this the correct term?) in comparison. For example, - even the cangjie input looks prettier than the Chinese font. Thoughts? (Disclaimer: I know nothing about fonts...) Wyang (talk) 06:25, 12 October 2016 (UTC)Reply

I don't know what it looks like on a Mac, nor what fonts are available on a Mac... —suzukaze (tc) 06:34, 12 October 2016 (UTC)Reply
Some screenshots: Hani, Hant and Hans. The Hani font can perhaps be improved... edit: screenshot of the zh-ja comparison. Wyang (talk) 06:44, 12 October 2016 (UTC)Reply
What does this look like? —suzukaze (tc) 07:20, 12 October 2016 (UTC)Reply
It looks like this. I feel that all the ones below are more aesthetically pleasing. Wyang (talk) 07:25, 12 October 2016 (UTC)Reply
Here are how they look on my Mac on three browsers. I think I've changed my browser's font settings, so I don't have the problem that you have. — justin(r)leung (t...) | c=› } 07:36, 12 October 2016 (UTC)Reply
It looks like the browser default (and thus probably the best choice) is "PingFang SC". SimSun seems to be imposed on readers by MediaWiki:Common.css. (man, there are some questionable font choices there...) —suzukaze (tc) 07:41, 12 October 2016 (UTC)Reply
Code2000? That's the last font I (or anyone) would want for Chinese. And why would the generic sans-serif be put first? — justin(r)leung (t...) | c=› } 07:47, 12 October 2016 (UTC)Reply
(holy shit someone shares my hatred for code2000) It's also weird how the fonts for .Hans and .Hant are defined a second time later on. —suzukaze (tc) 07:48, 12 October 2016 (UTC)Reply
I think it may have been me who messed it up before (羞慚). Any recommendations on what the
/* Chinese (Han) */
block should be changed to? Wyang (talk) 08:32, 12 October 2016 (UTC)Reply
Maybe this:
/* Chinese (Han) */

/* Hani: generic */
/* Hans: simplified */
/* Hant: traditional */

.Hani,
.Hans {
	font-family: PingFang SC, Heiti SC, DengXian, Microsoft Yahei, SimHei, Source Han Sans CN, Noto Sans CJK SC, SimSun, NSimSun, SimSun-ExtB, Song, sans-serif;
}
.Hant {
	font-family: PingFang TC, Heiti TC, Microsoft Jhenghei, Source Han Sans TW, Noto Sans CJK TC, PMingLiU, PMingLiU-ExtB, MingLiU, MingLiU-ExtB, Ming, sans-serif;
}

.Hani,
.Hans,
.Hant {
	font-size: 1.2em;
}

.Hani, .Hani *,
.Hans, .Hans *,
.Hant, .Hant * {
	font-style: normal;
	font-weight: normal;
}

big.Hani, strong.Hani, b.Hani, b .Hani,
big.Hans, strong.Hans, b.Hans, b .Hans,
big.Hant, strong.Hant, b.Hant, b .Hant {
	font-size: 137%;
}

.Hani b,
.Hans b,
.Hant b {
	font-size: 125%;
}
suzukaze (tc) 01:58, 13 October 2016 (UTC)Reply
Ooohhh, I like this. It definitely looks better and more solid than before. If no one objects, we will change it to this until someone proposes an improvement. Wyang (talk) 07:49, 13 October 2016 (UTC)Reply

Simplified Chinese in all templates and modules edit

@Wyang, Justinrleung, Suzukaze-c, Tooironic, Kc kennylau, Bumm13

I think we should stick to the promise of providing simplified Chinese in all templates, modules. The dialectal data tables currently don't show simplified forms. Do people think we need to cater for that? I understand this will be formatting and other work involved but simplified Chinese users shouldn't feel neglected. --Anatoli T. (обсудить/вклад) 09:31, 14 October 2016 (UTC)Reply

Yeah, it is disabled for now. Displaying both made the table look very cluttered. I was thinking about developing a js switch for all Chinese entries, allowing the user to choose trad/simp in all Chinese texts (zh-l, zh-x, zh-der, zh-dial, etc.). Wyang (talk) 11:01, 14 October 2016 (UTC)Reply
But that will only work for registered users. How about we have the simplified characters display as ruby, like this: , ? (We might want to increase the size of the ruby.) — justin(r)leung (t...) | c=› } 16:36, 14 October 2016 (UTC)Reply
The switch may be a dropdown underneath the ==Chinese== header, similar to how this page hides the romanisation on a click. The Ruby method is potentially good too, if we can increase the size and align them well, though making links may be more complicated. I think User:Suzukaze-c was trying to write some sort of gadget for this some time ago, but I can't find it now. Wyang (talk) 21:21, 14 October 2016 (UTC)Reply
Why not just display 我們我们 with a suppressed romanisation? The columns may need to get wider and care should be taken to have correct conversions with the ability to override. What does everybody think? --Anatoli T. (обсудить/вклад) 02:53, 15 October 2016 (UTC)Reply
I support the idea of showing simplified Chinese wherever possible and when it doesn't look cluttered. —suzukaze (tc) 05:24, 29 October 2016 (UTC)Reply

Wiktionary:Statistics edit

Ranked first (+4089) when sorted by change in #gloss definitions. Wyang (talk) 03:51, 7 November 2016 (UTC)Reply

Still going strong - number one (+3738) in November 2016. Wyang (talk) 16:44, 13 December 2016 (UTC)Reply
First again (+3450). 再接再厲! (壓力山大) Wyang (talk) 05:33, 4 February 2017 (UTC)Reply
First again (+4868). 再接再厲! 奔向100000個詞。 Wyang (talk) 12:26, 9 April 2017 (UTC)Reply

~州 edit

Are we having entries like 印第安納州, or do we treat them like 上海市 (redirect to 上海?) —suzukaze (tc) 08:16, 11 November 2016 (UTC)Reply

I'd say nah, unless it's an abbreviation, like 安省. Wyang (talk) 09:14, 11 November 2016 (UTC)Reply

Definitions format overhaul edit

Hi all. I'm thinking about overhauling the format of Chinese definitions, by using a templated approach which strictly associates word information (part of speech, synonyms, antonyms, measure words, examples, dialectal equivalents, etc.) with the individual senses. It may be along the lines of User:Wyang/zh-def. I think this is more conducive to the efficient expansion of the Chinese content with more synonyms, antonyms, ... etc. information. What does everyone think about the changes? Wyang (talk) 09:16, 23 November 2016 (UTC)Reply

@Suzukaze-c, Justinrleung, Atitarev, Tooironic, Hongthay, Mar vin kaiser Wyang (talk) 10:20, 23 November 2016 (UTC)Reply

+1, very attractive, but I fear it's too radically different from the standard entry format. —suzukaze (tc) 09:35, 23 November 2016 (UTC)Reply
I think the formatting of Chinese definitions should match the formatting of definitions for other languages. —Granger (talk · contribs) 12:17, 23 November 2016 (UTC)Reply
It looks great, and I know you've worked hard on this, but here are potential problems I see:
  1. Like the others have said, it's too different from other languages.
  2. The wikicode would probably be harder to pick up for new editors. (It'll take me some time to get use to.)
  3. There's a bit of repetition, like putting |pos=part multiple times for 的. Is that something we really want to do?
  4. It would probably take up more Lua memory, which would not be necessary if we keep the current format. — justin(r)leung (t...) | c=› } 13:31, 23 November 2016 (UTC)Reply
Thanks guys. It is a big change, but my feeling is that this sort of sense-synonym/antonym/... integration has to be done sooner or later; there were some calls before (for example User:DTLHS/export, which was referenced in this layout), but no one has really tested doing it. The reason for the integration is that synonyms etc. are only valid on a sense-specific basis, the same as classifiers (which has already been adapted to be sense-specific) and dialectal equivalents. Moedict and Cantodict also do the same.
The code can probably be simplified, such as switching pos to argument 1, and definition to argument 2. The enclosing zh-def template may be omittable too - if we can automatically generate the <ol> ~ </ol> using some css magic. If a Java gadget could be designed to allow GUI edit of the individual senses, while the raw code remains unchanged, that would be the most fantastic. The increase in Lua memory usage seems quite small - I tested with the equivalent current code, which was 18.96 MB, slightly smaller compared to the new version (19.62 MB). A good thing about enclosing senses is that sense ids can be created and used to reference individual senses elsewhere. Bot conversion of the definitions should be reasonably straightforward too. Wyang (talk) 14:24, 23 November 2016 (UTC)Reply
I removed the need for the outer enclosing template, and integrated all the code into a single template. It looks like this:
{{zh-def
|n|[[sugar]]
|syn: 食糖
|ant: 鹽
|x1: {{zh-x|糖尿病|[[diabetes]]}}
|x2: {{zh-x|糖{tong4}水|[[sugar water]]|C}}
|-
|n|[[candy]]; [[sweets]]
|mw: m:塊-“piece”,c:嚿-“piece”
|syn: 糖果
|x1: {{zh-x|棒棒糖|lollipop|C}}
|x2: {{zh-x|糖 食 得 多 冇益。|Eating too much '''candy''' is unhealthy.|C}}
|-
|n|{{zh-alt-form|醣|[[saccharide]]}}
|lb: organic chemistry
|x1: {{zh-x|多糖|polysaccharide}}
}}
where the different senses are separated by |-, and the effect is the same. It should be easier to use now. The memory requirement is slightly reduced in the process: 18.97 MB, nearly the same as the current format (18.96 MB). Wyang (talk) 23:18, 23 November 2016 (UTC)Reply
@Atitarev, Tooironic, Hongthay, Mar vin kaiser: Perhaps the previous ping didn't work correctly. Wyang (talk) 09:32, 24 November 2016 (UTC)Reply
Hi. Sorry, I got the ping but I'm a bit confused. It's a good effort but I agree that this is a radical change with the current format and too different with other languages again. Displaying PoS's in front of definitions (translations) are definitely worth considering. --Anatoli T. (обсудить/вклад) 09:38, 24 November 2016 (UTC)Reply
Thanks Anatoli. I accidentally discovered something I wrote > 2 yrs ago on Talk:一致, and it seems my desire to change the format has been long-standing... The division of definitions by part of speech is really not ideal for analytic and less inflecting languages (努力, 保險, 可能). IMO treating synonyms, antonyms and so forth as belonging to senses is also important, as we add more and more of these see-also-type of words. At the moment (shēng) looks fairly neat (albeit not as clear as if the PoS info were next to the senses), but if I add synonyms, antonyms, see-also terms as in User:Wyang/zh-def#生, the page could become quite confusing. Wyang (talk) 11:22, 24 November 2016 (UTC)Reply
  • As already mentioned above, these are radical changes you are proposing. Since they go to the heart of Wiktionary's layout, you'd be better off seeing if you can carry them out by getting support from other members of the community for ALL languages, not just Chinese. ---> Tooironic (talk) 12:49, 24 November 2016 (UTC)Reply
  • I really have no faith in the Wiktionary community in this. Haiz.
    If we, as the Chinese-editing community, believe that a current practice is unfittingly designed for Chinese, we should strive to achieve what we think is most suitable. I myself only have limited power in making a difference. It's like the opposition to {{zh-pron}} formatting and the Chinese merger before; other people are unfamiliar with this, so unless we adopt what is right, we won't progress efficiently. Wyang (talk) 11:58, 25 November 2016 (UTC)Reply

Sichuanese edit

I'd like to add entries for Sichaunese, but it appears that it doesn't have an ISO code, so one would have to be created. I don't think it should be included under the Mandarin section for zh-pron and other places due to the differences between the two (47.8% lexical similarity and < 60% intellegility) and also there is the sheer number of potentional listings that could be under Mandarin (ie Shandong, Shaanxi, Dongbei etc.) Maybe listing under Southwest Mandarin would be okay though. Most of the coverage would probably be on the Chengdu dialect, but I'm not sure how other dialects, some of which are quite different, would be accounted for.--Prisencolin (talk) 02:13, 10 December 2016 (UTC)Reply

(TBH it's currently impossible to nest it under Mandarin with the current zh-pron code —suzukaze (tc) 03:53, 10 December 2016 (UTC))Reply
It is a variety of Mandarin though - it would make more sense to group it under Mandarin and reorganise the Standard Mandarin tags accordingly. Wyang (talk) 16:44, 13 December 2016 (UTC)Reply
Sichuanese should definitely be nested under Mandarin but I don't have the guts to modify Module:cmn-pron. —suzukaze (tc) 12:47, 14 December 2016 (UTC)Reply
I'm not contesting that it's part of the Mandarin branch, I'm just concerned that at some point we might have over a dozen different entries under "Mandarin" that could appear and it might be a bit disorganized. Why don't we just create a new module for "Southwest Mandarin" and put it under that? It still has "Mandarin" in the name after all. It also has the benefit of being able to group related varieties together in specific subcategories.--Prisencolin (talk) 04:55, 15 December 2016 (UTC)Reply

'The body of this page needs to be updated to explain the new policy' edit

Hi, regarding the message reading 'The body of this page needs to be updated to explain the new policy.', I'd like to know when the update is going to be carried out, or at least where I can read the new policy. Thanks in advance. --Backinstadiums (talk) 15:48, 8 June 2017 (UTC)Reply

@Backinstadiums: I think it's pretty much up to date now. @Suzukaze-c, Atitarev, Wyang, is there any old policy still lingering around on the page? Can we remove that notice? — justin(r)leung (t...) | c=› } 16:02, 8 June 2017 (UTC)Reply
Yes, the notice can go now. I put it there after we moved to the unified Chinese L2 header but the policy described the old standards. Now it's matching what we are doing.--Anatoli T. (обсудить/вклад) 22:15, 8 June 2017 (UTC)Reply
Agreed. Wyang (talk) 23:01, 8 June 2017 (UTC)Reply
The Etymology section is still outdated (I'm not sure how to update it), but otherwise I think it's OK. —suzukaze (tc) 23:08, 8 June 2017 (UTC)Reply
@Suzukaze-c: Could we just remove the part that mentions literary Chinese altogether for now? — justin(r)leung (t...) | c=› } 23:17, 8 June 2017 (UTC)Reply
@Justinrleung I don't know if this policy is a draft. It's official - either endorsed by a vote or unchallenged by the community. The format of soft redirects wasn't endorsed, though but wasn't challenged either. There are still thousands of unconverted Mandarin and Cantonese hanzi entries, which are hard to convert for obvious reasons. Things to discuss are pinyin, jyutping and POJ entries (headers, categories and templates), Cyrillic Dungan and Arabic Xiao'erjing. What to do with topolects without an established writing system and lack of transliteration standards.--Anatoli T. (обсудить/вклад) 23:47, 8 June 2017 (UTC)Reply
@Atitarev: Is there a better template to use? {{policy}} seems too strong, but {{policy-DP}} / {{policy-ED}} seem too weak. — justin(r)leung (t...) | c=› } 23:51, 8 June 2017 (UTC)Reply
@Justinrleung I see your point, thanks. Yes, leave it as is. Another thing we need to do for Chinese (and any language in scriptio continua, including Vietnamese) is to define CFI. Definition of "word" or "some of parts" are not exactly the same in Chinese as with languages with spaces. Even German or Finnish criteria for inclusion differ from English. --Anatoli T. (обсудить/вклад)
────────────────────────────────────────────────────────────────────────────────────────────────────
On that note, I would like to suggest that we relax the part of CFI on personal names slightly, to allow names which are directly found in idioms and set phrases, such as
Wyang (talk) 03:54, 10 June 2017 (UTC)Reply

Looking to improve Wenzhounese coverage edit

I started the outline of an "about" page at User:Prisencolin/wenzhou. Wenzhounese should be distinct enough from other Wu dialects to warrant a page by itself.--Prisencolin (talk) 18:12, 5 July 2017 (UTC)Reply

(See also Template_talk:zh-pron#Wenzhou_dialectsuzukaze (tc) 18:15, 5 July 2017 (UTC))Reply
@Prisencolin, Suzukaze-c: I think the first step is to add Wenzhounese to {{zh-pron}}. We do have an editor from Wenzhou, @Mteechan, so it would be great if we could start adding Wenzhounese to Wiktionary. We need to determine which romanization system we should be using. @Wyang, Atitarev, any thoughts? — justin(r)leung (t...) | c=› } 18:40, 5 July 2017 (UTC)Reply
Wupin, or Wu romanization, the one wu-chinese.com uses will do. Nevertheless, it could be improved to some extent. Mteechan (talk) 18:52, 5 July 2017 (UTC)Reply
I'm still curious as to how irregular the phonology (esp. tone sandhis) is - this will determine the kind of system that would be ideal for use. Wyang (talk) 23:01, 5 July 2017 (UTC)Reply
Well, the tone sandhi is pretty complicated. I've made a lookup table for 2-word sandhi, but it's based on my accent, not the de facto "standard" accent in urban Wenzhou. Other than that, the phonology is not that irregular. Mteechan (talk) 04:38, 6 July 2017 (UTC)Reply
@D.s.ronis has done some work on Wenzhounese on Wikipedia, such as creating Wenzhounese romanisation.--Prisencolin (talk) 06:33, 9 July 2017 (UTC)Reply
Glossika has a Wenzhounese course as well, for those interested. Wyang (talk) 07:42, 11 July 2017 (UTC)Reply

Taishanese and Teochew edit

Taishanese and Teochew now have codes, pursuant to the discussion archived at Wiktionary talk:Language treatment/Discussions#Taishanese_and_Teochew. - -sche (discuss) 07:24, 19 January 2018 (UTC)Reply

Header of non-Chinese script entries edit

Wiktionary:Votes/pl-2014-04/Unified Chinese decided that words written in Chinese characters should be unified to Chinese header. However it also says the formats of templates in words written in non-Han scripts devised specifically for particular topolects above are not the subject of the vote and can be discussed separately if needed.

Sinitic terms (lemma or not) written in non-Han scripts includes:

  1. Pinyin romanization of words
  2. Jyutping romanization of characters
  3. POJ form of words
  4. Cyrillic Dungan
  5. Xiao'erjing words
  6. others, like zhuyin fuhao

There're two different topolect headings to use:

  1. Use the topolect as heading (e.g. Mandarin, Cantonese, Min Nan, Dungan)
  2. Use Chinese as heading for all terms (like this and this)

Also needing point out:

  1. Currently the heading of Pinyin entry is inconsistent (29822 Mandarin header vs 1318 Chinese header; MediaWiki:Gadget-AcceleratedFormCreation.js uses "Chinese")
  2. There're precedent to not use specific dialectal header for terms orthography exclusive to a specific dialect, see Wiktionary:Votes/2011-10/Unified Romanian

I propose to migrate all Sinitic terms (lemma or not) to Chinese header and eliminate any topolect header, to finish unification of Chinese. Any thought? Note this proposal only concerns header and says nothing about category. --Zcreator (talk) 02:24, 4 February 2018 (UTC)Reply

Support. Wyang (talk) 03:06, 4 February 2018 (UTC)Reply
Weak   Oppose, since romanizations like Jyutping are made specifically for Cantonese, unlike hanzi spellings, which can be shared across dialects.
AcceleratedFormCreation.js seems to be using "Chinese" because the accelerated creation links are found under a Chinese header. —Suzukaze-c 07:05, 3 May 2018 (UTC)Reply
(Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Can we agree on some of the existing non-Chinese headers, maybe one at a time? I think pinyin entries could look like this under the Chinese L2 header:
Hanyu Pinyin Běijīng (Zhuyin ㄅㄟˇ ㄐㄧㄥ)
The only difference from the current Běijīng entry would be a different L2 header (Chinese) and a linked name of the romanisation. Since Hanyu Pinyin is only used for Mandarin, it becomes obvious, which lect the romanisation applies to. --Anatoli T. (обсудить/вклад) 07:29, 3 May 2018 (UTC)Reply
My understanding is that unification of Chinese reduces duplication due to the large number of shared written forms across lects.
There is no such concern for romanizations, which are unique to a certain lect, so I think they should not use the "Chinese" header. Min Nan is Chinese, but I am not yet convinced that chai-iáⁿ#Chinese is helpful. I imagine that a "unified Chinese" plan would never have taken place if China used phonetic scripts, and there were no hanzi to "bind" lects together.
Suzukaze-c 08:33, 3 May 2018 (UTC)Reply
Thanks for the response. Let's see what other people think. Converting Min Nan Pe̍h-ōe-jī to Chinese L2 was looked at favourably but not everyone thinks we should have Hanyu pinyin entries in the first place. --Anatoli T. (обсудить/вклад) 13:22, 3 May 2018 (UTC)Reply

Dungan Cyrillic transliteration edit

(Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Hi. I think Dungan Cyrillic should be transliterated into Roman letters in Chinese entries.

I also think we should review the method itself, which is not quite standard, anyway - e.g. get rid of Cyrillics in the translit and make it more meaningful. --Anatoli T. (обсудить/вклад) 03:20, 11 April 2018 (UTC)Reply

Translit in {{zh-pron}}: definitely.
About the translit itself: I based it on w:ru:Дунганская_письменность#Таблица_соответствия_алфавитов, which is where ь came from. I'm not sure what it should be replaced with. î? —Suzukaze-c 03:25, 11 April 2018 (UTC)Reply
@Suzukaze-c: Thanks, let me think about it when I have a bit more time ("î" may not be a bad suggestion) and let's see what others think about it. --Anatoli T. (обсудить/вклад) 03:36, 11 April 2018 (UTC)Reply

Superscript tone numbers edit

(Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Would superscript tone numbers for Gan, Xiang, Jin, Teochew, etc. look better if they were made superscript by default like Cantonese, e.g. 所有 (so2 jau5)? This is not currently applicable to all templates - 所有 (so2 jau5). Pretty sure it was implemented by Kenny. --Anatoli T. (обсудить/вклад) 11:40, 15 April 2018 (UTC)Reply

I agree. Wyang (talk) 12:54, 15 April 2018 (UTC)Reply
I agree as well. I think this should also be automatic in {{zh-l}}. — justin(r)leung (t...) | c=› } 18:19, 15 April 2018 (UTC)Reply
Sure. —Suzukaze-c 22:14, 15 April 2018 (UTC)Reply
What's the more recognised standard? I am not familiar with pronunciation schemes for dialects other than Mandarin. JamesjiaoTC 22:02, 16 April 2018 (UTC)Reply

"other" dialects edit

Having {{zh-dial}} but not adding relevant IPA to entries seems odd to me. Perhaps we need to rethink {{zh-pron}}. —Suzukaze-c 04:43, 25 September 2018 (UTC)Reply

By all means do please. Something similar to Xiaoxuetang would be ideal, but it would also mean a large maintenance requirement. Wyang (talk) 05:00, 25 September 2018 (UTC)Reply

Romanisations of Chinese edit

According to the present policy, Pinyin romanisations of monosyllables and polysyllables for Standard Mandarin (aka Putonghua), such as "" and "bùguò" are allowed. However, for Standard Cantonese, only Jyutping romanisations of monosyllables of monosyllables are allowed (e.g. jyut6, ping3), while those of polysyllables are disallowed. Why is there such unequal treatments for the two languages? I believe that Jyutping romanisations of polysyllables should be allowed and massly created, as Pinyin romanisations of polysyllables are allowed and exist in a large quantity. Jonashtand (talk) 06:34, 9 December 2018 (UTC)Reply

@Jonashtand: It was a result of this vote. The only reason for only monosyllables I see in that vote is that "this is also what is done for pinyin with tone numbers". — justin(r)leung (t...) | c=› } 06:49, 9 December 2018 (UTC)Reply

Proposed change to zh-der edit

zh-der currently automatically provides the Mandarin pinyin for entries that have Mandarin pinyin in zh-pron. But for those entries which don't have Mandarin pinyin in zh-pron, no romanization is given. I propose including the non-pinyin romanizations like the Yueyu Pinyin and Min Nan POJ. It does not have to be well thought out or well planned at this stage. It just needs to happen and then be refined over time. --Geographyinitiative (talk) 22:51, 4 April 2019 (UTC)Reply

That would be very confusing to mix up different romanisations. Also, I think this topic is only for Chinese editors only, so this can be discussed at Wiktionary talk:About Chinese instead, rather than here. --Anatoli T. (обсудить/вклад) 23:14, 4 April 2019 (UTC)Reply
moved to Wiktionary talk:About Chinese per suggestion
If it's really that confusing, then we should split up all Chinese into different dialects. Right now we are basically saying, if an entry has a Mandarin pronunciation, we can tell people about that, but if it has only dialect pronunciations, we're just going to ignore those and not let you see them.
This [1] whole zh-der section is only dialect words. Why can't we let people see the dialect romanizations? --Geographyinitiative (talk) 23:33, 4 April 2019 (UTC)Reply
Of course we can but it needs to be split by dialects, if they are not Mandarin. I don't know we have a version of {{zh-l}} for specific lects, if not, the transliterations need to be provided manually. --Anatoli T. (обсудить/вклад) 23:54, 4 April 2019 (UTC)Reply
Note that if some feature is missing for a language or dialect, it's because nobody has done it. --Anatoli T. (обсудить/вклад) 23:57, 4 April 2019 (UTC)Reply
I don't think zh-der needs to be split by dialects in all cases. My reasoning is, we currently put everything in a Chinese word entry under the "Chinese" heading and pretend it's one language- so why can't we just add the non-Mandarin romanizations in zh-der? Take the farce to it's logical conclusion. My point is, why cover up the non-pinyin romanizations? Here's an entry where I manually added a POJ into a list of primarily pinyin zh-der; I've done a lot of these and there was no objection from the powerful editors [2]. Now I'm asking for us to take my edits to the logical conclusion and automatically generate all different "Chinese" romanizations in the zh-der, even if pinyin remains the default. --Geographyinitiative (talk) 00:36, 5 April 2019 (UTC)Reply
If you will come with me on a thought experiment: If we had a header called "Germanic" used for all words of English, German and Dutch "dialects" (euphemism for language) and the vast majority of the compounds under a Germanic word gave an English "dialect" pronunciation next to the "Germanic" words in the list, but then there were some exclusively Dutch "dialect" or German "dialect" compounds in the list of "Germanic" terms that didn't have a pronunciation next to them, I would support adding the Dutch "dialect" pronunciation for those words into the compounds list in some capacity, if only just for the words that were exclusively Dutch and had no English pronunciation. --Geographyinitiative (talk) 00:53, 5 April 2019 (UTC) (modified)Reply
As it stands right now, words with no Mandarin pinyin pronunciation are given no automatically generated romanized form in zh-der. That is a situation that needs to be changed. --Geographyinitiative (talk) 00:57, 5 April 2019 (UTC)Reply
This is easier said than done. Something we need to consider is which lect to display (if we're only displaying one). Mandarin has that status of being the standard for Chinese, but there's no rule to say which one should be chosen if a term is non-Mandarin but used in many other lects. Do we want to display several lects at a time? Do we need a separate parameter to specify which lect(s) we want to display? There are just so many things to consider that editors like me haven't actually bothered to address about it. — justin(r)leung (t...) | c=› } 03:08, 5 April 2019 (UTC)Reply
@Geographyinitiative: Please don't claim you received endorsement, if nobody reverted your edit. A good practice is, at the very minimum, advise the users with a {{qualifier|CHINESE VARIETY}}, that the term is dialectal or doesn't belong to the same lect, as in 斯普特尼克 (sīpǔtèníkè) and 史撥尼克史拨尼克. --Anatoli T. (обсудить/вклад) 03:43, 5 April 2019 (UTC)Reply
@Atitarev, JustinrleungThank you for your responses to my ranting. I agree it this is easier said than done. But what I would like to say is that taking the first step may not be as hard as it seems. My goal is to get some - any kind of romanized form (of any dialect whatsoever) next to the romanizations automatically. Let readers who are looking at the pinyin in the compound list run into a little dialect romanization! After all, those romanizations are a part of "Chinese" aren't they? Yeah, it will be confusing, but so is putting multiple dialect languages under the name "Chinese" as if it's one thing. (I'm not really asking you to split them up, but yeah, you look at all the stuff encompassed by "zh-pron" and tell me that's one language!) I would say that if you have to set up a list of "which dialect languages get priority" if one word is used in multiple dialects (but not in Mandarin) in the automatically generated romanizations, then I would just go by whatever order that the dialects are currently listed in in the zh-pron thing. If someone feels that this isn't appropriate for certain words, then they can go in and manually change the specific words they are concerned about. If Yueyu, Min Nan, Jin etc are included under 'Chinese' and can be part of zh-der, then their romanizations must be represented too. I understand what you are saying Anatoli, but I usually feel that if I make a really bad edit that one of the big editors will come by and hit me with a stick, haha. So I kind of see an edit after which there is no controversy as at bare minimum a grudging acceptance of my edit. I know to that I can use the qualifier things for the dialect synonyms; I do use them but probably not perfectly. --Geographyinitiative (talk) 04:10, 5 April 2019 (UTC)Reply
I added 斯普特尼克 to the 斯 普 特 尼 & 克 pages and also added 史撥尼克 to the 史 撥 尼 & 克 pages with the si2 but6 nei4 hak1 pronunciation included manually. --Geographyinitiative (talk) 04:17, 5 April 2019 (UTC)Reply
I don't like your e.g. diff because, again, you are mixing dialects together without supplying the important information, eg. {{q|Cantonese}}: (Cantonese) next to the word (before or after) or under a different subheader. --Anatoli T. (обсудить/вклад) 04:39, 5 April 2019 (UTC)Reply
@Atitarev I think I understand what you are saying, but I'm not sure exactly what you want me to do. Could you implement your idea on one of these pages and then give me a link to your edit so I can have a concrete example of what you think I should do? --Geographyinitiative (talk) 00:25, 7 April 2019 (UTC)Reply

Problem with zh-see- simplified character version 有边读边 edit

Hello! From late 2017, I have been noticing simplified Chinese pages with the types of problems you can see on the 有边读边 page. Looking at the page with my eyes, I can see: “{{n-g|A method people use to pronounce a Chinese character when they do not know its exact pronunciation: one of the components of the character is taken as a [[phonetic”. I shouldn't be able to see {{n-g| or [[.

In previous instances of this problem that were resolved, a work-around was found to avoid usage of the {{n-g| or whatever weird thing was displayed on the page.

I am going to start mentioning these problems here every time I see them.

--Geographyinitiative (talk) 01:17, 22 April 2019 (UTC)Reply

([3]Suzukaze-c 01:20, 22 April 2019 (UTC))Reply
As a temporary measure, what about making extract_gloss return "" if extracting failed (that is, if the gloss contains characters like |, =, {, etc.)? --Dine2016 (talk) 07:15, 23 April 2019 (UTC)Reply
Here's another variant of this problem: 官话 which reads (“Mandarin; Guanhua {{gl|group of Northern Chinese dialects including Putonghua; etc.”) Shouldn't be able to see {{gl| --Geographyinitiative (talk) 22:10, 24 April 2019 (UTC)Reply
Here's another variant of this problem 朱鹭 It shows up as “dot=: crested ibis, Nipponia nippon” --Geographyinitiative (talk) 02:01, 26 April 2019 (UTC)Reply
You have to decide: is this the best Chinese-English dictionary on Earth, or is it a nonsense dictionary that can't even be formatted correctly? --Geographyinitiative (talk) 02:05, 26 April 2019 (UTC)Reply
Do we want to manually fix every simplified characters page like this??? [4] --Geographyinitiative (talk) 04:05, 26 April 2019 (UTC)Reply
You need to have a more positive attitude. The template can't handle other template inside of it at the moment, which can, I'm sure be resolved. What can YOU do to help this "nonsense dictionary"? Edits are linked like this: diff. --Anatoli T. (обсудить/вклад) 04:11, 26 April 2019 (UTC)Reply
Okay, I understand. I just didn't realize that this was something we needed to do too. --Geographyinitiative (talk) 04:32, 26 April 2019 (UTC)Reply
This is also a technical issue, not specific to Chinese. WT:GP is a place to ask for problems, not necessarily Chinese editors. Wyang has left and the rest of us may not be Lua and template savvy, not on the same level, anyway. I didn't ask for full definitions to be added to {{zh-see}}, which caused the issue in the first place. We can revert to simple soft-redirect without any definitions, which will invariably cause problems. --Anatoli T. (обсудить/вклад) 04:37, 26 April 2019 (UTC)Reply
Similar issue (but this time I don't know how solve it manually). The explanation for the character 荇 in 荇菜 on the 荇菜 (xìngcài) page reads "taxlink|Limnanthemui nymphoides|species|ver=190423"--Geographyinitiative (talk) 21:16, 27 April 2019 (UTC)Reply
@Geographyinitiative That is a separate issue, which I have fixed. — justin(r)leung (t...) | c=› } 05:01, 28 April 2019 (UTC)Reply

POJ entries edit

Hello again. Sorry to be a constant bother, but I have a question about the POJ entires. I have seen @Justinrleung reduce POJ entries (entries something like the khai-sí page) into a mere "zh-see". Should all POJ entries be changed into "zh-see"? The entry tòe has an example sentence in which the Romanized form 'tòe' was actually used in the original text- should this example sentence be moved to the (zhuì) page? Thanks for your time. --Geographyinitiative (talk) 06:06, 1 May 2019 (UTC)Reply

@Geographyinitiative: Using zh-see for POJ is still experimental, but I think they all should be converted to zh-see eventually. (I can't seem to find the relevant discussions though.) About the quotation, it should either be put on the page or in Citations:tòe. BTW, tòe does not mean "with" but "to follow". — justin(r)leung (t...) | c=› } 07:57, 1 May 2019 (UTC)Reply
I support the use of {{zh-see}} for Min Nan POJ when there is a matching hanzi entry, to be decided or simply kept with entries like o͘-tó͘-bái. --Anatoli T. (обсудить/вклад) 08:24, 1 May 2019 (UTC)Reply

shǎnshuò qící 閃爍其詞 edit

现代汉语词典第7版 (2016) p1139 "shǎnshuò-qící"

现代汉语规范词典第3版 (2014) p1146 "shǎnshuò-qící"

现代汉语词典第6版 (2012) p1133 "shǎnshuò-qící"

现代汉语规范词典 (2004) p1137 "shǎnshuò-qící"

--Geographyinitiative (talk) 21:52, 20 May 2019 (UTC)Reply

It's such a big deal, wow. All Chinese editors on board! LOL diff, diff, diff--Anatoli T. (обсудить/вклад)
shǎnshuò-qící --Geographyinitiative (talk) 08:01, 21 May 2019 (UTC)Reply
We are a dictionary too. The references give the correct pronunciations. As for capitalisations, spacing and hyphenation of the transliterations, it’s up to dictionary owners’ rule. Pinyin is not that important and the hyphen was used for etymological purposes in those dictionaries, nothing to do with the Chinese spelling or how the word is pronounced. --Anatoli T. (обсудить/вклад) 08:10, 21 May 2019 (UTC)Reply
Is 《汉语拼音正词法基本规则》the rules for Hanyu Pinyin? If there are other rules, I will edit according to those rules. --Geographyinitiative (talk) 08:32, 21 May 2019 (UTC)Reply
  SupportSuzukaze-c 22:35, 21 May 2019 (UTC)Reply
@Geographyinitiative, Atitarev, Suzukaze-c: I'm pretty sure we've talked about this many, many times before. Is there a need to revisit this issue? We've basically settled on no hyphens in chengyu because the structure is often debatable. — justin(r)leung (t...) | c=› } 23:01, 21 May 2019 (UTC)Reply
I'm not upset about the status quo, but I wouldn't mind if someone wanted to comprehensively take the time. —Suzukaze-c 23:08, 21 May 2019 (UTC)Reply
Relevant discussion: Talk:興高采烈, Talk:無精打采, User talk:Tooironic#Hyphens in chengyu. If we do decide to use hyphens in chengyu, it will take a lot of work to actually go through all the chengyu entries and correct them accordingly. — justin(r)leung (t...) | c=› } 23:14, 21 May 2019 (UTC)Reply
I believe in doing the right thing. If this is the right thing to do, then it must be done, sooner or later. I'm not really 100% sure that it is the right thing to do, but I have an inclination that it MAY very well be the right thing. The ONLY thing I support is doing the right thing. Of course, I'm not going to pretend like I'm a master of Chinese. Last weekend I took the TOCFL Test Band C, and I got 49/80 on listening, one point below Level 5's "fluent" score, and 60/80 on reading, well within the "fluent" score for Mandarin reading. I also have two Chinese teaching certificates from Mainland China (as well as one that I have passed the speaking part of, but I can't pass the written part of). Mastery of Modern-day Mandarin Chinese (at least the Taiwanese version of it) is scientifically defined as Level 6 on TOCFL, so I'm not there yet. I don't want to push mistaken beliefs etc- facts only, facts always, facts forever, where ever the facts may lead. --Geographyinitiative (talk) 23:58, 21 May 2019 (UTC)Reply
There's no such thing as "the right thing" if policies and conventions differ between different sources. Using correct Chinese is important. Pinyin is a tool, not a language.
As dictionary editors, we can and should decide on rules, make a proposal, have a vote, write it up as a language policy. Pinyin is not a writing system, this should only be used as such, even if we allow soft-redirect entries for pinyin.
Both suggestions - "no space in chengyu" or "follow certain dictionary standard" have merits. I suggested a vote in Wiktionary_talk:About_Chinese#Capitalisation_of_demonyms_and_language_names_-_a_mini-vote on this very page, which was mostly ignored. Chinese editors are less worried in how Chinese words are rendered in Roman scripts, if it doesn't impact pronunciations, such as spaces, capitalisation or hyphenation but we can still define some rules and follow them. --Anatoli T. (обсудить/вклад) 00:45, 22 May 2019 (UTC)Reply
My opinion ALL FORMS of all romanizations that exist in any dictionaries (modern and historical) should be incorporated in some way. --Geographyinitiative (talk) 02:06, 26 May 2019 (UTC)Reply
I oppose adding all romanisations from all dictionaries. We may want adding zhuyin for Min Nan and Hakka but we almost covered all romanisations for varieties in use, adding all possible variants is silly, no dictionary does it. Any dictionary defines romanisations they use and consistently stick to the definitions. --Anatoli T. (обсудить/вклад) 07:34, 26 May 2019 (UTC)Reply
(@Geographyinitiative: I invite you to comment on User:Suzukaze-c/p/mul#Chinese at its talk page. —Suzukaze-c 07:47, 26 May 2019 (UTC))Reply

Glyph origin of 𤆬 edit

I don't believe the glyph origin given for this character. I do believe it might be a useful and/or popular saying used for teaching people this character, and as such is a part of the cultural background of this character and should not be outright deleted and ignored. --Geographyinitiative (talk) 02:01, 26 May 2019 (UTC)Reply

@Geographyinitiative: I haven't found a better explanation. The glyph origin comes from 臺灣閩南語按呢寫. (BTW, this discussion should be at WT:TR or WT:ER). — justin(r)leung (t...) | c=› } 02:58, 26 May 2019 (UTC)Reply
moved --Geographyinitiative (talk) 04:34, 26 May 2019 (UTC)Reply

Gloss of () edit

Hello all. I tried to correct the gloss for () by deleting the entry here: [5]. How can I link the gloss to the definitions given in the entry? Do I need to manually add those definitions to the gloss list? Thanks for any help. --Geographyinitiative (talk) 08:31, 6 July 2019 (UTC)Reply

edit

Does the jiao pronunciation exist for this word? A brief glance didn't find it--Geographyinitiative (talk) 02:09, 8 July 2019 (UTC)Reply

Add HSK level of the Hanzi edit

The Japanese section shows the 常用漢字 level of the Kanji, so I'd like to propose adding the HSK level of the hanzi too --Backinstadiums (talk) 14:39, 14 July 2019 (UTC)Reply

Here is the official word list that I have used [6]. There were more extensive lists for the old HSK. If we do HSK, then I'd like to do TOCFL [7] and other tests too. --Geographyinitiative (talk) 11:43, 15 July 2019 (UTC)Reply
@Geographyinitiative: where have you used it? The best lists are here http://www.hskhsk.com/word-lists.html. I really like the one of homophones http://hskhsk.pythonanywhere.com/homophones --Backinstadiums (talk) 19:18, 15 July 2019 (UTC)Reply
@Backinstadiums: When I say 'used' I should have said 'used in my personal study'. Those character and word lists are great (they are ultimately derived from this list as far as I can tell). --Geographyinitiative (talk) 19:34, 15 July 2019 (UTC)Reply
@Geographyinitiative: what other tests are you referring to? --Backinstadiums (talk) 19:39, 15 July 2019 (UTC)Reply
@Backinstadiums: I'm pretty sure Putonghua Proficiency Test has some official character and word lists that would be interesting to work on. There are other tests that may have word lists- ZHC etc. --Geographyinitiative (talk) 19:46, 15 July 2019 (UTC)Reply
@Geographyinitiative: I see. Why do kanji only show the list of 常用漢字? --Backinstadiums (talk) 20:05, 15 July 2019 (UTC)Reply
@Backinstadiums I don't know why- you will have to ask someone else. --Geographyinitiative (talk) 22:21, 15 July 2019 (UTC)Reply
@Geographyinitiative: In any case, most tests classify the same characters in the same levels, so only a group of characters would have two different levels at most. --Backinstadiums (talk) 22:54, 15 July 2019 (UTC)Reply
@Geographyinitiative Where can I add my proposal? --Backinstadiums (talk) 17:39, 17 July 2019 (UTC)Reply
@Backinstadiums [8] I support your proposal 100% --Geographyinitiative (talk) 20:51, 17 July 2019 (UTC)Reply
@Geographyinitiative in the discussion page? --Backinstadiums (talk) 23:13, 17 July 2019 (UTC)Reply

Proposal: adding elasticity/flexibility edit

I'll be concise for those knowledgeable, and refer to brief and basic bibliography for those who are not.

The Chinese elasticity/flexibility is a lexical property of chinese terms, two sides of the same coin, which must be reflected in the very same entry for a certain lemma.

Therefore, for example the fifth version of the prestigious XDHYCD (Xiandai Hanyu Cidian) applies mutual annotations in the respective entries, so that the entry for 煤 mei ‘coal’ reads "noun, … also called 煤炭 mei-tan ‘coal-charcoal’", and the entry for 煤炭 meitan ‘coal-charcoal’ is annotated as "noun, 煤 mei ‘coal’".

Unfortunately, currently in wiktionary this is wrongly reflected in the broadly termed 'compounds' section, as a synonym or after 'see also', and only for the monosyllabic version.

Please, before commenting read the following brief article (and if necessary further references within it); if you still have any questions, I'll be glad to try and answer them.

http://www-personal.umich.edu/~duanmu/2014Elastic.pdf

Finally, elasticity from Xiandai Hanyu Cidian 2005 has been tabulated in the following open access thesis

deepblue.lib.umich.edu/bitstream/2027.42/116629/1/yandong_1.pdf

I hope an enriching discussion ensues for this critical lexicograhical issue --Backinstadiums (talk) 13:43, 19 August 2019 (UTC)Reply

phoneticity edit

According to DeFrancis' "Visible Speech", Chinese "phoneticity" reaches up to 90% for "the two thousand or so that are necessary for basic literacy". --Backinstadiums (talk) 12:23, 21 September 2019 (UTC)Reply

Adding homophones irrespective of tone edit

As the entry ping already exists as "Nonstandard spelling of pīng, píng and pìng", it stands to reason to add a section in Chinese characters' entries for homophonic words irrespective of tones too --Backinstadiums (talk) 09:59, 1 October 2019 (UTC)Reply

Wiedenhof's A Grammar of Mandarin edit

According to Wiedenhof's A Grammar of Mandarin, page 43,

The final spelled as -o is only combined with the initials b-, p-, m-, f-. This vowel matches the vowel part of the final -uo [wɔ].

However in page 45, the author states

The fnal -uo [wɔ] is spelled as -o before the labial initials, b-, p-, m-, f-.

According to page 44,

"Weng" syllables rhyme with the fnal -ong [ʊŋ]

However, he'd specified weng as [wʌŋ], to add

Weng displays the same type of variation as the fnal -un: it may lose its rounding toward the end, [wəŋ].

Page 66 reads

there's free variation between [wʌŋ] and [ʊŋ] for both fnals, with complementary distribution.

Can somebody clarify these contradictions? --Backinstadiums (talk) 20:05, 2 October 2019 (UTC)Reply

bound morphemes edit

Is it possible to graphically show bound lemmas just as we do for exmaple for the English -able? --Backinstadiums (talk) 10:52, 12 October 2019 (UTC)Reply

Mandarin tone contours along diphthongs edit

I'd like to add information about how tone contours are orally distributed along Mandarin rhyme diphthongs. I've tried to find a graphic with variables such as time, volume of speech, pitch levels, etc. to no avail --Backinstadiums (talk) 09:07, 22 October 2019 (UTC)Reply

This is more Wikipedia material than Wiktionary, since this is very specific phonetic information. — justin(r)leung (t...) | c=› } 01:16, 24 October 2019 (UTC)Reply

Enable searches in zhuyin edit

Every entry shows its zhuyin rendition, so it makes no sense not to use it in the searchbox. --Backinstadiums (talk) 10:29, 10 December 2019 (UTC)Reply

  Support. --Geographyinitiative (talk) 10:54, 10 December 2019 (UTC)Reply
The search box functions cannot be changed by users. Perhaps this can be proposed for next year's meta:Community Wishlist Survey. — justin(r)leung (t...) | c=› } 21:06, 10 December 2019 (UTC)Reply

zh-dial vs. including dialectal terms in the thesaurus edit

i don't like it. —Suzukaze-c 23:01, 16 December 2019 (UTC)Reply

@Suzukaze-c: I don't like it either. @Tooironic, Mar vin kaiser, Atitarev, Dine2016, any thoughts? — justin(r)leung (t...) | c=› } 23:50, 16 December 2019 (UTC)Reply
@Justinrleung If I understand what's going on here, I'd like to say that as long as you have the header "Chinese", the theasaurus has to be allowed to include words from every "form of Chinese" including any dialectal terms. To do anything else is dangerous: disallowing anything but Mandarin synonyms is to pretend that they are the only legitimate synonyms in "Chinese". I don't like putting all the dialects together like this, but it's this crazy "Chinese" header that's causing the problem. --Geographyinitiative (talk) 00:28, 17 December 2019 (UTC) (modified)Reply
@Geographyinitiative: Thanks for your input and raising this issue, but the "Chinese" header is not going any time soon. That's why we need to define the roles of the two templates ({{zh-dial}} and {{zh-syn-saurus}}) so that they don't overlap in function. — justin(r)leung (t...) | c=› } 00:41, 17 December 2019 (UTC)Reply
I don't want to get kicked out of here again, so I will just say that I hold what appears to be minority viewpoints and I will continue to work with the community to create the best possible dictionary. Of course I believe my viewpoints, but I don't want to impose on everybody and I just want to put my thoughts out there for you to see and consider. Opinions differ sometimes. Thanks for all of your time and work on this basically wonderful resource we are making here. Of course, you are dead wrong, the "Chinese" header must obviously be removed one day, otherwise the dictionary can't be whole: no 'Old Chinese', no 'Middle Chinese' and no 'Cantonese' header makes this dictionary inconsistent with Ethnologue and Wikipedia's viewpoints, which is that Chinese is a group of languages and not a language. On Wiktionary, every language gets a header, like 'Middle English' etc. --Geographyinitiative (talk) 01:43, 17 December 2019 (UTC)Reply
@Geographyinitiative: The Chinese header's not going any time soon, even if it should be. There's no quick and dirty way to change existing framework to separate lects, whatever that looks like. Sometimes it's more about usability than "correctness", whatever that may look like. — justin(r)leung (t...) | c=› } 01:51, 17 December 2019 (UTC)Reply
I understand your viewpoint. We have all been trained to believe that "Chinese" is just one thing with some "unintelligible dialects" on the side. There is room for debate and discussion. It's an incredible linguistic phenomena we are dealing with. Good luck to all, keep going. --Geographyinitiative (talk) 01:58, 17 December 2019 (UTC)Reply
@Justinrleung: Thanks for pinging. Can we see some {{diff}}'s, please?
I strongly oppose unilateral actions by User:Geographyinitiative. He again knowingly violates the agreed format. It's not just a viewpoint, it's how it's done. You can express away your opinion, it's your actions in the mainspace, which is the problem. --Anatoli T. (обсудить/вклад) 10:07, 18 December 2019 (UTC)Reply
@Atitarev: See something like 妓女, where they both exist. — justin(r)leung (t...) | c=› } 16:30, 18 December 2019 (UTC)Reply
Including dialectal terms in the Thesaurus is an inferior duplication of zh-dial content. —Suzukaze-c 01:55, 20 December 2019 (UTC)Reply

Wade-Giles Issue edit

Hello all. The Chinese character name of the town 闊什塔格阔什塔格 (Kuòshítǎgé) (a town in Xinjiang) starts with the character ' (kuò)', the Wade-Giles for which we currently give as 'kʻuo⁴'. But the map [9](square 14 8, south of Pishan) I have and the GEOnet database give the Wade-Giles form as "K’o-shih-t’a-ko". Is there more than one form of Wade-Giles? What's going on here? Thanks for any help. --Geographyinitiative (talk) 13:07, 29 December 2019 (UTC)Reply

(In the past I have noticed this same kind problem in other contexts, for instance with the character 國, (Wade-Giles with or without the 'u') but I didn't think to much about it at the time.) --Geographyinitiative (talk) 13:31, 29 December 2019 (UTC)Reply

The translit without the “u” is not WG but another phonetic non-standard transliteration based on WG. For English and other speakers, it’s easier to make sense of eg “ko” than “kuo”, etc, hence "Komindang". Anatoli T. (обсудить/вклад) 14:23, 29 December 2019 (UTC)Reply
This is really interesting! I added Komintang as an alternate form on the Kuomintang page. --Geographyinitiative (talk) 16:04, 29 December 2019 (UTC)Reply

Jyutping /-a/, /-oet/ edit

New rimes added in 2018 (see the link at the bottom of https://www.lshk.org/jyutping). —Suzukaze-c 06:33, 16 February 2020 (UTC)Reply

@Suzukaze-c: Nice, just saw this. Seems like we already support these. — justin(r)leung (t...) | c=› } 01:23, 15 May 2020 (UTC)Reply

WDL status edit

I noticed that WT:WDL lists "Chinese" as a WDL, which seems contrary to our usual practice (as far as I can tell) of treating dictionaries as sufficient for most topolects and for classical vocabulary. Does anyone object to changing it to "Standard Mandarin", to clarify that non-Mandarin Chinese only requires one use or reliable mention for attestation? —Μετάknowledgediscuss/deeds 23:22, 14 May 2020 (UTC)Reply

@Metaknowledge: Standard Mandarin should be good. That being said, I'm not sure if that would include Standard Chinese from Hong Kong (not really Cantonese proper, but usually read in Cantonese), which is sufficiently documented for sure. Written Cantonese may also be included, I think - it's pretty robustly attested. Other topolects would not qualify for WDL status as of now AFAICT. — justin(r)leung (t...) | c=› } 00:17, 15 May 2020 (UTC)Reply
(edit conflict) @Justinrleung: Hi. Do we really have enough attested material in Written Cantonese? I am surprised. It's always been talked about as a mostly spoken lect with comics and other informal writings occasionally using it. I may have been out of touch wit the latest developments, though. --Anatoli T. (обсудить/вклад) 01:20, 15 May 2020 (UTC)Reply
@Atitarev: I mean there are always newspaper/magazine articles (usually tabloids) that may mix in Cantonese or use Cantonese entirely. Would that be enough for it to be considered WDL? — justin(r)leung (t...) | c=› } 01:26, 15 May 2020 (UTC)Reply
I think that the massive body of spoken media produced by Hong Kong would allow HK Cantonese to be well-documented. —Suzukaze-c 01:30, 15 May 2020 (UTC)Reply
@Suzukaze-c: Thanks. Yes, if the spoken media is considered good for CFI, then yes, but we are a written dictionary, so if a word is written in MSM (in movie or news subtitles) but pronounced in Cantonese, how do you reconcile that with what we are doing here? --Anatoli T. (обсудить/вклад) 01:40, 15 May 2020 (UTC)Reply
@Justinrleung: Re: your question - I don't know if they are enough. In the past, from what I read and heard discussions about, it wasn't. The Cantonese version of diglossia makes it difficult to separate what is standard and written, since when it's written and is standard, then it's MSM, not Cantonese. --Anatoli T. (обсудить/вклад) 01:43, 15 May 2020 (UTC)Reply
Yes, I support adding Cantonese as WDL. Wide range of newspaper tabloids written in pure Cantonese (not MSM) in Hong Kong. User:Iambluemon 02:56, 15 May 2020 (UTC)Reply

Yes, I object because this policy favors Mandarin as the standard and disregards the presence of other languages, and also it makes it easier to create dialectal entries using only one mention which will introduce many errors to this site. Can I know why non-Mandarin Chinese has been omitted? How about Mandarin read using Cantonese? Will that be considered part of Standard Mandarin? I want to see other Chinese languages such as Cantonese treated the same as Mandarin. All languages equal, not one superior over the rest. Iambluemon (talk) 01:14, 15 May 2020 (UTC)Reply

Where do you see favouring Mandarin? They are talking about attestations. Standard Mandarin is easier to attest than other forms, since Chinese write much less in the varieties. Chinese contributors have put a lot of efforts in actually improving the coverage of other Chinese varieties, not the other way around. --Anatoli T. (обсудить/вклад) 01:20, 15 May 2020 (UTC)Reply
Non-MSM lects are often not "well-documented". I think it's fairly straightfoward. Keeping stricter regulations will make it harder for non-MSM content to be on the site, which would lead to a fairly unequal picture on Wiktionary where MSM would dominate even more than it already does. —Suzukaze-c 01:24, 15 May 2020 (UTC)Reply
From my perspective, this seems to be a good conversation to be having. --Geographyinitiative (talk) 01:34, 15 May 2020 (UTC)Reply
@Suzukaze-c: I am not encouraging stricter regulations, quite the opposite. That's why I think Cantonese shouldn't be considered WDL, so that more contents is allowed. --Anatoli T. (обсудить/вклад) 01:40, 15 May 2020 (UTC)Reply
I support add Cantonese as WDL. If you put Standard Mandarin as the only "well-documented" variety it creates the impression that Mandarin is the dominant variety, that Standard Mandarin is Chinese, that other Chinese varieties are inferior to Mandarin. User:Iambluemon 02:56, 15 May 2020 (UTC)Reply
Mh, no it doesn't? PUC13:19, 15 May 2020 (UTC)Reply
Iambluemon, it looks like you misunderstood my intention. This policy would favour the non-Mandarin languages by giving them more lenient coverage. You claim it will "introduce many errors", which seems like a straw man argument — I challenge you to find even a single unambiguous error in an otherwise reliable source that would be entered into Wiktionary as a result. —Μετάknowledgediscuss/deeds 01:54, 15 May 2020 (UTC)Reply
One example of error is the diglossia situation in Cantonese. If you read Mandarin article using Cantonese, it doesn't mean that every Mandarin word automatically transformed into Cantonese word. We can also read Classical Chinese poems using Mandarin, Cantonese, Hokkien, etc, but it doesn't mean that all those words automatically become Mandarin, Cantonese, Hokkien. No, they are just readings, not actual words, not dictionary material. We need stricter criteria, don't have editors copy and paste individual character readings into compound words. Use material from spoken Cantonese (not MSM) that are available in written form. User:Iambluemon 02:54, 15 May 2020 (UTC)Reply
@Iambluemon: Please don't make assumptions, you have made a few today. We know all that. We don't include Cantonese words simply because there is a word in Mandarin. I'm talking about a common situation when a newsreader speaks Cantonese while their teleprompters and subtitles on the screen are in standard Chinese (automatically converting what they say into the correct Cantonese). The written and pronounced words will mismatch and written words will not be added as Cantonese, if they don't have Cantonese readings and they are used. --Anatoli T. (обсудить/вклад) 04:03, 15 May 2020 (UTC)Reply
While MSC/MSM words are not Cantonese in a strict sense, they can be considered Cantonese in a broader sense. Cantonese is the main spoken language of instruction of Chinese classes in Hong Kong, so MSC/MSM texts are always read in Chinese. Hong Kongers also write MSM texts and are meant to be read in Cantonese (although they can also be read in Mandarin). — justin(r)leung (t...) | c=› } 03:53, 15 May 2020 (UTC)Reply
@Metaknowledge: There seems to be consensus here (at least for Standard Chinese/Mandarin). Does this need to be brought to WT:BP for wider/further discussion, and do we need a formal vote? — justin(r)leung (t...) | c=› } 02:39, 15 May 2020 (UTC)Reply
@Justinrleung: Consensus among the editing community (i.e. this discussion) suffices. That said, it's a bit unclear to me whether there's consensus regarding the status of Cantonese as a WDL, and we have to decide these together. (I don't know enough about the quantity of Cantonese material that is easily searchable and meets CFI to have an opinion.) —Μετάknowledgediscuss/deeds 04:07, 15 May 2020 (UTC)Reply

I forgot to say this, but back in 2012, when Cantonese and Hokkien have their own header, Cantonese and Hokkien are treated as well documented language. Why downgrade their status now in 2020? User:Iambluemon 03:00, 15 May 2020 (UTC)Reply

It's not a downgrade. It's less restrictions for providing quotes for a term to be allowed to be included. --Anatoli T. (обсудить/вклад) 03:18, 15 May 2020 (UTC)Reply
It's also about reality. Hokkien is definitely not well-documented (although there's an increase in Hokkien writing, especially in Taiwan). — justin(r)leung (t...) | c=› } 03:49, 15 May 2020 (UTC)Reply
Add Teochew to the list of Chinese dialects that has a growing amount of writing. And there's currently quite a fair bit of Teochew media that is produced in China. I'd probably consider it the third most well-documented dialect of Chinese after Cantonese and Hokkien. The dog2 (talk) 01:55, 24 May 2020 (UTC)Reply
This isn't a competition. We're essentially talking about languages with strong publishing industries, or else which have such an absurd amount of other durably archived media that it makes up for a lack of written material. —Μετάknowledgediscuss/deeds 02:08, 24 May 2020 (UTC)Reply
AFAICT, Teochew is nothing close to Hokkien when it comes to the amount of material out there (whether written or spoken). After some thought, although written vernacular Cantonese has a good amount of publication, it cannot compare to the amount of publication in Standard Mandarin / Standard Written Chinese. Thus, the only variety we should consider as a WDL should be Standard Mandarin / Standard Written Chinese. — justin(r)leung (t...) | c=› } 03:54, 24 May 2020 (UTC)Reply
For sure there is a lot less Teochew than Hokkien material, but it's still one of the better documented dialects of Chinese. And yes, none of the dialects even come close to Mandarin when it comes to written material. I've been to Hong Kong and Macau and even there, written material is mostly in standard Mandarin. The dog2 (talk) 04:26, 24 May 2020 (UTC)Reply
@The dog2: Teochew is irrelevant to this discussion. I don't know if you fully understand the implications of this discussion. If Teochew were to be listed as a WDL, a lot of Teochew entries would have to go for lack of attestation per WT:ATTEST. No one is doubting the existence of Teochew material, which is quite a lot compared to other dialects like Changsha Xiang, just as an example, but it's just not eligible for consideration as a well-documented language. — justin(r)leung (t...) | c=› } 05:08, 24 May 2020 (UTC)Reply

────────────────────────────────────────────────────────────────────────────────────────────────────OK, Teochew certainly cannot be considered well-documented. Hokkien and Cantonese are somewhat on the fence, but I'd lean towards not considering then well-documented. The dog2 (talk) 05:18, 24 May 2020 (UTC)Reply

This is better, make it more specific (Standard Written Chinese) rather than changing Chinese to Mandarin which is bad suggestion. Does Standard Written Chinese also include literary/formal Cantonese, the type of Cantonese used in official functions and ceremonies that is based on written Mandarin? User talk:iambluemon 08:42, 8 June 2020 (UTC)Reply
@Iambluemon: I would not include formal spoken Cantonese (which would still have things like 嘅). — justin(r)leung (t...) | c=› } 08:54, 8 June 2020 (UTC)Reply

Unified Chinese revisited edit

Unified Chinese allows us to document the non-Mandarin languages faster: all you need to do is add the pronunciation, and the meaning if different from Mandarin. But this has the disadvantage that only the difference from Mandarin is documented. For example, the 寫字樓 entry currently has the following definitions:

  1. office building
  2. (Cantonese) office

Now it is clear that sense 2 is Cantonese only, but is sense 1 Mandarin only or both Mandarin and Cantonese? And does the absence of the "Hokkien" label mean "this sense is not in Hokkien" or "the editors have not yet considered that language"?

One way to solve the problem is to build {{zh-dial}} data. For example, although has the definitions "all; both" and "(Cantonese) as well; also; too", the {{zh-dial}} tells us that the first sense is also in Cantonese. But this may not be feasible for the smaller entries. Another way is to add examples, but this is not 直觀 for the reader. It is much better to build the senses separately and in full for each language, while retaining the common pronunciation base. (A common pronunciation base is necessary or there will be no place for dialectal readings of Mandarin.) This can be achieved by the following entry layout:

==Chinese==
{{zh-forms|...}}

===Pronunciation===
{{zh-pron
|m=...
|c=...
...
}}

===Mandarin===
...

===Cantonese===
...

(If this is not allowed, one can always resort to

==Chinese==
{{zh-forms|...}}

===Pronunciation===
{{zh-pron
|m=...
|c=...
...
}}

===Definitions===
...

----

==Cantonese==

===Pronunciation===
{{yue-pron}} // transcludes the pronunciation in {{zh-pron}}, showing only |c=

===Definitions===
...

but this requires more typing and part of the advantages of Unified Chinese is lost.)

What do you think of this layout? I think it's better to adopt it incrementally, focusing on words with varying meanings across languages first. Most terms such as 粵語 surely don't need splitting unless they get lots of examples in a variety of languages.

[To answer Geographyinitiative's another question: Wikipedia has nine language editions because there's no ground for unification. You can't write encyclopedic text that are Mandarin and Cantonese and Wu and Classical Chinese at once. But you can write in Traditional and Simplified scripts at once, so Wikipedia unifies them. Wiktionary deals with words, so the unification is the other way round.]

(Notifying Atitarev, Tooironic, Suzukaze-c, Justinrleung, Mar vin kaiser, Geographyinitiative): --Nyarukoseijin (talk) 12:20, 5 June 2020 (UTC) Reply

Yes, I am very happy to read this suggestion. I want to be able to differentiate between Chinese words that are used in all Chinese varieties and Chinese words that are only limited to certain dialects and this is very good solution to deal with the problem. It only involve extra typing. I definitely support this proposal. User talk:iambluemon 08:55, 8 June 2020 (UTC)Reply
I'm not a big fan of either proposal. I can see many edge cases where the first option would be problematic. In colloquial Cantonese, 奶奶 refers to "one's husband's mother" or "madam", but in the written language, Cantonese speakers may use this word to refer to "paternal grandmother" as well and it would be read in Cantonese (even though it would not be used in actual speech). I also can't imagine the mess it would be if we adopt the first option for long single character entries with multiple etymologies/pronunciations. The second option is essentially reverting back to disunified Chinese with a redundant Chinese section, which is even messier than before. — justin(r)leung (t...) | c=› } 09:09, 8 June 2020 (UTC)Reply
I like the first option. If it is messy for long single character then we only apply this format for compound entries. if there is different usage in spoken and colloquial Cantonese we can use usage notes to explain the difference. Maybe second option is not so nice. Iambluemon (talk) 09:17, 8 June 2020 (UTC)Reply
Another problem with the first option is that even usage within the dialects of Mandarin or any other major grouping of dialects would have variation. Just look at 阿婆 or 阿公. Are we gonna split it up to every single dialect possible? I don't see how our status quo is much different from other languages like English, where there are lexical differences between different dialects of English. — justin(r)leung (t...) | c=› } 09:32, 8 June 2020 (UTC)Reply
if we didn't have chinese characters, we would have no choice but to do so —Suzukaze-c (talk) 10:14, 8 June 2020 (UTC)Reply
I don't mind usage differences within dialects of Mandarin, or variations within the same dialect group. At least people will be more careful when adding definitions. Right now people just copy and paste pronunciation without bothering whether it is literary or colloquial or which dialect group the word belongs to. Iambluemon (talk) 10:34, 8 June 2020 (UTC)Reply
Yet another problem with the first option is that we have language varieties in L3 headers. This would automatically push PoS to L4 headers (and when we have more than one etymology, they'd get pushed to L5). Also, if definitions are separated by topolect groups, I don't see why pronunciations need to be grouped together. — justin(r)leung (t...) | c=› } 10:44, 8 June 2020 (UTC)Reply
Easier to compare when pronunciation is group together. Iambluemon (talk) 10:48, 8 June 2020 (UTC)Reply
@justinrleung:—
  • 奶奶 meaning "paternal grandmother" in literary Cantonese: is it a borrowing from Mandarin? Does it occur only in Mandarin contexts (奶奶的……) or does it also occur in Cantonese contexts (奶奶嘅……)? I think we need only cover the colloquial language in ==Cantonese==, ==Hokkien==, etc. The literary language based on Mandarin is already covered under ==Chinese==; the additional L2 headers are for those who want to study the colloquial language without Mandarin influence.
  • Messiness: I agree that the first option doesn't look good (which is why I reverted this proposal initially). Under the second option, you can still focus on Unified Chinese if you want to. The additional language headers are for those interested in the individual languages (like @Geographyinitiative and me), and they don't need to come with glyph origins, etymologies, etc. The main motivation for having additional language headers is because the current Unified Chinese format doesn't treat the individual languages well: is the "office building" sense of 寫字樓 also in Cantonese? If not, should I add "(Mandarin)" or "(not Cantonese)" if I don't want to research the other languages? etc.
  • How to split under the first option: Splitting by the so-called 一級方言 (Mandarin, Cantonese, Gan, etc.) should be enough. The current 阿公 entry looks fine and doesn't need splitting. But if it gets dozens of examples in a variety of dialects it might be useful to split by language.
  • PoS headers pushed to L4 under the first option: Wyang thinks that PoS headers should be abolished for Chinese, and they're currently replaced by the dummy ===Definitions=== header for single-character entries. The L3 language headers were intended to take that place. If this is not possible, one can use the following format instead:
===Definitions===
{{zh-hanzi}} {{tlb|zh|Mandarin}}

# ...

===Definitions===
{{zh-hanzi}} {{tlb|zh|Cantonese}}

# ...
Or with the second ===Definitions=== removed.
  • if definitions are separated by topolect groups, I don't see why pronunciations need to be grouped together: terms like 我們 may have more pronunciations than definitions.
The current Unified Chinese format is Modern Standard Written Chinese oriented. If we don't solve the problem with additional language headers, what about an extended version of User:Wyang/zh-def that displays a little matrix showing which senses apply to which languages? The cells of the matrix can be simple yeses and noes or they could contain labels like "dialectal" (as in "dialectal Mandarin") or "morpheme" ( is a morpheme in Mandarin but a noun in Cantonese). --Nyarukoseijin (talk) 12:17, 10 June 2020 (UTC)Reply
@Nyarukoseijin: 奶奶 is probably not a good example because I personally wouldn't use it in writing for "paternal grandmother". It's possible to see it in Chinese textbooks/books (taught in Cantonese) in Hong Kong though. It's definitely unlikely for people to say 奶奶嘅 to mean "paternal grandmother's". So under your proposals (especially the second option), does that mean we'll have Chinese as Standard Written Chinese and Mandarin, Cantonese, etc. as covering only the colloquial versions? This is very difficult to determine, as the dialects are in a continuum from colloquial to formal (often closer to Standard Written Chinese). Take the word 太陽 as an example. In Hong Kong, 太陽 is quite commonly used in everyday speech and has kind of replaced the more colloquial words (熱頭 or 日頭), but in Taiwanese Hokkien, it seems to be restricted to literary/poetic registers (like in a song). Would we have to split 太陽, and if so, what's the most appropriate way of doing so?
About 寫字樓, in Hong Kong, it's usually the "office" sense, but the "office building" sense is also possible (at least according to some Cantonese dictionaries). The usual implication of not labelling is that it's totally fine at least in Standard Written Chinese across regions. Whatever additional lects listed in {{zh-pron}} would be okay to use the words (to varying extents). Of course, we need to do a better job at labelling and writing usage notes so that we have a representation of all the lects that is as accurate as possible.
Back to your proposals. They seem to allow several formats to coexist (no splitting for 粵語 but splitting for 寫字樓 maybe). How do we decide which format to use? There are always edge cases that would be hard to define. As for the third option you just proposed, I don't think abolishing PoS across the board is the way to go. Chinese may be more "flexible" with PoS due to the lack of overt morphology, but that doesn't mean we should abandon PoS, especially for entries with more than one character. And where would we put literary Chinese (文言文) under your proposals?
About 一級方言, we would definitely need to define these properly. Do we group Min as just Min, we follow Ethnologue and split it as Min Nan (which includes Hainanese, Leizhou Min, and maybe Zhongshan Min), Min Dong, Puxian Min, Min Bei (which includes Min Bei proper and Shaojiang Min) and Min Zhong? Do we group Pinghua under Cantonese? Do we group Shehua with Hakka? What do we do about varieties that Ethnologue doesn't seem to deal with (Xiangnan Tuhua, Shaozhou Tuhua)?
I think many of the details need to be fleshed before we can all make a good judgment as to what we should do. — justin(r)leung (t...) | c=› } 21:18, 10 June 2020 (UTC)Reply
@Justinrleung:
  • 奶奶: If a text has Cantonese pronunciation but Mandarin vocabulary and grammar, I think it should be subsumed under the ==Chinese== header. The Taiwanese Min Nan and Hakka dictionaries by the Ministry of Education ROC have only 阿媽阿妈 (a-má) and 阿婆 (â-phò), not 奶奶.
  • 太陽: # {{lb|yue|Hong Kong}} [[sun]] under ==Cantonese==?
  • Which words to split: My proposal isn't about splitting. It's about building separate dictionaries for the Chinese languages alongside a dictionary of the Chinese macrolanguage. This means that ==Chinese== still has the fullest coverage, just at a higher level. It's similar to the English dictionary market where we have both the Oxford English Dictionary and the Middle English Dictionary. You don't have to build ==Cantonese== if you don't want to.
  • PoS: I didn't say abolish PoS. Abolish PoS headers and use labels instead. Single-character entries need labels anyway.
  • Literary Chinese: ==Literary Chinese== of course. The primary meaning of in Literary Chinese is different from that in Modern Chinese, but the current ==Chinese== entry doesn't mention it. --Nyarukoseijin (talk) 08:49, 17 June 2020 (UTC)Reply
@Nyarukoseijin: Thanks for clarifying. So would it be fair to say that what you're proposing is close to how Arabic is treated as of now, i.e. Chinese is reserved for Modern Standard Chinese, and language headers for other varieties should coexist with the Chinese header? For example, would 太陽 be formatted like this: a Chinese header with "sun" and "sunshine" (and "greater yang"?), a Mandarin header with "sun", "sunshine" and "temple" (labelled as SW Mandarin), a Cantonese header with "sun", "sunshine" and "temple" (labelled as Guangxi), a Gan header with "sun", "sunshine" and "temple", a Jin header with "sun" and "sunshine", a Min Bei header with "temple", a Min Nan header with "sun" (labelled as formal/literary) and "temple" (labelled as Leizhou) and a Literary Chinese header with "sun" and "greater yang"? You said 阿公 doesn't need to be split unless we have examples in many dialects. I don't think this is a good criterion for splitting. Entry layout should not revolve around examples. Essentially, we need refine the proposal by specifying the scope that each of "Chinese", "Mandarin", "Cantonese", "Gan", "Hakka", "Jin", "Min Bei", "Min Dong", "Min Nan", "Min Zhong", "Puxian Min", "Wu", "Xiang" and "Literary Chinese" covers if this is to be put through a formal vote. But of course, I still see the current layout as better (and would like to see Arabic follow suit). — justin(r)leung (t...) | c=› } 16:42, 17 June 2020 (UTC)Reply
I shouldn't have used the word "splitting". It's not about splitting. It's about adding individual languages, not subtracting anything from ==Chinese==. The individual languages would probably be independent from ==Chinese==, like the Dictionary of Old English, the Middle English Dictionary, etc. from the Oxford English Dictionary. This means that (1) ==Chinese== won't be restricted to MSC due to the additional headers, just as the OED isn't restricted to Modern English due to the other dictionaries. (2) Additional headers don't have to be built all at once; they can have their own paces. (3) You don't have to work on them if you don't want to. Building ==Chinese== and {{zh-dial}} content is still the most efficient way to document those languages (and, in an age of language suppression, more ethical). The additional headers are for info which ==Chinese== doesn't handle well (e.g. 'run' being the primary sense of 走 in Literary Chinese), and are completely opt-in. --Nyarukoseijin (talk) 17:58, 17 June 2020 (UTC)Reply
This doesn't sound to me like a fully fledged idea. The OED covers both modern and Middle English (but not Old English), because they don't care that they overlap considerably with the Middle English Dictionary — they're different enterprises, and they have no interest in being consistent. At Wiktionary, we want to make one dictionary for all languages, and being consistent isn't optional — it's necessary. You're saying that if I want information about Cantonese, then sometimes I should look under a 'Chinese' header and sometimes I should look under a 'Cantonese' header, and there will be no way to predict which. That is antithetical to how Wiktionary is organised, and it makes it less usable for both humans and machines. —Μετάknowledgediscuss/deeds 18:13, 17 June 2020 (UTC)Reply
I totally agree with @Metaknowledge. We're not an anthology of dictionaries, but one dictionary with many languages. There would be significant overlap if we start allowing Chinese in addition to other language headers unless we define Chinese as something different from what it is now. — justin(r)leung (t...) | c=› } 21:31, 17 June 2020 (UTC)Reply
@Nyarukoseijin: Hello. You haven't presented a case where the CURRENT structure doesn't work. I don't understand why you want to change something that's not broken (other than in one of our troll's mind). If there are senses (or PoS), which are only specific to Mandarin (not applicable to Cantonese, etc.), they can be marked/labelled so. The current definitions of e.g. are good. If you want to specifically say that the sense is only applicable to Mandarin (or maybe a bunch of other varieties, they can be labelled so), e.g. {{lb|zh|Mandarin|Jin|...}} ... --Anatoli T. (обсудить/вклад) 03:32, 18 June 2020 (UTC)Reply
One of the disadvantages of adding labels is that it's all-or-nothing. If you add "(Mandarin)" to the "walk" sense of 走, it would suggest absence in Cantonese, Gan, etc. So you have to research all the languages and dialects at once. Also there are no filters that allow me to see only Cantonese senses and examples/quotations. --Nyarukoseijin (talk) 05:25, 18 June 2020 (UTC)Reply
@Nyarukoseijin: It is a fair point, thank you very much but it's not a show stopper. If the number of such cases was really large, then the unified approach wouldn't work. The differences between lects are of interest to contributors and generally addressed as a matter of priority. The differences often show in very frequent but common words. The higher (more formal) the level of writing, the less differences you find (very true for Arabic varieties as well). There is no clear line there, any where, as dialects borrow from each other. One statement is generally true that the formal written variety of Mandarin is generally applicable to other Chinese lects. No, you don't have to know or research the usage in other lects. Editors ether add what they know of find in dictionaries. --Anatoli T. (обсудить/вклад) 07:07, 18 June 2020 (UTC)Reply
I am not a troll. This website is a troll. No multi-syllabic Wade-Giles in zh-pron? Give me a break people. I have made incredibly important contributions to this website trying to salvage it a little bit. Obviously people are waking up to the fact this website has some major flaws. Why is bullying against me ("one of our trolls") allowed on this website? I'm glad I'm "our troll" at least. I am documenting stuff that the CPC doesn't want us to know, which makes me an outsider by default. --Geographyinitiative (talk) 05:42, 18 June 2020 (UTC)Reply
Here's an example of my trolling: File:Inked_Ni-44-9-chushul-china-india_cropped_v2.jpg. This historical map of the disputed area currently under contention can be used by the English-speaking world on Wikipedia because I added the map it came from to Wikimedia Commons a few months ago. I am not a troll, I am making an encyclopedia/dictionary/etc. Just because I have different opinions does not make me a troll. --Geographyinitiative (talk) 05:45, 18 June 2020 (UTC)Reply
I am 100% sure that the current Unified Chinese paradigm will be undone, maybe after a decade or longer. The problem is the political desire to see them all as one language. Because that IS what we are doing: we are equating "Chinese" with "Danish" or "Norwegian" etc as if "Chinese" is an unbroken and unchanged language created by Mr. Cang Jie back during his stint in the Yellow Emperor's court. This is like using the Tower of Babel as the core of our theory for understanding language. I hope I will not be calling valuable contributors with different opinions trolls when we come to our senses and use a more scientific analysis in the coverage of languages in China on Wiktionary. --Geographyinitiative (talk) 05:51, 18 June 2020 (UTC)Reply
Your arrogance embarrasses me. —Μετάknowledgediscuss/deeds 06:12, 18 June 2020 (UTC)Reply
Even if Unified Chinese is undone, it has still done more good than bad. When several groups of people are under persecution, saving them at once is more ethical than saving them separately if it allows more people to be saved. And that's what happened: the Mandarin nouns : Cantonese nouns : Wu nouns ratio has changed from 20467 : 317 : 10 to 95,620 : 73,447 : 6,193. My proposal is about experimenting with new ways to document the languages, not abandoning the good old way. --Nyarukoseijin (talk) 06:17, 18 June 2020 (UTC)Reply
I wonder if a system where contributors can confirm/deny that X word is used in Y lect (or specify: it is literary, etc.), and the result would be calculated to produce {{lb}} is feasible. —Suzukaze-c (talk) 06:31, 18 June 2020 (UTC)Reply
It would be helpful for the entirety of Wiktionary, really. Regarding English, I don't know what people say in New Zealand. —Suzukaze-c (talk) 06:37, 18 June 2020 (UTC)Reply
I agree. We should try to always improve and fine-tune what we have in place. If someone labels an English word as British, if they are from UK, another editor can Oz or NZ labels, if the same usage applies to Australia or New Zealand. --Anatoli T. (обсудить/вклад) 07:07, 18 June 2020 (UTC)Reply
A note from last year that I found today by chance: no: "used in Hokkien" // yes: "Cantonese: no; Hokkien: yes; Teochew: unknown; ..."Suzukaze-c (talk) 05:11, 20 June 2020 (UTC)Reply
All I am saying is: have you looked at the excellent work I have been doing with the neglected Wade Giles derived words? I am literally kicking butt every day, every way. With this, the dictionary can move on from hyper Hanyu Pinyin centrism into an actually good dictionary my friends. The closer we get to that point, that is to say, tge better the dictionary becomes, the harder it will be to group all Chinese character related communication in China and Taiwan for the past several thousand yeas under one header friends. That is my opinion, yexu budui! But no matter what, I am enjoying working on this project, even if we stick with the current situation indefinitely. Geographyinitiative (talk) 10:36, 18 June 2020 (UTC)Reply
If you are somehow still thinking I'm the troll and Wiktionary is mainstream, then take a glance at the history for Huang He and Xizang Zizhiqu. No no, I wouldn't want to be mainstream on this website if that's what's going on here. Geographyinitiative (talk) 11:57, 18 June 2020 (UTC)Reply

Language treatment: Only the macrolanguage is treated as a language? edit

Wiktionary:Language treatment says that for Chinese: "Only the macrolanguage is treated as a language". Are we sure about this? Does this mean other varieties not treated as a language in Wiktionary? I think this contradict current practice. In example such as Hsi-ning https://en.wiktionary.org/w/index.php?title=Hsi-ning&type=revision&diff=59443856&oldid=59443845 language code for "Mandarin" is preferred over language code for "Chinese". The Unified Chinese vote is about treating Chinese varieties under a single header and using "zh" language code. Does it abolish other Chinese varieties or disallow their language code? Someone can explain? User talk:iambluemon 09:00 8 June 2020 (UTC)

@Iambluemon: "Treat as a language" in that context refers to how entries are made. The current practice is that we don't have Cantonese, Mandarin, Xiang, etc. entries, but Chinese entries with Mandarin, Cantonese and/or Xiang subsumed under it. (There are exceptions to this, but I digress.) This does not "abolish other varieties" as there are no other varieties if all Chinese varieties are treated as Chinese. This also has nothing to do with how lects are treated in etymologies. There are many "etymology-only" languages/lects/varieties. — justin(r)leung (t...) | c=› } 09:25, 8 June 2020 (UTC)Reply
The page for Wiktionary:Language treatment mentions that it is to "document cases where Wiktionary's treatment of lects deviates from that of the ISO/SIL". It doesn't mention that language treatment is in the context of how entries are made. And isn't there a header for Min Nan entry in Wiktionary based on Latin POJ? Maybe the description can be more specific, such as "only the macrolanguage is treated as a language for lects written in Han script". Iambluemon (talk) 11:16, 8 June 2020 (UTC)Reply
@Iambluemon To my mind, Iambluemon has definitely pointed out one of the "flaws in the matrix" concerning Chinese on this site. The "Mandarin/Hanyu Pinyin is the defacto Chinese" ideology is deeply ingrained into this website nowadays, but Wikipedia and Wiktionary were founded on a more liberal minded and rational view of the linguistic situation of China, so these "flaws" crop up everywhere. --Geographyinitiative (talk) 01:49, 9 June 2020 (UTC)Reply
Please stop your baseless accusations about any ideologies deeply ingrained at Wiktionary. If anything is missing or is incorrect in a Chinese dialect, it means nobody has added it yet. The use of one L2 "Chinese" for all Chinese varieties only brought positive development to the varieties, otherwise miserably neglected. All the promises to provide a separate treatment for each individual word in any given Chinese lect have been kept. You can define not only pronunciations but usage, part of speech, which are specific to Cantonese, Min Nan, etc. Wikipedia, which you quote so much as superior to Wiktionary, uses zh-min-nan language code for Min Nan, we are fairer, we just use "nan". We provide all readings, Min Nan Wikipedia only uses POJ.
You only bring negative views to this site. If you hate it so much, just leave it. -Anatoli T. (обсудить/вклад) 02:03, 9 June 2020 (UTC)Reply
Another thing Geographyinitiative fails to notice is that the 中文 (= Chinese) version of Wikipedia is written in Written Standard Chinese, based on Mandarin. This isn't much different from us in treating Mandarin-based Written Standard Chinese as de facto Chinese. — justin(r)leung (t...) | c=› } 02:45, 9 June 2020 (UTC)Reply
@Iambluemon: Wiktionary:Language treatment is a "draft proposal", so there definitely needs to be more work done on it to specify what we mean by "language treatment" and how it relates to entry making and etymologies. — justin(r)leung (t...) | c=› } 02:47, 9 June 2020 (UTC)Reply
One big disadvantage of Wikipedia is that their Hokkien version is written in the Latin alphabet, when by far the most common way of writing Hokkien is with Chinese characters. I think this makes their Hokkien version far less accessible to the average Hokkien speaker than it could be. The dog2 (talk) 03:52, 9 June 2020 (UTC)Reply
Yes, I mentioned that above. It fits the agenda of those who prefer the separate treatment. At Wiktionary we provide both the Chinese characters and the romanisation (POJ fro Min Nan). The infrastructure is there for editors. Editors are free to focus only on the terms written in Chinese or POJ. Editing the Chinese characters only doesn't exclude the romanisation to be used (providing the manual or automated transliterations). --Anatoli T. (обсудить/вклад) 04:08, 9 June 2020 (UTC)Reply

Heroes edit

I know that at times discussion between myself and other users concerning Chinese on Wiktionary has become heated and tense. I'm sorry I'm so dumb. But I want to let you all know that despite our differences, I view you all as heroes who are making our world and the world of our children a better place. Wiktionary is an absolute good, regardless of whether Chinese is all one header or not. That does not really matter. What matters is that what has been created here makes it easier for anybody who has access to the site via the internet to educate themselves about Chinese stuff. The more people that are more educated and informed, the better our world will become in the long run. Thank you for your service. Geographyinitiative (talk) 11:08, 9 June 2020 (UTC)Reply

Yellow River edit

I switched the Yellow River to the Yellow River page in accordance with the English language. --Geographyinitiative (talk) 08:01, 11 June 2020 (UTC)Reply

@Geographyinitiative: Alright. Seems to be fine since it's probably more common in English. It's not entirely relevant to this page though. — justin(r)leung (t...) | c=› } 08:07, 11 June 2020 (UTC)Reply
It is relevant to this page because it's a term derived from the Chinese character meaning and we need community awareness that you can't pretend Hanyu Pinyin equals English. --Geographyinitiative (talk) 08:14, 11 June 2020 (UTC)Reply
Alright, thanks for the notice. — justin(r)leung (t...) | c=› } 08:18, 11 June 2020 (UTC)Reply
Same goes for the Tibet Autonomous Region. Just switched it over from Xizang Zizhiqu. Please switch these to the correct names when you see them. --Geographyinitiative (talk) 06:32, 12 June 2020 (UTC)Reply

Mandarin Dialect Romanization edit

@Justinrleung Hey, I remember somewhere you talked about the possibility of an in-house romanization for the various Mandarin dialects. I was thinking that it's possible to adapt Sichuanese pinyin as a romanization for most of the Mandarin dialects available in 現代漢語方言大詞典. Maybe the only "complications" are checked tones and number of tones. For checked tones, an "-h" final can be added (like for Nanjing and Yangzhou), and for number of tones, most have 4, but a few have 3 (no problem I guess), and some southern ones have 5 (because they have checked tones). What do you thnk? --Mar vin kaiser (talk) 13:51, 24 June 2020 (UTC)Reply

@Mar vin kaiser: We can definitely look into it, but we'll have to look at them one by one and check other sources. We should start with one representative from each major grouping:
  • Northeastern: Harbin
  • Jilu: Jinan
  • Jiaoliao: Muping
  • Central Plains: Luoyang, Wanrong, Xi'an, Xining, Xuzhou
  • Lanyin: Ürümqi, Yinchuan
  • Southwestern: Chengdu (done), Guiyang, Liuzhou, Wuhan
  • Jianghuai: Nanjing, Yangzhou
Harbin should be pretty straightforward - we could just use pinyin for it. For the other groupings, let's look at Jinan, Muping, Ürümqi and Nanjing for now. We already have coverage of Central Plains with Dungan and Southwestern with Chengdu, so we can probably worry about those later. What do you think? — justin(r)leung (t...) | c=› } 21:31, 24 June 2020 (UTC)Reply
This is probably a good basis for Nanjing - if it corresponds well with 南京方言詞典, we should probably just use it without much modification. — justin(r)leung (t...) | c=› } 21:34, 24 June 2020 (UTC)Reply
@Justinrleung: Yeah, that looks good for Nanjing. And maybe it can be used for Yangzhou also, their phonology look almost identical. For Southwestern Mandarin, I was just looking into it, it looks like Sichuanese Pinyin can be used with Guiyang and Wuhan, except maybe it doesn't have "l". I'm gonna look into Jinan next. --Mar vin kaiser (talk) 10:47, 26 June 2020 (UTC)Reply

Allowing Jyutping polysyllabic entries as non-lemmas edit

I started a discussion in Beer Parlour in April but no one has responded, so I post it here again:

Under the current policy, Jyutping transliterations for Cantonese are only allowed for monosyllables, such as zoeng1, but not polysyllables; while Pinyin transliterations for Mandarin are allowed for both, as in zhāng and jǐnzhāng. I propose that Jyutping should be given the equal status as Pinyin that polysyllables be allowed as non-lemma entries, since Jyutping has acquired the status as the standard phonetic transliteration for Cantonese in Hong Kong, considering that:

  1. it is developed by the w:Linguistic Society of Hong Kong;
  2. it is used in the the Cantonese Read-Aloud Test; and
  3. recent linguistic papers written in English transliterate Cantonese in Jyutping.

There are no reasons for us to treat Pinyin and Jyutping differently. Jonashtand (talk) 07:57, 17 July 2020 (UTC)Reply

4,000 phono semantic compounds! edit

The Category:Han phono-semantic compounds has over 4,000 pages in it now. I hope that this achievement helps spread the word that CJKV characters have components that indicate a hint to pronunciation. I ask everyone interested (maybe nobody!) to take a look at the category and see if you can find one or more entries that has an error in it~~ or, just add one more character. Rather than pretend Chinese characters are ideographs and pictographs with perhaps only a few unimportant characters that are phono semantic, let's instead tell the world what the right-hand side of a Chinese character is for. Geographyinitiative (talk) 12:33, 17 July 2020 (UTC)Reply

Simplified forms and alternate forms edit

Does have a simplified form? If so, what explains "刘璿"? There are other characters that fall into this "crack" between simplified, traditional and alternate and I just don't know how to handle them appropriately. --Geographyinitiative (talk) 00:33, 16 September 2020 (UTC)Reply

@Geographyinitiative: In general usage, it seems to be considered a variant of 璇. In the case of names, basically many variant characters may pop up, so it's not surprising that it's written as 刘璿 rather than 刘璇 (although we should always take Wikipedia with a grain of salt). — justin(r)leung (t...) | c=› } 00:55, 16 September 2020 (UTC)Reply
Given what you have said, isn't what I have done on the page in error? Or partially in error? --Geographyinitiative (talk) 01:02, 16 September 2020 (UTC)Reply
@Geographyinitiative: I'll have to look at what other sources say. My gut feeling is that 璇 should shouldn't be listed as a simplified form of 璿, but we should have the definition say "alternative form of 璇". It's probably even better to collapse it as {{zh-forms}} if it's an exact equivalent (which is what I need to check) so that everything is centralized at 璇. — justin(r)leung (t...) | c=› } 01:05, 16 September 2020 (UTC)Reply
"璇 should be listed as a simplified form of 璿"- this is my problem, my hang up. Is [10] the authority on simplified/traditional relationships? If it is, then since 璇 is not listed as simplified from 璿, I find it wrong to call 璇 the simplified form of 璿. There's no 璿 or 璇 here: [11]. I can not feel confident calling something simplified unless the PRC language committee has handed down a rule saying "this is simplified from this other character". It's like in some Baptist religious doctrine where you need to know the date on which you were saved, otherwise you are uncertain you are really saved~~ what day was 璇 made a simplified form of 璿? In which decree? (concerning 璿 itself: 璿 [12] 璿璣 [13] and 璿圖 [14]) --Geographyinitiative (talk) 01:18, 16 September 2020 (UTC)Reply
@Geographyinitiative: Sorry, silly me! "should" is a typo for "shouldn't" (which is totally opposite in meaning). — justin(r)leung (t...) | c=› } 01:36, 16 September 2020 (UTC)Reply
That said, if you look at the actual 第一批异体字整理表, 璿 is listed as a variant of 璇. — justin(r)leung (t...) | c=› } 01:38, 16 September 2020 (UTC)Reply
Okay, I see that- now how should this situation depicted on the 璿 page? What's the relationship between 璿 and 璇 ? --Geographyinitiative (talk) 01:43, 16 September 2020 (UTC)Reply
I've defined it as an alternative form of 璇. I think this should be a good way to deal with it. — justin(r)leung (t...) | c=› } 01:50, 16 September 2020 (UTC)Reply
(Also, I'm not sure what Baptist doctrine you're talking about, because that doesn't sound like what most Baptists believe.) — justin(r)leung (t...) | c=› } 01:59, 16 September 2020 (UTC)Reply

Classifiers edit

(Notifying Atitarev, Tooironic, Suzukaze-c, Mar vin kaiser, Geographyinitiative, RcAlex36, The dog2, Frigoris): How should we deal with measurement words for mass/non-count nouns, e.g. 管 for 牙膏, 杯/滴/盆/鍋/口/etc. for 水, 束/盆 (as opposed to 朵) for 花, 群 (as opposed to 個) for 朋友? It seems really messy when we have all these options in {{zh-mw}} without explanation. — justin(r)leung (t...) | c=› } 21:47, 28 September 2020 (UTC)Reply

Maybe we should have a table similar to the dialectal modules, then we can list the right classifier for each context. The dog2 (talk) 22:09, 28 September 2020 (UTC)Reply
@The dog2: We do have synonym tables for some of these, but they're not quite placed beside the definition like {{zh-mw}} is. — justin(r)leung (t...) | c=› } 22:16, 28 September 2020 (UTC)Reply
One thing that I guess can be done is to have a special table for uncountable words mass count words that can be expanded right next to the definition. The dog2 (talk) 22:20, 28 September 2020 (UTC)Reply
@The dog2: Hmm, I don't know how that will look. I imagine it might make the definition line really cluttered, which isn't ideal. — justin(r)leung (t...) | c=› } 22:31, 28 September 2020 (UTC)Reply
Maybe usexes? —Suzukaze-c (talk) 22:57, 28 September 2020 (UTC)Reply
@Suzukaze-c: I guess that's one way to do it. Another issue is that we could have a lot of arbitrary ones, like 克, 磅, 盒, 箱, 杯, etc., based on how we group things. What stops us from adding these to {{zh-mw}}? (In other words, should we have constraints on what we allow in the template?) — justin(r)leung (t...) | c=› } 23:11, 28 September 2020 (UTC)Reply

Hsiao & Xiao edit

Hello all. I just added Hsiao to the Xiao page and Xiao to the Hsiao page as alternative forms. Do I need to add an "alternative form of" definition [15] to each page as a second definition? Thanks for any guidance. --Geographyinitiative (talk) 19:15, 20 October 2020 (UTC)Reply

Here's the type of edit I'm talking about: [16]. Not sure if this is the correct way to do it or not, but it seems right to me. --Geographyinitiative (talk) 19:17, 20 October 2020 (UTC)Reply
@Geographyinitiative: No, don't do that. They're alternative forms of each other. (Also, this is probably not the right avenue for a question like this because it's about English entries.) — justin(r)leung (t...) | c=› } 19:29, 20 October 2020 (UTC)Reply

English Wikipedia's Xiehouyu page sux edit

English Wikipedia's Xiehouyu page sux. They list 活到老,學到老 and 江山易改 as a Xiehouyu. I always knew this phrase, but I never thought it was a Xiehouyu- pardon me if it is a Xiehoyu. I enjoin everybody to look at that page and help out later generations. --Geographyinitiative (talk) 23:11, 20 October 2020 (UTC)Reply

@Geographyinitiative: You can always ask for citation using w:Template:Citation needed. — justin(r)leung (t...) | c=› } 23:25, 20 October 2020 (UTC)Reply

Positioning of Forms and images edit

Currently, there's an unwritten rule that {{zh-forms}} and Images should be placed right under the Chinese. However, this leads to two issue.

  • For entries with long pronunciations, the images are no longer visible once the user scrolls down to the definition. As wiktionary keeps on expanding, all Chinese entries will have long pronunciations sections as they are filled out. For instance, see 鹿.
  • {{zh-forms}} breaks up the Etymology section making it harder to read. 蝴蝶 is a good example of this issue.

Proposal 1

  • {{zh-forms}} and links to Wikipedia should be placed at the end of the Etymology section.
  • Images should be placed directly under the part of speech section, e.g. under Noun

Proposal 2

  • {{zh-forms}}, links to Wikipedia, and Images should be placed under the part of speech section, e.g. under Noun

Feedback? Languageseeker (talk) 07:40, 2 November 2020 (UTC)Reply

The position of {{zh-forms}} shouldn't change. Images can be placed wherever it's logical. That would mean they could either be with definitions or right under {{zh-forms}} (and {{zh-wp}}, if it's there). They should not be placed under pronunciation because they are not part of the pronunciation. — justin(r)leung (t...) | c=› } 06:46, 3 November 2020 (UTC)Reply
@Languageseeker: IMO, images can go under {{zh-wp}} or under {{head}} ({{zh-noun}}, ...). Anything else is (probably) odd. Images under the Pronunciation header is probably not good, as Justin noted on his talk page. —Suzukaze-c (talk) 06:31, 11 November 2020 (UTC)Reply

Dialectal modules for words that seem to be the same across dialects edit

We have data for but not for , and we don't have data for , unlike water#Translations. Should we make these?

Personally,   Support for clarity and explicitness.

@Justinrleung, Mar vin kaiser, 沈澄心, The dog2Suzukaze-c (talk) 17:16, 12 November 2020 (UTC)Reply

In the case of 小, it is not the same across all dialects. Many southern dialects use 細. While 大 seems to be the same across all dialects, but if you know any dialects that use a different word, go ahead and create the table. Ditto for 水. The dog2 (talk) 17:24, 12 November 2020 (UTC)Reply
@The dog2 Oh, what I mean is that we have Module:zh/data/dial-syn/小, but not Module:zh/data/dial-syn/大; I presume that for the latter, it is because dialects use 大 across the board (or not! what do I know! but the lack of symmetry is odd). —Suzukaze-c (talk) 17:27, 12 November 2020 (UTC)Reply
I do think it'd be useful to have these even if there's no dialectal difference at all (although if we look hard enough, there may be some somewhere; maybe not in the synchronic data, but in the diachronic data). — justin(r)leung (t...) | c=› } 17:28, 12 November 2020 (UTC)Reply
@Suzukaze-c: Obviously, I can't speak for all dialects because I don't speak all of them, but yes, for the latter, all the dialects I know use 大. While for 小, that is not the case, because Cantonese, Hokkien and Teochew all used 細. I'm not sure having a dialectal module is necessary if it's the same word across all dialects, though I'm not vehemently opposed to having one either. The dog2 (talk) 17:35, 12 November 2020 (UTC)Reply
  Support for creating these data pages. -- 11:47, 13 November 2020 (UTC)Reply

Descriptive, not Proscriptive: The Variant Names like Wu Han edit

I am very, very proud to announce on this February 2021 day that I have documented the word 'Wu Han' on Wiktionary. It was a long time coming. No longer will this form of the name be cast aside as a mere mistake or ignorable vulgar form. I would like to encourage everybody to add all the non-standard forms of English language loan words from Chinese characters you have ever seen to Wiktionary, especially the ugly ones with hyphens, spacing and capitalization that hurt the eye. They may not seem like important words, but documenting these words is the essence of a spirit of descriptivism. With these words, we will make Wiktionary an honest dictionary vis-a-vis the truth of what's happening on the cutting edge between the two forms of language- the first dictionary anywhere to be that honest. --Geographyinitiative (talk) 18:26, 11 February 2021 (UTC)Reply

Intuitive Middle Chinese reconstructions edit

@Frigoris, Suzukaze-c, and others; while editing under the Etymology sections of specific Sino-Japanese readings, I have somewhat intuitively reconstructed some Middle Chinese pronunciations which there are none in their modules using patterns in the reconstructed Old Chinese pronunciations. Here are the characters that I have edited with the notes: (*jɨɐŋH), 婿 (*seiH)

Does anyone know any others that have Old Chinese reconstructions but no known Middle Chinese attestation? There should be a category for that. Thanks, ~ POKéTalker21:24, 23 February 2021 (UTC)Reply

@Poketalker: There is Middle Chinese for 樣. What do you mean? — justin(r)leung (t...) | c=› } 21:30, 23 February 2021 (UTC)Reply
I've also moved MOD:zh/data/ltc-pron/壻 to MOD:zh/data/ltc-pron/婿, so there should be Middle Chinese for 婿 as well. — justin(r)leung (t...) | c=› } 21:32, 23 February 2021 (UTC)Reply
Slap in the face... last time I edited the shinjitai (new character form in Japan) () with the etymology section there was probably no MC pronunciation for 樣--how long was that...
There are some Chinese characters that have a reconstructed Old Chinese pronunciation but no Middle Chinese most likely due to lack of information added to their proper modules in here. Any recommended websites with such information? ~ POKéTalker21:36, 23 February 2021 (UTC)Reply
@Poketalker: There should theoretically not be such cases because the OC reconstructions should always have MC reflexes. There are two issues: (1) Zhengzhang sometimes reconstructs anachronistically without evidence from early (pre-Han) texts and perhaps reconstructs OC for "late" words, and (2) Guangyun uses different variants than the modern-day standard. In both cases you mentioned, it was the case that Guangyun used a different variant (㨾 and 壻) instead; in this case, we would simply have to move the module or copy the module over to the modern-day standard. The issue of a "late" word not found in Guangyun can't be solved because there probably isn't any "standard" way to reconstruct MC in those cases. — justin(r)leung (t...) | c=› } 21:47, 23 February 2021 (UTC)Reply
@Justinrleung: (MC dɑuH?) appears to be suffering the same issue, perhaps? There might be some more... ~ POKéTalker08:38, 26 February 2021 (UTC)Reply
@Justinrleung: also 瑪瑙玛瑙 (mǎnǎo). ~ POKéTalker07:59, 27 February 2021 (UTC)Reply
@Poketalker: For 淘, I'm not sure what the basis of Zhengzhang's reconstruction is. It might be based on Jiyun, but I can't seem to find this word in Guangyun. For 瑪瑙, it was written as 碼碯, so I've created the modules for 瑪 and 瑙 based on 碼 and 碯. — justin(r)leung (t...) | c=› } 17:05, 27 February 2021 (UTC)Reply
@Justinrleung: I see; but any progress for (MC *dɑuH)? (MC *d͡ziᴇt̚) is also missing. Should the MC parameter (or module of character in question) be based only on Guangyun? ~ POKéTalker01:08, 26 April 2021 (UTC)Reply

Fangcheng Dialect edit

 
趁~?~啦

Hey, can you tell me what the non-standard character is here? I believe it's the second character. I will add this to that Wikimedia Commons page so it's more clear for future users. --Geographyinitiative (talk) 17:51, 17 June 2021 (UTC)Reply

@Geographyinitiative . See 趁墟 (chènxū). RcAlex36 (talk) 18:04, 17 June 2021 (UTC)Reply
I was looking for 土+干 haha! I added this image at --Geographyinitiative (talk) 18:33, 17 June 2021 (UTC)Reply

Nanjing Dialect edit

@Justinrleung Can we use the Nanjing dialect module already? It looks ready lol. --Mar vin kaiser (talk) 11:12, 25 July 2021 (UTC)Reply

Oh nevermind, there's no module yet. I thought there was one already since a romanization system was listed. --Mar vin kaiser (talk) 11:16, 25 July 2021 (UTC)Reply
@Mar vin kaiser: It has been created already (Module:cmn-pron-Jianghuai), but it's not ready for multicharacter entries AFAICT. I don't have the time to figure things out yet. I'm not sure how familiar you are with Lua, but it would be nice to get some test cases to check if the module works well. — justin(r)leung (t...) | c=› } 19:12, 25 July 2021 (UTC)Reply

Sources for cites in permanently recorded media edit

@Justinrleung, 沈澄心 What are your sources/tricks for cites in permanently recorded media? —Suzukaze-c (talk) 19:24, 29 July 2021 (UTC)Reply

@Suzukaze-c: I have access to a database called 讀秀 through my university. It provides access to many books, journals and newspapers published in Mainland China. — justin(r)leung (t...) | c=› } 19:53, 29 July 2021 (UTC)Reply

Template:zh-der#Features: colons versus semi-colons edit

Discussion moved to Template talk:zh-der.

Incorrect Gwoyeu Romatzyh edit

Discussion moved to Template talk:zh-pron#Incorrect Gwoyeu Romatzyh.

Moral Equality between Cantonese and Mandarin edit

IN the spirit of "I cannot breathe", I demand in the name of moral justice that Cantonese and Mandarin be treated as moral equivalents within zh-pron, if Wiktionary is a purveyor of linguistics and not politics. What form would that type of moral equivalency take? All the varieties, many of which have their own Wikipedia versions- Cantonese Wikipedia, etc- are listed alphabetically in zh-pron. That is: except for Mandarin, which is given top-billing. Beside the political factor (promotion by central governments in China), there is no scientific difference between Cantonese and Mandarin, and putting Mandarin out of order at the top of zh-pron is a dastardly moral and academic crime where politics masquerade as linguistics. Take the language headers on English Wiktionary: they are done alphabetically- that is, except for English, because English is the language of English Wiktionary. That I can understand or at least tolerate. But to put everything in zh-pron in alphabetical order except Mandarin is to say that Mandarin has a special status in the realm of linguistics that justifies position above Cantonese, despite the letter C coming before the letter M in the alphabet. I see Cantonese and Mandarin as two instances of the same type of thing- in other words "equal"- equal in the same sense that 'All men are created equal' (not as understood by the Dred Scott case). I admit Mandarin has a special status in the political realm. No contest! But there is no justification within the science of linguistics to justify Mandarin supremacy. This is a grave crime. --Geographyinitiative (talk) 16:32, 6 April 2022 (UTC)Reply

Standard Mandarin is much more widely spoken and studied than any other variety of Chinese, so it makes sense that it should appear first in the list of pronunciations. This is part of a broader principle that also applies to other languages: it makes sense to give more focus to the "standard" variety than other varieties. This is not because of "a special status in the realm of linguistics", but rather a practical consideration to help our readers. —Granger (talk · contribs) 18:50, 6 April 2022 (UTC)Reply
Category:English terms partially calqued from Chinese is empty and I challenge anyone here to fill it with actual specific Wiktionary entry pages instead of subcategories based on Cantonese, Mandarin, etc. Can't be done, demonstrating that Chinese is a language superfamily, not a language. Down with the man. --Geographyinitiative (talk) 19:38, 8 May 2022 (UTC)Reply
My bad if this discussion was taken elsewhere. From here it looks like @Geographyinitiative's concern wasn't addressed.
Russian is "much more widely spoken and studied" (in @Granger's words) than Bulgarian and may have more contributors, but the inequality — if we can call it that — ends there. And so it is for any number of language pairs. How does it "help the users of Wiktionary" that Cantonese is in effect treated as a dialect of Mandarin? Is there a non-circular justification for this? (talk) 01:45, 21 December 2022 (UTC)Reply

Why are many Chinese words that are from Japanese wasei-kangos not treated as wasei-kangos? edit

In Wiktionary, only part of the Chinese words that are from Japanese wasei-kangos are added with template "wasei kango" in their section "Etymology" (such as "電話", "進化", "宗教", etc.), while others are not. In fact, words like "階級", "社會", "文明", "主義", "獨裁", etc. are also wasei-kangos. Why are they not treated as wasei-kangos in this website? --NasalCavityRespiratory (talk) 09:44, 10 April 2022 (UTC)Reply

@NasalCavityRespiratory: Special:Contributions/49.179.157.161Fish bowl (talk) 09:46, 10 April 2022 (UTC)Reply
@Fish bowl: So the templates in many entries have been removed because their etymologies have not been verified? So how do they make sure that some of the words like "電話", "進化" are verified to be wasei-kangos? (My native language is not English and I am a new user. Please forgive me.) —NasalCavityRespiratory (talk) 10:35, 10 April 2022 (UTC)Reply
@NasalCavityRespiratory (I'd like to know the answer to this as well, if you've found it.) (talk) 01:49, 21 December 2022 (UTC)Reply

Words with no traditional form edit

Does the requirement to have a Traditional Chinese form as the lemma still hold for words that exist only in Simplified Chinese form and don't have a Traditional Chinese form (in which case the Traditional-Chinese-as-lemma requirement would force us to invent a Traditional Chinese form out of whole cloth)? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 20:27, 15 April 2022 (UTC)Reply

Do you have any specific examples in mind? —Granger (talk · contribs) 23:00, 15 April 2022 (UTC)Reply

Category:Oxymorons by language edit

Hey, is "平谷 (Pínggǔ)" an oxymoron- "flat valley"? If not, that's fine, but I also want to mention that there is as of yet no category for oxymorons in Chinese languages, which is saddening- see the above category. --Geographyinitiative (talk) 09:50, 3 May 2022 (UTC)Reply

A common way to form compounds in Chinese is by combining antonyms – 大小, 东西, 多少, etc. These seem to be categorized in Category:Chinese antonymous compounds. —Granger (talk · contribs) 10:02, 3 May 2022 (UTC)Reply
Thanks for your response. I had thought of that, but this one is the name of a location. --Geographyinitiative (talk) 10:12, 3 May 2022 (UTC)Reply

Remove pinyin when linking to other entries? edit

Currently the synonym, compounds, derived terms, etc. sections contains large amount of links to other entries, which is often accompanied the pinyin of that entry. (This includes pinyin automatically created by usage of {{zh-l}}, {{zh-m}}, etc, as well as pinyin in plain wikitext or those used in {{zh-der}}.) There are several issues caused by this:

  1. It belittles and ignores the non-Mandarin languages which do not use pinyin. Note that the pronunciation/romanisation/transliteration of these are only listed at the entry pages and (rarely) example sentences but not under these sections (there might be some, but almost non-existent that I haven't found one yet), even when that link is specified to be limited to that language.(e.g. in 吹水#synonyms)
  2. When generated automatically, it assumes that the entry has a Mandarin pronunciation, even when the entry is not (commonly) used in Mandarin (e.g. 靚#Compounds under Etymology 2).
  3. Editors sometimes would not check the correctness of the automatically generated pinyin, especially when there is a large amount of them in these sections
  4. This creates clutter, inflates the page size considerably, and uses relatively expensive functions to derive pinyin from the provided words/characters
  5. This causes inconsistencies where only some links have pinyin (e.g. see 口/derived terms).
  6. The pinyin can be confused with the language names and/or glosses, since both are in the exact same font style (e.g. in 吹水#Synonyms)

The readers can already check the (often-more-detailed) pronunciation tables via the links, so I believe that removing them will not cause considerable usability issues. -- Wpi31 (talk) 14:52, 4 July 2022 (UTC)Reply

CJKV Character list by Ideographic Description Characters edit

I miss an appendix of characters ordered according to the Unicode ideographic description character used in their description Backinstadiums (talk) 13:07, 11 September 2022 (UTC)Reply

"(This includes names derived at an older stage of the language.)" edit

At Category:English surnames from Chinese we see the parenthetical "(This includes names derived at an older stage of the language.)" With all due respect, FUCK this sentence. What "older stage"? Old Chinese? Middle Chinese? Yes, I seem to remember that some Chinese-Americans use the surname /*ʔsaŋʔ/, from the older stage of Chinese pronunciation for 蔣. Oh no, no, wait: when you say "an older stage of the language" you're actually referring to the modern languages the central government wants to eliminate, don't you? There's Wiktionary again, hand in glove with 21st century digital despotism. I try to shake you awake people, but you're still suffocating in the Iron House of CCP-compliant prescriptivism. Here, I double dog dare you to make a damn Tongyong Pinyin-derived entry like Cyuanjhou if you're still independent enough. But you're not. --Geographyinitiative (talk) 20:13, 3 October 2022 (UTC)Reply

The parenthetical appears to be boilerplate, also used in categories such as Category:English surnames from Spanish. It presumably applies to Chinese too; surely there are, for instance, English surnames derived from Cantonese back in the 19th century when Cantonese phonemically distinguished place of articulation for sibilants. Not everything is a political conspiracy. —Granger (talk · contribs) 20:29, 3 October 2022 (UTC)Reply

"Topolects" vs. "Unified Chinese" edit

Recently, I began to use "Min Nan" as header for several Min Nan entries, based on the existence of following cases:

  1. Min Nan lemmas of Japanese origin, e.g. 歐兜邁 should be more appropriately expressed in Pe̍h-ōe-jī as o͘-tó͘-bái.
  2. Min Nan lemmas with uncertain etynom and with diverse choices of Han characters (as sematic or phonological loans), e.g. siâⁿ written in , , , , , , etc.
  3. Min Nan lemmas with uncertain etynom and without clearly widespread use of Han characters, e.g. phián in phián-thô͘ (to scratch on the ground; to dig soil and turn it over).

But a senpai told me that

Min Nan should not be used as a heading for Chinese character entries, even if the word is exclusively used in Min Nan.

I don't completely object to this practice; I even support the usage of {{zh-see|xx|poj}} to direct Pe̍h-ōe-jī entries to Han character entries. What I object to is the attempt to "unify Chinese" to the extreme and suppress the subjectivity of various topolects in the meantime.

Although there have been discussions (Wiktionary:Votes/pl-2014-04/Unified_Chinese and Wiktionary talk:About Chinese#Unified Chinese revisited) about treating several Chinese languages as a single language "Chinese", neither WT:About Chinese nor WT:About Han script has laid down accurate guidelines that Chinese topolectal entries written in Han characters must be put under the header "Chinese". Furthermore, WT:LT doesn't prohibit the possiblity that any topolect can be treated as a separate language and thus owns its headings.

I understand some people want to merge all entries of Chinese languages for the sake of simplicity or even unity. But it is well-known that languages themselves do not possess inherent simplicity, let alone unity. While the identification, naming and classification of languages are considered objective and scientific, the unification of languages should be considered subjective and arbitrary, and often with a political purpose.

Regardless of the motivation for unifying the language, its impact is too severe to ignore. If one attempts to "unify" a language, some varieties will definitely gain prestige and some get marginalized, as blatantly pointed out and preached by the rationale in Wiktionary:Votes/pl-2014-04/Unified_Chinese. However, The fact that 99% of Mandarin lemmas are cross-topolectal is a consequence that Mandarin has long been occupying the written corpus of Chinese languages and dominating in national language policies of many Chinese-speaking countries. If the practice of "unifying Chinese" in Wiktionary is kept implemented but "Chinese" entries are always centered on Modern Standard Mandarin, then the diversity of Chinese languages will only worsen.

Taking a stance to preserve the diversity of language, not to eliminate it, I recommend encouraging any demonstration of the subjectivity of topolect as a language. After all, since we allow and the lemmas written in Kana (see 大和 for example), what justifies us to prohibit the Hakka language written in Han characters from having its own lemmas?

Based on the above reasons, I would like to propose some improvements to the current practice:

  • About heading
  • If a lemma belongs exclusively to one topolect listed in ISO:636-3, no matter it is written in Han characters or in any allowed romanization, it should be placed under the L2 header specifiying the topolect on its own. The "exclusive usage" should always be attested, of course. For example, 𢯭手 should be placed under the header ==Hakka==, and 鬥跤手 under ==Min Nan==.
  • If a lemma is used in two or more topolects listed in ISO:636-3, then it can be placed under the header ==Chinese==, in the sense that Chinese is a macrolanguage to which the lemma belongs. For example, 幫忙 (used in Mandarin, Cantonese, etc.) and 鬥相共 (used in Min Nan and Zhao'an Hakka) can be placed under the header ==Chinese==, as currently presented.
  • Using ISO:636-3 as a criterion is only advisory but not mandatory. More nuanced distinctions are also welcome.
  • About pronunciation
  • About category
  • Topolect-specific thematic categories (e.g. Category:nan:Technology) should be allowed for all topolects.
  • Template:zh-pron should be modified such that a term is automatically categorized into multiple topolects when the corresponding pronunciations are given.
  • About the Han character variants
  • Template:zh-forms, Template:zh-see and their respective modules should be modified so that the variants of Chinese characters in different Chinese topolects can be processed and categorized. For example, the recent addition of the available value trc in Template:zh-see aims to indicate the Taiwanese Southern Min Recommended Characters. Such specification only appears in Min Nan terms. An expedient approach would be adding a parameter to specify the language code and to display the name of the topolect.
  • Module:zh/data/glosses should include topolectal glosses, e.g. ["𢯭"] = "(Hakka) to help".
  • Creating modules or glosses specific to each topolect is also feasible.
  • About etymology
  • When a Han character is proven a phonological or semantic loan in a certain topolect and therefore owns its own variant forms, it is better to handle the etymology separately. Multiple L3 headers labelling ===Pronunciation x=== or ===Etymology x=== ( ) can be used. See for an example.

Simply put, if a term is cross-toplectal, it can be treated as a "Chinese" term, taking into account the toplectal varieties. If a term is specific to just one toplect, it is treated independently.

I am not trying to overturn the decision made in Wiktionary:Votes/pl-2014-04/Unified_Chinese. My suggestions are certainly not perfect and require more discussion. I just hope to inspire everyone to take the variety of Chinese language more seriously.

Wikijb (talk) 21:14, 6 December 2023 (UTC)Reply

  Oppose. The current practice based on the vote is to have all topolect terms, including a large number Cantonese, Min Nan specific terms. {{zh-pron}} takes care of categorisations, which won't include Mandarin, etc. if only |mn= (Min Nan) was specified. Besides we have {{lb|zh|Min Nan}} labelling technique to make it even more specific. ==Min Nan== L2 headers are only used for POJ soft redirects. The rationale on the vote includes an example of a Cantonese-specific word, which is never used outside Cantonese. Anatoli T. (обсудить/вклад) 23:31, 6 December 2023 (UTC)Reply
  Oppose It leads to confusion in formatting, giving editors an additional hurdle in learning entry creation/formatting. When a term is later found to not only be used in Min Nan but other varieties, there would also be several places where changes need to be made, leading to higher probability of malformed entries; these are usually small changes that only affect categorization, which make them hard to detect. I really warn you against implementing any of these ideas you mention above until consensus has been reached. — justin(r)leung (t...) | c=› } 00:37, 7 December 2023 (UTC)Reply
One area where we could do better if we are to continue with the unified Chinese approach is with categorization with {{C}}. We should probably have both {{C|zh|X|Y|Z}} and {{C|nan|X|Y|Z}} when a term is used in Min Nan. — justin(r)leung (t...) | c=› } 00:41, 7 December 2023 (UTC)Reply
@Justinrleung: Thanks. Do you mean with {{C|nan|X|Y|Z}} won't categorise under Category:Chinese lemmas or topical categories only - Category:zh:All_topics? I support the latter, not the former. Anatoli T. (обсудить/вклад) 01:06, 7 December 2023 (UTC)Reply
@Atitarev: I'm talking about topical categories. — justin(r)leung (t...) | c=› } 01:30, 7 December 2023 (UTC)Reply
I just re-read the proposals above, and there are a few other things I would support. The points under pronunciation are definitely something we should pursue; I don't think there would be anyone opposed to having support for additional varieties. Hailu is already one of the things we're planning to implement in the future.
I'm not exactly sure what the point about etymology means, and how it differs from current practice in general. — justin(r)leung (t...) | c=› } 01:35, 7 December 2023 (UTC)Reply
  Partial oppose. I share basically the same views with Justin. While I am certainly in opposition to the "let's dump everything under Chinese" approach, I don't think the suggestions regarding "heading" would be feasible, and perhaps even worse than the existing approach.
On top of that I think that the points under "Han character variants" are symptoms of the problems of the templates {{zh-see}} and {{zh-forms}} - I also find them rather problematic, but I disagree with the suggestions; they should instead be rewritten/replaced with better templates. – wpi (talk) 03:00, 7 December 2023 (UTC)Reply
  Oppose. Given that you only make reference to Southern Min and Neo-Hakka, I would like to fill you in that (at the very least) a lot of ISO-636-3 groups are questionable at best and some, like wuu would just end up with the same problem we've started with. Splitting headers, like what wpi and justin have already said, would cause a big mess. Whereas your ideas for zh-pron improvement are interesting, at the current point in time, I don't think any form of what you have proposed would be ergonomic or even feasible in Northern Wu due to how varied and complex the tone sandhi systems are. — 義順 (talk) 08:08, 7 December 2023 (UTC)Reply

too many label aliases? edit

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho Hi everyone. I just implemented functionality to display all the labels in all languages that categorize into a given category. See Category:Taiwanese Hokkien for an example. As can be seen in this category, we have a ton of aliases that produce the Taiwanese Hokkien category. Do we really need all of these aliases? It makes bot work a pain to have to account for all of them and it seems needless. What do people think of cutting down the number to something more reasonable? E.g. do we really need a Taiwanese Hokkien and Hakka label (with 20 or so aliases) at all? Why not just put Taiwanese Hokkien and Hakka separately? Benwing2 (talk) 04:09, 17 March 2024 (UTC)Reply

@Benwing2: Agreed to split.
Please also decide on labels. I can see former "Min Nan" changes back and forth with a couple of days(?) and still inconsistent for translations (new, recent, old) and Chinese entries. Confused between "Southern Min" and "Hokkien" for "nan-hbl" language code.
If using a bot, pls don't forget about the alphabetical order for nested translations. Anatoli T. (обсудить/вклад) 04:15, 17 March 2024 (UTC)Reply
@Atitarev Yup, my script to fix up translation tables now sorts things alphabetically in nested translations as well as at the top level. Formerly it didn't change the order of nested translations but that's been fixed. In the former state I ran it on all translation tables but I haven't rerun it universally since the fix, only on pages where I renamed "Min Nan" to Hokkien. If you see any translation tables with Southern Min or Min Nan in them, please let me know; I suspect they've been added recently (i.e. after the Mar 1 dump I used to find translation tables with Min Nan translations). Benwing2 (talk) 04:20, 17 March 2024 (UTC)Reply
@Benwing2 Just on the "Taiwanese Hokkien and Hakka" label, I think it's supposed to be used instead of having "Taiwanese Hokkien" and "Taiwanese Hakka" separately. I agree it's silly, though. Theknightwho (talk) 04:43, 17 March 2024 (UTC)Reply
@Theknightwho Yeah I suppose it was added to avoid a bit of redundancy with separate labels Taiwanese Hokkien and Taiwanese Hakka displaying the word "Taiwanese" twice. But that seems hardly enough reason to have the label and if this is really an issue, we can add a capability in Module:labels to compress adjacent labels of certain sorts in certain ways. Benwing2 (talk) 04:47, 17 March 2024 (UTC)Reply
Return to the project page "About Chinese".