The policies for the Chinese language are changing.
Following Wiktionary:Votes/pl-2014-04/Unified Chinese and the preceding discussions and agreements, the structure of Sinitic or Chinese (Mandarin, Cantonese, Min Nan, Wu, Hakka, etc.) entries is changing. The entries are being merged and new methods to display topolect information are being used. The body of this page needs to be updated to explain the new policy.

The Chinese or Sinitic language family includes a number of related lects which have very similar written forms, but different grammar, vocabulary and especially pronunciation. On Wiktionary, these lects are treated under the header ==Chinese== and the language code zh unless they natively use a non-Chinese script.

The Chinese lects

The many varieties of Chinese which are written using Chinese characters (Mandarin, Cantonese, Wu, Min Nan, Min Dong, etc.) are handled under a single header (==Chinese==) and language code (zh). Lects which use a different script, e.g. Dungan, which uses Cyrillic, have their own headers and language codes.

Terms are defined in relation to Standard Written Chinese. If pronunciations or senses are limited to certain dialects, topolects, or regions, this is indicated using context ({{label}} or {{lb}}) and qualifier ({{qualifier}}) tags. For example, 簡訊 uses {{lb|zh|chiefly|Taiwan}} and 煠熟狗頭 uses {{lb|zh|Cantonese}} to show that these terms are mainly used in Taiwan and exclusively Cantonese, respectively.

Character forms and romanization

Chinese may be written with either traditional characters or simplified characters. At Wiktionary, the content is hosted at the traditional form of an entry to avoid entries becoming out of sync. Traditional entries link to simplified forms using {{zh-forms}}, and simplified entries are stubs that point to traditional forms using {{zh-see}}.

Wiktionary also has stub entries for the pinyin forms of Mandarin terms and full entries for the POJ forms of Min Nan terms; see #Mandarin and #Min Nan for details.

Entry format

Chinese entries should follow format guidelines in Entry layout explained. 朋友 is a good example of how Chinese entries should ideally be formatted. {{zh-new}} can be used to accelerate creation of entries.

Templates that rarely have a good reason to be omitted are {{zh-forms}} and {{zh-pron}} ({{zh-pron}} does the appropriate categorization of entries; do not omit it). Useful Chinese-specific templates include {{zh-l}}, {{zh-usex}}, {{zh-cat}}, and {{zh-compound}}.



The standard romanization of Cantonese used at Wiktionary is Jyutping.

Stub entries are made for individual Jyutping syllables.

Tone change

Tone change is indicated using a hyphen -: ‎(joeng6-2). (Other websites may use an asterisk *.)



The standard romanization of Mandarin used at Wiktionary is Hanyu Pinyin.

For individual pinyin syllables, we have entries using tone numbers (zhang1), tone diacritics (zhāng), and no tones marked at all (zhang).[1] For pinyin romanizations of multisyllabic words, we have stub entries in only tone diacritics (yánlì, NOT yan2li4).[2]

Stubs for words spelled in pinyin are allowed; see yánlì for an example. The entries only make the search for words easier; all important information (part of speech, definitions, synonyms, etc.) are contained in the hanzi entries and are not allowed on the pinyin entries.[2]

Tone sandhi

Some Mandarin dictionaries are inconsistent when it comes to depicting tone sandhi in Pinyin. For example, the character () (normally - fourth tone) is changed to second tone () when followed by another fourth tone syllable. Some dictionaries spell it with the converted tones (búshì in this case, for example, HSK汉语水平考试词典, ISBN 978-756172078-3), while others use the root tones (bùshì, for example, 现代汉语词典, ISBN 978-962070134-4). Wiktionary uses the root tone for syllables when spelling words in Pinyin (bùshì). This is also more consistent with how tone sandhi is handled in other situations in Mandarin. For example, 可以 (kěyǐ) is kěyǐ, even though the first syllable changes to second tone (kéyǐ).



The standard romanization of Hakka used at Wiktionary is Pha̍k-fa-sṳ.

Min Dong


The standard romanization of Min Dong used at Wiktionary is Foochow Romanized.

Hokkien Min Nan


The standard romanization of Hokkien used at Wiktionary is POJ.

Entries spelled using POJ are allowed. See put-khó-su-gī for an example.

Teochew Min Nan


The standard romanization of Teochew used at Wiktionary is the Teochew Romanization Scheme (not Gaginang's Peng'im).

Tone sandhi

Not indicated.


The Shanghai dialect of Wu is treated as the standard.


The standard romanization of Wu used at Wiktionary is an in-house romanization, WT Romanisation.

Historical languages



Historical Sinitic languages include the spoken languages Middle Chinese (ltc) and Old Chinese (och), the written language Literary Chinese (lzh), and the protolanguage Proto-Sino-Tibetan. Entries for words in these languages are used, except for Proto-Sino-Tibetan, which is a protolanguage and thus in the Reconstruction namespace. These terms can also appear in etymologies for entries in modern Sinitic languages, and in entries for languages that have borrowed from Chinese, notably Japanese, Korean, and Vietnamese.

Finer distinctions are possible, such as Late Middle Chinese and Early Middle Chinese for the spoken language, and Literary Chinese versus earlier Classical Chinese for the written language. These distinctions can be made in the text of etymologies, but these do not have ISO 639 codes, and thus are not used for level 2 headings.

The precise meaning and status of these “languages” is complicated: narrowly speaking “Middle Chinese” and “Old Chinese” refer to various phonological reconstructions, notably based on rime dictionaries, and do not necessarily refer to a specific historical dialect or common language. Nevertheless, they are useful designations for historical periods.

Most modern Sinitic languages descend from Middle Chinese, with the notable exception of Min, which diverged earlier, with Proto-Min also descending from Old Chinese; see branching of modern varieties of Chinese. A notable example of this difference is , from which English tea is from Min and chai is from other Chinese.

Literary Chinese is significantly different from the spoken languages; this may be compared with Medieval Latin versus Romance languages. Literary Chinese (lzh) is the correct source language for literary terms in modern Sinitic languages, notably chengyu (four-character idioms), and in borrowings such as the corresponding Japanese yojijukugo.

Middle Chinese

As Middle Chinese phonology is not attested (it is only reconstructed), please be sure to mark pronunciations with *.

Old Chinese

As Old Chinese phonology is not attested (it is only reconstructed), please be sure to mark pronunciations with *. As sources differ, please carefully cite specific references (author and year) for any reconstructions.

References for Old Chinese phonology include:

Cognates and stubs

Across Sinitic languages, a single written form is very frequently shared across a long historical period and wide geographical area. Thus cognate entries in different languages appear on the same page; this occurs quite frequently for cognates in closely related languages in other scripts, but to nowhere near the same degree as in Sinitic languages. Due to this, it is generally unhelpful, and possibly incorrect, to create an entry for one Sinitic simply by copying the heading and definitions for Mandarin. It is unhelpful because this adds no information beyond which a reader could themselves guess (cognate so probably the same meaning), and possibly incorrect because words do differ between these language; blindly copying without a reference is not reliable.

Thus, when creating a new Sinitic entry, please try to add some information distinctive to the particular language, particularly pronunciation, references, or citations.

For etymologies, each entry should include an Etymology section indicating its immediate ancestor term. For native words in modern Sinitic languages this is either Middle Chinese (most) or Proto-Min (thence Old Chinese) for Min languages. Per usual practice (see Wiktionary:Etymology), it is acceptable to include full etymologies back to Proto-Sino-Tibetan in modern entries. However, unless there is something specific to the etymology of a term in a given language, this is tedious to repeat for all modern languages. It is thus preferred (and sufficient) to only include the full history at representative languages, namely Mandarin and Min Nan (most used in each branch), with other languages just indicating the immediate predecessor and having a link reading “more at Mandarin/Min Nan”.

Similarly, it is tedious and not helpful to list contemporary cognate terms unless some particular relationship or contrast is being given. Instead, ancestral relationships can be given both backwards (in the Etymology section), to Middle Chinese, Old Chinese, and Proto-Sino-Tibetan, and forwards (in the Descendents section), from Middle Chinese, Old Chinese, and Proto-Sino-Tibetan to later forms. In these Descendents sections, listing pronunciations of descendent terms along with the spelling allows easy comparison, and avoids the duplication of the same listing in all modern forms. These are more useful than sibling relationships between cognates.

Chinese characters

Chinese characters should not be conflated with Chinese words or morphemes. Information about the characters themselves appear in the Translingual section, which appears before all other sections. See Wiktionary:About Chinese characters for discussion of its format.

In general the Translingual section only includes information on the character form (in Etymology and script variations) and the meanings, which are widely shared. It does not include pronunciation information, except when necessary to understand the form. This occurs for example in phono-semantic compounds, where reconstructions of the pronunciations of the compound character and its phonetic are relevant to the form.

Specifically, discussion of the phonetic change of a character over time in Old Chinese, Middle Chinese, and various modern Sinitic languages belongs in the language-specific sections. However, information on when a meaning of a character developed (whether in some Sinitic language or a separate one, such as Japanese) is acceptable in the Translingual section.

Other entry sections


If possible, it is desirable for entries to have etymologies, showing earlier pronunciations, spellings (if hanzi usage has changed), and semantic change (change in meaning).

For terms or phrases that can be traced back to Literary Chinese, you may wish to use the etymology template in the form {{etyl|lzh|cmn}} (where cmn is the Modern Chinese language in which the term is used).


Like other Wikimedia projects, Wiktionary is largely the work of anonymous volunteers. Therefore it is important to cite authoritative reference works such as dictionaries and encyclopedias. The {{pedialite}} template is a good choice if you want to cite a Wikipedia article. If matching articles in a Chinese language and English can be found on Wikipedia (particularly true for nouns), you can use the {{pedialite}} template in the following manner (example given for Mandarin):


In the references section, it will look like:


For external websites, you can use the {{cite-web}} template. Here is an example:


  • 剪刀”, in 國語辭典 [Guoyu Cidian On-line Mandarin Dictionary][1] (in Mandarin), accessed 9 April 2008.

Reference books

For books, you can use the {{cite-book}} template. For your convenience, the filled out templates for some authoritative reference works are provided (click on the blue edit button to copy):


Hanzi form templates

To display various forms of Hanzi, Chinese entries make use of {{zh-forms}}. Documentation is available on its template page.

Additional help

Help from the community

Sometimes, we know there is a problem, but don’t know what to do to correct the problem. If you should find a Chinese entry with a problem that you do not know how to correct, there are several ways to approach the situation.

  1. Mark the page with {{attention}} with a language code. This template adds the entry to the cleanup category for that language (such as Category:Mandarin terms needing attention), where another user can then find and correct the problem. It helps if you include comments on the entry’s talk page explaining what the problem is or why you think the page needs attention.
  2. Raise the issue on Wiktionary talk:About Sinitic languages. Note that this approach is primarily for issues of style, formatting, categorization, and not for specifics of content.
  3. Mark the page with {{rfc}}. this is a more general cleanup tag, and it allows the user to include reasons or concerns as an argument in the template. Be sure to also add an entry to WT:RFC concerning the word so that other editors will be made aware of the problem.

Translations into Chinese languages/dialects/topolects

  • All translations into Chinese languages must be grouped under * Chinese. Subdialects can be sub-nested. Regional variations can be flagged with {{qualifier}}
* Chinese:
*: Mandarin: {{t|cmn|肥皂|tr=féizào|sc=Hani}}
*: Min Nan: {{t|nan|雪文|tr=sat-bûn}} {{qualifier|Zhangzhou}}, {{t|nan|茶塊|tr=tê-kóe}} {{qualifier|Quanzhou}} ...
  • The traditional precedes the simplified version if they are different and the transliteration is provided with the simplified version.
* Chinese:
*: Mandarin: {{t|cmn|心理學|sc=Hani}}, {{t|cmn|心理学|tr=xīnlǐxué|sc=Hani}}
  • If translation is both simplified and traditional, only one translation is given.
* Chinese:
*: Mandarin: {{t|cmn|三明治|tr=sānmíngzhì|sc=Hani}}

Other Chinese aids

  1. ^ Wiktionary:Votes/pl-2009-12/Treatment of toneless pinyin syllables
  2. 2.0 2.1 Wiktionary:Votes/2011-07/Pinyin entries
