Wiktionary:About Han script

A user suggests that this Chinese project page be cleaned up.
Please see the discussion on Requests for cleanup(+) or the talk page for more information and remove this template after the problem has been dealt with.

There are a number of entries on Chinese characters, which are used in the People's Republic of China (simplified Chinese), the Republic of China (Taiwan) (traditional Chinese), Japan (kanji), and Korea (hanja).

Chinese characters were formerly used in Vietnamese (chữ Nôm), have been used in minority languages in China: Bai, Dong, Miao and Zhuang (this latter using significant variants), and Siniform scripts were used for the extinct Khitan, Jurchen, and Tangut languages.

In addition to laying out a standard format, this page recommends standards on writing (creating or editing) these pages.

Recommendations edit

  • Use {{lang}} to wrap Chinese character text, specifying the language and using Hant (traditional Chinese), Hans (simplified Chinese), Jpan (Japanese), or Hani (generic Chinese) as the script, or characters may display using inappropriate fonts and forms.
    • Due to Han unification, a given character does not have a language attached. As a result, if the language is not explicitly defined browsers may use a set of fallback fonts that may be inappropriate for a given text. Chinese words may be jarringly displayed using a Japanese font with insufficient coverage needed for Chinese (resulting in two fonts with different styles being used at once), or Japanese terms may be displayed using a Chinese font that uses character forms that deviate from the Japanese norm. To avoid this, {{lang}} is used to tell the browser that "this is Chinese text, use a Chinese font" or "this is Japanese text, use a Japanese font".

Entry layout edit

Chinese characters are both characters and the spellings of words in various languages. Thus entries for Chinese characters:

  • begin with a "Translingual" section on the character itself, and
  • then include entries in each language that uses them.
    • The language-specific sections themselves ("Chinese", "Japanese", "Korean", etc.) begin with a section on the character itself (except for Chinese), then include a part of speech section if the character can be used in isolation (for example, can be used as a normal noun in Japanese). The part of speech sections follow WT:ELE, but the character sections have specific formats (reading, eumhun, compounds, etc.), as detailed below.

Theoretical code for a most basic entry is shown below, using .

{{also|経|经}}
{{character info}}
==Translingual==
{{stroke order|type=animate}}
{{Han simplified forms|經|経|经}}

===Han character===
{{Han char|rn=120|rad=糸|as=07|sn=13|four=21911|canj=VFMVM|ids=⿰糹巠}}

====References====
{{Han ref|kx=0925.240|dkj=27508|dj=1360.320|hdz=53402.110|uh=7D93}}

----

==Chinese==
{{zh-forms|s=经}}

===Pronunciation===
{{zh-pron
|m=jīng
|c=ging1
|cat=noun
}}

===Definitions===
{{zh-hanzi}}

# <definition>

----

==Japanese==

===Kanji===
{{ja-kanji|grade=|rs=糸07|style=ky|shin=経}}

# [[classic]] work; [[canon]]

====Readings====
{{ja-readings
|goon=きょう
|kanon=けい
|kanyoon=きん
|kun=へ-る, たていと-
}}

===Noun===
{{ja-noun|きょう|shin=経}}

# {{kyujitai spelling of|経}}: <definition>

----

==Korean==

===Hanja===
{{ko-hanja|hangeul=경|eumhun=지날 경|rv=gyeong|ehrv=jinal gyeong}}

# [[classic]] work; [[canon]]

----

==Vietnamese==

===Hán tự===
{{vi-readings|hanviet=kinh|nom=canh, kinh|rs=糸07}}

# [[classic]] work; [[canon]]

Categories of characters edit

Chinese characters have been used in a number of languages and regions in the Sinosphere, and thus have a great deal of variation.

  • Many characters are used in the same form, across all languages;
  • some characters only exist in one language or another;
  • and other characters have different forms in different languages.
    In Unicode, many are considered variants not worth encoding separately (see Han unification), some are encoded separately to preserve backwards compatibility with legacy encodings, and some appear identical but have different stroke orders.
    There are also handwritten simplifications (略字; Japanese ryakuji, Korean yakja) which may or may not be encoded.

It is useful to indicate both:

  • What categories a character falls into
  • What variant forms, if any, a character has

Most basically, there are traditional Chinese characters, and two major simplifications: simplified Chinese characters (Chinese) and shinjitai (Japanese).

In more detail:

Variant forms edit

Traditional
Shinjitai
Simplified

A character may have multiple forms. One should:

  • Indicate the variant forms, both by an {{also}} hatnote and via a {{Han simplified forms}} (Translingual) or {{zh-forms}} (Chinese) template.
  • Not use inappropriate forms. For instance, do not write Japanese words in simplified Chinese (if the form differs from the shinjitai form).

For instance, the character for "reading" has 3 forms, as in the box at right. This box is produced by the template {{Han simplified forms}}.

Headings edit

There are a number of templates which help with the layout, which are listed below.

The only thing that should come before the “Translingual” heading is, if necessary, a {{also}} hatnote for similar characters which may be confused (such as and ), and for variant forms; plus {{character info}}.

Translingual edit

(Stroke order) edit

A stroke order diagram may be displayed using the template {{stroke order}}, with parameter strokes= for sizing. All existent diagrams can be found at commons:Category:CJK stroke order. A separate “Stroke order” section is not to be created.

Caveat: Different stroke orders edit

Beware that, just as some characters have different forms in different languages, some characters have different stroke orders in different languages.

There are potentially 3 (or more) different stroke orders, but these very often coincide:

  • Traditional Chinese, used historically, and in Taiwan and Hong Kong
    Note that there are some differences between modern Taiwanese and Hong Kong standards and actual historical practice; for example, differs in Taiwan, while and differ in Hong Kong.
  • Japanese
    e.g. , , .
  • Simplified Chinese, used in mainland China
    Also known as Modern Chinese; some characters were not simplified, but their stroke order was changed.

There are also Korean and Vietnamese stroke orders and character forms, but modern Korea generally uses Japanese conventions, and Vietnamese is only of historical interest, hence relatively unimplemented.

For instance, is different in Chinese and Japanese, while the radical (and thus all derived characters) differs in simplified (and Taiwan) and (historical) traditional Chinese.

When simplified and traditional Chinese stroke orders differ, Japanese and simplified Chinese coincide. There are apparently no examples where all three share the same form but different stroke orders, though there are examples where the form differs in all three.

  • {{stroke order}} defaults to Simplified Chinese.
  • If there are multiple stroke order diagrams available, please include all forms.
  • To include Chinese and Japanese, use the parameter |strokes=jbw for the Japanese stroke order: see .
  • To include traditional and simplified forms, you must currently do so manually: see .

Etymology edit

If possible, include “Etymology” section explaining the form of the character, listing earlier forms, and explaining the development of the character form.

Beware that there are many folk etymologies based on analyses of modern forms, with many dating to the 2nd century CE (when present forms largely stabilized)! Modern scholarship based on oracle bone script often provides different etymologies. See References.

Please do not include discussion of the etymology of the word (often Old Chinese) that the character was developed to represent; this belongs in the language-specific section. The Translingual "Etymology" section should not include pronunciation information, except when necessary to understand the form. This occurs for example in phono-semantic compounds, where reconstructions of the pronunciations of the compound character and its phonetic are relevant to the form, but sound is completely irrelevant to pictographs and ideographs. Reconstructed pronunciations should be cited and follow the usual rules for historical Sinitic languages – see About Old Chinese and About Middle Chinese for guidelines, and for an example.

Most characters were coined during the Old Chinese period; this needn’t be explicitly mentioned, but can be stated if helpful. If a character was not coined during the Old Chinese period – notably Middle Chinese or foreign coinages (especially Japanese, some Korean and Vietnamese), this should be mentioned.

Simplified and Shinjitai edit

For simplified Chinese and shinjitai character, the "Etymology" section should simply link to the traditional Chinese or kyūjitai and explain the method of simplification, as in Simplified Chinese characters: Methods of simplification and Shinjitai: Methods of simplifying Kanji. This can be done using the {{Han simp}} template, which also categorizes.

Traditional and coinages edit

For traditional Chinese and country-specific coinages, the "Etymology" section should:

  • Classify composition (see Chinese character classification). One should provide traditional 六書六书 (liùshū, six writings) classification using the template {{liushu}} and break up compound characters via {{Han compound}}. Note that:
    • The overwhelming majority of Chinese characters (90%+) are phono-semantic compounds.
    • Beware of folk etymologies based on current forms (especially claims that a character is an ideogrammic compound) – the current form is often a simplification of an older form, which may not be related to the current components. For instance, the lower part of is cognate to , not to , which it more closely resembles.
  • Show previous forms. These are collected at Wikimedia Commons, and the template {{Han etyl}} will display them if they exist.
    • Note that older forms themselves had variants, which need not be exhaustively displayed.

Han character edit

The main section is the “Han character” section, using the {{Han char}} template, which includes radical, stroke count, and various input methods.

Previously this is followed by definitions (still in “Han character”). This is deprecated and should not be used. Definitions should be placed under the language heading (“Japanese”).

This should also include a “Reference” section, using {{Han ref}}, which links to the character in various standard dictionaries, and includes the Unicode number (linking to Unihan in the process).

==Translingual==
{{stroke order}}

===Etymology===
(Explanation of form; ideally shows earlier forms.)

===Han character===
{{Han char|rn=109|rad=目|as=03|sn=8|four=40716|canj=JBMM}}

====References====
* {{Han ref|kx=0489.010|dkj=13733|dj=0848.140|hdz=21482.010|uh=65E5}}

General considerations edit

Compounds edit

Compounds and idioms involving a character (熟語) are listed language by language, since they vary between languages.

List compounds using a suitable Category:Column templates, generally {{rel-top5}} or {{top5}} if only listing compounds, or {{rel-top3}} or {{top3}} if also providing a gloss. See is an excellent example.

Compounds should be collated by radical-and-stroke sorting; for order of radicals, see Appendix:Chinese radical. However, as per Wiktionary:About Japanese#Compounds, compounds that begin with the character should come first.

As per Wiktionary:About Japanese#Compounds, terms involving a character should be listed in an L4 section called “Compounds” – by contrast, in the entry for a 2 or more character compound, longer compounds should be called “Derived terms”.

A separate L4 section called “Names” should contain any common names constructed from the character, even if such names duplicate a compound word.

Note that some pages list compounds as “Derived terms” in the “part of speech” section: contrast 日#Mandarin and 天#Mandarin.

Compound entry edit

On the page for a compound (2 or more Chinese characters), some general considerations.

As above, longer compounds (containing a given compound) should be in a section called “Derived terms”.

If one compound is obtained from another by re-arranging the characters, such as 会議 and 議会, it is useful to link these; the “Related terms” section fits best, presuming an etymological connection.

Chinese edit

See also: About Chinese

For the layout of “Chinese” section, see Wiktionary:About Chinese#Entry format. The following is an example:

==Chinese==
{{zh-forms|s=...|alt=...}}

===Glyph origin===
...
===Etymology 1===
From ...

====Pronunciation 1====
{{zh-pron
|m=
|c=
|h=pfs=
|md=
|md_note=
|mn=
|mn_note=
|mn-t=
|w=
|mc=
|oc=
|cat=
}}

=====Definitions=====
{{zh-hanzi}}

# definition 1
# definition 2
# {{†}}  definition 3

======Usage notes======
...

======Synonyms======
...

=====Compounds=====
{{zh-der|...}}

====Pronunciation 2====
{{zh-pron
|m=
|c=
|mc=
|oc=
|cat=
}}

=====Definitions=====
{{zh-hanzi}}

# ...
===Etymology 2===
====Pronunciation====
{{zh-pron
|m=
|c=
|h=pfs=
|md=
|md_note=
|mn=
|mn_note=
|mn-t=
|w=
|mc=
|oc=
|cat=
}}

====Definitions====
{{zh-hanzi}}

# ...

Japanese edit

See also: About Japanese

In addition to L3 part of speech headings, Japanese entries for a Chinese character have an L3 heading called “Kanji”, which has an L4 heading called “Readings”, which can use the template {{ja-readings}}. This currently supports the usual on, kun, and (rarer) nanori readings, but also nazuke and 呉音 (go-on) readings.

See Wiktionary:About Japanese#Kanji_entries for more on the format of Japanese entries.

Korean edit

See also: About Korean

There should be an L3 heading for “Hanja”, beginning with the eumhun (meaning/reading), which can be obtained by the template {{ko-hanja}}. This also supports the following romanizations, via the respective parameters: Revised Romanization of South Korea (ehrv), McCune-Reischauer (ehmr), Yale Romanization of Korean (ehy).

Next there should be an L4 heading “Compounds”; in addition to the hanja form, it should also include hangeul forms for all words.

Vietnamese edit

Currently, the vast majority of Vietnamese character entries indicate Hán-Việt readings and omit Nôm readings. The layout has not been standardized, though most have a single L3 heading, "Han character", with {{vi-readings}} below it.

Works in chữ Nôm are quoted in the part of speech section using the {{vi-ruby}} template. Any quốc ngữ works should be quoted in the corresponding quốc ngữ entry.

Note that most Nôm text includes characters not yet encoded in Unicode. Most Nôm sources make use of Private Use Area characters that are found in various Nôm fonts. Do not use Private Use Area characters, because they will be misinterpreted by readers with different Nôm fonts installed. Instead, use Ideographic Description Sequences. (See Template:vi-ruby for an example.)

Pronunciations and etymology generally belong in the quốc ngữ entry. Also in that entry, each headword line takes a list of characters (according to Nôm readings) as an additional parameter. Hán-Việt forms may be listed under an L3 "Readings" section using {{han tu form of}}.

Proposal edit

There is a proposal at Wiktionary:Beer parlour/2013/December#Nom character that would do away with the current layout in favor of the following structure:

  • Character – "Han character" is avoided because it appears to exclude Nôm readings or Nôm-only characters.
    • Readings – Specify any Hán-Việt and Nôm readings using the template {{vi-readings}}.
    • Compounds
  • part of speech (Noun, Verb, etc.) – Because Hán-Việt readings rarely differ from the definitions in the Translingual section, the parts of speech sections are for definitions of Nôm readings only. Use headword templates like {{vi-noun}} and {{vi-verb}}, listing Nôm readings in the first parameter.
  • References (if applicable)

Others edit

Chinese characters and similar scripts (see Chinese family of scripts) are used for other languages than those primarily discussed above.

Ryukyuan languages edit

Chinese characters are used for some Ryukyuan languages, following the format for Japanese.

Minority languages in China edit

Some languages like Bai, Dong, Miao and Zhuang, like Vietnamese, are currently officially romanized, but used Chinese characters in the past, and have limited usage currently. These languages may uses some significant variant characters, not fully encoded in the Unicode Standard at present.

Although most of entries in these languages are in romanized script, entries in Chinese characters may exist. They may be soft redirected to entries in romanized script using an appropriate template (e.g. {{za-sawndip form of}}).

Other scripts edit

The extinct languages of Khitan, Jurchen, and Tangut each use their own script, derived from Chinese characters. Characters in these scripts is not unified with Han characters, so information in this page does not apply to entries in these languages.

Some other scripts used in China are not derived from Chinese characters, but often have borrowed from Chinese. These notably include Dongba script, Geba script, Sui script, and Yi script.

See also edit

References edit

K’s Bookshelf – useful lists for finding related characters:

Etymology edit

  • Xǔ Shèn 許慎/许慎. Shuōwén Jiězì “說文解字”/“说文解字” 100–121 CE – classic reference, but due to lack of access to earlier forms, has errors
  • Xu Zhongshu 徐中舒. “丁山說文闕義箋” [Commentary on the errors in Shuowen by Ding Shan]
  • 李孝定 Lĭ Xiàodìng (Lee Hsiao-ting, 1965). 甲骨文字集釋 Jiǎgǔwénzì jíshì, [Collected interpretations of oracle bone characters], 台北 Táibĕi, 南港 Nángǎng (Nankang): 中央研究院歷史語言研究所 Institute of History and Philology, Academia Sinica
A authoritative modern reference.