Appendix:Vocabulary lists of Southeast Asian languages

Below are vocabulary lists for Southeast Asian language branches and reconstructed proto-languages.

Many of the lists are glossed in Chinese, with several also in Vietnamese, Russian, and French.


Welcome to Wiktionary's vocabulary lists series. This series aims to have representative word lists for all language families of the world.

  • Purpose: As linguistic lexicographical works, the vocabulary lists are designed with historical-comparative linguistics research goals in mind, such as classifying languages, reconstructing proto-languages, and identifying loanwords. Frequency lists and pedagogical resources are not included.
  • Glosses: Each list maintains original glosses (definitions, meanings) as found in the original sources. Translated glosses are sometimes added as additional columns if the original glosses are not in English. Translations that are not in the original source are noted in the lists, and do not replace the original glosses. Unlike Swadesh lists and other standardized lexicostatistical word lists, the vocabulary lists here do not consist of lists with predetermined glosses. Instead, the vocabulary lists here can serve as "raw building blocks" for compiling Swadesh lists.
  • Content: The lists are typically in the 50-1,000 item range for lexical entries. Definitions are typically concise and focus on basic vocabulary concepts such as numerals, body parts, and natural phenomena.
  • Scope: Emphasis is placed on divergent language isolates, families, and branches that would likely be crucial for etymological reconstruction and classification. Proto-languages are included whenever possible. Many of these language groups are sparsely documented and/or extinct. As a result, some of these lists may actually be the only extant documentation of a language or even language group.
  • Sources: The word lists are adapted from academic sources published by linguists. Thus, all lists must be properly referenced with adequate notes and metadata. Many of these sources are out of print, with highly limited distribution and accessibility.
  • Digitization: As with Wikisource texts, the lists are individually and painstakingly digitized using a variety of methods, such as optical character recognition (OCR), manual typing, and document conversion.
  • Encoding: Unicode.

Open-access online lexical databases that are similar in design, content, and research goals include STEDT, MKED, RefLex, Chirila, and Starling.

Navigation templateEdit

Vocabulary lists of Southeast Asian languages

p-Tibeto-Burman • Old Chinese (basic) • p-Southern Min • Greater Bai • p-Tujia • p-Naish • p-Ersuic • Guiqiong • Horpa • p-Lalo • Lalo • Akha • Woni • Axi • Nesu • Yi (Mihei) • Kathu • Gong • p-Karenic • p-Luish • p-Bodo-Garo • Kuki-Chin • Suansu • Mru • p-W. Tibetan • Tibetan (Lajiao) • Amdo Tibetan • Zakhring • Tshangla • Kho-Bwa • Mey • p-Puroik • p-Hrusish • Koro • Greater Siangic • Raji-Raute • Dhimalish • Baram-Thangmi • Bhujel • p-Kham • Dura • Bunan • (Nepal)


p-Austroasiatic • p-Munda • p-Khasian • p-Palaungic • Quang Lam • p-Khmuic • p-Pakanic • p-Vietic • p-Katuic • p-Bahnaric • p-Pearic • p-Khmeric • p-Monic • p-Aslian • p-Nicobarese


p-Hmong-Mien • Hmong-Mien • p-Hmongic • Pa-Hng • Xong • Pana • She • p-Mienic • Mienic • Mien (Gongcheng) • Biao Min (Shikou)


p-Kra-Dai • p-Kra • Laha • Qabiao • Gelao • p-Kam-Sui • Kam-Sui (Hunan) • p-Lakkia • Biao • p-Tai • Zhuang (Tiandeng) • Bouyei • p-Be • Jizhao • p-Hlai • Jiamao









Open-access online lexical resources for each Sino-Tibetan branch are listed below. Branches for which lexical data is available in the Sino-Tibetan Etymological Dictionary and Thesaurus (2015) is noted as (STEDT).

Western Himalayas
Eastern Himalayas
Central Sino-Tibetan branches


See alsoEdit

External linksEdit