Wiktionary:Language categories

link={{{imglink}}} This is a Wiktionary policy, guideline or common practices page. This is a draft proposal. It is unofficial, and it is unknown whether it is widely accepted by Wiktionary editors.
Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES.

This page aims to explain the category system used in Wiktionary for languages, because it is often confused and may lead to miscategorization.

Top-level categoriesEdit

There are two top-level categories for languages:

  • Category:Languages is a topical category which contains the names of languages in English and other languages, for example German or Deutsch.
  • Category:All languages is a lexical category and should contain only language categories, i. e. categories containing all terms in a specific language, like Category:French language. There are also some branches, for which the same applies. Note that only independent languages should be categorized here. Dialects, like Category:American English, should be categorized into the respective "regional" category.


The Category:All languages should contain all language categories that exist at Wiktionary. It is automatically added if you use {{langcatboiler}} (see below for usage instructions). There are also some side branches.


Category:Languages by country contains all languages again, but ordered by the country they're spoken in. Categorization is first by continent, then by country. Individual categories should only be created for internationally recognized sovereign countries. Exceptions can be made on individual basis (for example for Taiwan). Categories can also be created for clearly defined multicultural regions, for example Category:Languages of the Caucasus. Again, the exact details are decided on a case-by-case basis.

Language categories should be categorized in all country categories with native populations speaking this language. Note that immigrant communities should be always excluded, so even though there may be speakers of Samoan in Berlin, this does not mean Category:Samoan language should be categorized into Category:Languages of Germany. Extinct languages should be categorized in the present-day country(/ies) that corresponds to their former language area.

Constructed languages should not appear in country categories, since they do not have a native area in the traditional sense.


Category:Languages by family contains all languages yet again, but this time ordered by genetic classification, aka language families. Categorization should generally follow the language family tree, and is automated using the {{family cat}} template, which automatically adds language family categories to their parent family based on the list in Module:languages. Note that only language families which are recognized by the majority of linguists should be used. So there shouldn't be a category for either Altaic nor Nostratic languages. Groups of languages for which there is no genetic relationship, such as Category:Amerindian languages or Category:Caucasian languages, may also have a category, but the template will not automatically categorize in these categories.

Due to their nature, sign languages should not appear in this category tree. They should instead be categorized into the special Category:All sign languages.

There are also some special categories used in some special situations:

  • Category:Language isolates contains languages which are not known to be related to any other languages, like Sumerian. Even if a language has multiple evolutions with separate categories (like Old Korean, Middle Korean and Modern Korean), they should appear in this category, but it may be more feasible to create a small family category and class that as an isolate rather than the individual languages.
  • Category:Constructed languages is used for constructed languages, i. e. languages created by a person or entity intended for international communication.
  • Category:Unclassified languages is used for languages which cannot currently be ordered into the category tree, in most cases because of sparsity of language data. In theory, all families that have no parent family can be considered unclassified, but they are not put in this category.
  • Category:Pidgins and creole languages is used for all pidgins and creoles. These should not appear in the normal language family tree, but may be categorized by source language.
  • Category:Mixed languages is used for mixed languages, which is a special case of a creole language.


Category:Languages by script contains all languages once again, but this time ordered by the script they are written in. if a language is or was officially written in multiple scripts, it should be categorized into all appropriate script categories.

If a language does not have an official script (which is often the case for small minority languages), but it is written in a linguistic notational script (in most cases Latin), it should be categorized as such. Even though there is a Category:Undetermined script languages, it should be used only when a language is not written at all, i. e. when no data exists whatsoever.

Language categoryEdit

The category of a language should always use {{langcatboiler}}.

The first parameter is the language code, which can be looked up at Wiktionary:Language codes. All further parameters are used for the countries the language is spoken in. For example, Category:English language should contain the following (note that this example is only explanative, and does not reflect the actual contents of this category):

{{langcatboiler|en|the United States of America|the United Kingdom|Australia|New Zealand}}

To change the family and script of a language, you need to edit Module:languages. The former takes the full language family name, while the latter only takes script codes, which can be looked up at Wiktionary:Scripts.