Appendix talk:Swadesh lists

Latest comment: 1 year ago by Alessandrofalconi in topic Reorder list?

Ordering. edit

I'd like to add Dutch translations, but I'm not sure where that column should go. Is it in alphabetical order (between English and French)? Should it be close to the closest related languages (English and German)? Or doesn't it matter and do I put it right at the end? D.D. 21:19 Mar 10, 2003 (UTC)

There's no established understanding on this matter, but add on a few more languages and we've got a problem. That's going to require some thought. ☮ Eclecticology 23:23 Mar 10, 2003 (UTC)
Could it be useful in this respect to create different pages with English as the base language and groups of related languages. Let's say a page with English - Afrikaans - Dutch - Frisian - German. Another with English - French - Italian - Romanian - Spanish. Etc...
On a more general level, I wonder if we should think about using subpages in certain cases (the Swadesh list and other lists, the interlanguage question, ...). I know that Wikipedia decided against it after a lot of discussion. But I'm getting more and more convinced that Wiktionary is something very different, with a content that needs much more structuring and hierarchy. And probably a number of changes in the software too. D.D. 09:27 Mar 11, 2003 (UTC)
I very much agree, and we very much need a place other than the "Swadesh list" to go into greater detail. Eclecticology
I've created [1] to do that. D.D. 19:56 Mar 11, 2003 (UTC)

Hashes. edit

What are this hashes for? Is it a sign of ancient numbering scheme (replaced by HTML table)? May I remove them? Youandme 02:32 Mar 11, 2003 (UTC)

That's what it was. The numbering and order is as it usually found on such lists, notably the Rosetta Project which has at least partial lists for some 1,200 languages. I can't be sure whether there is a significance to the order, or whether the entries should be hard numbered

Expanded lists? edit

Does anyone know if there is another system in use or a more expanded list than Swadesh for a standard list of frequently used/important words? With the Swadesh list as it is, given that words as fundamental as "and" come toward the bottom of the list, and "or" is excluded, while "louse" is included, I'm really at a loss, even if lice are a common part of peoples' lives in the world. Personally, I am interested in a list for helping people to simultaneously teach multiple languages at once where the most basic words essential for communication would be covered. Any ideas? - Brettz9 04:48 Mar 26, 2003 (UTC)

Possibly you want something like Basic English, see http://www.basiceng.com/words.html --Imran 20:20 Mar 27, 2003 (UTC)
Perfect, I think...thanks! - Brettz9 20:46 Mar 27, 2003 (UTC)

Stab... with a dagger? edit

Which meaning of stab is intended? Since the word is listed among the group hit, cut, split, scratch, dig I think we are dealing here with basic technological lexicon: how man can transform the environment with his hands and elementary tools. May I assume that stab means something like pierce, bore, make a hole, stick, sting, prick?

I cannot believe anyone has ever considered severely wound or kill a man with a dagger a simple concept commonly used in everyday life. That's the meaning of Italian pugnalare, French poignarder, Spanish apuñalar, found in this appendix.

If I'm right, then the Italian word should be one of trafiggere, perforare, forare, bucare, fare un buco, pungere.

Is the emphasis on the result, the hole? Or on the process, thrusting a pointed tool into something? Should the tool and the hole pass through from side to side or are we contented with an indented mark?

A methodological question: is there a description of the meanings, or are we expected to guess the linguistic psycology of Swadesh? Did he write guidelines? For example, even if the exact meaning were trafiggere (‘transfix’ in literal sense), this word hardly belongs to the fundamental vocabulary. Since this is a requirement, we should make a compromise and choose among forare, bucare both ‘make a hole or opening through’, and pungere ‘sting with something as small as a needle’. In my opinion. But in Swadesh's?

The method for building the lists must be explicitly stated if we plan to make statistical investigations of any value. English bore and Italian forare are both inherited lexicon and both from the same IE root. If we choose to compare these two words we get a result, if we choose a synonym of either we get a different result. But the choice should not depend on the desired result. - Sprocedato 20:34, 21 September 2008 (UTC)Reply

Usually the unmarked meaning of "to stab" that most people think of is to stab someone with a knife or dagger. How do you say "she stabbed her husband with a dagger" or "he stabbed me"? Precision of sense is not so critical as simply making sure that all the languages being compared all have the same sense. The aim is to compare words in different languages that have the same meaning, as nearly as possible, which may mean shared roots if the words mean the same thing, are used in the same situations, and are about as common as one another. But if apuñalar is common and pugnalare is uncommon, then pugnalare should be replaced by another word that would normally be used to translate apuñalar. —Stephen 14:34, 22 September 2008 (UTC)Reply

Hindi error. edit

Hindi transliteration is not correct. I thought of correcting but there'd still be confusion unless Devanagari or IPA or some such script is used, e.g. between short and long a, nasalized vowel & vowel followed by n/m, etc. -- 202.141.24.2 12:51 Jun 6, 2003 (UTC)

Go ahead and fix it. The software will recognize Devanagari; it will also allow macrons in translations to show long vowels. This is not the place for IPA; such a pronunciation schem would belong on each individual word article. Eclecticology 06:40 Jun 7, 2003 (UTC)
You might also add it to Indo-Iranian_Swadesh_lists (instead of the main page which seems to be hosting European languages mostly. I've been editing the Chinese here, but at some point, I should probably move that too. - Brettz9 18:54 Jun 7, 2003 (UTC)

Non-Unicode browsers. edit

I suspect that Brettz9 doesn't currently using full unicode-aware browser. So much changes made that I can't point out where the wrong started.
This table is the diff made on word 8,9,10 between last changes by 141.76.1.121 (01:11 May 11, 2003 UTC) and Brettz9 (03:58 Jun 8, 2003 UTC) http://wiktionary.org/w/wiki.phtml?title=Swadesh_list&diff=13569&oldid=10526

#EnglishFrenchGermanItalian SpanishDutchEsperantoChineseHindi
8.thatcelàdaseso die, dattiuna3/na4/nei4woh
8.thatceldaseso die, dattiuwoh
9.hereicihierqui,qua aqui,acáhierĉi tieyahan
9.hereicihierqui,qua aqui,achier?i tieyahan
10.thereda,dortecco ahídaartiewahan
10.therelda,dortecco ahdaartiewahan

Petruk 04:44 Jun 8, 2003 (UTC)

My apologies...I'm using the new Safari browser for the mac, and maybe it (or my configuration) is somehow messing things up. I'm still not sure how things are messing up though. I don't see the doubles in my version. If the accents are the issue, would it be too much to have people use the HTML codes instead if that is the problem? If you think that is too much trouble to change, maybe I can request of someone else then to please remove the Chinese column entirely (and push the Hindi over or ideally add the latter to the Indo-Iranian languages page)? I've already added the Chinese words to the Sino-Tibetan template. Thanks! - Brettz9 15:58 Jun 8, 2003 (UTC)

I could understand your suggestion to use HTML entity name, for example á to put á, instead of the single character itself, as you did on Indo-Iranian Swadesh lists, for compatibility issue. But I'm affraid there are some characters that don't have the corresponding HTML entity. And to use the numerical format ō etc. would make it a nightmare, at least for me, to search for the code on list of lots, probably thousands, of kanji characters. Petruk 17:31 Jun 8, 2003 (UTC)

Thou. edit

If you use translate thou to Sie (formal) in German, why do you translate ist to tu (informal) in French? Wouldn't vous be more appropriate? --81.217.38.108 20:07, 27 Aug 2003 (UTC)

"Thou" is relatively uncommon in English. Its presence here is primarily to emphasize that the singular is intended, rather than to deal with matters of formality. In those circumstances the French "tu" is correct. Perhaps the German should be changed to du, but I'll leave that to someone more familiar with German. Eclecticology 00:55, 29 Aug 2003 (UTC)
In French, vous is a formal reference and correct to use with both a singular as well as plural subject. For an informal or intimate reference, as with family and friends, tu and nous would be the pronouns used for a singular and plural subject respectively.
In German, Sie is the formal pronoun, whether it is a singular or plural reference; du and wir are the correct pronouns for the singular and plural references respectively. HodgePodge (talk) 01:12, 6 September 2016 (UTC)Reply

Rosetta project. edit

The Rosetta Project is also done with volunteers... they've got Swadesh lists for all these languages. I don't know about the license, but they say

"The resulting Rosetta archive will be publicly available in three different media: a free and continually growing online archive..."
- fagan

The link to the interactive table seems to be broken Wendy

It seems the Rosetta Project no longer has its Swadesh lists. Locoluis 01:17, 2 August 2006 (UTC)Reply

They do, but 1) It's no longer an interactive list 2) You have to agree to not use the knowledge for commercial gain before seeing the list. 201.132.169.145 17:02, 11 August 2006 (UTC)Reply

Polite adresses. edit

I took the liberty of deleting the so-called "polite adresses" [German Höflichkeitsformen (Sie), French vous, Dutch jij, u etc.], assuming that the purpose of a Swadeh is to list and compare very basic concepts, and not to provide full coverage of all the possible translations. If somebody wants to restore these forms, here they are: German polite you (sing.) = Sie German polite you (pl.) = Sie French polite you (sing.) = vous I don't know the Dutch version for sure, this is what i deleted from the "you (sing.) line: jij, je (s.); u, gij (obs. in Holland, commonly used in Flanders, Belgium).

It is not because this difference is not relevant or important in English or Esperanto, that it is not important in many other languages. I don't know whether it belongs in the Swadesh list, as I don't really grasp what the purpose is of such a list besides comparing languages. When comparing languages it is important to realize that some languages do see a difference between polite forms and forms to be used among friends and family. So I don't see why it had to be removed from the list.

Latin. edit

I added Latin, because I think it is interesting to compare it with the languages of Latin origin. Hope nobody minds...

Person. edit

I am not sure about the "person" row. In the English list, person is in between woman, man, and child, which makes it seem that person as in "human being" is meant, not a legal person or a person in a play or any other definition. In this case, the correct Latin translation would be homo, not persona (which means "mask" or "character in a play"; and the correct German equivalent would be Mensch, not Person; somebody should check the other languages. Any comments? --209.179.245.121 03:59, 25 Apr 2004 (UTC)

This is probably true. —Muke Tever 12:51, 25 Apr 2004 (UTC)
I agree. The Rosetta Project shows Mensch. It's an excellent project for this sort of thing, and should be referenced when this kind of problem arises: http://www.rosettaproject.org:8080/live/search/ Eclecticology 17:23, 25 Apr 2004 (UTC)
Coming a little late to this conversation, I agree too. I'm sure what is intended is a translation for homo, ἄνθρωπος, Mensch, etc. I'm going to change it to "Man (human being)" as opposed to "Man (adult male)". --Angr 07:33, 17 May 2005 (UTC)Reply

Help with Basic English pages. edit

How about we get on with developing the Basic English pages now that the Swadesh ones are basically finished. We could use some help, not only with the other languages, but with coming up with higher-order categories for the words in order to be able to group words together within the list later on. Brettz9 16:33, 21 Aug 2004 (UTC) See Wiktionary:Basic English Word List

Wikipedia entry. edit

Hi. There's currently no Wikipedia entry for Swadesh list. Can someone please start one. The content of this article seems encyclopedic, and less suitable for a dictionary — maybe it should be moved? -- (anon) 08:17, 15 Oct 2004 (UTC)

Esperanto "you" (vi, ci). edit

Esperanto "you" = vi (singular and plural); Also note: the "ci" form is formal as in "thou" but rarely used (e.g. poetry). "Ci" is not even mentioned here: http://en.wiktionary.org/wiki/Vi#Esperanto therefore I am correcting the Swadesh list table to "vi" and removing "ci".

Iberian languages? edit

since when was 'Iberian languages' a language family?

Reorder list? edit

I believe it would be useful to re-arrange the list so closely related languages are listed next to each other; ie German next to Dutch, Spanish next to Italian, Latin, and French; that way it would be easier to spot similarities and patterns.

I don't have the expertise in the code here to do this myself, but it would be worthwhile....

I would propose the order: English Dutch German Swedish Esperanto Latin Italian Spanish French

BTW: English seems to be missing from the list/tree of languages; I think it should go to the Germanic languages ;) but without link (English-English Swadesh list wouldn't make sense) [fvg 78.234.102.169 21:33, 26 January 2011 (UTC)]Reply

First of all : thank you very much to ALL the people who contributed to edit and organize the HUGE material around Swadesh lists ! ! ! I agree with the objection except about the order of languages , which should be better around latin and german . So something in this way : french , spanish , latin ,italian , english , german , dutch , swedish . In any case , for personal use , is very simple to copy the table and paste it in a spreadsheet . It would be nice if someone could point here the instructions about how to reimport the table here . Thank you again to everybody ! ! Alessandrofalconi (talk) 08:35, 17 November 2022 (UTC)Reply

Moving lists from Wikipedia. edit

Please see w:Talk:Swadesh_list#WP:NOT. -- Jeandré, 2006-04-16t12:14z

Obsolete Family Terminology edit

Hamito-Semitic? Malayo-Polynesian? My Gosh. This is ridiculous, and totally obsolete! Use <<Afro-Asiatic>> and <<Austronesian>> instead. The names Hamito-Semitic and Malayo-Polynesian come from a time when other branches of those families were ignored or neglected (i.e., the several Formosan branches of Austronesian / Chadic and Omotic branches of Afro-Asiatic). Remember, this is not the 1911 Britannica. Be scientific, by modern standards and with modern terminology, please. 200.208.131.106 15:28, 6 July 2006 (UTC)Reply

Fair enough. I've just edited the names here and on the relevant articles.Kelilan 04:30, 28 July 2006 (UTC)Reply

Some corrections to the Spanish edit

Yesterday, I spent about an hour going over this Swadesh list with a native Spanish speaker here in Mexico (near Mexico city). We went over the list and here is what we observed:

  • Some is either Algunos or unos in Spanish (I forgot to ask a native speaker what the difference between these is. I'll do so this afternoon)
OK, I asked a native speaker about this. Algunos is a unspecified small groups of items of a given type (say, Mangos); Unos (which literally translates to "ones", as in "the ones with green spots") is a specified small group of items (say, Mangos with green spots). 201.132.183.230 01:41, 10 July 2006 (UTC)Reply
  • Short actually means multiple things in English. I don´t know if Swadesh meant "Short as in not long" or "Short as in not tall". The "not long" form of short in Spanish is corto and the "not tall" form of short is bajo.
  • Thin. Delgado is "not fat"; Flaco is thin to the point of being unhealthy.
  • Mother. Be careful with the word Madre in Mexican Spanish; this word is used as a cuss word in many contexts.
  • Fish. Pez is a fish in the ocean/sea/river; Pescado is a fish on your plate (think cow/beef or pig/pork in English)
  • Bird. Pájaro is a small bird; Ave better refers to all birds.
  • Snake. Vivora is far more common than Serpiente in Mexican Spanish.
  • Hair. Roughly: Cabello is human hair; Pelo is animal hair (I'll have to confirm this with a native Spanish speaker)
I talked to a native speaker about this difference today: Pelo used to mean "animal hair" but now also means human hair. Basically, Pelo is a more general purpose term which means any hair; Cabello emphasizes hair on one's head and is the word you usually use when talking about styling hair. Don't confuse this word with Caballo which means "horse" (memory key: Bello means "beautiful" in Spanish) 201.132.183.230 01:41, 10 July 2006 (UTC)Reply
  • Breast. Seno (also means sine as in "sine wave") is the polite word for a human breast. Pechuga is an animal breast. Pecho means "Chest".
  • Scratch. Arañar is when someone else scratches you; Rascar is when you scratch yourself.
  • Warm. Cálido is warm weather; Tibio means warm is in "neither hot nor cold" (lukewarm)
  • Sharp. Agudo more means "thorn-shaped" (also means "accent mark" or "high-pitched"). Afilado is the "can cut you" form of "sharp".

I also took the liberty of clarifying that "bark" on this list is the stuff on a tree, not what a dog does, and that "louse" is the singular form of "lice".

While discussing this list, I also asked about the difference between Esposa/Mujer ("wife") and Esposo/Marido ("husband"). Basically, mujer is a somewhat cruder term for "wife" (somewhere around "old lady" in American informal English) and marido is the corresponding cruder term for "husband" (somewhere around "hubby" in American informal English). Esposo and Esposa are the polite words that one normally uses. 201.132.183.230 01:48, 10 July 2006 (UTC)Reply

200.77.85.217 15:52, 9 July 2006 (UTC)Reply

Also, "pull" is "jalar", not "tirar", confirmed both by a native speaker and by an images.google.com search. This looks to be a clerical error, and I've corrected it. 201.132.183.230 02:02, 10 July 2006 (UTC)Reply

OK, since I saw, "In the wild", a case of "tirar" being used for "pull", I asked a native speaker about this. Basically, "jalar" is "pull" in Mexico (and probably other areas of the Americas); "tirar" is pull in Spain. 201.132.68.253 18:26, 26 July 2006 (UTC)Reply
Another note: In Mexican Spanish, "echarse" is when an Animal lies down; "Acostarse" is when a person lies down. 201.132.68.124 16:03, 31 July 2006 (UTC)Reply
Yet another note: I just confirmed this with a native speaker: Mexican Spanish doesn't have a "to fear" (where the subject feels afraid because of the direct object). "to scare" is "temer", "dar miedo", and "asustar". 201.132.169.145 16:59, 11 August 2006 (UTC)Reply
Pull is definitively tirar. Jalar is colloquial and rather uncommon in many regions. Have a look at the RAE dictionary: http://buscon.rae.es/draeI/SrvltConsulta?TIPO_BUS=3&LEMA=jalar if you dont believe it. Matthias Buchmeier 15:37, 24 June 2007 (UTC)Reply

Interactive Rosetta Link Broken edit

The link that reads "The Rosetta Project has an interactive version of the list." is broken. It leads to a page that says: "Search results - No results were found. Did you not find what you were looking for? Try the Advanced Search for more precise search options."

It would appear that The Rosetta Project has, for some bizarre unfathomable reason, eliminated the interactive Swadesh lists from its site. I've poked around their site for a bit with no luck. I could be wrong, but in any case, the above mentioned link is incorrect and should probably be removed.

State of Rosetta Project edit

That's why I don't trust the Rosetta Project anymore. Their website is often down, is difficult to navigate, and is a very little known site (low Alexa site ranking). I believe that the Swadesh lists would be far better off on Wiktionary than on a relatively unknown site where the lists can get taken off capriciously. The Rosetta Project is also putting a lot of effort into a disk which doesn't seem to have any practical use. I'd like to see a huge unified online database that allows wiki-style contributions, not a pretty disk with micro-etchings. — Stevey7788 01:24, 17 September 2010 (UTC)Reply

IPA edit

Swadesh lists are supposed to be in IPA. Otherwise you really can't use them to compare languages. Hurry up and get the IPA info from Wikipedia before they delete it all. Ƶ§œš¹ IPA: [aɪm ˈfɻɛ̃ⁿdˡi] 09:25, 16 October 2006 (UTC)Reply

Portuguese and Esperanto edit

I would like to include the Portuguese Swadesh list. Portuguese is a world languages with over 200 million native speakers. Also i think it is instructive to compare the Portuguese to the Spanish list. No my question:

Do you agree if I replace the Esperanto list with the Portuguese list or should i create a new column?

No, don’t replace the Esperanto...and this list is already too full for any more columns. We already have a Portuguese list at Appendix:Swadesh lists for further Romance languages. —Stephen 16:46, 10 January 2007 (UTC)Reply
Since Swadesh lists are used for Glottochronology and comparitive linguistics, they shouldn't be used on conlangs. Therefore, I'm removing Esperanto. Ƶ§œš¹ [aɪm ˈfɻɛ̃ⁿdˡi] 01:39, 16 January 2007 (UTC)Reply
aɪm ˈfɻɛ̃ⁿdˡi, I agree that it was wise to replace Esperanto with Portuguese in this chart. But I take issue with your saying that Swadesh lists "shouldn't" be used for conlangs. Although the idea of the list was certain developed with glottochronology in mind, it is informative and useful for conlangs as well, so long as they are heavily based on vocabulary from natural languages. Were it not for the lack of space, I would hope you would not object to including Esperanto. --N-k 16:40, 9 April 2010 (UTC)Reply

Example languages edit

I notice that this list is mainly biased towards the Indo-European languages, in particular, the Romance and Germanic branches. I propose that the example languages should have at least one from the major language groups, which would probably include English, French, Russian, Chinese, Malay, Hindi, Arabic & Turkish~. 210.7.2.242 23:54, 30 March 2007 (UTC)Reply

A Swadesh list is not meant as a sampling of a language, but to compare one language with other related languages. As Swadesh list with Spanish, Chinese, Arabic, and Turkish would be a total waste of time. You can go to Appendix:Afro-Asiatic Swadesh lists for the Semitic languages, or click on any of the other offerings. Each list must have English, in order to be understandable, and then all the other languages in the list must be relatively closely related. The similarities between Russian and Spanish will be few because of the distance between them. The only reason we have both Romance and Germanic languages in this one list is because English is a Germanic language with strong Romance influence via Old French. —Stephen 00:41, 31 March 2007 (UTC)Reply
My mistake, I guess you always learn something new everyday. 210.7.7.19 20:48, 31 March 2007 (UTC)Reply
But is it possible to put French along with things like Spanish and Italian in a major Romance language list? I don't see a French list except the appendix here.--61.92.239.192 04:07, 8 April 2007 (UTC)Reply
French is right here in this list (Appendix:Swadesh list), along with Spanish, Italian, and Latin. You could, and should, add French to Appendix:Swadesh lists for further Romance languages, in place of Interlingua. Constructed languages have no place in Swedesh lists, and Interlingua needs to be removed. French would be the logical replacement. —Stephen 14:54, 10 April 2007 (UTC)Reply
One Roman language at most, and preferrably Latin. Skip all Germanic languages except German and instead add Sanskrit, Lithuanian and Greek. Dutch and Swedish (I'm a Swede, I would be offended here if I cared to, but I don't) provides too small a difference from German and English to add anything real insight to the article. The current list is not really interesting containing too many European languages that influences each other in a sprachbund making the swadesh list a little dubious. Rursus 09:18, 13 June 2009 (UTC)Reply

Snake in Spanish is not sierpe. It's vivora or serpiente edit

I have a native Spanish speaker right here next to me and I just asked her to make sure: "Snake" in Spanish is not sierpe. Sierpe is an old word that used to mean "Snake", but no longer. It's "Serpiente", but more commonly called "vivora". For the record, I'm in Puebla Mexico and the person I asked about this is a native Spanish speaker with a degree in Spanish grammer. That in mind, I'm reverting the change done in this edit. 209.172.32.214 23:27, 1 May 2007 (UTC)Reply

I believe the point is to show linguistic/etymological cognates, not exact translations. -- Beobach972 23:39, 1 May 2007 (UTC)Reply
Changing data in order to fit a given theory is not good science. The word for Spanish IS "Vivora". THE WORD IS NO LONGER "sierpe". There's a reason the list has 200 words, not 10 or 20, so people can see that, while some words have changed, other words are still congnates. Again, changing data to fit some linguistic theory is not going to suddenly make 100,000,000 Spanish speakers in Mexico suddenly use "sierpe" for "Snake" instead of "Vivora". Don't debliberately put incorrect data here to fit your little pet theory. Thank you. 189.131.24.59 19:27, 3 May 2007 (UTC)Reply
Uh... "Víbora". Locoluis 19:35, 3 December 2007 (UTC)Reply

Swadeshs theory is bunk. edit

His theory known as Glottochronology or Lexicostatistics and probably other names as well has been thoroughly proven to be false. It involves an assumption that basic vocabulary (=the words in the Swadesh list) is replaced in every language in the world at a fixed rate. This is not only obviously preposterous but the method has also been shown to give completely arbitrary results. This should be mentioned in the entry concerning the list. The lists while obviously amusing to the wider public have little actual usefulness, they are a curiousity at best. The claim that "Sometimes it is even possible to learn basic communication with no knowledge of the target language syntax." is misleading in the extreme, first of all "to learn basic communication" should be replaced with "to accomplish (very) basic communication". Learning basic communication would entail repeatability, a word list alone cannot make possible repeatable basic communication. Some things can be communicated sometimes with only bare words, however in the case of languages with rich case morfology bare words are less useful than in the case of english where "man eat pig" can be understood. This ultimately is an effect of the rigid word order in english, if the order were to be "eat man pig" or "man pig eat" (very common word order patterns in the worlds languages, together covering over half the languages of the world) understandability is not likely.

made changes--Axegern 20:48, 22 February 2008 (UTC)Reply
Glottochronology is NOT "OR" Lexocostatistics! Please note the special wiki entries. (Glotto"chronology!" refers to time computations only and is a subfield of the much broader term "lexicostatistics").195.4.79.194 06:19, 9 April 2011 (UTC)Reply

semantic field theory edit

This seems to be a popular subject, but not very challenging. I propose creating a new and more useful appendix using semantic field theory, as many of the problems listed above are exactly what that theory is about. In a simplified version a semantic field set is a set of single word translations between two languages, or a language and a meta-language. The point is easiest to show by an example: The english word "light" can be translated to danish as "lys" (like what is emitted by a lamp or opposite of dark) or "let" (opposite of heavy but also meaning the opposite of difficult). The same word can be translated to finnish either as "valo" (like what is emitted by a lamp), "vaalea" (as in light brown) or "kevyt" (opposite of heavy).

The semantic fields then look like this

  • English: Light Light Light
  • Danish: Lys Lys Let
  • Finnish: Valo Vaalea Kevyt

And for completeness following the danish branchoff

  • English: Light Light Light Easy
  • Danish: Lys Lys Let Let
  • Finnish: Valo Vaalea Kevyt Helppo


The point here is that the semantic fields show us very useful things: for one they show us which homonyms are not synonyms, which concepts are not related although they share a name. Secondly they show us which translation errors are possible in a direct way. Finally they let us glimpse the outline of a possible universal semantics, the basic meanings (here exemplified in the finnish translations, one meaning = one word.

Such a volunteer project would be much more useful than mere swadesh lists and should be based on the basic vocabulary lists instead of Swadesh lists. A meta-language for crosslinguistic comparison should be used as the base; for example english "man" can be rendered into "male human", "father" into "male parent", the point should be that the metalanguage should always conform to the least common denominator, i.e. to the meaning equivalent of the most specific word in the sets.

I myself don't know how to make that kind of list look ok or even how to make the whole thing work, perhaps someone else does? Of course this would be relevant to wiktionary only as far as dictionaries also translate between languages. Perhaps the scope is too wide and more fitting for say a standalone Wikimantics project. --Axegern 20:42, 22 February 2008 (UTC)Reply

Start WikiMantics project? edit

Hello Aksel, interesting section you wrote. I and some others are working on an article called 'Language Purification', see Semantic Language Purification. See: [www.spirilogic.com], and especially the section on 'Debabling over multiple languages', and the linked figure. Other interesting concepts: language and word evolution, and required constructs (such al word creation (to wocon), etc.).

We as well think continuing like this does not make sense. In the it will destroy the wiki concept, or put in idle status (article pollusion, incosistency of semantics and terms between languages, too large maintenance and storage costs, etc. Even more interesting is that we also came to the conclusion that semantics should be stored centrally. And we came... honestly, to the same proposed project name: WikiMantics.

I'm looking forward to you reading our article and providing it with comments (good / bad, etc.) and/or modidications or enhancements. Feel free! (We prefer somebody editing once too often, than no editing at all... Do you know were we can get support for this huge (and very interesting! It's all about semantics) project? Bye,Towonderer 10:49, 5 September 2008 (UTC)Reply

May we wikilink? edit

Hello, may we wikilink the individual words in each Swadesh list? For example, the Vietnamese language needs a lot of work at en:Wiktionary and wikilinking the words in the Vietnamese Swadesh list would identify which important words are still redlinks. Thank you, 24.29.228.33 19:48, 23 December 2009 (UTC)Reply

Yes, but you should link it with {{l|vi|}} and so on so that the language is identified. —Stephen 04:57, 24 December 2009 (UTC)Reply

Thank you, I already did it, and it allowed me to make almost 100 new Vietnamese entries, nearly all the terms now having rudimentary entries. I'm sorry I don't understand the template above or the exact reason it would be used? 24.29.228.33 06:52, 24 December 2009 (UTC)Reply

You use it like this: con (that is, {{l|vi|con}}), không, tôi. If there are also other languages that have that same spelling, then this template will link only to the Vietnamese. —Stephen 06:59, 24 December 2009 (UTC)Reply

OK, it should be used in the Swadesh lists only, or everywhere? I think you mean it will link to the internal link (usually "#" is used for this) for that language in an entry with multiple languages? 24.29.228.33 07:04, 24 December 2009 (UTC)Reply

Some of them, due to Vietnamese's distinctive diacritics, don't have definitions in other languages. 24.29.228.33 07:05, 24 December 2009 (UTC)Reply

It can be used in many places, such as "See alsos", "Related terms", and "Descendants". True, many Vietnamese words are unique to Vietnamese, but there are quite a few like con that are not unique. —Stephen 07:13, 24 December 2009 (UTC)Reply

OK, this template seems pretty easy to remember. I could never remember whether "lang" or "l" is used in the first part, but I see it's "l." For some languages such as Hebrew or Arabic, the template renders it in a much more beautiful font than the default. 24.29.228.33 07:21, 24 December 2009 (UTC)Reply

I've edited the templates so that all the English words have links. — Stevey7788 01:26, 17 September 2010 (UTC)Reply

Footnote edit

There are asterisks in the table. Higher up in the page is a sentence that says:

    The words in the preferred 100-word list are designated by an asterisk (*). 

This is not sufficient. The convention is that the reader look at the bottom (of the table or page) to find a corresponding footnote. The information should also be given in a footnote under the table or at the bottom of the page, as:

    * Words that are also in the preferred 100-word list.

Deletion of Appendix:Swadesh lists for lesser used European languages edit

The object of a Swadesh list is (or was) primarily to establish the closeness of the relationship of genetically related languages. The languages assorted here belong in different linguistic branches or even families:

  • Faroese, Icelandic, Kölsch, Lëtzebuergesch and Low Saxon are Germanic languages and as such, they are already included here.
  • Greek and Albanian (along with Armenian) consist the three independent branches of the Indo-European family. Albanian features under the section Assorted lists, but this makes no sense, since a single language can not be considered an assortment. The Greek list included contains a large number of errors, the sort one gets by entering words in a translating machine.
  • Finnish is a Finno-Ugric language and is included in both that family's list and the Baltic-Finnic languages.
  • Basque is a language isolate. A Swadesh list of the Basque language of its own already exists.

Since:

  1. an assortment of unrelated between them and / or already included elsewhere languages serves no linguistic purpose
  2. the corresponded Wikipedia article links to this appendix with the caption "grouped by language family, on the Wiktionary Appendix pages", and
  3. the description lesser used in the name of the appendix is both inaccurate and controversial,

I suggest the following:

  1. Deleting Appendix:Swadesh lists for lesser used European languages.
  2. Creating a new list, Appendix:Independent Indo-European languages Swadesh lists, where Greek (Ancient / Modern), Albanian (Gheg / Tosk) and Armenian (Eastern / Western) will be included, under the sub-header Language-family Swadesh lists in need of expansion.
  3. Removing the Appendix:Albanian Swadesh list from the current Appendix and adding it as an entry to the Category:Swadesh lists.

Jaxlarus 18:13, 23 February 2010 (UTC)Reply

Personally I also use these lists to learn the different translations of a word (no matter of the language roots), however this purpose could probably be more appropriately aimed by a board built from an extraction of some Wiktionary:Frequency lists. JackPotte 19:46, 23 February 2010 (UTC)Reply
Next part here. JackPotte 18:40, 27 February 2010 (UTC)Reply

Swadesh lists on main page — a "WikiVocab" project edit

Hello, I'd like to request placing Appendix:Swadesh lists (it contains vocabulary word lists for many world languages) along with "Appendices • Abbreviations • Thesaurus • Rhymes • Frequency lists • Phrasebooks," or on some other part of the main page. I'd just like to find some way to increase traffic to the area so that more people can know about this great resource. It's a tremendously important part of Wiktionary that many users have found to be really helpful. I'd also be OK if the Wiktionary community opposes; however, it's one of the best resources on the web for learning vocabulary, comparing/preserving/promoting languages — simply indispensable. There's a lot of potential for these lists.

My dream is for there to be a big database on the Internet where anyone can access the basic vocabulary words (in standardized topical lists) of all the world's languages. Wikipedia has information on the grammar and demographics of languages, but does not often include vocabulary, which is the core and essence of language. The closest things we have to a massive comparative database on world languages are the Austronesian Basic Vocabulary Database, Intercontinental Dictionary Series, and of course, Wiktionary's Swadesh lists. As a side note, even though this is basically the Rosetta Project's goal, the website is still quite unwieldy for ordinary users, has a very low Alexa site ranking, and does not allow wiki-style contributions. The Rosetta Project has also pulled off Swadesh lists that used to be on there, and do not have any searchable vocabulary databases as of now. To help in language preservation, comparative linguistic studies, language learning, and more.

Or perhaps we can even create a separate "WikiVocab" website, similar in style to WikiSpecies! If we do create a big, unified, and searchable database for all the world's languages — all in one place — I believe it will be one of the greatest human achievements in modern times.

Thanks for your considerations! — Stevey7788 01:20, 17 September 2010 (UTC)Reply

Editing with Microsoft Word edit

Just a note that ^p is equivalent to a line break in Microsoft Word's find/replace box. You'll be able to add columns/wikilinks, edit/merge lists, and replace entries a lot more easily if you use that. — Stevey7788 09:18, 29 September 2010 (UTC)Reply

Open Office Writer does the same in free. JackPotte 17:21, 29 September 2010 (UTC)Reply

Source??? edit

Where on earth did Swadesh publish a 207-word list at all?? Not 1950 (165 words), not 1952 (215 words), not 1955 (switched from 215 to 100 words), not 1972 (final 100-words, listed in word fields). HJJHolm 06:49, 9 April 2011 (UTC) Again: Swadesh never published a 207-word list. The whole article is a lie. This "Ishtar" (?) list differs in many points from the correct Swadesh list and is thus not compatible with it and of little help. The inventors of it confuse everything, in particular with lists for learning foreign languages. For that purpose, please refer to the numerous existing Wikitravel. HJJHolm 10:18, 30 December 2011 (UTC)Reply

Original Swadesh list edit

It is a pity that people swamp wiktionary with outdated Swadesh lists, who obviously have not even read Swadesh, who explicitly (Swadesh 1955:124) wrote, "... defects in the old list were repeatedly made evident. The only solution appears to be a drastic weeding out of the list, in the realization that quality is at least as important as quantity. The new list of 100 items includes 92 from the old list, those starred in Table 2, plus eight new ones: say, moon, round, full, knee, claw, horn, breast. Even the new list has defects, but they are relatively mild and few in number." Therefore I copied the final original Swadesh list, published posthumous in 1971 and 1972, at the main article "Swadesh list" in wikipedia. Regarding an unlucky decision in the substitutions, I propose to remain with the old "fingernail" instead of "claw", in full agreement with well-known researchers as D. Ringe and S.A.Starostin. HJJHolm 10:33, 4 February 2012 (UTC)Reply

Follow-up: Additionally, the current list is cruelly inexact in the description and definition of too many items in question! The authors are obviously not able or willing to learn. HJJHolm (talk) 15:27, 26 April 2012 (UTC)Reply

What about Sumerian, Akkadian, Etruscan, Hurrian, Urartian, Hattic, etc.? edit

Böri (talk) 07:17, 12 April 2012 (UTC)Reply

If you want to, you can make lists for those languages, too. —Stephen (Talk) 07:23, 12 April 2012 (UTC)Reply

Intro confused edit

The introduction again has been confused by insertions not matching the whole context, and thus had to be changed. Swadesh has nothing to do with lexicostatistics!! HJJHolm (talk) 13:24, 10 September 2012 (UTC)Reply

What about North Caucasian languages? edit

Circassian(Adyghe), Abhaz, Chechen, Ingush, Avar, Lak, Lezgic languages Böri (talk) 13:36, 20 February 2013 (UTC)Reply

Those languages are needed, but they have to be written in the correct script. Chechen, for example, uses Cyrillic, and I think all of the other ones you named do as well. —Stephen (Talk) 17:10, 25 February 2013 (UTC)Reply
North Caucasian--Russian Swadesh lists at Russian Wiktionary: [2], [3] (Of varying quality. Numbers are the same as here). Mar1Qh 14:36, 15 May 2018 (UTC)


Chechen words:
  • all: derrig
  • ashes: juq\
  • bark: kewstig
  • belly: čuo
  • big: doqqa
  • bird: olχazar
  • bite: joχka
  • black: ʕӓrža / arj
  • blood: ċij / ziy
  • bone: dӓʕaχk
  • breast: naqa
  • burn tr.: dago
  • claw (nail: mʕara
  • cloud: marχa / marh
  • cold: šijla
  • come: dan
  • die: dala
  • dog: žʕӓla / jal
  • drink: mijla
  • dry: deʠa
  • ear: lerg
  • earth: latta
  • earth: moχk
  • eat: jaʔa
  • egg: hoa
  • eye: bʕӓrg / barg
  • far: genara
  • fat n.: moħ
  • feather: pelag
  • fire: ċe / tsel
  • fish: ḉara / cher
  • fly v.: lela
  • foot: kog
  • full: d\fczna
  • give: dala
  • good: dika
  • green: bӓccara / betzir
  • hair: čo
  • hand: k\fcg
  • head: korta
  • heart: dog
  • hear: χaza / hazar
  • heavy: d-eza
  • horn: kur
  • horn: maʕa
  • I: so
  • kill: den
  • knee: gola
  • know: χaʔa / khaar
  • leaf: ʁa
  • lie: ʕilla
  • liver: doʕaχ
  • long: deχa
  • louse: meza
  • man: stag
  • many: duqa
  • meat: žižig / jijag
  • moon: butt
  • mountain: lam
  • mouth: baga
  • name: ċe / zce
  • near: gergara
  • neck: vorta
  • new: ċina
  • night: b\fcjsa
  • nose: mara
  • not: ca
  • one: cħaʔ
  • person: stag
  • rain: doʁa / dogh
  • red: ċen / zi
  • road: neʠ
  • root: oram
  • round: gorga
  • salt: t\fcχa
  • sand: ʁum
  • say: len
  • seed: hu
  • see: gan
  • short: d-ōca
  • sit: ʕen
  • sit: χaʔa
  • skin: ċoka
  • sleep: jan
  • small: žima
  • smoke: ḳur
  • snake: lӓħa / leh
  • stand: latta
  • star: seda
  • stone: qera
  • sun: malχ / malh
  • swim: dan
  • tail: ċoga / zhog
  • that: dʕora
  • thin: d-uṭʠa
  • this: hara
  • this: i(-za)
  • thou: ħo
  • tongue: mott
  • tooth: cerg / zherg
  • tree: ditt
  • two: šiʔ / şia
  • walk (go): daχa
  • warm: mela
  • water: χi / ghie
  • we: tχo
  • we: vaj
  • what: hun
  • white: ḳajn
  • who: mila
  • wind: moχ / mouh
  • woman: zuda
  • worm: nʕӓna
  • year: šo
  • yellow: mož
also you can look at http://starling.rinet.ru/maps/maps28.php?lan=en and http://starling.rinet.ru/cgi-bin/query.cgi?root=config&morpho=0&basename=\data\cauc\nakhet Böri (talk) 10:02, 25 February 2013 (UTC)Reply
We only accept words that are spelled (correctly) in their proper script. Chechen is written in Cyrillic, so all the words must be written in Cyrillic. —Stephen (Talk) 17:06, 25 February 2013 (UTC)Reply
They are using the Cyrillic script because the Russians rule Chechnya. The Chechens are NOT a Slavic people! (so this script is not their own script!) Böri (talk) 07:21, 26 February 2013 (UTC)Reply
We know this. It doesn’t matter, it is the script that Chechen is currently written in. Cyrillic is not only for Slavs, it is used by many languages not related to Russian, including many Caucasian languages, Turkic languages, Paleo-Siberian languages and others. —Stephen (Talk) 11:45, 26 February 2013 (UTC)Reply
For example, this Chechen Dictionary is not using the Cyrillic script: http://ingush.narod.ru/chech/awde/ Böri (talk) 09:25, 27 February 2013 (UTC)Reply
In 1992, the Secessionist government introduced a new Latin alphabet, but when the secessionists were defeated, the Latin alphabet was dropped and the Cyrillic alphabet was restored. Anybody with an agenda can make a dictionary of any language using any alphabet or spelling scheme they wish. English could be written in Cyrillic or Arabic. Japanese could be written in Tibetan. It doesn’t matter, those are not in standard use and we don’t accept them here. Here, we only accept Chechen that is in Cyrillic. If you have documents in Chechen that were written in Arabic or one of the Latin alphabets, you could make a Chechen appendix with a list of those words, but in the main namespace, use Cyrillic for Chechen. —Stephen (Talk) 11:12, 27 February 2013 (UTC)Reply
on Swadesh lists for Afro-Asiatic languages, Wiktionary also used "Romanized Arabic"... so Latin alphabet for Chechen language is OK! Böri (talk) 14:22, 1 March 2013 (UTC)Reply
No, it’s not okay. Where is the link to the list you are talking about? If the Arabic is Romanized, it needs to be corrected or deleted. The only Arabic Swadesh list that I have seen on Wiktionary uses proper Arabic script. Perhaps you are thinking of the attendant transliteration? For words in languages that don’t use the Roman alphabet, we like to add transliterations. Transliterations are always encouraged, but the main (linked) word must be in the proper script for that language. —Stephen (Talk) 19:34, 1 March 2013 (UTC)Reply
"Transliteration" = You can use the Chechen words with Latin alphabet as transliteration! There's no Chechen Swadesh list on Wiktionary! (I'm saying this.) / You are saying: "Where is the link to the list you are talking about?" I already said: "Swadesh lists for Afro-Asiatic languages" Böri (talk) 14:51, 3 March 2013 (UTC)Reply
That list is here: Appendix:Swadesh lists for Afro-Asiatic languages. The Arabic is not romanized, the editor merely used that word to describe the transliterations. The Arabic in that list is all written in the correct Arabic alphabet. The transliteration is only for those who don’t know Arabic well enough and need some help with the alphabet. You can also (and indeed should) put transliterations along with the Chechen Cyrillic, but the correct Cyrillic spelling is a requirement here. Without the Cyrillic spelling, we will not accept it. If anyone makes an entry for one of these Chechen words in the Roman alphabet, it will be quickly deleted. Please stop arguing about this, it is the longstanding policy of Wiktionary and there is no way around it (other than making an appendix page as I suggested above...but if you do that, you still have to prove to our satisfaction that the particular Roman spelling was actually used at some time in the past in a durably-archived medium such as a book).
I am finished with this discussion. I have explained the policy to you as well as I can. Unless you can find the correct Cyrillic spellings of the Chechen words, you cannot make a Chechen Swadesh list on this site. If you do it, it will be deleted (not just by me, but by any Wiktionary admin who discovers its existence). If you still believe you have a valid argument, then make it at WT:BP. But I’m telling you honestly, you will not convince anyone there to change the rules on this for you. Good luck. —Stephen (Talk) 16:04, 3 March 2013 (UTC)Reply
Chechen words are written in Cyrillic in all the Chechen media, books written in Chechen. The examples of the romanised Chechen words may be useful on some chats or for transliteration purposes but not to learn the language as its used.
The above Chechen words are actually written:

Hattic words edit

  • king = katte
  • queen = kattah
  • child = binu, pinu
  • children = lebinu
  • god = shapu, washapu (= gods), ashaf, shaf, fa-shaf
  • land = fur
  • wine = findu?
  • sun = eshtan / eştan
  • moon = kap
  • mountain = zish /ziş
  • year = lish
  • bread = fula
  • sea = han
  • leopard = prash, parash
  • to hear, listen = shama
  • woman = nimhu/nimhut
  • head = kash
  • wind = pezil
  • stone = pip
  • big = te
  • house = fel / fael
  • lord = tafarna
  • lion = takeha
  • soldier = aku
  • rain = tumil
  • I = fa
  • wine = karam (< from Semitic karm)
  • horse = tarish?
  • cheese = witanu
  • father = fafaya
  • copper = kinawar
  • sky = yah
  • iron = hapalki
  • bird = ashti
  • tongue = alef
  • to protect = kip
  • bright = paru
  • leaves = puluku
  • to blow on = puşan
  • priest = paraya (see father above)
  • lady = tawa-nanna
  • root = tup
  • gate = ştip
  • thousand = far
  • fear = tafa
  • sour = zipina / wet
  • to stand = anti
  • wind = pezil
  • to look = pnu
  • mortality = funa
  • to devour = puş
  • to lie, put = ti
  • long = fute
  • to fall = zik
  • when = anna
  • to open = han
  • to see = kun
  • to come, go = nu
  • wide = harki
  • heart = şaki
  • to eat = tu
  • wood = zehar/zihar
  • wife = zuwatu
  • ground = şahhu/tahhu
  • lord = şail/tail
  • barber = tahaya
  • to pour = tefu
  • to take = tuh
  • to build = teh
  • to step = tuk
  • horn = kaiş
  • to come (here) = aş
  • to be able = lu
  • light = leli
  • runner = luizzil
  • to strew = hel
  • to envy = le
  • to hide = her
  • arm, sleeve = hir
  • to seize = hu
  • spring, well = uri
  • this = ana
  • when = anna
  • woman = anna
  • upwards = akka
  • 5 = apa
  • earth = araz
  • human being = antuh
  • ritual functionaries = dudduşhijal
  • palace = halentiu
  • throne = halmaşşuit / kuşim
  • courage = haipinamul
  • among, between, through = ha-
  • on, to the = ka-
  • head = kaş
  • witchcraft, sorcery = katakumi
  • spy, messenger = kiluh
  • soul = kut/kud/psun
  • his = le-
  • his/her = te-
  • from = li-
  • good = malhip
  • apple = şawat
  • district = telipuri
  • rain = tumin
  • you = un- / wa
  • to you = ud-
  • we = uş-
  • bull, ox = milup
  • stones = munamuna
  • hammer = pakku
  • to you = par-
  • eagle = wapah
  • thousand = war
  • sheep = wazar
  • wine = win
  • bread = wulasne
  • grandson = zintu
  • wife = zuwatu

I used "ş" for "sh" sound... These words are Hattic words! (NOT Hittite!) Source: http://www.palaeolexicon.com/ (please click Languages, find Hattic and click Word Index Böri (talk) 09:10, 21 February 2013 (UTC)Reply

Chechen list - Cyrillic script edit

This is the list of Anatoli. May I write the "Romanized" forms? Böri (talk) 12:28, 5 March 2013 (UTC)Reply
Here's your template: Appendix:Chechen Swadesh list. I have only added one word at the top to show how this is done.For example, the first word is "I" (me), in Chechen it's со (so), so the template #1 is: {{l|ce|со}}
You can have a go. The Chechen words should match the English by numbers! - you can see them in the preview. You can also use w:Chechen_language#Alphabets for the transliteration, e.g. "гӀ" is "ġ". Good luck! Will have to think about a transliteration table as well. --Anatoli (обсудить/вклад) 12:48, 5 March 2013 (UTC)Reply
Thank you, Anatoli. I will try to write it within few days... I'll work on it. Böri (talk) 10:19, 6 March 2013 (UTC)Reply
You're welcome. Just to make sure you understand it. The first part of {{l|ce|со}} is "со" - Chechen word in Cyrillic, "so" is the transliteration. If you don't how to transcribe, leave "tr=" empty. --Anatoli (обсудить/вклад) 10:45, 6 March 2013 (UTC)Reply

How to consider clusivity? edit

"We" has two meanings, one is "we and not you" (exclusive), and the other is "we and you" (inclusive). Some languages do not distinguish between these two, but some do!

So which one is the intended meaning in the Swadesh list? BrightSunMan (talk) 19:27, 22 December 2020 (UTC)Reply

Perhaps someone can find a source document stating Swadesh' intentions regarding this and related semantic issues, but in the absence of that I'd be inclined to assume anglophone-centrism in this case. A possible solution might be to list both ex- and inclusive versions in the same row, distinguished by a parenthetical note. I did this in the Láadan table for the more finely-divided plurality rules that language has compared to English. Arlo Barnes (talk) 21:33, 28 August 2021 (UTC)Reply

Isn't Lojban a constructed language? edit

So should it be listed under "Constructed auxiliary languages" like Esperanto?Yanrs17 (talk) 10:10, 31 January 2021 (UTC)Reply

@Yanrs17: I am pretty sure it would. I can't see any doubt on that. Go on! BrightSunMan (talk) 21:16, 2 February 2021 (UTC)Reply
Return to "Swadesh lists" page.