Wiktionary:Etymology scriptorium/2017/February

Accent of Lithuanian vesti

@Benwing2 Slavic has a mobile accent here, with accent on the ending in the infinitive. However, Lithuanian has an accent on the stem instead. Assuming that Slavic reflects the original situation, what is the cause of the retraction in Lithuanian? Also, why does the present have ẽ while the infinitive has è? Does Slavic have root accent in the present? —CodeCa t 21:34, 5 February 2017 (UTC)[reply]

@CodeCat I wish I knew enough about Lithuanian historical linguistics to answer this. The answer might be in Kortlandt's From Proto-Indo-European to Slavic but it's inaccessible right now. I do know that verbs in Lithuanian are rather less conservative than nouns. Benwing2 (talk) 21:57, 5 February 2017 (UTC)[reply]

Can you ping anyone else here who may know? —CodeCa t 21:58, 5 February 2017 (UTC)[reply]

@Ivan Štambuk? Benwing2 (talk) 22:08, 5 February 2017 (UTC)[reply]

Stang writes that "Im Litauischen sind heute alle Präsensformen barytoniert, abgesehen von den Fällen, wo das de Saussure'sche Gesetz gewirkt hat." in "Vergleichende Grammatik der Baltischen Sprachen" page 449.

No idea about the ẽ/è variation, it's probably also explained in Stang's book. Based on this sentence from Lithuanian_accentuation#Root "Short vowels a, e in a root of a word lengthen when stressed and have a circumflex accent: ã, ẽ", I wager that it has to do with when the accent was placed on it. (by analogy?)

Crom daba (talk) 01:41, 6 February 2017 (UTC)[reply]

To me, it looks a bit like the distinction could be similar to that of Neoshtokavian, where ẽ reflects an originally stressed vowel while è reflects a retracted accent. I could be totally wrong though. I don't know how the acute-circumflex distinction could arise on short vowels anyway, but apparently they're a regular thing for Lithuanian. —CodeCa t 01:45, 6 February 2017 (UTC)[reply]

Also, if I understand your quote right, does it mean that there are no more mobile verbs (accent classes 3 and 4) in Lithuanian, at all? —CodeCa t 01:56, 6 February 2017 (UTC)[reply]

è was certainly unstressed at some point, but I really have no clue if the ictus getting there is phonology or morphology at work.

Circumflex "short vowels" ã/ẽ are actually long, the difference between them and historically long circumlexes is vowel quality.

Most probably the mobile accent classes still exists when comparing tenses other than the present.

My experience with this whole Balto-Slavic accentuation mess boils down to trying for years to square that linked Kortlandt pdf with my native language with no success. At this point my only hope is that @Benwing2 will figure it out for all of us and update the relevant wiki pages.

Crom daba (talk) 03:00, 6 February 2017 (UTC)[reply]

Let's try another approach then, since Lithuanian is not helpful here. If the accent was originally on the root in PBS, then there would have to be some kind of sound law that shifted it onto the ending. The main contender, Dybo's law, does indeed shift the accent to the ending in AP b verbs, but this law doesn't operate in mobile paradigms. So it seems that the final accent directly reflects the PBS situation, if the mobile paradigm itself dates to PBS. Are there any sound laws that could cause a fixed paradigm to become mobile in the history of Slavic? —CodeCa t 21:03, 8 February 2017 (UTC)[reply]

krokodili

The etymology of krokodili is not sourced. Several suggestions have been put in to what the etymology of this word is but none of them have been sourced properly. Pkbwcgs (talk) 16:25, 6 February 2017 (UTC)[reply]

[1] is not a source?

"Krokodili (to crocodile) means to speak in your national language at an event where you should be speaking Esperanto (conjuring up the image of a reptilian beast flapping its big jaws)."

How exactly are you going to explain this away. You can't. —This unsigned comment was added by 76.28.243.251 (talk) at 18:36, 6 February 2017 (UTC).[reply]

@76.28.243.251 That is a story book, not a source. Story books are not sources. Pkbwcgs (talk) 18:00, 6 February 2017 (UTC)[reply]

@Pkbwcgs A story book. A story book? I don't suppose you have a source for that assertion, because it is most definitely NOT a story book simply because the first line of its description says "Here is the captivating story of humankind’s enduring quest to build a better language." So I suppose by that same measure every written biography also falls under the mantle of fiction? Now you are just being obstinate.

@76.28.243.251 It is basically a story about Esperanto. It has nothing about crocodiles and definitely nothing to do with your etymology. All you have done is provided me with a story about Esperanto and you are saying that it is a source and you are calling me obstinate. Please provide a better source and I will see whether it is valid or not. Pkbwcgs (talk) 18:25, 6 February 2017 (UTC)[reply]

@Pkbwcgs Is google blocking the preview for you or something, because page 113 shows the etymology clear as day. Do I need to quote it again? "To crocodile", look for it. It's right there. It is furthermore not "basically a story about Esperanto" as that makes up a fractional part of its over 300 pages devoted to conlangs of every variety. All you have done is display your willingness to distort the truth. —This unsigned comment was added by 76.28.243.251 (talk) at 18:36, 6 February 2017 (UTC).[reply]

@Pkbwcgs I am not convinced you are very well equipped to judge whether sources are valid or not. A brief Google of the author reveals that she 1. has some pretty legitimate credentials in the field of linguistics and 2. has written quite a lot about Esperanto specifically. This would make her claims on Esperanto etymologies authoritative enough in my book. The work may be aimed at a broader audience, but just because it's popular science doesn't mean it's unscientific. With that source, I'd definitely say the IP editor here is right and the claim could be mentioned at least in the etymology section. — Kleio (t · c) 18:39, 6 February 2017 (UTC)[reply]

Slang etymologies are going to be somewhat apocryphal no matter how much scientific scrutiny you bring to the table, no sense in making a big deal out of this. Crom daba (talk) 18:44, 6 February 2017 (UTC)[reply]

I am leaving the discussion here but I saw a lot of users reverting the etymology so this is why I brought it to here. Pkbwcgs (talk) 18:47, 6 February 2017 (UTC)[reply]

Special:Contributions/Helolo1

Please review / clean up this user's contributions. DTLHS (talk) 01:08, 7 February 2017 (UTC)[reply]

Please block him. --Barytonesis (talk) 01:15, 7 February 2017 (UTC)[reply]

Yeah, blocking makes sense to me, unless someone can reason with them. —John C5 02:51, 7 February 2017 (UTC)[reply]

I hadn't seen any constructive attempts to educate and correct them which is why I asked here. DTLHS (talk) 03:04, 7 February 2017 (UTC)[reply]

I just made a first stab at it on their talk page. Although they make lots of the usual kinds of newbie formatting errors, they also have problems with editing where they're clearly out of their depth, but don't seem to realize it, and they show other signs of poor judgment. I suspect this is compounded by moderately poor English comprehension. Chuck Entz (talk) 03:44, 7 February 2017 (UTC)[reply]

I'm as anti-Starostin as anyone here, but blocking someone who hasn't demonstrated malicious intent sounds like a terrible idea. It makes sense short-term, but long-term we need editors and even this one may develop into a great asset in the future. Crom daba (talk) 11:21, 7 February 2017 (UTC)[reply]

𓆷𓈎𓄜

This entry needs some cleanup and an assessment of the likelihood of the descendants actually descending from it. The former can quite possibly only be done by a competent Egyptian editor, most of whom seem to have left the project, unfortunately. The latter can probably be handled by any of our editors who have experience in these sorts of things — @Ivan Štambuk, JohnC5, and feel free to ping others. —Μετάknowledge^{discuss/deeds} 02:55, 7 February 2017 (UTC)[reply]

So I can find references to this word, but Beekes and earlier Walde prefer to relate it to a Semitic source, citing Hebrew שָׂק (saq, “sack”). These lemmata may well be related, but I don't know whether the Ancient Greek can be directly traced to the Egyptian. I have not the knowledge of Afro-Asiatic to make such a determination. —John C5 04:05, 7 February 2017 (UTC)[reply]

Indo-European buck word

Could coordinate we coordinate the stories on Proto-Germanic *bukkaz, Proto-Celtic *bukkos (“goat”), Old Armenian բուծ (buc, “lamb”), Persian (boz, “goat”) (ESIYa gives Proto-Iranian *būźa), Sanskrit बुख (bukha, “male goat”) and maybe make a PIE entry? Currently we have *bʰuǵno-, *bʰuǵ- and *bʰuǵos. I'd like to link it as a potential source/relative of Proto-Turkic *buka (“bull”), Proto-Mongolic *bugu (“deer”), South Tungusic *bụcan ("deer"), and I'd like a single handle for the etymon.

And as an aside, how secure is the reconstruction anyway? How do we explain Celtic gemination (could it be a Germanic loan?) or Sanskrit aspiration (are these two connected somehow?) or Iranian long vowel (and why is it shortened again in Persian?). Also what about Proto-Slavic *bykъ (“bull”)? It doesn't look that far of from the rest of the crew.

Crom daba (talk) 17:05, 7 February 2017 (UTC)[reply]

On some of these words see Witzel 2003, pages 21–22. --Vahag (talk) 08:25, 8 February 2017 (UTC)[reply]

I note also Hungarian bika (“bull”) and Proto-Turkic *buka (“bull”). ‑‑ Eiríkr Útlendi │^{Tala við mig} 20:40, 8 February 2017 (UTC)[reply]

Did anyone try reconstructing *bʰuǵ(ʰ)-kos on a PIE level (as Matasović does for Celtic)? Could PII *ĉk ~ *ĵk ~ *ĵʰgʰ give Av. z, Sa. (or Prakrit substrate) kk(h) ? What about Armenian? How does it treat clusters with different voicing? Crom daba (talk) 22:09, 8 February 2017 (UTC)[reply]

Armenian requires PIE -ǵ-. A -ǵʰ- would yield Arm. -ձ- (-j-). PIE -ǵk- would probably yield Arm. -ծք- (-ckʻ-) or -սք- (-skʻ-), although we do not have data. The two stems of Arm. բծ-ա- (bc-a-) and բծ-ո- (bc-o-) point to PIE *bʰuǵ-eh₂- and *bʰuǵ-o- respectively.--Vahag (talk) 08:31, 9 February 2017 (UTC)[reply]

@Crom daba, Vahagn Petrosyan, Eirikr: I was cleaning up the PGmc and PCelt entries and found this discussion while adding PIE *bʰuǵos. I was really conservative with the descendant tree, but I did some of the same head scratching as you. Sanskrit बुख (bukhaḥ) points to a laryngeal extension form *bʰuǵ-H-os, but I was wondering if the -kh- could be a result of some consonant cluster simplification, which are all over the place in PII. Perhaps PIE *bʰuǵ-k-os > pre-PII *bʰuĵkas > PII *bʰužkʰas > बुख (bukhaḥ). --Victar (talk) 00:42, 26 April 2017 (UTC)[reply]

阿芙蓉

I don't understand how to edit this in a way that doesn't mix it up but does get it out of the category "Persian twice-borrowed terms". Please help if you can! Kolmiel (talk) 18:53, 8 February 2017 (UTC)[reply]

I see no issue. {{bor}} works this way. --Anatoli T. ^{(обсудить}/^вклад) 20:49, 8 February 2017 (UTC)[reply]

When you looked, the problem had probably already been solved by the friendly user who did. Admittedly, it was a simple edit. But when I tried to fix it, I seem to have done something wrong, so I got confused. The issue was that the word was in the category mentioned above, in which it cannot be because it is a Chinese word, not a Persian one. Kolmiel (talk) 22:14, 8 February 2017 (UTC)[reply]

Yes, User:Crom daba has fixed it. The entry had {{bor|fa|zh|افیون|lang=fa|notext=1|tr=afyūn}}, which means a Persian term borrowed from ~~Persian~~ Chinese. --Anatoli T. ^{(обсудить}/^вклад) 03:56, 9 February 2017 (UTC)[reply]

Kolmiel: The first arguments for {{bor}}, {{der}}, and {{inh}} are 1) the language of the headword term itself, and 2) the source language of the etymon. This is the opposite of the older {{etyl}} template, which had the etymon language first and the headword language second.

However, if you're using {{bor}}, {{der}}, or {{inh}}, and you also include a lang= argument, that overrides the behavior to match the older {{etyl}} template instead -- the lang value is the headword language, the first unnamed argument becomes the etymon language, and the second unnamed argument is the etymon itself. See [[Template:borrowing#Old-style_parameters]], for instance.

So in the previous version of the entry, {{bor|fa|zh|افیون|lang=fa|notext=1|tr=afyūn}} parses out to a Persian headword (the lang=fa part) from a Persian etymon (the fa in the first unnamed parameter), and that etymon was the term "zh" (the second unnamed parameter). ‑‑ Eiríkr Útlendi │^{Tala við mig} 21:39, 10 February 2017 (UTC)[reply]

Sorry all - my fault initially. I don't even know why I used {{bor|fa|zh|افیون|lang=fa|notext=1|tr=afyūn}}. Wyang (talk) 21:45, 10 February 2017 (UTC)[reply]

incandesce

Does anyone know if "incandesce" is a back-formation from "incandescence" or "incandescent"? It is consistent with the expected meaning of that (which would be some form of glowing), and when I first used it I had never heard it before and had back-formed it from incandesc- words. 50.107.129.57 08:03, 9 February 2017 (UTC)[reply]

Oxford Dictionaries says that it's a backformation, but OED and Merriam-Webster say that it's from Latin incandescere. — justin(r)leung _{{ (t...) | c=› }} 06:07, 10 February 2017 (UTC)[reply]

Odd how Oxford Dictionaries and OED contradict each other. — Cheers, JackLee ^–talk– 06:41, 12 February 2017 (UTC)[reply]

RFV PIE>PSlav

Reconstruction:Proto-Indo-European/gʷerH- ("to approve, to praise") says that *gʷr̥H-yé- (zero-grade ye-present) gives žerti, but Reconstruction:Proto-Slavic/žerti ("to devour, to glut") says that it comes from *gʷerh₃- ("devour"). And of course Reconstruction:Proto-Indo-European/gʷerh₃- mentions žerti. A mistake of aspirants that we can distinguish (H-h₃)? Sobreira ►〓 (parlez) 10:12, 9 February 2017 (UTC)[reply]

LIV gives from *gʷr̥H-ye- > "OCS (+)" žьrjǫ, žrъti; "secondary thematic" žьrǫ 'to offer', separate from the *žerti group. I can't seem to find anything fitting together with this from {{R:Derksen 2008}}, though. --Tropylium (talk) 04:09, 10 February 2017 (UTC)[reply]

This is just a case of homonymy, *žьrti is the verb derived from *gʷerH- and *žerti is from *gʷerh₃-, the point of confusion is that *žьrti had an alternative form in *žerti (according to our page) which was homophonous with the devour verb. Crom daba (talk) 06:44, 10 February 2017 (UTC)[reply]

باران, baran

Which of these are actually borrowed from Persian? I'm not a huge expert on Iranian, but Kurdish and Pashto are suspect. For the former, there's the variant "waran", which looks more expectable; but I suppose both might be native. The Pashto, however, looks very borrowed. Initial *b- becomes w-, not the other way round. Actually, b- can only arise from intervocalic *-p- by procope according to this [2]. Kolmiel (talk) 14:07, 9 February 2017 (UTC)[reply]

Proto-Iranian *w- → Kurdish b- is regular. See Asatrian G., Livshits V. (1994) Origine du système consonantique de la langue kurde, pages 94–95, here. --Vahag (talk) 15:50, 9 February 2017 (UTC)[reply]

Apparently Southern Kurdish, in many cases at least, has retained w- ([3]). But okay, you're right, it's a native development per se. So Kurdish is out of the list. Can you say anything about Pashto? Kolmiel (talk) 18:00, 9 February 2017 (UTC)[reply]

Usually, the Pashto ووریدل (warēdəl, wōrēdəl, “to rain”) is taken to be the descendant of this root. I agree with you that Pashto باران (bārān) is probably a borrowing. --Vahag (talk) 06:19, 10 February 2017 (UTC)[reply]

Danicize

[Copied from User talk:Smuconlaw.]

Excuse me, but if you’re going to go around reverting substantial edits you at least have to pay attention to the actual content of the edit. Nothing in my edit implied that Danish is borrowed from Latin. Rather, it very clearly indicates that Danicize is derived from Latin danicus and NOT from Danish. Anyway, from the form of the word, Danicize /ˈdeɪnɪsaɪz/, it can be seen that it cannot be from Danish + -ize; then it would be Danishize /ˈdeɪnɪʃaɪz/. Like most of the -ize words, it is either a learned Latinate construction or borrowed and adapted from Neo-Latin or another language that uses such constructions. The word’s spelling and pronunciation both confirm this. – Krun (talk) 12:54, 11 February 2017 (UTC)[reply]

Perhaps we can take this discussion to the Etymology scriptorium. — SMUconlaw (talk) 13:04, 11 February 2017 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Editors' input on the above matter is most welcome. — SMUconlaw (talk) 20:24, 11 February 2017 (UTC)[reply]

I would think it's more from Dan(ish) +‎ -icize. I don't think these new constructions are actually hearkening back to old Latin forms, but are created on a model now ingrafted into and natural to English. Leasnam (talk) 22:05, 11 February 2017 (UTC)[reply]

@Krun, Smuconlaw, Leasnam: The OED doesn't list Danicize, but it does list Danic (“Danish”), which it states is an “ad[aptation of the] med[iaeval ]L[atin] Danic-us, f[rom] Dania Denmark.” It also lists Danicism (“a Danish idiom or expression”) as a derivation of Danic. IMO, Danicize was most probably formed as Danic +‎ -ize. FWIW, the OED doesn't list *Danishize either. — I.S.M.E.T.A. 23:44, 26 February 2017 (UTC)[reply]

Thanks. Should we indicate that as the (probable) etymology? — SMUconlaw (talk) 11:08, 27 February 2017 (UTC)[reply]

@Smuconlaw: Yes; I've just done that. — I.S.M.E.T.A. 15:04, 27 February 2017 (UTC)[reply]

OK, much obliged. — SMUconlaw (talk) 16:03, 27 February 2017 (UTC)[reply]

Latin penuria

Does anybody have more details on this word? De Vaan doesn't mention it; and penury has another etymology. --Barytonesis (talk) 11:46, 14 February 2017 (UTC)[reply]

My bad, it does mention it as a derivative of paene, precisely. --Barytonesis (talk) 11:49, 14 February 2017 (UTC)[reply]

I think it's the same root but not directly from paene, compare Greek cognates listed at spero. Crom daba (talk) 12:47, 14 February 2017 (UTC)[reply]

French à peine

I don't know what to think of this edit. If it's right, we have to remove the descendants from paene. --Barytonesis (talk) 12:12, 14 February 2017 (UTC)[reply]

The TLFI says poena [4]. Also, can someone check the translations, they seem a bit off to me (especially #2). — Dakdada 15:24, 14 February 2017 (UTC)[reply]

If indeed paene means almost and poena means pain, then it could very well have been a corruption between the two, since they are pronounced similarly in Vulgar Latin (/pe.ne/ vs /pe.na/). --kc_kennylau (talk) 02:51, 4 March 2017 (UTC)[reply]

I thought ae developed into an open-mid vowel /ɛ/ and oe into a close-mid vowel /e/. If so, then paene would have been /pɛ.ne/. — Eru·tuon 03:04, 4 March 2017 (UTC)[reply]

I modified the etymology according to French Wiktionary. — TAKASUGI Shinji (talk) 08:17, 9 March 2017 (UTC)[reply]

A new Labs Tool to visually explore etymological relationships extracted from the English Wiktionary

Hi all! I have developed a tool to visualize etymologies. Please check it out at tools.wmflabs.org/etytree. My work is funded by an IEG grant. Please leave your feedback on the interactive tool here. It will help improve it.

a screenshot of the graph for word coffee

It's is impressive how well automatic extraction of data works. This is because Etymology Sections are written using well defined standards. I would like to get some feedback about some difficulties I have encountered while extracting data and some ideas I have about new templates. I wrote some notes here. Please add your comment there if you have any.

Looking forward to your comments! Epantaleo (talk) 17:25, 14 February 2017 (UTC)[reply]

I took a look at a few basic words.

"Door" generates, in addition to the expected chain of inheritance (< OE duru < PGmc *duriz ← PIE *dʰwer-) also an extensive tangent on completely unrelated words meaning "beetle". It seems that these have tagged along because of Middle English dor (a word that we do not have yet!) being used as a spelling for both.
@Tropylium: Thanks for looking so much in detail. It's a great feedback. I'm going to reply to each point. Regarding dor, in the visualization it's dor, i.e., English dor (some kind of beetle) and not Middle English dor. Apparently (from the etymology in Wiktionary) both English door and English dor derive from Middle English dore. So the extraction is correct. Epantaleo (talk) 11:52, 15 February 2017 (UTC)[reply]
So, the operation was a success, but the patient died... Homographs are definitely a problem, especially when there are a number of etymology sections, and with languages such as Middle English, where there's so much orthographical variation that unrelated words overlap. Chuck Entz (talk) 14:59, 15 February 2017 (UTC)[reply]
@Chuck Entz: I see your point. You are right. The problem is intrinsic in the current way templates are used if two homographs link to the same lexical entry. This kind of experiment (the visualization) can help set new standards that improve how Wiktionary works in terms of infrastructure. If when people write etymology sections, they think about homographs and specify which sense they mean in the template, the graphs will link to the correct thing, a posteriori. This is not implemented yet though. To use your words, hopefully with a new setup (where thelink points to the word with the correct sense) the next patient won't die. Does it make sense? Epantaleo (talk) 15:30, 15 February 2017 (UTC)[reply]
I doubt we will be "fixing" this just for the sake of accommodating your tool. People with brains can figure out that dor for "beetle" and dor for "door" are probably not the same word. And, as I said, you can fix this easily enough by not conflating homographs for which we have no entries. After all, you already manage to do this with homographs for which we do give multiple etymologies, correct?

A fully machine-readable formatting standard for etymology would be quite a nice thing to have, but this is not really a project for Wiktionary to hash out (such a project would probably need a different underlying database structure entirely). --Tropylium (talk) 21:05, 15 February 2017 (UTC)[reply]
@Tropylium: Maybe I'm too optimistic but how about users see that there is an incorrect link in etytree and add an entry for homograph "dor"? I think a major problem though is that in etymology sections there is no standard for homographs, i.e., even if there was an entry for "dor" etymology sections would anyway link to {{m|enm|dor}}, which is ambiguous. Maybe there could be something like {{m|enm|dor#Etymology_2}} or [[dor#Middle_English#Etymology_2]]?Epantaleo (talk) 16:09, 1 March 2017 (UTC)[reply]

"Air" for some unfathomable reason gives only definition line 9 (i.e. "(informal) Nothing"). Only three derived terms are given (airworthy, castle in the air, phlogisticated air), out of dozens of possible ones. Two different PIE roots turn up (apparently because we have a faulty etymology at Modern Greek αέρας). Interestingly, one of them gets rendered with on-line numbers ("*h2weh1-"), despite our originals having the proper subscripts.
@Tropylium: Good catch, those are small bugs that I'm going to fix: printing word definition in the tooltip seems to only print one of the definitions, and this change in superscript... not sure why this last thing is happening.Regarding derived terms, I am filtering those (I am working on filtering derived terms that are compounds at the moment, so they still show up) otherwise the visualization is overpopulated. Maybe I'll just have a button that visualizes all of the derived words if the user is interested. Epantaleo (talk) 11:57, 15 February 2017 (UTC)[reply]
"Cheap" generates a generally correct-looking web of related words, but for some reason splits Old English cēap into two nodes, one of them labeled "ceap", the other "cēap". Is the tool failing to cope with words that we cite differently from the lemmatization?
@Tropylium: ceap has been extracted from its own page https://en.wiktionary.org/wiki/ceap#Old_English, and there is no place (at least in etymology sections) where ceap is said to be etyomlogically equivalent to cēap. Again the extraction is working correctly. Epantaleo (talk) 12:02, 15 February 2017 (UTC)[reply]
You may have failed to register correctly what the problem is. Ceap is not "etymologically equivant" to cēap, they are the one and the same word, which we spell in two different ways depending on the context. Our entry for ceap indeed notes this, in giving the word in its definition line as ċēap and not ceap. (You can think of ceap as a standardized broad transcription, cēap and ċēap as more narrow transcriptions.) It is highly typical for older stages of languages to have unstandardized orthography, which we here at Wiktionary solve by standardized lemmatization. Your tool definitely needs to be able to work with this. Although you could manually infer them from headword lines, rules for these operations can be generally found listed at Wiktionary's language considerations pages, in this case Wiktionary:About Old English. --Tropylium (talk) 21:05, 15 February 2017 (UTC)[reply]
@Tropylium: Great, thanks for the pointer! I didn't know that. I will try to implement this, i.e., collapse different transcriptions of the same word. Epantaleo (talk) 16:11, 1 March 2017 (UTC)[reply]
"Lunch" gives an extensive list of words borrowed fron English, and also an undergrowth of words meaning "long", that I don't see where it's pulling them from. I imagine they have something to do with the possibility we give that Northern English lunch might be from Spanish lonja — but the Spanish word itself appears nowhere in the chart! A second unrelated tangent adds in various "loin" words, again due to Old French longe having both meanings.
@Tropylium: The etymology section in Portuguese lanche says: Borrowing from English lunch, shortened form of luncheon, probably from Old French longe, from lonc, from Latin longus. And Portouguese lanche is in the visualization. Epantaleo (talk) 12:12, 15 February 2017 (UTC)[reply]
I see. That's perhaps taken care of easily enough: it might be a good idea to disregard claims of the form "word B derives from word C" in some other word A's entry, whenever word B's own entry has an etymology but fails to corroborate derivation from C. --Tropylium (talk) 21:05, 15 February 2017 (UTC)[reply]

So far this looks like kind of random performance; I have doubts about how much an uneducated reader is going to get out of it currently. In particular, it seems that the tool is making a bit too many assumptions about the relatedness of words whenever an entry is lacking. But I'm interested in seeing what direction this develops in.

@Tropylium: Hopefully I have shown with your examples that there is no random behavior.

As far as the interface goes, so far the generated graphs look far too "wiggly" to me, with the nodes grouped randomly and no clear way to identify different chronological levels. Attempting to manually re-organize the nodes, while possible, also seems to pull the entire graph along, with a distressing flickery effect (at least on Firefox 51.0.1 for the Mac).

@Tropylium: I agree. The thing is that, because of inconsistent etymologies (which I have to say I was expecting), I find loops in the graphs (which should not be!) otherwise I would use the much nicer visualization I used in the demo which uses trees that go from left to right following time evolution (with no loops!). Epantaleo (talk) 12:12, 15 February 2017 (UTC)[reply]

--Tropylium (talk) 19:23, 14 February 2017 (UTC)[reply]

I agree with Tropylium. The current state of the project seems more like a demonstration of author's d3 skills. Seriously, I can comprehend text better this infographics-wannabe. --

Dixtosa (talk) 19:36, 14 February 2017 (UTC)[reply]

@Dixtosa: Hope my explanations above answer your doubts too. In any case it was a lot of work for only 6 months. Hopefully my grant will be renewed so that I can improve the visual interface. It is noteworthy that the visualizations point out to inconsistent etymologies, which can be fixed. Once fixed the database extraction can be updated and the visualization will look fine. Epantaleo (talk) 12:14, 15 February 2017 (UTC)[reply]

This has made a ton of progress since I last looked at it, and it's really cool to see it coming along! However, there is certainly a lot of work to be done yet. The current moving web of words that is displayed is not as clear as the more linear layout I remember seeing before, which was far, far clearer. I'm not sure what your reasons for the change is (or if I'm just looking at incomplete work). I notice also that the search bar does not recognize diacritics, and does not even search when I use them (I tried searching "café" and "leçon" and nothing happened).

Again, I'm really thrilled about the idea, and the progress that has been made, but there's a lot of work to be done to make it more accurate and visually clear. Andrew Sheedy (talk) 23:20, 14 February 2017 (UTC)[reply]

@Andrew Sheedy: Thanks! That's very encouraging! :) I totally agree with you that the linear visualization was much clearer. I wish I could use it as it was so much work. However, as I have written before, in the current state, there is a lot of inconsistencies in etymologies which causes loops in the visualization and loops cannot fit in the previous visualization (trees don't have branches that merge back with the tree). Epantaleo (talk) 12:17, 15 February 2017 (UTC)[reply]

@Andrew Sheedy:, @Epantaleo: : As there are inconsistancies in the extracted data, with different nodes merged as a single one, the extracted datastructure is a graph. It is possible to overcome this limitation by "correcting" the structure on the fly (using a spanning tree algorithm or try to get a DAG (Directed Acyclic Graph), which should be the correct structure. If the grant is to be continued, I'll be happy tohelp you find a correct algorithm for this. Dodecaplex (talk) 12:32, 24 February 2017 (UTC)[reply]

@Dodecaplex:: Great thanks! I thought about correcting the data on the fly too. However there is a fundamental question: do we want to manipulate what editors write in Wiktionary or do we just want to show what they write, including loops like the one described above? If we want editors to use this tool to check inconsistencies in Wiktionary then we want to plot what is is Wiktionary and let them fix inconsistencies. Maybe we can have both visualizations, one with loops and one without loops? Using both might be a lot of work. Epantaleo (talk) 16:20, 1 March 2017 (UTC)[reply]

I like the idea of visualizing the derivational relationships between words. However, as shown in the image for coffee, the tool doesn't recognize variant spellings of the same word: specifically the Arabic قَهْوَة (qahwa) with diacritics and قهوة without. Those should be a single node in the diagram; they are variant spellings of the same word, just as Old English ċēap, cēap, and ceap are. Ideally, the tool would show the form with the most diacritics, as it has the most information (that is, قَهْوَة (qahwa) and ċēap); and for Arabic, this is doubly necessary, because spellings without diacritics are extremely ambiguous in certain cases. (Arabic letters indicate consonants or long vowels, while diacritics indicate vowels, the lack of a vowel, and consonant doubling.) — Eru·tuon 01:54, 16 February 2017 (UTC)[reply]

@Erutuon: Thanks! Actually your note is similar to Tropylium's note above for Old English. I hope I'll have more time to work on etytree and implement this change. Epantaleo (talk) 16:24, 1 March 2017 (UTC)[reply]

Category:Proto-Germanic given names

A lot of the reconstructed given names we have are attested only in one or two (often closely related) early Germanic languages; e.g. *Grīmaz, *Audawarduz, *Mērijawīgą, *Andaswaraz, *Hrōþilandą, and so forth. On what basis do we assume that names like these were in fact in use in Proto-Germanic and were not just later regional innovations? — Kleio (t · c) 16:51, 15 February 2017 (UTC)[reply]

Grim is attested as a nickname for the same God in Old Norse and Old English (Grīm and Grímr). There's also scanty evidence the Franks may have had a God they called Grīm. I'd say two (possibly three?) languages having the same nickname for the same Common God is enough to keep Grīmaz. The other few do seem a bit tenuous, especially Mērijawīgą, as only Mariwig is an "attested" Frankish name ("attested" since only through Latin Merovechus). Llacheu (talk) 17:04, 26 February 2017 (UTC)[reply]

Where is Grim attested as a given name, and do these attestations predate intensive contact with ON speakers? Same with the Frankish attestation you speak of. My Google-fu is failing me, and I would like to verify. — Kleio (t · c) 18:41, 26 February 2017 (UTC)[reply]

Latin conjugation vs. present active participle ending

Is there anything for 3rd?

conj	1st	2nd	3rd	4th
verb	-o#Latin (denominal or 3rd-conj.-deverbal compound)	-eo#Latin (causative)	??	-io#Latin (causative)
pres.act.part.	-ans#Latin	-ens#Latin	??	-iens#Latin

Sobreira ►〓 (parlez) 17:38, 15 February 2017 (UTC)[reply]

Yes, -ō, -ēns, though as the descendant of the default thematic or athematic conjugation, they aren't really derivational suffixes. —John C5 18:13, 15 February 2017 (UTC)[reply]

@JohnC5The same -ō, -ēns of 1st and 2nd that would respectively make?:

suffixed to third-conjugation verbs in composition, forms regular first-conjugation verbs INTO
suffixed to third-conjugation verbs in composition, forms regular first-conjugation and third-conjugation verbs; and
Used to form present active participles from second conjugation verbs. INTO
Used to form present active participles from second and third conjugation verbs? Sobreira ►〓 (parlez) 13:19, 16 February 2017 (UTC)[reply]

Template:REPLY TO Yes and no. For -ēns, that is fine, but for -ō, this isn't correct. It's only inflectional morphology, not derivation for 3rd conjugation. Verbs don't really become 3rd conjugation. They are just inherited into that conjugation. On the other hand, verbs do switch conjugation to 1st. @CodeCat, could you confirm this? —John C5 15:32, 16 February 2017 (UTC)[reply]

Every conjugation has a characteristic underlying formation, which may or may not be still productive. The issue is that the conjugations include multiple formations that have been subsumed into one group. The 1st conjugation, for example, is made up of denominatives formed from ā-stems with a ye-suffix, factitives formed from thematic adjectives with a h₂ye-suffix, a few primary root verbs, mostly from laryngeal-final roots, frequentatives in -tā-, deverbal verbs created by prefixing a preverb, and various analogical derivations. The second conjugation contains at least statives in -ē- and o-grade causatives in -eye-. So to say that a verb is converted to another conjugation is misleading; you should specify which formation was used in the conversion. —CodeCa t 15:38, 16 February 2017 (UTC)[reply]

done Sobreira ►〓 (parlez) 21:01, 16 February 2017 (UTC)[reply]

3rd conjugation could have two forms, compare for example rego (reg-o) and capio (cap-io). -Slœtel (talk) 21:10, 16 February 2017 (UTC)[reply]

Taipeh

RFV of the etymology. @Prisencolin, not really sure what you mean by "court dialect". — justin(r)leung _{{ (t...) | c=› }} 00:35, 16 February 2017 (UTC)[reply]

It appears to follow the romanization scheme scene in An English and Chinese Vocabulary, in the Court Dialect- -Prisencolin (talk) 02:47, 16 February 2017 (UTC)[reply]

It seems to be referring to the Nanjing dialect. — justin(r)leung _{{ (t...) | c=› }} 17:08, 18 February 2017 (UTC)[reply]

I've made changes to the article to reflect this, but it's hard to tell whether it actually comes from the Nanjing dialect. — justin(r)leung _{{ (t...) | c=› }} 17:12, 18 February 2017 (UTC)[reply]

schots

schots, RFV of etymology 2: This etymology is also given by De Vries [5], but other dictionaries don't follow him and call the etymology uncertain. Lingo Bingo Dingo (talk) 11:49, 16 February 2017 (UTC)[reply]

RFV etym ergosterol

Sourced in DHLP=Houaiss for Portuguese, and Wikipedia, I changed the etym from LA<GRC ergo- + sterol to EN/FR ergot + sterol. Any idea of the preference for EN or FR? (BTW, Houaiss says PT<FR, not from EN). Sobreira ►〓 (parlez) 13:10, 16 February 2017 (UTC)[reply]

tuba

RFV of the etymology.

Many sources list this as originating from Latin. Merriam-Webster and Oxford also mention that it came into English via Italian.

Our entry currently says this originated in French. Anyone have any clarity on where this etymology came from, and whether this is a valid alternative theory or just a mistake? ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:23, 16 February 2017 (UTC)[reply]

Whoever added it may have got it from here [[6]] Leasnam (talk) 05:44, 18 February 2017 (UTC)[reply]

I have removed that part pending further sources. The OED says "Italian and Latin", but it looks to me that the instrument was invented by Germans and that the name was simply taken directly from Latin for all major European languages involved. It referred to a specific sort of Roman instrument, for which our English entry lacks a sense. —Μετάknowledge^{discuss/deeds} 07:47, 18 February 2017 (UTC)[reply]

Indeed. "Worüber man nicht sprechen kann, darüber muss man schweigen." We need some Italian, German, and French etymological research. DCDuring TALK 15:39, 18 February 2017 (UTC)[reply]

Robert has tuba from a 1767 dictionary referring to a Roman instrument, doesn't mention Italian, but (of course) refers to Latin origin. The correspondence of the early use of the French term to the modern instrument is not clear to me. DCDuring TALK 15:48, 18 February 2017 (UTC)[reply]

For German, Pfeiffer says "borrowing (second half of 18th century) from Latin tuba". Kolmiel (talk) 19:23, 19 February 2017 (UTC)[reply]

Philippa (Dutch) may be interesting: "Borrowed from German Tuba "tuba" (1845), earlier already Baß-Tuba (1835), a generalization of Tuba "Roman war trumpet" (1768), which is borrowed from Latin [...]. Before that, Dutch tuba "Roman war trumpet" had already been borrowed [...]. The German musician Wilhelm Wieprecht and the instrument maker Gottfried Moritz obtained a patent for the tuba in 1835, a bass wind instrument developed by them, under the name of Baß-Tuba." — So it seems that the word as such had already spread and that the use for the modern instrument is from German. Kolmiel (talk) 19:31, 19 February 2017 (UTC)[reply]

Vandalic orthography for reconstructions

For the interested (@Anglom, Angr, CodeCat, Leasnam?) I started a discussion on this user page about Vandalic reconstructions after the entries *Gaisareiks and *Hildireiks were created recently. Input may be nice, since I've not done a whole lot of reading about Vandalic yet. — Kleio (t · c) 18:35, 16 February 2017 (UTC)[reply]

soda

Online Etymology Dictionary claims that Proposed Arabic sources in a name of a variety of saltwort have not been attested and that theory is no longer considered valid. , do we know something they don't or is our Arabic etymon bunk? Crom daba (talk) 11:50, 18 February 2017 (UTC)[reply]

三味線

According to Wikipedia, the shamisen originated from the Chinese sanxian. If that is the case, why do we claim here that the word shamisen in Japanese comes from Okinawan? ---> Tooironic (talk) 15:15, 19 February 2017 (UTC)[reply]

Is the etymology at 三味線#Japanese confusing? I would be happy to rework it if needed.

The term's history is relatively clear. The instrument may indeed have originated in China, but the term came from Okinawan, albeit based on constituent roots originally borrowed from Chinese. The relevant section of the WP article (at w:Shamisen#History_and_genres) notes the transmission of the instrument from China to Japan via the Ryūkyū Kingdom. The Japanese WP article section here notes that the Chinese instrument was known in the Ryūkyū Kingdom from 1392. That article also states that this was then transmitted to Sakai in mainland Japan in 1558 or 1559.

FWIW, the Japanese cognate for Chinese 三弦 (sānxián) is 三弦 (sangen). ‑‑ Eiríkr Útlendi │^{Tala við mig} 19:24, 20 February 2017 (UTC)[reply]

I'm just confused how something which originated from China could be originally derived from a Japonic language. Perhaps the etymology does not go back far enough? ---> Tooironic (talk) 09:42, 21 February 2017 (UTC)[reply]

Perhaps we're talking past each other?

One example of a Chinese "thing" with a non-Chinese name is English bean curd instead of tofu. Things and the terms for those things don't always share origins.

“something which originated from China” -- The "thing" is the instrument, which did originate in China. No concerns there.

“could be originally derived from a Japonic language” -- The Japanese term 三弦 (sangen) for (one variety of) this instrument originates in Chinese. However, Japanese 三味線 (shamisen) derives from an Okinawan-coined compound of Chinese-derived elements 蛇皮 (jabi, “snake skin”) + 線 (sen, “line; string”). The choice of the 線 (sen) final character was influenced by the xián phonetics of the Chinese 弦, but the Japanese 三味線 (shamisen) etymon 蛇皮線 (jabisen) as a single term was decidedly not from Chinese, to the best of what I've been able to find.

The alternative name of the more-traditionally Okinawan form of the instrument, 三線 (sanshin), is definitely borrowed from Chinese 三弦 (sānxián) phonetically, with the Japanese term using 線 (shin, more commonly sen, “line; string”) as a phonosemantic replacement for Chinese 弦 (xián, “string of an instrument”), as the Japanese reading of this character is gen, which doesn't match the Chinese pronunciation at all.

Does that answer your concerns? ‑‑ Eiríkr Útlendi │^{Tala við mig} 19:23, 21 February 2017 (UTC)[reply]

Thank you for that detailed explanation, I think I understand now. Cheers! ---> Tooironic (talk) 03:09, 22 February 2017 (UTC)[reply]

mërajë

G. Meyer 1891: 259 (G. Meyer. Etymologisches Wörterbuch der albanesischen Sprachen. Strassbourg. 1891) — This unsigned comment was added by Mattbeets (talk • contribs)..

@Mattbeets: Is that supposed to be the source for the substrate origin proposed at mërajë? Meyer claims no such thing. According to him the Albanian word is a Romance borrowing. Feel free to readd the substrate theory (which is plausible) with a proper reference. --Vahag (talk) 14:25, 20 February 2017 (UTC)[reply]

@Vahagn Petrosyan: no, just a source for the attestations, since there was none. But I must admit i did not have the original source, so it was a proxy reference from Berneker's "Slavisches etymologisches Wörterbuch" (second volume) p. 73, but I thought the original reference was best. The substrate theory is mostly mentioned for the Greek word. e.g. Beekes 2010 . 903-904 copying from Schwyzer 1953 (first volume) p. 61 (and the same in Chantraine's and Frisk's respective dictionaries). The Greek dictionaries, however do not mention the Rumanian and Albanian, as I recall). — This unsigned comment was added by Mattbeets (talk • contribs).

I have moved the etymological discussion to maraj, with references. --Vahag (talk) 15:02, 20 February 2017 (UTC)[reply]

Russian очень

Vasmer speaks of a relationship with о́ко (óko, “the eye”). What's the semantical link? --Barytonesis (talk) 23:02, 25 February 2017 (UTC)[reply]

Presumably visibly -> obviously -> prominently -> very Crom daba (talk) 16:51, 26 February 2017 (UTC)[reply]

He draws a semantic relationship with очну́ться (očnútʹsja, “to awaken; to come to senses”), очути́ться (očutítʹsja, “to find oneself (suddenly) in a place”). Also comparing with очеви́дно (očevídno, “apparently, obviously”). You can see phonetic changes /k/ -> /t͡ɕ/ even in о́ко (óko, “eye”). --Anatoli T. ^{(обсудить}/^вклад) 12:28, 30 June 2017 (UTC)[reply]

Benjamin

RFV of the etymology.

User:Hekwos changed the etymology for this English biblical name to assert that the original Hebrew term from which it came literally means "son of days"(referring to Jacob's age at the time of the birth), rather than "son of (my/the) right hand" or "son of (the) south", based on the assumption that it contains the plural of the Aramaic word for days. I disagree, and have reverted them twice. Since Semitic linguistics isn't one of my stronger areas, I would like a second opinion before taking further action. See the talk page for the points made so far and the edit history for the details of the reverted edits. Thanks! Chuck Entz (talk) 23:18, 25 February 2017 (UTC)[reply]

Reconstruction:Old English/Loca

A user has created this entry but initially neglected to provide any descendants, so I marked it for speedy. This was reverted and a descendant added. However, I'm still not convinced that any of this stands up to scrutiny. —CodeCa t 15:50, 26 February 2017 (UTC)[reply]

Then scrutinise it. CodeCat has neglected to provide my full explanation and the fact I added etymologies. Llacheu (talk) 16:56, 26 February 2017 (UTC)[reply]

I smell yet anUther sock. —Aɴɢʀ (talk) 20:48, 26 February 2017 (UTC)[reply]

lol clever ;) Leasnam (talk) 06:54, 27 February 2017 (UTC)[reply]

I have upgraded Equinox's block to a permaban, as even if this isn't Uther, this user threatened other editors quite violently. I have also deleted most of the reconstructions he created. —Μετάknowledge^{discuss/deeds} 21:32, 26 February 2017 (UTC)[reply]

polo

I would like to ask to check the etymology of the English term polo. My Czech language etymology dictionary says that the Czech word pólo comes (obviously) from the English term which comes from Balti expression polo which is a cognate with Tibetan pulu. Something similar can be seen e. g. also at the Online Etymology Dictionary. However, our entry says that the Balti term is pulu and the Tibetan term is po lo. I am not able to correct it since I cannot write the scripts of the two languages. Thanks. --Jan Kameníček (talk) 23:31, 26 February 2017 (UTC)[reply]

The Balti is pulu, Tibetan is polo. Both written the same way: པོ་ལོ (po lo). པོ་ལོ (po lo) is definitely polo in Tibetan, pulu in Balti. That little ོ ( o) on top of the letters is the Tibetan ‘o’. —Stephen ^(Talk) 02:38, 28 February 2017 (UTC)[reply]

Well, I do not understand, why do all the dictionaries have it the other way. Another one is here: [7] --Jan Kameníček (talk) 18:16, 28 February 2017 (UTC)[reply]

@Wyang:. Could you help? DCDuring TALK 18:45, 28 February 2017 (UTC)[reply]

English etymological dictionaries:

Hobson-Jobson; a glossary of colloquial Anglo-Indian words and phrases, and of kindred terms, etymological, historical, geographical and discursive:
Page 1, Page 2
The Concise Dictionary of English Etymology:
Polo, a game. (Balti.) 'It comes from Balti; polo being properly, in the language of that region, the ball used in the game;' Yule. Balti is in the high valley of the Indus.
An Etymological Dictionary of Modern English
polo. Balti (dial. of Indus valley); cf. Tibetan pulu. First played in England at Aldershot in 1871.

Evolution of the Wiktionary entry:

2004: From the Tibetan word polo, meaning ball.
2008: From the Tibetan word pulu, meaning ball.
2008: From the Balti word pulu, meaning ball.
2011: From the Balti word (deprecated template usage) པོ་ལོ (po lo).
2012: From Balti (deprecated template usage) པོ་ལོ (po lo).
2014: From Balti (deprecated template usage) པོ་ལོ (po lo). Cognate with Tibetan པོ་ལོ (po lo), ཕོ་ལོང (pho long), སྤོ་ལོ (spo lo, “ball”).

Tibetan resources:

Tibetan writing is conservative (i.e. does not reflect modern pronunciation). The usual word for “ball” is པོ་ལོ (po lo) or སྤོ་ལོ (spo lo), pronounced differently in different places, including po:lo· in Balti (Rangan, K. 1975. Balti phonetic reader).
Dialectal pronunciations of this term: list. Only one showing pulu is Wenlang dialect of the Cuona language, an East Bodish language (not Tibetic).
Balti-English / English-Balti Dictionary: polo s (i) ball; (ii) polo [T. (Ladakhi) bo-lo 'ball']
པུ་ལུ (pu lu) is found in Written Tibetan, but it means “hut made of stones”. ཕུ་ལུ (phu lu) is “a unit for tax assessment”.

Conclusion:

Balti pulu in the current entry is likely a mixup amongst the many edits. It should be corrected to po lo (spelt པོ་ལོ (po lo) in Tibetan script, or in the Perso-Arabic script).
Status of Balti (as either a Tibetan dialect or a separate language) should be discussed, and should be checked. It was created by a possibly non-native anon.
Tibetan pulu is so far not verified. It may be a word from a western Tibetan dialect.

Wyang (talk) 23:45, 28 February 2017 (UTC)[reply]

@Wyang Could you please correct the entry in the way you have suggested above? --Jan Kameníček (talk) 16:25, 22 March 2017 (UTC)[reply]

@Jan.Kamenicek No worries, I've corrected the entry. Wyang (talk) 06:11, 23 March 2017 (UTC)[reply]

@Wyang Thank you. Originally I thought that the Tibetan script needs to be corrected too. Jan Kameníček (talk) 07:49, 23 March 2017 (UTC)[reply]

I don't know anything about Balti, but I suspect should be spelled , if it is indeed a real word. --Wiki Tiki 89 14:22, 23 March 2017 (UTC)[reply]

Oh I see that was already mentioned. I'm going to move it if no one objects. --Wiki Tiki 89 14:24, 23 March 2017 (UTC)[reply]

I tried to search the last database dump for any pages with Arabic presentation forms in their titles, but AWB found none — not even this page, although the dump was from three days before you moved it, so it should have showed up. - -sche (discuss) 17:46, 23 March 2017 (UTC)[reply]