Wiktionary:Beer parlour/2013/September

← August 2013 · September 2013 · October 2013 →

Term only citable with different spellings countingEdit

I can find one hit for Copenhagenisation and two for Copehagenization (meaning “(sociolinguistics) the process of Danish speakers begining to use the dialect of Copenhagen”). Not enough citations for either, but they’re just different ways of spelling the same word, so should they be included? — Ungoliant (Falai) 12:35, 2 September 2013 (UTC)

Our entries are for spellings. DCDuring TALK 13:41, 2 September 2013 (UTC)
There is support for it (Wiktionary:Information_desk/Archive_2012/July-December#Request for clarification: How strict is WT:CFI regarding attestation of spellings which vary slightly?). — Ungoliant (Falai) 14:05, 2 September 2013 (UTC)
Why do you call that support? DCDuring TALK 15:52, 3 September 2013 (UTC)
Support creating both entries. I don't think there is much point in gerrymandering the CFI to exclude terms merely because of spelling differences. It's the same word. —CodeCat 13:58, 2 September 2013 (UTC)
Agreed. If it were a regional term or an alternative spelling, where the spelling is what's in question, it might be different, but -ize and -ise are substituted into words by an extremely regular and mechanical process analogous to inflection (most of the time, we don't even notice we're doing it). If we accept plurals for singular lemmas, or past for present lemmas, we should accept these. Chuck Entz (talk) 14:41, 2 September 2013 (UTC)
Yup, though not without exception. (Consider compromise and exercise and advertise, whose counterparts in <-ize> are quite rare by comparison. And, for that matter, consider matrices and hypotheses and phalanges, whose regularly-backformed singulars matrice and hypothese and phalange are, similarly, quite rare compared to the standard singulars. So we do need to exercise caution.) —RuakhTALK 21:49, 2 September 2013 (UTC)
I agree with CodeCat. —RuakhTALK 21:49, 2 September 2013 (UTC)
Yet another step that means increase in quantity and decrease in quality of entries. DCDuring TALK 15:52, 3 September 2013 (UTC)
You’re just being a concern troll. — Ungoliant (Falai) 18:34, 3 September 2013 (UTC)
I'm not sure what "step" you're referring to. Are you implying that hitherto we have not allowed entries in cases where a word meets the CFI but has not had any individual spellings/forms that do? —RuakhTALK 20:29, 3 September 2013 (UTC)
Exactly. Am I wrong? I know I am not wrong about the poor quality of our definitions, both English and other. It's hard to say whether they are getting worse or not as we have no metrics (not that we could readily develop any, except on a sample basis). I'm quite sure that our definitions are not rapidly improving and that we are constantly adding FL terms with ambiguous glosses. DCDuring TALK 23:58, 3 September 2013 (UTC)
Support including the term, but I would like to see one proper entry for a lemma, plus a form-of page.
Our pages are for spellings, but many of our full entries are for lemmas, with form-of references for inflections and spelling variations. The latter is a much better arrangement for the reader, and also for integrity of the dictionary, per the w:DRY principle. Our citation practices encourage me to think that we cite terms, not spellings: “Unlike the main space, inflected forms and alternate spellings should be redirected to the primary entry. Variations in case should be on the same page, with the other(s) redirecting, even if the definitions are distinct” (from WT:CITE#Naming).
Some of these are also citable: Copenhagenize/Copenhagenise, Copenhagenized/Copenhagenised, Copenhagenizes/Copenhagenises, Copenhagenizing/CopenhagenisingMichael Z. 2013-09-06 04:33 z
Lightly object. To generalize from this example, we're talking about words that have two spellings in English, which means you can't come up with 5 examples in English--with Google Books, that's not usually a huge hurdle. You're also usually taking about words that predictable variations on words we should have; if we have Copenhagen, Copenhagenization should be pretty clear. I don't see the benefits as being huge.--Prosfilaes (talk) 05:52, 6 September 2013 (UTC)

Wiktionary's definition of a word is spelling-based, and I don't see why we should make an exception for Copenhagenisation and Copehagenization. If both can be independently cited they both deserve a separate entry, with one being a lemma and other an alternative form, misspelling, or whatever. --Ivan Štambuk (talk) 17:22, 12 September 2013 (UTC)

Wiktionary:Manual of styleEdit

I'd've thought this was a good idea for things that Wiktionary:Entry layout explained either doesn't mention or doesn't give an unambiguous verdict on. For example, definitions may be formatted as sentences, or not. There's very little consistency. Even two consecutive definitions in a single entry, the first will have an initial capital and a full stop, and the second will have neither. Mglovesfun (talk) 11:11, 3 September 2013 (UTC)

Good or featured articles?Edit


Is there here a system of good or feature articles, like on Wikipedia (Wikipedia:Featured articles/Wikipedia:WikiProject Good articles)?

Thanks by advance, Automatik (talk) 13:18, 3 September 2013 (UTC)

  • No - mainly because we haven't got any articles, only words. SemperBlotto (talk) 13:21, 3 September 2013 (UTC)
    Sorry, I would mean a similar system. Thanks. Automatik (talk) 15:35, 3 September 2013 (UTC)
But we do have WT:WOTD. DCDuring TALK 13:25, 3 September 2013 (UTC)
Thaks for your answer. Automatik (talk) 15:35, 3 September 2013 (UTC)
But WT:WOTD isn't really comparable. There are no quality requirements on WOTD and no process for "bringing an entry up to WOTD level". The non-English WOTD must have a pronunciation and at least one citation (at least one mention for a limited-documentation language), but the requirements on English WOTD all have to do with the nature of the word itself, not with the quality of the entry. —Angr 21:46, 3 September 2013 (UTC)
That's true, but I think it's in part because we can improve an English entry quickly once it's announced as an upcoming word of the day. —RuakhTALK 04:14, 7 September 2013 (UTC)

Hello, I came from the French wiktionary too. We are trying to create a system to have a quality evaluation, and it seems no other Wiktionary have a system like that. Do you want to join to the discussion? If yes, we do have to work some weeks more and then we can translate it in English to share the ideas with you. Eölen (talk) 22:52, 3 September 2013 (UTC)

Can you please provide a link to the relevant discussion on French wiktionary? --Ivan Štambuk (talk) 17:16, 12 September 2013 (UTC)

Wiktionary:About PolishEdit

Since I saw it on a "needed badly" list somewhere, I decided to start this page. It has been brewing on my disk for some time. I loosely based it on WT:ACS, while trying to explain some grammatical features, and highlight a few gaps in current practices. Please tell me what you think, whether anything is missing, needs change or an explanation. Keφr 10:34, 6 September 2013 (UTC)

Good work! Not much to criticise or suggest at this point. I'll watch this project and may use something to add to Wiktionary:About Russian. I'd like to see more treatment of verbs, including perfective/imperfective (not just entries but translations), abstract/concrete, semelfactive. Also interested in the policy for reflexive verbs, which seem to be handled differently across languages (separate entries or separate senses?). Polish could perhaps use more etymology info, which can often be looked up at Serbo-Croatian (or sometimes Russian) entries with Proto-Slavic derivations. --Anatoli (обсудить/вклад) 23:52, 8 September 2013 (UTC)
I do not remember ever encountering a semelfactive aspect which would be distinct from perfective. Translations - noted, will write something up. I tried to be descriptive of current practices rather than prescriptive, so if you people want to discuss how the policy ought to be, feel free. Not sure what you mean by the abstract-concrete distinction. Remember, this is not a complete guide to Polish grammar, just a quick summary to explain how it is relevant to presenting terms in Wiktionary. Keφr 07:25, 9 September 2013 (UTC)
Re: semelfactive vs simply perfective, example: "krzyknąć" and "pokrzyczeć" are both perfective, the former is semelfactive (instantaneous, momentive), the latter is not. Abstract vs concrete (verbs of motion only): "chodzić"/"iść". I've added some categories for a few Slavic languages other than Russian before. Your project page doesn't have to describe all that, of course. --Anatoli (обсудить/вклад) 12:24, 9 September 2013 (UTC)
And I used to think that aspect is an easy language… aspect. Did you notice the mention of frequentatives? Any idea whether and how this aspect mess should be handled? (I think I remember Russian having a similar feature.)
The abstract-concrete gave me some idea, but I am not sure I got it right. I think you will not find a good translation of the verb go in all its generality. The verb iść still sort-of refers to using feet, even if the main focus on something else.
And this page is not "mine" by any standard. If you think you have something to add, go ahead. In the worst case you will get reverted once or twice. Keφr 13:54, 9 September 2013 (UTC)
Although I would not mind having the former type of page somewhere in here, to be honest. There are a few languages I would like to learn, but have something of a hard time finding good resources. A brief grammar reference would be helpful. Keφr 07:28, 9 September 2013 (UTC)

Interwikis and translation-links for languages without Wiktionaries, or whose Wiktionaries are closed.Edit

As far as I can tell, Wiktionaries can be classified into four groups:

  1. Regular Wiktionaries, like fr.wikt and es.wikt. Except for various annoying edge cases that aren't the subject of this discussion, these work just fine, and exactly as you'd expect.
  2. Nonexistent Wiktionaries that redirect to the Wikimedia Incubator, like vep.wikt. I'm not sure quite how we should handle these, but I think we can basically do whatever we want; we just need to decide what we want to do with them, and then do it. Interwiki-links to [[:vep:...]], for example, work fine, linking to vep.wikt URIs that redirect to Incubator URIs.
  3. Nonexistent Wiktionaries that don't redirect to the Wikimedia Incubator, like zza.wikt. With these we can do whatever we want for translation-links (we just have to link directly to the Incubator entry if we want that), but interwiki-links are uglier (we'd have to add them JavaScript-ically).
  4. Closed/locked Wiktionaries, like aa.wikt and dz.wikt. (I suppose these could be considered a subset of the previous.) These are annoying, because they have some existent pages, and they have database-dumps, but redlinks to them are rather pointless (since content can't be added), even bluelinks to them are rather dubious (since problematic content can't be fixed or removed), and in some (most? all?) cases there's at least as much content on Incubator as on the Wiktionary domain itself.

Group #1 needs no discussion, but how do we want to handle each of groups #2–4?

RuakhTALK 06:21, 7 September 2013 (UTC)

Since no one's weighed in yet, here are my own views:
  • we should never link to closed/locked Wiktionaries — not as interwiki-links, and not as translation-links.
  • we should never link to non-existent pages on Incubator — not as interwiki-links (obviously), and not as translation-links.
  • when a translation has an appropriate-language Wiktionary entry on the Wikimedia Incubator, we should link to it using {{t+}}. (Note: since e.g. [[zza:...]] and [[aa:...]] don't work properly, this will require a change to the translation-templates. Actually these templates are already a bit broken when it comes to languages without Wiktionaries — {{t|zza|foo}} links to a page named zza:foo on en.wikt — so we'll want to make some sort of change to them regardless.)
  • when an interwiki-link would appropriately link to a redirect to an existent entry on the Wikimedia Incubator, we should use it. For example, [[April]] should include [[vep:April]] among its interwiki-links.
  • when an entry exists on the Wikimedia Incubator, but an interwiki-link wouldn't work, should we hack up some JavaScript to make it work? I'm not sure.
RuakhTALK 19:47, 7 September 2013 (UTC)
Sounds all reasonable to me on the face of it. As for the Javascript question in the last item, my instinct would be to avoid adding Javascript unless it generates significant added value, which does not seem to be the case. --Dan Polansky (talk) 20:14, 7 September 2013 (UTC)
  • Does anyone object to my beginning to implement these changes? —RuakhTALK 03:51, 10 September 2013 (UTC)

IMHO, apart from top-X (where X < 5), other Wiktionaries are so much inferior in quality that linking to them in both interwikis and translation tables seems like a waste of time, database space and edit counts. --Ivan Štambuk (talk) 17:10, 12 September 2013 (UTC)

"Magic mirror in my hand, who is the fairest in the land?" --flyax (talk) 10:21, 22 September 2013 (UTC)

Number formsEdit

Based on the Category:Inflections, I believe Wiktionary needs a new category called Numeral forms because some languages have inflections for their cardinal numbers. I hope this isn't a difficult suggestion. --KoreanQuoter (talk) 18:08, 7 September 2013 (UTC)

But we already have one? —CodeCat 18:26, 7 September 2013 (UTC)
I tried to make a separate page for одно (neuter form of один) and I think Numeral forms is more appropriate for a category. --KoreanQuoter (talk) 18:51, 7 September 2013 (UTC)
I still don't understand. What is wrong with the existing numeral forms category? —CodeCat 19:24, 7 September 2013 (UTC)
Wait. There was an existing numeral forms category? --KoreanQuoter (talk) 05:47, 8 September 2013 (UTC)
…yes? Keφr 06:02, 8 September 2013 (UTC)
Oh. Silly me. Thank you. --KoreanQuoter (talk) 06:18, 8 September 2013 (UTC)

CFI and Wiktionary is not an encyclopediaEdit

I have created vote Wiktionary:Votes/pl-2013-09/CFI_and_Wiktionary_is_not_an_encyclopedia. I propose to remove or at least trim WT:CFI#Wiktionary is not an encyclopedia section.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 08:08, 8 September 2013 (UTC)

Let's keep the comments on the talk page of the vote. Mglovesfun (talk) 09:06, 8 September 2013 (UTC)

CFI and trimming the Idiomaticity sectionEdit

I have created vote Wiktionary:Votes/pl-2013-09/CFI and trimming the Idiomaticity section.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 09:00, 8 September 2013 (UTC)

Underestimating idiomaticity of Finnish translationsEdit

While going around fixing translation lists, I noticed that very often, the Finnish translations are marked up as if they were sum-of-parts. At first I thought, "well, I guess Finnish is weird", but recently I started to doubt the accuracy of their such characterisation. Take door-to-door. The Finnish translations listed look like simple inflections of the Finnish word for "door" (ovi). The English meaning of "door-to-door" is apparently idiomatic, so I have a quite hard time imagining how the Finnish entry, which breaks down into constituents pretty much the same way, would be sum-of-parts. I am also suspicious of entries where Finnish translations are broken into roots and affixes.

Do you think we should go over these? Keφr 11:54, 8 September 2013 (UTC)

"ovelta ovelle" translates literally as "from door to door". ovelta is the ablative case of ovi and means "(away) from a/the door", while ovelle is the allative case and means "to/towards/onto a/the door". —CodeCat 12:19, 8 September 2013 (UTC)
I've noticed this too and I think it's just the way they've been added (by a human editor) and nothing to do with the language itself. Mglovesfun (talk) 19:58, 8 September 2013 (UTC)
So in that case, should we not have an entry for the whole phrase and link to it in the translations list? Keφr 20:55, 8 September 2013 (UTC)
I would trust Hekaheka's judgement on how to translate into Finnish. We should invite her if any doubt. Translations may be done as "solids" (if they are idiomatic in the target language) or using "sum of part" methods. Using "ovelta ovelle" allows you to see individual components and the grammar but "ovelta ovelle" requires the entry to exist or at least interwiki. --Anatoli (обсудить/вклад) 22:30, 8 September 2013 (UTC)
I agree (w/Anatoli). Even if it is just a case where the Finnish-speaking editors have made a different decision than most of the non-Finnish-speaking editors would have made . . . well, that doesn't seem like a big deal to me. There are a lot of things that it's important to be consistent about across languages, but I'm not sure this is one of them. —RuakhTALK 23:16, 8 September 2013 (UTC)
There are pros and cons in both approaches but it's often safer to use "SoP" approach, even if it's more time-consuming. It usually causes less criticism. It's the first I see criticism of the SoP approach.--Anatoli (обсудить/вклад) 04:32, 9 September 2013 (UTC)
Safer? Maybe. Though note that [[ovelta ovelle]] does exist in this case. The specific problem I see with overmarking as sum-of-parts are that 0) this discourages creation of entries for terms which may be non-trivial to translate into English; 1) these terms will not be picked up by Yair rand's gadget on the search page, and it does not seem obvious to me how it could be extended to do so; leading into 2) the translation of such a term will be harder and more time-consuming, especially when the entries for the constituent words are missing some meanings. So, marking translations as idiomatic can be beneficial even when that makes them redlinks. Keφr 07:09, 9 September 2013 (UTC)
By "safer" I mean in terms of someone disputing idiomaticity. I've translated double-decker bus and death camp as SoP двухэтажный автобус m (dvuxetážnyj avtóbus) and лагерь смерти m (lágerʹ smérti) as tomorrow someone may dispute both the English terms and the Russian translations. It's still educational to see the grammar of the translations, showing the individual parts and how the translation is made. --Anatoli (обсудить/вклад) 13:00, 9 September 2013 (UTC)
Educational — I am not denying that. Having these translations listed as SoPs is helpful, even if only by the virtue of it being better than having no translations at all. But the grammatical structure of a multi-word term can be also analysed in the whole term's entry, and can also be inferred by hand when the constituent word pages are reasonably complete, so this is not hugely relevant. Creating pages for these I find rather easy. Though granted, I do not always create these myself.
All I wanted to know is whether everyone is okay with treating such terms as SoPs. There are many other examples, they often land in Category:Translations to be checked (Finnish), because I usually just leave them there when xte suggests reviewing the translation. (Hekaheka would probably like to call me a perkeleen vittupää because of that, but oh well. Cannot please everyone.) Keφr 14:10, 9 September 2013 (UTC)

Translating is a complicated business. An idiom is not always translated by an idiom, a word in one language may take a sentence in another language to translate, a concept may not exist in every language, the grammatical structure used to express ideas may be entirely different etc. One problem is that if an expression is unidiomatic in one language (say, Finnish), it never occurs to anyone to look it up in a dictionary, even if there's an idiom in another language (say, English) to express the same idea. Instead, one would look for the components. My way to circumvent this problem is this:

  1. If there is an unidiomatic English synonym (e.g. in the case of live up, to fulfil the expectations), I translate this synonymous expression. One can always use this verbatim translation of täyttää odotukset in lieu of "live up".
  2. Often (like in the case of dip, in the sense "to switch to low beams", vaihtaa lähivaloille) I also write a usex or sense in one or more of the entries for the Finnish components (see lähivalot, ojentaa in the sense lend, or piittaamaton in the sense "lawless"). I have no clear rule as to when to do it, and I believe it would be difficult to formulate one, too.

--Hekaheka (talk) 17:42, 20 September 2013 (UTC)

If we are to have entries in English for encoding idioms that are not decoding algorithms, then we need to make it possible for all users in all languages to find them, not just those knowing languages that happen to have a common corresponding idiom, but without having entries in every language for the corresponding SoP entries. How could that be achieved using existing search boxes? DCDuring TALK 18:09, 20 September 2013 (UTC)
I have trouble parsing this passage. Keφr 18:11, 20 September 2013 (UTC)
Kefir, how do you like the method I described above? --Hekaheka (talk) 07:40, 22 September 2013 (UTC)
It does make some sense. But it strikes me as odd that Finnish would not have a single-word translation for detoxification or disconnected. And the only way to describe those concepts is through SOP phrases. Or that "päästää irti", "päästää käsistään", "hellittää otteensa", currently listed as translations of let go, are actually used in common speech, with both words in place and have no added meaning apart from the meaning of their constituent words. Why would simple "hellittää" not be appropriate, for example? One of the definitions for that one is "to loosen one's grip". Is it common/natural to further disambiguate it with "otteensa"?
There are also phrasebook entries, which are not subject to idiomaticity criteria, and yet the Finnish translations are often broken apart into constituents, unlike every other language listed. Keφr 08:06, 22 September 2013 (UTC)
It's because I'm not going to write entries like tarvitsen pyyhkeen (I need a towel), minulla on astma (I have asthma) and so on for umpteen things. The only reason for me to write even the little I do is that I want them off the trreq -list. Bluelinking them to constituent parts makes them easy to verify for anyone who is interested and is more informative than just a red link to a non-existing phrase entry. This is a wiki and anyone is welcome to continue from that point by writing an entry for the full phrase. --Hekaheka (talk) 04:03, 23 September 2013 (UTC)
Well, believe it or not, there's no one word in Finnish which would cover all instances of "detoxification". There are a number of methods to detoxify (neutralizing, removing, diluting...) something and a different verb is used for every one of them and for those the structure is "verb + myrkky". The gloss in the entry "detoxification" is "removing toxins" and for that the only available equivalent is "myrkyn poistaminen". Another problem is that the English definition is probably too narrow in this case - and believe me, this is not the only occasion where the formulation of English gloss makes the translator ready to lose his rag.
I insist that the core problem is that the languages do not have a 1:1 correspondence and that there's no one-size-fits-all solution to it. Let's take the verb "to score", sense 1 (to earn points in a game) as an example. There's no single verb in Finnish to express that, one either "makes points" (tehdä pisteitä/tehdä piste), "makes goals" (tehdä maaleja/tehdä maali) or even both, depending on the game. Both "make points/make a point" and "make goals/make a goal" are acceptable expressions in English, but there are no entries for them (except for the idiomatic expression "make a point"), as they are obvious SOPs. Why should there be an entry for tehdä maali, which is equally SOP? Maybe we should just accept that a dictionary alone is never going to be a sufficient guide for mastering a language. --Hekaheka (talk) 11:26, 22 September 2013 (UTC)
Of course there is no such correspondence. I am well aware of that. And you may be right about the the "score" example. But what about the other examples I gave? In the "door-to-door" example, the word "door" refers specifically to house doors, in the context of doing some business — which is not apparent from the word "door" alone. Or the "hellittää" example. Right now it looks like most of the time you are translating glosses, when you should be translating the words themselves. Sometimes of course there is no other way, but I suspect that you miss the opportunities to do so when there actually are some. But I barely know that Finnish exists, so I might be wrong. Maybe Finnish actually is that weird. (Not that this is a bad thing.)
Also, funny that you mention lose one's rag: the Finnish translation currently listed is "repiä pelihousunsa". Reading the entries for the words, I understood that the meaning of the whole expression here is non-literal. For me that makes it an idiom, which is not marked as such. Precisely what I am complaining about. Keφr 12:07, 22 September 2013 (UTC)
A further complication is that the Finnish editors are a very scarce resource. Sometimes I have the feeling that there's me, and then there's no one, and somewhere far behind comes an occasional editor. The formerly active users Jyril, Jaaari and a few others seem to have retired. I agree that repiä pelihousunsa is idiomatic. But I also think that there are lots of more entry-worthy items that need to be worked on. While I'm working on them, it's better to have "lose one's rag" blue-linked to pelihousut than to have it red-linked to repiä pelihousunsa. --Hekaheka (talk) 19:11, 22 September 2013 (UTC)
BTW, it may be a quick judgement to say that I would be "most of the time translating glosses". There are some 80,000 Finnish entries in the English Wiktionary, of which I have created more than 10,000 over the six years that I have been an active Wiktionarian, many of them phrasal verbs and idiomatic expressions. I have not been able to locate statistics on the number of English-to-Finnish translations, but there must be more than 100,000 of them overall. Probably a few hundred of them are such that they are questionable when evaluated with your idiomaticity indicator, but I would say most of them are so for an acceptable reason. Some of those which you have tagged have actually been in want of redefining and I appreciate that you have pointed them out. Keep up the good work, but don't get frustrated if everything does not go your way. --Hekaheka (talk) 23:32, 22 September 2013 (UTC) --PS. Check updated entry for hellittää. I still don't think it could be translated as "let go" without some sort of qualifier.

Merging Mari and Buryat varietiesEdit

Can we or should we merge some varieties of Mari and Buryat at Wiktionary?

  • Mari (chm, mhr, mrj):
Hill/Western Mari (mrj) can probably stay separate, it has a few more letters than the standard or Eastern (Meadow) Mari. Extra Cyrillic letters in Western Mari: Ӓ, ӓ and Ӹ, ӹ and they don't use standard Mari letter Ҥ, ҥ. This variety has about 30 thousand speakers. It's still possible to merge if Western Mari has context labels and the additional letters are handled. Anyway, chm and mhr can be merged safely.
Language codes with names:
  • chm - "Mari", "Standard Mari"
  • mhr - "Eastern Mari", "Meadow Mari"
  • mrj - "Western Mari", "Hill Mari"

  • Buryat (bxr, bxu, bxm, bua):
Russian and Mongolian Buryat use the same alphabet. Mongolian and Vagindra are hardly used. The overwhelming majority of Buryats live in Buryatia, some in Mongolia, even less in China.
Language codes with names:
  • bxr - "Russia Buryat"
  • bxm - "Mongolia Buryat"
  • bxu - "China Buryat"
  • bua - "Buryat", "Buriat"
(it's obvious that at least one is redundant)

--Anatoli (обсудить/вклад) 04:53, 10 September 2013 (UTC)

Meadow and Hill Mari have separate written standards so they should be kept separate (and the chm code deleted). Buryat, easy thing, merge them. There is no difference between these 'lects, except perhaps for some loanwords. -- Liliana 07:39, 10 September 2013 (UTC)
Seems like we have an agreement on Buryat. So we can delete bxr, bxm and bxu and make bua the only code for Buryat.
With Mari, I would rather delete mhr and leave the name "Mari". Standard Mari is "Eastern Mari" or "Meadow Mari" and chm is more common. OK, let's leave mrj but I'll make a transliteration page and a module, which works for both alphabets. --Anatoli (обсудить/вклад) 22:42, 10 September 2013 (UTC)

Overriding manual transliterationEdit

This has been discussed in other pages, but no consensus was reached.

Automated transliteration works perfectly for several languages, such as Armenian. Some suggest to always override manual transliteration for these languages, because many of them are incorrect due to human errors and inconsistent (due to changes to transliteration system, etc.) Some others say we should always let the editors use |tr=.

Another solution is removing the old manual transliterations for the terms of these languages, and don't override manual transliterations after that. (we can put the pages with |tr= for terms of these languages in a category to keep track of them) --Z 13:29, 10 September 2013 (UTC)

Wouldn't we want to be able to let that be decided on a language-by-language basis? What about also allowing the overriding of everything with tr=, but allowing tr0= to override bad automatic transliterations, also on a per-language basis? DCDuring TALK 14:47, 10 September 2013 (UTC)
It is being decided language-by-language. See the override_translit section of Module:links. --Vahag (talk) 15:15, 10 September 2013 (UTC)
I support overriding manual transliteration for languages whose automatic transliteration works perfectly, e.g. Armenian, Georgian. For such languages manual transliteration will be redundant in the best case and wrong in the worst case. --Vahag (talk) 15:15, 10 September 2013 (UTC)

Great language gameEdit

I know that this isn't a forum, but there isn't really anywhere else to put it. And it would be a shame not to share it because I think many people on Wiktionary will like it. There's a new website called the Great Language Game where you can see how well you can tell different languages apart by ear. I seem to do pretty well with it, I hope it's fun to others as well. —CodeCat 12:32, 12 September 2013 (UTC)

Love it, thanks. Only 800 for me... And I got lucky, I kept ending up with Slavic languages. --Fsojic (talk) 20:16, 12 September 2013 (UTC)

What can be done to improve quality?Edit

The more my wanderings take me to visit a wide range of non-English entries, the more I think that the English-language entry quality problem is not our only quality problem.

For non-English entries the problems range from the near-incoherent terseness of our copyings of a 110-year old Sanskrit dictionary to the frequent presentation of non-idiomatic calques as glosses and the use of terms that simply don't belong in a definiens of a contemporary dictionary due to the age, rareness, or unglossed polysemy of the term or terms used.

For English entries the quality problem includes the obsolete language of definiens and the poor coverage of polysemic terms, especially uses that developed in the 20th century and remain common today. The entries for polysemic terms contain many important definitions that are buried and lost in visual clutter. The definiens of many terms includes words that are rare and/or technical when neither characteristic is necessary.

Are there technical means that could help? An example might be processing the dumps to identify uses of terms labeled rare, obsolete, archaic in definiens. Or words used only once in any definiens.

What can we do to get more effort by existing and past editors devoted to entry improvement?

Are there helpful ways to more actively recruit or develop contributors? DCDuring TALK 12:56, 12 September 2013 (UTC)

I do think that we should avoid using obscure terms in definitions, but sometimes there just happens to be that one word that describes it so much better than anything else. In such cases I usually prefer to show both. I often include multiple glosses if it helps to narrow the meaning down more.
I'm not sure if there is much we can do to increase the effort. People will work on what they feel like working on. We can raise awareness, but that's about all we can do. Wiktionary is pretty decentralised and we have no central announcement system that everyone is guaranteed to see, except for WT:NFE which a lot of people ignore regardless. So if we want to raise awareness of issues we first need some kind of global platform to raise them on to begin with. Beer Parlour isn't really enough.
As for visual clutter, I think this is a real problem and I think it could be improved substantially by adopting a visual style similar or identical to the French Wiktionary. Their use of colours, borders and icons is far easier on the eye and does a lot to direct the user's attention to certain parts of the page. It makes things stand out more and gives visual structure to the page which is pretty much essential. —CodeCat 13:19, 12 September 2013 (UTC)
For foreign language entries, heavy use of glosses or listing several possible translations is a must. For example, in the Serbo-Croatian entry vez, a definition given is “binding”. The reader is left to guess which sense of binding it refers to. When I add a Portuguese entry, I always try to add enough information via glosses and possible translations so the user won’t need to follow any link nor rely on guesswork to understand precisely what the term means.
Shortcut glosses like “(all senses)” should be avoided as well, IMO, as they can lead to error. — Ungoliant (Falai) 13:28, 12 September 2013 (UTC)
"For foreign language entries, heavy use of glosses or listing several possible translations is a must." Very much this. I already do this when adding entries in Polish, and I put a similar recommendation at WT:APL#Definitions. This should be a project-wide policy, because the reasons I gave there are not exclusive to Polish at all. This is a no-brainer for anyone who deals with translations, really.
As for visual side, I will disagree. I actually like our, shall I say, ascetically colourless style. I think we the best solution would be to convert pages to some kind of semantic markup so that we do not have to enforce any particular style at all. Dislike a style? Switch your skin.
Regarding the lack of a central propaganda tube, I would add a "add N4E to watchlist" link to the welcome template, and maybe streamline the template to the most essential bits. Could help. And the Beer parlour does not cut it, I presume, partly because the main BP page is just too damn big, loads slowly, and you have to keep adding and removing per-month pages to your watchlist to keep being updated, which is tedious. Wikipedia's archive pages system is better in this regard, although it has its own flaws. I cannot wait for mw:Flow to solve all our wiki discussion problems. In the meantime, why not convert the central discussion pages to LiquidThreads? Keφr 15:29, 12 September 2013 (UTC)
By comparing other meanings it is is obvious that the binding sense of the Serbo-Croatian noun vȇz refers to "A finishing on a seam or hem of a garment". Sometimes using a dictionary requires a minimum amount of intelligence on reader's part. Ditto for what DCDuring calls "near-incoherent terseness" of Monier-Williams Sanskrit dictionary, the most comprehensive Sanskrit dictionary compiled by the most authorative Sanskrit lexicographer in the West. --Ivan Štambuk (talk) 17:04, 12 September 2013 (UTC)
Easy to say that when you’re a native speaker and already know the word. And the best one can do is figuring out that “A finishing on a seam [] ” is the most likely meaning, but without a gloss there is no certainty. Even then, people expect a dictionary, not a test on their figuring-out-the-most-likely-meaning-of-words skills. — Ungoliant (Falai) 17:14, 12 September 2013 (UTC)
@Ivan: I don't think that those providing support for this project intend that it be usable only by an intellectual elite. The intellectual elite that uses and contributes to this wiki needs to also serve the general population of those who need dictionaries.
I don't doubt the underlying quality of the Sanskrit dictionary in term of its coverage of Sanskrit. It seems like an outstanding basis for good Sanskrit Wiktionary entries. I just don't think that it is very usable for a non-specialist, partially because the style and wording of the Wiktionary entries resulting from the copying is not similar to that of other Wiktionary entries. The problem is not unlike the problem of copying Webster 1913 definitions, except the stylistic difference is even more dramatic. As it stands our Wiktionary Sanskrit entries are, in many ways, worse than the underlying dictionary because of omissions. DCDuring TALK 17:37, 12 September 2013 (UTC)
But Sanskrit and other extinct and classical languages are only used by intellectual elite. The only way a common person is going to come across a Sanskrit entry is through some etymology.
I don't recall having seen Sanskrit entries that are formatted radically different than entries in other languages. The only problem is the abundance of meanings that words have, and which sometimes get grouped by eras or sources, and not by semantic closeness as they are normally - but that's a particular issue of classical languages that have been used over a long period of time, which other "normal" languages don't have. But users looking up Sanskrit words expect such layout which enables them to quickly isolate set of meanings appearing in a particular work that they are reading.
I don't understand what omissions you are referring to. If you have some constructive proposal of how to change some user-unfriendly entry of your choice I'd be happy to hear it. --Ivan Štambuk (talk) 22:02, 12 September 2013 (UTC)
With other meanings being embroidery and needlework, I can't imagine anyone construing the binding sense in any definition of [[binding]] other than the one that has sewing context label attached. I don't think that the average reader is that stupid. --Ivan Štambuk (talk) 22:02, 12 September 2013 (UTC)
That's not fair. That same "stupid" reader would be right in many other cases; for example, if we defined Hebrew לִפְנֵי as "In front of" and "Before", you would consider a reader "stupid" for correctly guessing that we mean the temporal sense of "before" rather than the spatial sense that you would have assumed. Words in unrelated languages often develop along parallel lines, and it's not reasonable to expect readers to have a Ph.D. in semantics to be able to decide whether a guess is plausible. We shouldn't be asking readers to guess to begin with. —RuakhTALK 06:05, 15 October 2013 (UTC)
Yes, but prepositions in all languages have special meanings and can't directly translate to other languages, and need many examples, context labels, usage notes and so on. Words are not looked up outside of context - I really doubt you'd encounter the word vez "binding" in ambiguous usage, where for an English-language speaker it becomes unclear which meaning of binding is referred to. But I admit that adding glosses it's necessary for other purposes. --Ivan Štambuk (talk) 04:01, 16 October 2013 (UTC)
This is more of an observation than a suggestion, but when I was experimenting with using links to senses with sense IDs using {{senseid}} when writing definitions for foreign-language terms, I found that the sense that I wanted to link to was missing--about half the time, actually--and that there were many senses of the English term that I hadn't thought of, and which forced more disambiguation on the foreign language end than I had realized. The process of matching senses exposed gaps on both ends, so if that could be integrated in the editing process, it would be a big aid to editors. That would really tap into the collaborative power of the project. This is not to push or oppose sense IDs, just my experience with them. --Haplology (talk) 16:05, 12 September 2013 (UTC)
I would expect contemporary non-English terms to often need well-worded, contemporary senses of common English terms that the English entries lack. That is one of the biggest problems for English entries. {{rfdef}} helps us identify the need, but a note in the template to explain the need would be helpful in prioritizing work on the English entry. Do we have a tag to mark the FL definition as waiting for a suitable English definition to be provided? Would a use of {{sense-id}} in the English and FL entries help by providing a way to find the original FL entry problem (missing gloss)? DCDuring TALK 16:36, 12 September 2013 (UTC)
We also have {{gloss-stub}}. I add it to entries whenever I find that the definition doesn't identify the meaning specific enough. —CodeCat 16:46, 12 September 2013 (UTC)
That presumably goes in the FL entry and {{rfdef}} goes in a new line at the English L3/L4 section. How could those be linked more or less automagically by the use of {{senseid}} in each? DCDuring TALK 17:25, 12 September 2013 (UTC)
Install Wikidata. DTLHS (talk) 20:30, 12 September 2013 (UTC)
And I realize that saying "install wikidata" isn't helpful- just pent up frustration about trying to implement features of a database in something that very much isn't. DTLHS (talk) 20:49, 12 September 2013 (UTC)
Speaking of Wikidata — I often see statements in the Wikipedia metaspace that there are plans to deploy Wikidata on Wiktionary in some form. At the same time I am yet to see anybody from Wikidata approaching the community here about this. Trying to explain how it would work, how to handle existing dictionary content, and such. I smell a disaster. Keφr 20:55, 12 September 2013 (UTC)
d:Wikidata:Wiktionary. --Yair rand (talk) 22:06, 12 September 2013 (UTC)
Huh, I just realized I'm the only Wiktionary admin who's also a Wikidata admin. We're probably going to need some more Wiktionarians paying attention to Wikidata's progress if WD use is going to turn out well here. ... --Yair rand (talk) 22:20, 12 September 2013 (UTC)
@Ivan. I think we would want to be more than a wikisource for a 110-year old dictionary, no matter how good that dictionary may be, especially as there already it already is available: eg, [1]
We split proper nouns senses from common noun senses, but many, many Sanskrit sections do not. Not all Sanskrit sections include the link to the underlying dictionary, which is itself an omission, and not every bit of explanatory note in the original dictionary seems to have survived. A glossary for the abbreviations used does not seem to be included. The language used is not contemporary English and the definitions lack glosses. DCDuring TALK 23:23, 12 September 2013 (UTC)
For an example of a problem see WT:RFC#सह and join the discussion there. DCDuring TALK 23:25, 12 September 2013 (UTC)
MW dictionary is perfectly valid even today (because we're dealing with an extinct language, doh), some of the entries were created before the online version of MW dictionary was available, Sanskrit grammar tradition doesn't make the distinction between proper and common nouns, 98% of the words in its definitions are perfectly valid contemporary English as far as I recall, and anyone studying Sanskrit doesn't need a meaning gloss. In other words, there are no problems with Sanskrit entries. --Ivan Štambuk (talk) 16:02, 13 September 2013 (UTC)
The problem with the dictionary is not the definienda, it's the definiens. We need to convert the definitions to a more contemporary English, at least removing the archaicisms and obsolete terms. Formatting the entries to Wiktionary standards, eg, Proper noun sections. Including references to the underlying dictionary to aid the work wouldn't hurt. Having excellent coverage of Sanskrit is certainly an important goal for Wiktionary, which perhaps subsequent contributors will achieve. DCDuring TALK 16:42, 13 September 2013 (UTC)
I haven't read this thread, but pretty much every entry I read has problems, from the very minor to the very major. English entries are usually the worst; incomprehensible, overly formal or using difficult language, using ambiguous or overly informal language, just plain wrong. It would be quicker to list the problems I've have't encountered, rather than the ones I have. We could try and be more organized if we had enough people to do it; go through the 10,000 most common English words, assign them to different people and reassign them as people quit and new people join. That's pretty much all I have to say. Mglovesfun (talk) 15:32, 16 October 2013 (UTC)
Re: complexity and formality. The words that are hardest to define are those that are essentially grammatical/functional. Many basic words that are not function words are also hard to define accurately using words that are as simple as they are or simpler. It is a bit easier to define complex terms in simpler ones, but sometimes the complex terms represent more complex concepts rather than being mere long-winded synonyms for basic words. A monolingual dictionary will inevitably have to use circumlocutions for common words and be able to rely on simple one-word definitions only if the definienda have been virtually replaced in normal contemporary discourse by that simple word. DCDuring TALK 18:22, 16 October 2013 (UTC)

Pages with protolanguage information?Edit

CodeCat and I have discussed a couple of times the question of reconstructed forms without references in Etymology sections (the most recent discussion is here). One conclusion now seems to be that it would be a good idea to have pages (perhaps in the Appendix) with more detailed historical information, including perhaps original research by Wiktionarians, on specific topics, which could then be linked to from individual words. Case in point: Proto-Baltic vs. Proto-Balto-Slavic. The current tendency goes in the direction of Proto-Balto-Slavic, but there are not many published reconstructions of words out there, whereas Proto-Baltic has clearer sources. Now, if Wiktionarians want to add Proto-Balto-Slavic etymologies, or simply replace the Proto-Baltic label ({{etyl|bat-pro|LANG}}) with the Proto-Balto-Slavic one ({{etyl|ine-bsl-pro|LANG}}) on the assumption that most PB reconstructions will be acceptable PBS reconstructions as well, wouldn't it be nice to have a page (called, say, "Appendix:Proto-Baltic and Proto-Balto-Slavic") that discusses this in detail, with correspondences, derivations, and clear statements of what things in PB we think will remain the same in PBS, and why? In this way, any changes of PB to PBS can be referred to this page: it will be the basic source for the reconstruction, and the interested reader can read it to see on what grounds we have an (as yet unpublished) PBS etymology rather than the (already published) PB one. Also, Appendix pages with reconstructed PBS words could be linked to it. One objection is that this page would contain "encyclopedic" information. Yet I feel that this kind of information is quite vital for someone who is navigating the thorny area of Indo-European etymology and wants to feel sure the etymological information given at Wiktionary is correct and accurate -- as vital as having, say, a page on IPA and its symbols, or a page with the definitions of all grammatical terms used to tag words. What do you guys think? --Pereru (talk) 20:25, 12 September 2013 (UTC)

I think you missed a few important parts of the original discussion. To me, the point of having a special page for this is to act as a repository of sourced knowledge related to the reconstruction of a given language, but it's also the place that we as Wiktionarians would use to collect our own conclusions about certain minor issues surrounding them. Specifically I called it a way to allow original research, while keeping it both contained and publically accessible as a reference for etymologies and reconstructed entries on Wiktionary. Basically, to enable peer review of Wiktionary's reconstructions. I also think that the last two posts in our conversation are important:
Me: My main objection with really big stuff is that it is the kind of area where even professional linguists get things very wrong, so that makes it even more likely for amateurs to miss things. I don't have any professional schooling in linguistics, just a lot of curiosity that made me want to look for things and learn more. So I know a bit I think but what I know is not at a professional level and I don't think it is for anyone else here either. The limitations are mainly there to protect ourselves, Wiktionary and its users from our own incompetence. :P
Pereru: I am a professional linguist, though not an Indo-Europeanist (I work on South American indigenous languages). But one of the things I've learned is to stick to logics and good arguments, because (a) big stars with famous diplomas often think their fame is all their need to justify something, and (b) non-big-stars, without any diplomas, surprisingly often contribute really intelligent, insightful ideas that deserve recognition.
CodeCat 21:03, 12 September 2013 (UTC)
That is an interesting thing, and I certainly support it. But I do see the main point of having such pages in a 'dictionary (as opposed to a research journal) in being able to add references to specific reconstructions -- be they in ===Etymology=== sections, be they independent pages on PBS reconstructed forms. --Pereru (talk) 06:48, 13 September 2013 (UTC)
No we can not add original research by Wiktionarians in etymologies. Neither as reconstructions nor as speculations on word origins. Etymologies are like small encyclopedic articles and all of the Wikipedia policies on no OR and maintaining NPOV apply to them as well. If we allowed original research Wiktionary would become worthless as an etymological dictionary because there would be no way to differentiate among credible sources. We might as well restore H&M's Chinese phonosemantic interpretations. If you want to make up theories on word origins go write a blog or paper. It is not up to us to deem sources "right" or "wrong", but simply to collect all of the competing theories from established authorities and present them to the reader in the most appropriate fashion, taking into account issues such as neutrality, acceptance, and newness.
Proto-Baltic is an obsolete theory and it's quite irritating to see you intentionally replacing Proto-Balto-Slavic reconstructions that can be cited with the ones based on the 1980s scholarship. I don't think that there are linguists today (apart from some Russophobic Baltic nationalists) that dispute PBSl. There so no "tendency", it's a settled matter. There are are many details that need to be settled, but the grouping itself is not a point of contention. --Ivan Štambuk (talk) 21:26, 12 September 2013 (UTC)
But where is it actually cited as policy that we don't allow original research? We constantly do original researching when we document definitions, why is this different? If Wiktionary editors can be lexicographers, why not also etymologists? Pereru explicitly encourages the matter and he is a professional linguist himself, so he understands what is involved. I understand that you want to differentiate reliable theories from bogus ones and that is exactly what this proposal is supposed to prevent, as the idea is just that: to collect all of the competing theories and build up a body of peer reviewed research that can be used to support reconstructions in Wiktionary articles. I wonder if you even understand what has been suggested? —CodeCat 22:10, 12 September 2013 (UTC)
Writing definitions on the basis of attestations is "original research" in much the same way that writing Wikipedia articles based on cited sources is. We do not invent new meanings, but rather collect the ones attested in usage on the basis of our CFI (which are really "criteria for attestation"). The original part there is to word the definition in a manner that doesn't coincide with any of the existing dictionaries (unless they are out of copyright). No original research is one of the pillars of Wikipedia that protects users from obscure theories, and the project itself from being a propaganda machine for every fringe group that thinks that the lack of editorial or peer-review process as an opportunity to present its fringe view.
Pereru is just someone nicknamed "Pereru" (what is a "professional linguist" BTW? Somebody paid by taxpayers to produce work hidden from general populace behind paywalls and costly volumes?). I don't care if he is de Saussure reincarnated.
You seem to be conflating to separate points: 1) etymologist as somebody writing a paper on a word origin, postulating reconstructions and speculating on word origins 2) etymologist as somebody writing an etymological dictionary, which is usually done by every single headword having references to various scholarly opinions, with etymologist then choosing what he thinks is the "best" explanation. We can only do OR in the second sense, by being a synthetic work of the most recent scholarship. Not invent reconstructions and deep theories of word origins based on our own opinions of how languages evolved. Which is what you have been doing and seem to be keen on getting a community approval. --Ivan Štambuk (talk) 08:26, 13 September 2013 (UTC)
When we apply existing principles known to linguistics to come to a reconstruction that nobody has published before, is that not just applying the same science that linguists do? My intention was specifically to allow our own reconstructions while at the same time have every detail of that reconstruction accounted for by sources. This is currently an area that is lacking like Pereru points out below; we either have references to the whole reconstruction verbatim, or none at all. For example take *dūmas. I don't need a source to tell me that it's a sound reconstruction, because I can see that it fits perfectly with all the relevant sound laws in Balto-Slavic and its descendants. Yet it has no source because no source happens to attest this word in Balto-Slavic, even though every single phoneme of the reconstruction can be accounted for by established and sourced sound laws. Also I'm not sure why you think there would be a lack of peer review. I specifically noted that the whole point of this is peer review, and wikis as a whole are founded on the principle of peer review. So fringe theories would be rejected because there is no consensus for them on Wiktionary. As long as we assume that Wiktionary editors are knowledgeable about the area, there would be peer review of new reconstructions to ensure that the science has been applied correctly according to the most mainstream theories. I have done this with many Germanic reconstructions in the past, and it has worked well. —CodeCat 12:09, 13 September 2013 (UTC)
By creating a reconstruction you are ipso facto making statements "these words are inherited" and "this is the proto/ancestral form" and "these are the sound laws that have occurred". These statements constitute true original research. Specifically, Proto-Balto-Slavic *dūmas "smoke" is by Kortlandt, Derksen and others from Leiden reconstructed as *dúʔmos, with segmental laryngeal merger as glottal stop, and without the change PIE *o > PBSl *a. *dūmas is far from being a sound reconstructions if radically different alternatives are given by reputable authorities in the field. And I think that I can saw forms *duHmos or *duHmas as well in the literature.
Wikis are based on the peer review of content that is itself based on solid evidence. There is no peer review of original research. What is susceptible to discussion are issues such as "is this wording neutral" or "is that prominent opinion or theory sufficiently represented". Not completely new and original interpretations of ex-wiki facts that are repeatedly revised by wiki editors.
That you have done such original research with Proto-Germanic - i.e. postulating reconstructions not found anywhere - only demonstrates that urgent action is needed to stop you from turning this project even more into your personal playground. I don't care about Germanic languages much, but some Balto-Slavic reconstructions and paradigms that you've been making are nothing but original research.
I do support however going beyond traditional etymological dictionaries which are constrained by space, by making extended etymologies describing every sound change that has occurred, and have even proposed how these should be formatted the last time PBSl. was discussed in the BP. But not creating our own reconstructions that cannot be found anywhere. There are thousands of of published works that deal with proto-forms, and if no reference can be found for a particular reconstruction that doesn't necessarily mean "this reconstruction is unreferencable not because it's implausible, but because no linguist has yet studied it" but rather "this reconstruction is unreferenced because it is implausible, and nobody authoritative has wasted time with it". Formally there is no way to distinguish the two cases, absence of evidence and of counter-evidence. You can combine countless theories on the development of particular properties in proto-languages, yielding dozens of equally "valid" reconstructions that individually cannot be attested, but with each sound change within being attested. --Ivan Štambuk (talk) 16:29, 13 September 2013 (UTC)
A "professional linguist" is someone who writes the sources you cite here -- you should have more respect for them, young man, since all you do here is copy and paste their work (plus a few comments). On "original research": I have absolutely nothing against Wiktionary being a place to introduce new research -- as long as it is clearly labeled as such. Etymological dictionaries do that (clearly marking their new suggestions and theories as the author's opinion) and that is OK. All your surprisingly uninsightful remarks above don't seem to even come close to address this simple fact. (Now, I am certainly against reconstructed protoforms without references; it seems you are, too. So here's the big question: why aren't you deleting them? Or at least proposing to do so, here or in some other discussion page?) --Pereru (talk) 08:42, 20 September 2013 (UTC)
I don't have much respect for people whose work is paid by other people's taxes, and they re-sell it at high price, but I digress... Yes all that we are suppose to do is to synthesize other people's research in wiki format ("copy/paste" as you disdainfully call it, but it more often then not requires lots of research on editor's part to decide what proto-form of framework is the most generally accepted). No Wikimedia project does original research. It is a fundamental policy. We're just collecting facts of the external world and presenting them with respect to several fundamental policies such as NPOV, NOR and V. For meanings attestations are equivalent of encyclopedic references, restricting the possible semantic value of a word or phrase in question. We do not invent new words or meanings. The only place we're I've actually seen OR going on is that huge list of protologisms in the appendix namespace, and which should be deleted IMHO.
Yes etymological dictionaries make their own suggestions - but in that case that is the original research by the author of that dictionary. On the other hand, there are many etymological dictionaries that do not make any kind of such suggestions with respect to proto-forms - they are simply a carefully chosen compilation of other people's opinions, according to the dictionary author's view what is the most representative and up-to-date scholarship. We can only be this other type of etymological dictionary.
I don't have anything particular against proto-forms without references, but unfortunately lax policy on them has caused lot of pretty bad, and sometimes irreparable appendices to be created. There are dozens of Proto-Indo-European appendices created on the basis of obsolete reconstructions in the 1960s (Pokorny). Fixing them is time-consuming, and nobody is doing it because it's boring and everyone wants to deal with words they are interested with and not cleaning up other people's junk. The other problem is CodeCat's (and perhaps others'?) proto-forms which according to themare in many cases original research, and cannot be cited. That is very problematic, because postulating a reconstruction is equivalent to making a very strong statement: that certain set of words are inherited (as opposed to be independently derived through normal derivational morphology, borrowed, of having some other origin), that certain sound changes have occurred, and that certain proto-form existed in a particular shape.
Why I am not deleting proto-forms without references? Well, why should I? First of all, we don't have a policy page that requires that. Second, I usually list of all the references in one place, e.g. appendix on PIE or PSl. proto-form, and copy/pasting them to individual entries seems unnecessary and wasteful of space. I've seen you deleting Proto-Balto-Slavic reconstructions on Latvian entries simply because they lack references, and replacing them with obsolete Proto-Baltic reconstructions. Proto-Baltic reconstructions are based on an etymological dictionary representing the 1980s scholarship, wheras the Proto-Balto-Slavic reconstructions are based on the scholarship of 2000s and 2010s. Now, I understand your concern, that PBSl. is still a matter of extensive research, and that there is still no widespread consensus on how the proto-language is suppose to look like, but it also doesn't appear to me that you're assuming good faith in such replacements (your assumption is that those PBSl. reconstructions are being made up, as opposed to be valid but lacking references), and furthermore, that you're subtly pushing an agenda (of Balto-Slavic stage not existing at all).
Also, you're simultaneously in favor of OR and in favor of all proto-forms being referenced. You don't feel any kind of cognitive dissonance with that? It makes no sense to me.
To sum this up: if you want original research be done with proto-forms, and have a proposal how they should be marked, and on what kind of arbitrary criteria should they be reconstructed, write a proposal and a vote page. We can continue there, and I'd like to also involve some other editors that are not seeing this discussion to further discuss on why this should not be done and what are the stakes. --Ivan Štambuk (talk) 09:37, 20 September 2013 (UTC)
I'll have to agree with CodeCat here: where is it said that there can be no original work? To me, it seems every time you add new definitions to words -- definitions not previously published in other dictionaries --, you are doing original work. Where is it said that original work is not OK on Wiktionary, and why? (All I've seen is references to "Wiktionary is not Wikipedia".)
On your objections:
(a)If we allowed original research Wiktionary would become worthless as an etymological dictionary because there would be no way to differentiate among credible sources -- Why not? All you need to do is make accurate references. If you're taking something from a published source, by all means refer to it! (Shall we make it official policy that reconstructed protoforms are only allowed here with references?) If you're proposing one, write a page here with the details not still found in published sources and refer to it! In what way is this confusing, and how would this make it impossible to differentiate among credible sources? If at all, references would make it easier to differentiate among these sources... (On the subject of original research, I refer to published etymological dictionaries, in which the authors often advance original contributions and ideas for specific words, always carefully labeling them -- in the LEV, with a letter "K" at the end -- as the author's own work).
(b) It is not up to us to deem sources "right" or "wrong", but simply to collect all of the competing theories from established authorities and present them to the reader in the most appropriate fashion, taking into account issues such as neutrality, acceptance, and newness -- I agree fully. But note that most etymologies thus far presented here at Wiktionary are not like that: they are given without a source, and the casual reader has no way of judging whether they were presented "appropriately", with attention to "neutrality, acceptance, and care". It seems to me that adding a page in which things like PBS vs. PS etymologies could be explicitly discussed would be a great step forward in the direction of achieving precisely the goal you state. (In fact, here is another suggestion: how about a page, maybe in the Appendix, discussing precisely the good and bad points of all published sources for PIE etymologies that are used at Wiktionary, and why we trust some of them more than others? In the interest of full transparency and disclosure, wouldn't this increase the level of precision, as well as trustworthiness, of Wiktionary etymologies as a whole?)
(c) Proto-Baltic is an obsolete theory and it's quite irritating to see you intentionally replacing Proto-Balto-Slavic reconstructions that can be cited with the ones based on the 1980s scholarship. -- If they can be cited, why is (almost) nobody doing that? I've seen a couple of good citations of PBS forms (usually by you, actually), but most PBS forms proposed here have no support in published sources and, as per your own policy (in (b)) above, should not be here at all. So why are they, and why is wrong to remove them and replace them with sourced ones?
I don't care how "well established" you think PBS is (and a couple of Leiden specialists I've talked to -- both Dutch, not "Russophobic Baltic nationalists", whatever that is -- would beg to differ from you): the issue here is "what published source does a given reconstructed form come from"? Currently, almost nobody is adding sources to reconstructions here. If you have a good, published source for PBS etymologies, by all means refer to it! Heed your own advice! But when I see PBS forms being added without supporting evidence, and that in a world, no matter how well established you think PBS to be as a hypothesis, in which published PBS reconstructions are still few and far between, I think that the best policy is -- as you yourself propose! -- to trust the published sources, in which PB is still much more frequent. And, to follow this policy -- which, again, you yourself explicitly subscribe to! -- I delete, and will go on deleting, unsourced PBS etymologies and replacing them with sourced PB ones. After all, in a dictionary, sourced should always defeat unsourced. If a PBS etymology is sourced, it stays. If it isn't, it doesn't. I honestly don't see how you can subscribe to the "honesty and neutrality" policy you described above, and still disagree with that. Unless you simply want to push your personal vision of "what's right" in PBS reconstructions -- in which case, how is this NPOV?
Alternatively, you can do what CodeCat suggests: write a page in which YOU say why it is that PB reconstructions should be relabeled as PBS even in the absence of a published source that explicitly states that PBS = PS. You can sketch arguments, give examples, correspondences, etc... and then cite this page as your source.
How on earth would this be confusing, and how would this create trust problems for Wiktionary? Please riddle me that! If at all, what we're recommeding is that things be done more responsibly, and with more references. Don't you think that the current etymologies-without-references bonanza creates a much, much worse trust problem than any PBS-vs-PS page would?
I end up having to agree with CodeCat above: I think you didn't understand what it is you're disagreeing with. There is no contradiction between what is proposed here and any of the principles you espouse. Please read it again. --Pereru (talk) 06:48, 13 September 2013 (UTC)

Just out of curiosity, because I still don't understand what is at stake here: what's exactly the difference between Proto-Balto-Slavic and Proto-Baltic? Does Proto-Balto-Slavic theory say that there was simply no Proto-Baltic language, but that Latvian and Lithuanian evolved from Proto-Balto-Slavic exactly the same way that Proto-Slavic did? I've just drawn this (sorry for the probably simplistic view) so... which tree represents best the actual Proto-Balto-Slavic theory? The second or the third one? --Fsojic (talk) 13:18, 13 September 2013 (UTC)

What's Proto-Balto-Slavic.png
The first has been more or less discredited, although some still hang on to it, maybe for political reasons. The second is how linguists generally saw it in the past. Newer research suggests that there are really three branches of Balto-Slavic (not the same as your third image): East Baltic, West Baltic, and Slavic. Each of those, it is supposed, had its own proto-language, but the proto-language of East and West Baltic together (what is called "Proto-Baltic") is not demonstrably different from Proto-Balto-Slavic itself. That is, if you try to find out what the common ancestor of all Baltic languages was, then you end up with a language that Slavic can also descend from. —CodeCat 14:30, 13 September 2013 (UTC)
But West Baltic evidence is limited. If one reconstructs a word from Latvian and Lithuanian - or East Baltic in general - alone because there is no known corresponding word in Old Prussian - or West Baltic in general - and label it as Proto-Baltic rather than Proto-East-Baltic (and I suppose some do this; well, I don't know), can we be sure it's the root for Proto-Slavic as well? --Fsojic (talk) 15:17, 13 September 2013 (UTC)
It's a matter of applying knowledge of how each language evolved, and then making all the ends fit together. Linguists formulate the phonetic evolution of a language through a series of ordered rules called "sound laws", which each act to change the pronunciation of words in some specific way according to certain rules. The sound laws for the Balto-Slavic languages are all more or less known, with some difficulty in the details still, but the general picture is clear. This means that it's fairly easy to find out if a given form can be an ancestor for a given Slavic term. All you need to do is apply all the Balto-Slavic-to-Slavic sound laws and see if the result you get matches what is actually found in attested Slavic or in reconstructed Proto-Slavic. An example: you start with Proto-Balto-Slavic *dūmas. There are two sound laws that apply in this particular case. The first is Balto-Slavic *ū > Slavic *y, the second is masculine nominative singular Balto-Slavic *-as > Proto-Slavic *-ъ. Applying these two rules together gives *dūmas > *dymъ. And that is the form that is actually found in Slavic (see *dymъ). Thus, the reconstruction is correct for Slavic. The same can then be applied to all the other Balto-Slavic languages, and if it matches all of them, then you have successfully reconstructed a Proto-Balto-Slavic term. —CodeCat 15:27, 13 September 2013 (UTC)
Except that not everybody accepts PIE *o > Proto-Balto-Slavic *a. You can get both Baltic and Slavic forms independently from Post-PIE *d(ʰ)ūmos. What is important here is w:Hirt's law yielding Balto-Slavic acute accent with fixed (columnar) paradigm on the root, and which is an exclusive Baltic-Slavic isogloss not found in other branches. Superficially, Lithuanian dūmas is more similar to Sanskrit धूम (dhūmás), but "under the hood" it's really not. --Ivan Štambuk (talk) 16:41, 13 September 2013 (UTC)

Proposal: eliminate {{t-}} and {{}}; only link to FL-wikt entries that are known to exist.Edit

Vote: Wiktionary:Votes/2013-09/Translation-links to other Wiktionaries

I'm starting to think that maybe our Translations sections should only link to target-language-Wiktionary entries that are actually known to exist (just like how we only have interwiki-links to existent pages). Under such an approach:

  • {{t}} would behave like {{}} does now.
  • {{t-}} and {{}} would redirect to {{t}}, and presumably eventually be eliminated.
  • various tools and bots (Conrad's translation-editor, Kephir's {{t}}-ifier, Rukhabot, etc.) would only deal in {{t}} and {{t+}}.

If y'all are on board with this, I think we'd probably want some sort of vote — the current system, give or take, has been endorsed by votes — but I figured I would start a discussion first, to see (1) if y'all are on board, and (2) if y'all have any alternative/additional ideas.

So . . . any thoughts?

RuakhTALK 20:18, 13 September 2013 (UTC)

It will not simplify anything for the tools for the same reason the move to {{g}} will not simplify anything until it is completely done, which will not be very soon: in the meantime, we have to deal with both unconverted and converted pages. Complexity in fact at best stays at the same level. My tool always generates {{t}} anyway and will have to recognise existing uses of {{t-}} and {{}} (which it currently does not touch at all). For other tools, including bots, it should be similar.
I am mildly opposed, actually. Contributors from foreign Wiktionaries might be actually looking for redlinks into their native Wiktionaries simply to create the missing entries. With the current approach, it takes two middle-clicks, two keyboard shortcuts, some typing and tab switching to copy our entry into their native Wiktionary, or just two clicks to start the entry from scratch. Although now they would have a somewhat hard time actually finding these. Categorising usages of {{t-}} would be useful for this. Maybe not the best use case, but… I can see some value in this.
So why, really? I fail to see any advantage in the above-mentioned… characteristics of this approach, for lack of a better word. Keφr 20:57, 13 September 2013 (UTC)
I'll start with your third paragraph ("So why, really? [] "), since I think that's the crux of your comment. (I didn't actually give my reasons for thinking we we shouldn't link to nonexistent FL-wikt entries; I guess I should have.) The reason is, I think such links are useless clutter:
  • In the case of {{t-}}, they're bright red, like redlinks within en.wikt, but unlike redlinks within en.wikt, there's little chance that readers and editors here will be able to help with them, and they're likely to be not-very-useful for en.wikt readers even once they exist. Note that we don't add red interwiki-links, for example, because the goal is to indicate what FL-wikts information can be found in.
  • In the case of {{t}}, the links aren't bright red, but in a way, that's even worse: it's hard to tell at a glance that it's linking to a non-existent FL-wikt entry (because the external-link blue is so similar to the bluelink blue), so it's a link to trick readers into thinking they're going to get more information, when in fact they're not.
That out of the way . . .
Re: first paragraph ("It will not simplify anything [] "): I'm not sure I completely agree with your literal statement, but I think we can agree on a key point, say, "we shouldn't do this because it's a simplification": you because you don't think it is a simplification, me because I don't think a small technical simplification (even if real) can justify a much-larger functionality change.
Re: second paragraph: Thanks for weighing in. For the specific use-case you mention (contributors from an FL wikt looking for our redlinks to them), I'd be happy to generate language-specific lists, which I think would work better for that use-case than searching for entries with {{t-}}. (And of course, even that use-case doesn't recommend {{t}}'s current behavior.) But if you can think of any other relevant use-cases, I'd be interested to hear about them.
RuakhTALK 03:26, 14 September 2013 (UTC)
Okay, I am fine with that. You can go ahead as far as I am concerned. Keφr 08:37, 14 September 2013 (UTC)
I support this. —CodeCat 01:51, 14 September 2013 (UTC)
  • Support. As for "Contributors from foreign Wiktionaries might be actually looking for redlinks into their native Wiktionaries simply to create the missing entries": I don't think it en.wikt's job to act as a worklist for other Wiktionaries, presenting the editors of en.wikt with redlinks that they cannot turn blue by editing en.wikt. --Dan Polansky (talk) 08:28, 14 September 2013 (UTC)

X-system and H-system in EsperantoEdit

Discussion moved to Wiktionary talk:About Esperanto#X-system and H-system.

Block of User:MewBotEdit

Ruakh blocked MewBot for updating {{it-noun}} quite profoundly without any prior discussion. This seems to violate WT:BOT#Policy. CodeCat has been unblocking her own bot. Since both the blocking and the unblocking are unilateral, I thought I'd bring it here. I also support an indefinitely (but presumably not infinite) block both for this issue and the fact that CodeCat can't always act alone on updating things in her grand vision of things without discussing it first. Mglovesfun (talk) 21:10, 14 September 2013 (UTC)

What are you talking about? You even took part in the discussion, and it wasn't even the only one that took place, there's more on SemperBlotto's talk page. —CodeCat 21:13, 14 September 2013 (UTC)
I must admit that CodeCat can be very annoying, particularly when modifying heavily-used modules/templates without testing them. But in this case, the modifications were discussed with me (the major editor of Italian nouns) in advance, and they seem to work OK. SemperBlotto (talk) 21:21, 14 September 2013 (UTC)
Thanks for bringing this here. Personally, I actually don't support any long-term block: CodeCat obviously enjoys running a bot, and as long as she's using it to do things that the community has agreed should be done, I think that's great. My blocks were under the assumption that she would quickly fix the issue and then unblock it (I think my block-summary even said as much); I had no idea how much of a trial this would be. She's taken it personally, so has started making it personal herself, casting aspersions on my intentions, so now I'm annoyed enough that I'm half-tempted to support a long-term block, :-P   but my best current judgment is that I should trust my earlier, non-annoyed judgment. —RuakhTALK 02:43, 15 September 2013 (UTC)
  • I support temporarily blocking User:MewBot for bot actions made without first gaining consensus for them via appropriate channel such as Beer parlour. Whenever a dispute over there being a consensus for bot actions arises, CodeCat should provide links that show there is consensus for their actions. Only after the blocking admin is satisfied that the actions are supported by consensus can the User:MewBot be unblocked, on a case-to-case basis. --Dan Polansky (talk) 09:48, 15 September 2013 (UTC)
    • And what is consensus? Yes, this is a redlink. As far as I know, we never practised "consensus" here. Keφr 10:32, 15 September 2013 (UTC)
    • Dan, I did in fact show that there was a consensus, but Ruakh wasn't satisfied. Not much I can do then. —CodeCat 12:01, 15 September 2013 (UTC)
      • Ruakh said: "I want the changes to stop until they are discussed on a page such as Wiktionary:Beer parlour, Template talk:it-noun, or Wiktionary talk:About Italian." I support his request. --Dan Polansky (talk) 14:28, 15 September 2013 (UTC)
        • And you think anyone else would read it and care to respond? This wiki has like, twenty regular contributors, most of them admins (which I think is telling something), and the areas of their interests hardly ever overlap. We cannot afford being bureaucratic here. Keφr 14:42, 15 September 2013 (UTC)
          • Re: "And you think anyone else would read it and care to respond?": Then there wouldn't be a problem. You just post, with a question like "Does anyone object?" or "Any objections?", and if no one cares to respond, you just go ahead after a day or two. CodeCat refuses to do even that much. —RuakhTALK 15:13, 15 September 2013 (UTC)
            • One question first. Do you want to object to these edits, given their content? Keφr 15:29, 15 September 2013 (UTC)
              • I don't think that's relevant. Suppose that someone ran a bot to delete hundreds of entries in a language that you don't know, but that several editors contribute in. Suppose that this bot-task wasn't discussed or mentioned anywhere that you can find; you only find about it because you happen to see one of the deletions. You wouldn't (couldn't) object to the deletions themselves, because you don't know the language — for all you know, the deletions are perfectly correct — but then again, for all you know, the deletions might be enforcing one editor's idiosyncratic or prescriptivist views. Wouldn't you want to make sure that the other editors in the language were aware of what was going on? Wouldn't you be annoyed that someone took it upon themselves to do this without any discussion? (That may sound like a reductio ad absurdum, but given CodeCat's other recent mass actions, such as deleting all Slovene translations whose gender was given as "masculine", I expect to see something like this any day now. Maybe she'll change her mind about the script Gothic should be in, and no one else will realize it until it's a fait accompli. (N.B.: In all fairness regarding the Slovene thing, I should mention that she did intend the deletions to be somewhat temporary: she hoped to restore the translations herself using a bot. Dunno how well that would have worked; if the change hadn't been reverted, it's almost certain that at least a few translations would still be gone today, but it's hard to say how many. Actually it's quite possible that a few still are gone, and there's no way to tell.)) —RuakhTALK 06:46, 16 September 2013 (UTC)
        • It also concerns Italian templates. How many people beside Semper do you think would care for that? My resistance to all of this is from Ruakh saying "I'm not convinced of your way of forming consensus, do it my way or I'll keep blocking your bot". —CodeCat 14:53, 15 September 2013 (UTC)
          • I certainly never made reference to "your way of forming consensus", because until now it never even occurred to me that you thought you were forming consensus! You've often taken infrastructural actions unilaterally, with no pretense of consensus-building — once, recently, when I called you out on making breaking changes to {{support}}, your entire reply was "So being bold is not ok anymore?" — and I assumed that this was more of the same. So, just to be absolutely clear about this: as far as your bot-edits regarding {{it-noun}} are concerned, "your way of forming consensus" was simply to ask SemperBlotto (talkcontribs) about it, and to leave it at that? —RuakhTALK 06:46, 16 September 2013 (UTC)
If we don't have a quorum of knowledgeable and/or interested people to decide on something of broad impact, ie, in a language, then we shouldn't be doing it. If it is so obvious that language-specific expertise is not required, then there should be no problem getting some kind of consent from parties with less expertise. If language expertise is needed, can't contributors from other Wiktionaries be solicited for advice or help?
In any event, there is plenty to do at the level of cleaning up messes and long-standing problems. One could even get involved in individual entry improvement, or with some kind of support for welcoming new contributors and making it easy for them to make contributions that we value. If this seems hard or vague, that might be an indication that it is the kind of task that is neglected and might have a substantial payoff. DCDuring TALK 16:16, 15 September 2013 (UTC)
Expertise isn't the issue, it's people affected by it. People who don't use Italian templates will generally just go "I don't care because I am not affected by it". So is it that surprising that I went directly to one of the few people who would be affected? —CodeCat 16:35, 15 September 2013 (UTC)
With that attitude, we are going to stagnate. And get feedback like this: Wiktionary:Feedback#Why Wiktionary sucks. And no one will be able to do anything. If there is only one person capable of identifying a problem, able and willing to fix it, they should solve it, quorum be damned. People will not join if we neglect to address issues because of insufficient number of, well, people; they will just go elsewhere. There will never be a quorum at all. I repeat, we cannot afford to be bureaucratic.
Where are these tasks? I want to see them. (Coincidentally, I have been thinking for a while about creating a global to-do list page, named like Wiktionary:Open tasks). Keφr 16:44, 15 September 2013 (UTC)
I like how Linus Torvalds put it: "[...] don't expect people to jump in and help you. That's not how these things work. You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project." Keφr 16:46, 15 September 2013 (UTC)
We won't stagnate because someone's favorite technical project can't proceed. And we can't rely on the would-be problem-solvers to find the right problems to solve. We may stagnate because we don't have the resources to maintain and improve what we have. I am not so sure that it is advisable that adds to the maintenance burden by attempting to maintain a font system that fights with our host software.
Does anyone have any ideas about how to make it fun and easy for new users to make useful contributions? DCDuring TALK 17:04, 15 September 2013 (UTC)
Ideas how to make it easy? Having ideas is simple, implementing them is harder. Make an editor which abstracts away the markup syntax, everywhere. Something to replace the NEC and WT:EDIT. It should also meaningfully support inflection tables and requests for etymology/pronunciation/verification/etc. But for that, we need to harmonise markup and template usage in some way — pages which render similarly may have wildly different markup, the semantics of which will not always be obvious, and therefore it will be hard to parse. Or even completely migrate Wiktionary to some kind of semantic database, because MediaWiki markup is quite lousy for our purposes. (Though I am a bit sceptical about Wikidata — I never saw a discussion between regulars here and the Wikidata people. There might be some friction.) The "fun" part may be harder to have ideas about. Perhaps we could ask the WMF to enable the WikiLove extension and the "thanks" feature (and create an opt-out or even opt-in for ULS and WebFonts by the way), and encourage people to use them. Although I am not particularly enthusiastic about these, because I do see barnstars degenerate into meaninglessness on Wikipedia, so… On the other hand, I do miss "thanks". I will probably use it somewhat often if it be here. Keφr 17:23, 15 September 2013 (UTC)
What wikis should be good for is capturing input from users. If we can bring format bit by bit to a certain level of consistency so data can be extracted from the dumps, I think we will have done our job. I think that means keeping at least one level simple; having bots run to identify non-conforming entries; having easy, form-based ways to add definitions, citations, usage examples, even in-line comments, which can be easily flagged by patrolers for further specific kinds of further review. I don't know if we have any statistics about how many definitions and usexes are added using the specific tools. They aren't available by default to unregistered users, which might be where they do the most good (and have the most risk of abuse). Having thumbs-up/down rating for individual definitions and sections might be nice and give us a way to capture some feedback.
"Fun" has a lot of degrees to it. "Satisfaction" is a level. And thank you for being you! ;-}} DCDuring TALK 19:48, 15 September 2013 (UTC)
…which is pretty similar to what I had in mind, yes. And I like the thumbs-up/down for definitions idea. In addition to creating some gratification to editors, it would be a somewhat useful feedback tool. (Better than our current one, anyway. Buried deep next to interwiki links. Ugh.) And compared to barnstars, it would better fit our until-now unwritten (to an extent) philosophy that this project is ostensibly a dictionary and not a circlejerk. (Wikipedia has it written down, which is probably why they never follow it.) But it seems we hijacked the thread. Time to go back to lynching CodeCat, Ruakh, or whoever deserves, or whoever does not deserve, but we just feel like lynching. Keφr 20:39, 15 September 2013 (UTC)
  • My main point was the procedural one that if there isn't broad support for, rather than merely a lack of opposition to, a reform of how bots support a language, the reform should not proceed. The "broad" support could be among en.Wiktionarians knowledgeable about the language, among en.Wiktionarians as a whole (which usually means the reform is obviously beneficial and doesn't seem to require a lot of specific knowledge), or either type of grouping supplemented by support from those active on the language's Wiktionary. I could imagine relevant opinion coming from other wikis.
It just isn't a very good idea for powerful technical means to be unleashed without being fairly sure that the ends to be achieved are on balance desirable. DCDuring TALK 22:20, 15 September 2013 (UTC)

Eliminating adjective PoS for AinuEdit

Although John Batchelor includes adjectives for Ainu in his works about 100 years ago, according to scholars such as Tamura and Kumagai [2], Ainu has no adjectives; that category of speech is best characterized as intransitive verbs. Wiktionary has four adjectives, listed at Category:Ainu adjectives. Are there any objections to changing all of these to verbs? Since these words include the inchoative sense (become X), a possible way to gloss them is "To be/become X." BB12 (talk) 08:19, 15 September 2013 (UTC)

No objection to changing them to verbs. As to the rest: I would say a better way to categorize them is as stative verbs. See Category:Hawaiian stative verbs for one way of handling these without resorting to "be/become" in every definition. Chuck Entz (talk) 09:56, 15 September 2013 (UTC)
Thank you for the reference. There's a description of this at wt:About Hawaiian, but I don't understand it all. So, for example, does keʻokeʻo = white, clear, including the meaning of "become white, become clear"?
I found minor problems with the following Hawaiian stative verbs: makahiki and wikiwiki (no "stative" label, probably safe for me to add), kea (it's not clear why White Mountain has quotes and is italicized), and luahine (I think the comma just needs to be deleted from the inside the template). BB12 (talk) 21:20, 15 September 2013 (UTC)
Stative, not inchoative. The meanings are "be white, be clear". —Μετάknowledgediscuss/deeds 21:22, 15 September 2013 (UTC)
Right. So what would be a good way of showing the user that these Ainu words have the inchoative meaning as well? I don't understand how Hawaiian really makes the stative issue clear, either, as a stative label does not seem very user-friendly. BB12 (talk) 22:09, 15 September 2013 (UTC)
Comment: English words like "white" and "clear" (and "brown", etc) are verbs, yet the basic meaning of the Hawaiian verbs currently defined/translated as "white", "clear" etc is not necessarily white#Verb, clear#Verb, etc (unless the English verbs happen to have stative senses). The meaning of the Hawaiian verbs is "be white#Adjective", etc. Hence I agree with BB12 that it's not user-friendly to omit "be" from the definitions. Someone could make a pass over the entries with AWB to insert it anywhere it's missing. - -sche (discuss) 04:47, 16 September 2013 (UTC)
I have no particular expertise with regard to Ainu, but do know a bit about stative verbs. It seems likely that, Batchelor notwithstanding, ピリカ is a stative verb. Batchelor even glosses it as "to be good" in addition to calling it an adjective glossed as "good". This seems similar to Lakota, where missionary linguists identified the stative verbs as adjectives since their English or French translations are usually adjectives. I'm not sure about アィヌ, though. It seems clear that it exists as a noun; that Batchelor calls it an adjective might be faulty reasoning from a stative verb (if one exists), but could equally be false reasoning from use of the noun to modify other nouns, as in アィヌモシㇼ (Ainu land; Hokkaido). In either case, though, it's not an objection to eliminating the adjective POS for Ainu. (I'd also echo Chuck Entz in pointing out that statives are not necessarily inchoative.) Cnilep (talk) 05:23, 16 September 2013 (UTC)
I have sent the adjective アィヌ to wt:RfV.
The fact that Batchelor calls words such as "white" adjectives seems reasonable. If I were creating a glossary for myself, I would probably go with adjective as well, just to make it easier in my mind. But Masayoshi Shibatani in "The languages of Japan" (1994, p. 19) says: "Forms corresponding to adjectives in meaning and function of other languages function as predicates in exactly the same way as intransitive verbs. Not only do they share the same personal affixed, but they both function as nominal modifiers in exactly the same way (section 3.3). Furthermore, these forms can have an inchoative reading, as well as their basic stative one... Thus, there does not sem to be any need to set up an independent category for adjectives in Ainu."
There seems to be little objection to erasing the adjective category. Would it work to have a template that generates "(stative) To be" for these Hawaiian adjective/verbs, and another template that generates "(inchoative, stative) To be/become" for Ainu? Other languages could use them as well, of course.... BB12 (talk) 08:31, 16 September 2013 (UTC)
Supplemental: I forgot to check the Japanese Wiktionary for the PoS. They have nine adjectives for Ainu, and the Japanese Wikipedia article on Ainu does not discuss the issue. BB12 (talk) 08:37, 16 September 2013 (UTC)

I asked in the Japanese Wiktionary Editors' Room (編集室) and was directed to a style manual for Ainu at ja:Wiktionary:スタイルマニュアル/アイヌ語, where it notes that 田村 and 中川 do not propose a category of adjectives for Ainu, but instead propose intransitive verbs. It also says that it hasn't been verified what 知里 proposes to do with them. Although the Ainu project terminated prematurely, they have done some nice work on organizing the parts of speech and some other aspects for Ainu, so I will adapt that and change all the Ainu adjectives to intransitive verbs.

Deleting list of protologismsEdit

I have created vote Wiktionary:Votes/pl-2013-09/Deleting list of protologisms.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 12:47, 15 September 2013 (UTC)

Appendix talk:List of protologisms#Deletion debate looks like a good place to start. Note that Wiktionary:Criteria for inclusion no longer has a section 'Protologism', and also Conrad.Irwin's comment of "an institution, albeit a terrible one" (and he voted to keep). Mglovesfun (talk) 13:35, 15 September 2013 (UTC)

A new way of formatting definitions I saw someone useEdit

I just came across the entry paraconsistent logic, and noticed that the whole definition has been wrapped into {{l|en|...}}. I had not thought of this way of using our templates but it's quite a nice idea. If the text contains links, then the template only processes the links that exist in the text, it does not add any new ones. So this is quite a simple and effective way to ensure that all of the links in the definition point to the #English section. I think it's worth considering turning into normal practice, but I would prefer to use a dedicated template name such as {{def}} (currently a language code template, which might be deleted) or the even shorter {{d}} (currently a redirect to {{delete}}, so we could usurp the name). Using a dedicated template would let us avoid all the superfluous logic that {{l}} has such as language tags, script detection, gloss annotations and so on. Would others be interested in adopting this practice? (I can already guess one person who probably isn't) —CodeCat 19:54, 15 September 2013 (UTC)

Yes I quite like it because it marks everything in the definition as English, which is it. Words don't exists independently of language, no reason not to mark all words in a sentence written in English as English. Mglovesfun (talk) 20:01, 15 September 2013 (UTC)
Well, all text on Wiktionary is marked by default as English. The language of each web page has lang=en on it at the top level, so we don't really need to mark the language for definitions. The benefit of this idea is mainly to add #English to the links, but it is also nice that definitions are tagged specially with their own template. That might have some future use for bots or other kinds of semantic parsing. —CodeCat 20:22, 15 September 2013 (UTC)
Yes, {{d}} looks like it is free to repurpose. I have also seen {{senseid}} being put forward a few times earlier. Why not merge its functionality into the new {{d}} template? Keφr 20:04, 15 September 2013 (UTC)
{{dfn}} is also free. We are probably not going to introduce any more language code templates? Keφr 20:13, 15 September 2013 (UTC)
You could also merge it with {{label}} by passing everything after the first parameter as label arguments... DTLHS (talk) 20:19, 15 September 2013 (UTC)
To Kephir's first post: We could do that, and I definitely think we should, but what about context labels? {{context}}/{{label}} is a bit too complex to combine neatly into another template, so we would probably want to keep it separate. Template calls can be nested, but we should avoid it if at all possible because it means that bots using regular expressions can no longer parse them, recursion would be required. (MWParserFromHell parses recursively, but not everyone can use it).
To the second post: either {{def}} or {{dfn}} is fine, but the latter might cause some confusion with the HTML element <dfn> which has a very different purpose.
To DTLHS: That can be done but it means that the order that the labels and definitions appear in in the wikicode is the opposite of the way it appears on the page. It's somewhat counterintuitive. —CodeCat 20:22, 15 September 2013 (UTC)
As for parsing, the API (mw:API:Properties#revisions / rv, mw:API:Expandtemplates) has a "generate parse tree XML" feature (the same as used in Special:Expandtemplates), which may help with the issue, although I presume it puts some load on the servers, so it would be nice to avoid it. I agree that putting {{label}} into the mix would complicate things; never mind the template logic, the cognitive load of editing the entry using that all-in-one template (I imagine it would be something like: {{d|lang|senseid|label|label|...|definition}}) could be a problem. While the expressiveness of the syntax stays at pretty much the same level. Keφr 20:48, 15 September 2013 (UTC)
It would make the wikicode hard to parse for humans as well, although that might be in part because we're just not used to it. A distinct label template stands out as much visually in the wikitext as it does on the page, which helps in quickly finding the part of the source you want to edit. And despite some of the protests, I think it helps too that the labels now always begin with {{context|, it makes them stand out more to the editor and leaves less room to guess about the nature of that particular piece of code. Where do we currently place {{senseid}}? Before the label or after it? —CodeCat 20:54, 15 September 2013 (UTC)
{{senseid}} has to be placed immediately after the # for it to work. --Yair rand (talk) 20:59, 15 September 2013 (UTC)
I do think that's the best practice, because the sense ID would "apply" as much to the label following it as to the definition itself. The label is a part of the definition, in a sense. However, it sounds like this is a technical restriction. What reason is there for that? —CodeCat 21:03, 15 September 2013 (UTC)
The template is essentially overwriting the # in order to attach an ID to the element. If there's anything else before the template, you'll just get an extra list item. --Yair rand (talk) 21:07, 15 September 2013 (UTC)
  • That looks pretty horrible, IMHO. --Dan Polansky (talk) 21:04, 15 September 2013 (UTC)
  • I saw it as well and thought "What a dreadful waste of time, space, resources etc". It exemplifies all that is wrong with this Wiki and probably goes some way to explain why it's getting slower and slower. SemperBlotto (talk) 21:13, 15 September 2013 (UTC)
    • I think that such strongly polarised opinions are also a cause. There has been this schism developing with one side wanting to progress towards a more functional, semantic and manageable practice, and another side preferring the old Wikipedia-style markup. Or said another way, one side sees our current software as a limitation and wants to develop ways to overcome it, while the other side thinks it's fine. Because people hold such strongly differing opinions on what Wiktionary should be, a lot of time is spent arguing over even relatively small things, and progress just grinds to a halt because many attempts to make any significant changes are blocked. So you get a situation where nobody is happy, but nobody is able to do anything about it either. —CodeCat 21:58, 15 September 2013 (UTC)
WP recently introduced a "visual editor". Personally I hate it, but I'm the kind of person who prefers to hand-code HTML instead of using a tool, and I'm a minority. Maybe we should introduce a "visual editor" without removing the ability to write markup if preferred. Equinox 01:27, 16 September 2013 (UTC)
No, enabling that would be horrible. It would just invite people to violate WT:ELE, misuse templates and misformat definitions. Besides, the implementation is buggy, and until quite recently it was very obnoxious. If we are going to implement an editor, it should understand our entry formatting practices. Something more like WT:EDIT. Though WT:EDIT also has missing features and is somewhat unintuitive to use. Keφr 07:42, 16 September 2013 (UTC)
  • If it is going to be used only for linking to "English" section, I think it's not a good idea, note that there are alternative ways to do it, with JS, maybe not a neat way, but it's better than doing it like that with a template and module, overall. If it is really supposed to be used for semantic purposes or if it would significantly make operations easier to do for bots, beside linking to "English" section, then it's a good idea. By the way, the current code of {{senseid}} is hackish, it leaves an unfinished tag. Browsers fix this error though, but still. If we merge it into the proposed template for definitions, we can add 'id's neatly. On the other hand, if we want to put the labels in the element too (which I think we should do), we have to somehow merge {{label}}/{{context}} into the proposed template as well. --Z 09:24, 16 September 2013 (UTC)

Changes to Template:en-nounEdit

I have gradually been working towards converting this template to Lua, specifically Module:en-headword. {{en-adj}} and {{en-adv}} were already converted a while ago, but they were far easier to convert and did not have such intricate parameter usage. With this template things have not been so easy so I've been trying to untie the rather confusing mess of parameters that this template used to support, and also to fix any errors that might have crept into existing entries. In the process I have made some changes to the templates that made certain old uses no longer work. I converted the existing entries but I realise now that I should have discussed these changes more widely before applying them, for which I apologise.

The current situation for the parameters is now relatively simple. The first parameter gives the plural form, or it can be given as "s" or "es" which are interpreted specially by the template. In the past, you could also give the stem and the ending as separate parameters, but this did not seem to have any benefit for English nouns so I removed this feature (again I apologise for not discussing this first). You can also give the first parameter as "-" or "~", which indicates that the noun is usually or partially uncountable. In that case, the parameters shift up by one, so the second parameter gives the plural then. If a noun has more than one plural form, the additional plural forms are given with pl2=, pl3= and so on.

What I would like to change is convert the pl2= (etc.) parameters into positional parameters. So {{en-noun|first plural|pl2=second plural}} would become {{en-noun|first plural|second plural}}. I also want to add support for the shorthand "s" and "es" to these additional plurals; currently this is not supported and you always have to give the whole word. Module:en-headword is not currently used for this template, but after these changes are done, we should be able to change the template to use it. The code for English nouns is already there and should work, please check it to be sure.

Once the conversion to Lua has been done, we can look at ways to make the default plural form that the template shows a bit more useful. Rather than just adding -s onto the end of the word, it could look at the last consonants and decide what to add, so that we would not need to specify "es" anymore. It could also be made to change the ending, like convert -y to -ies. Such changes have been made to the adjective and adverb templates already and they work well. But although I am a native speaker of English, I am not really all that familiar with the intricacies of the spelling and grammar, so it would be helpful if someone could make a list of the most common rules for forming the plural in English. Keep in mind that these should be sensible default rules (rules of thumb), so they don't need to work all the time, only enough of the time for the rule to be worthwhile.

Please let me know what you think of this proposal. If it's agreed, I will make the changes to the template and convert the plural parameters. Then the changeover to Lua can be made. —CodeCat 20:50, 16 September 2013 (UTC)

That makes sense to me. One difficulty, after we've Luacized and want to start supporting things like <-y> → <-ies>, is that we can't change a default-generated plural without identifying beforehand all the cases where this will actually affect the entry. (For example, if the pagename ends with <-y> and we're currently using the default-generated plural in <-ys>, as we currently are for one of the nouns spelled <why>, we don't want it to suddenly become <-ies>. Even when the default-generated <-ys> is actually wrong, we'll want a human to take a look, if only because there's a good chance that we'll need to delete a plural entry that was autocreated off the mistaken version.) Also, we'll need to make sure that the documentation is very clear about how to override the default-generated plural when necessary, for the benefit of less-savvy editors: IME, such editors tend to find it frustrating when computers do complex-but-mistaken things. (Actually I personally think it would be better never to autogenerate plurals at all — we can just always use {{en-noun|s}} and {{en-noun|-|es}} and so on — but I know that a lot of editors would never accept that, so I won't try to push for it. :-P   ) Lastly — thank you for bringing this here. I realize that it can be frustrating, when you have a great idea for how to improve things, to have to restrain your excitement and wait for feedback before beginning work. —RuakhTALK 21:09, 16 September 2013 (UTC)
Actually, come to think of it, we might want to detect nouns that end in <-y> (or <-ss> or whatnot) and don't specify a plural, not so we can try to autogenerate the correct plural, but just so we can assume |? and tag them for human examination. In cases like <whys>, we can require an explicit |s or |whys. This way we can supply a default in only the (very common) case that we can be reasonably sure it's not totally wrong, while still having a simple and easy-to-understand behavior for other cases. —RuakhTALK 21:14, 16 September 2013 (UTC)
We can do this the same way I did it in the past for the adjectives, and for {{it-noun}} currently. We can add code to the module which generates default plurals internally both the "old" way and the "new" way, and then categorises depending on whether they match or not. The new default would become whies, the old would be whys, and so the entry why would end up in the "does not match" category, where we can take measures to fix it, by explicitly adding the full plural form "whys" into the entry. Once we fix all entries in that category, we can be sure that no entry will be affected by changing from the old default to the new. —CodeCat 21:28, 16 September 2013 (UTC)
Yup, though I think it's best to first examine a database dump to look for any identifiable instances. The problem with relying on template-edits and categorization is that MW is not 100% reliable, and when you edit a widely-transcluded template in a way that affects the categories it generates on a given page, it will often happen that the page itself is updated, but still doesn't actually show up in the category. (Database-dumps are not 100% reliable either, firstly because they're out-of-date by up to a few weeks, and secondly because the code used to examine them can never perfectly match the code that MediaWiki uses to parse wikitext — it's difficult-to-impossible to catch all edge-cases and intertemplate magic — but by using database-dumps for the first pass, and categorization for the second pass, you can minimize the chances of undetected breakage.) —RuakhTALK 22:03, 16 September 2013 (UTC)
We could always counter that by doing a null edit on all transclusions to make sure they're updated. That's what I usually do. —CodeCat 00:35, 17 September 2013 (UTC)
I assume that the reason MW is unreliable is that the system is overtaxed, and that performing additional mass edits (null or otherwise) simply exacerbates the problem. (But perhaps I assume wrongly. It may be worth consulting the developers.) —RuakhTALK 05:13, 17 September 2013 (UTC)
I would say that the job queues are long right now because so many changes are being made to widespread templates. The software isn't getting enough time to catch up. But I don't think that has anything to do with the actual CPU usage or anything like that. As far as I know, doing null edits would only tell the system to prioritise the page you edit, it shouldn't affect things in general. And actually, I think that each view or edit also causes the system to process a small part of the job queue, so maybe doing many edits actually helps the system a little bit. In any case, I don't think that editing regular pages, which don't transclude anything, would make the job queue longer. —CodeCat 12:55, 17 September 2013 (UTC)
I've made the changes and did the switchover to Lua a few minutes ago. So far I see no flood of script errors, which is promising. :) It seems to work ok. I updated the documentation as well, but it could still be improved further I think. The documentation seems to be styled as a howto-guide but does not do so well as a reference of all the features. —CodeCat 17:19, 19 September 2013 (UTC)
Thanks! —RuakhTALK 18:23, 19 September 2013 (UTC)

TTBC and language namesEdit

I propose that language names used in {{ttbc}} are left alone rather than being replaced with language code like in diff. The language names are used in translation tables, so they make it easier to switch a ttbc entry to a verified one. --Dan Polansky (talk) 16:29, 17 September 2013 (UTC)

I have a different proposal. Rather than using {{ttbc}} to replace the language name, we just write the language name and place {{ttbc}} after it instead. Then there is no need for the template to take a language name at all. Actually, why not convert it into an actual translation link template like {{t}}? Then we would be able to mark specific translations to be checked, like {{ttbc|nl|gedrag|n}}. —CodeCat 16:32, 17 September 2013 (UTC)
My current solution is just to replace {{ttbc|xyz}} with {{subst:xyz}} when checking a translation. That will work as long as we don't delete the language-code templates. —Angr 16:36, 17 September 2013 (UTC)
I like that last idea. Keφr 16:38, 17 September 2013 (UTC)
Replacing ttbc| with subst: is easier because you only have to remove one piece of text instead of two. Keφr 16:38, 17 September 2013 (UTC)
You can subst module invocations too. You could replace {{ttbc|xyz}} with {{subst:#invoke:language utilities|lookup_language|xyz|names}}. But if the language code does not exist, then this will substitute the script error into the page instead. —CodeCat 16:41, 17 September 2013 (UTC)
Yes, imagine me typing that every time. (Conclusion: We need a JS tool for that. Or maybe temporarily extend {{ttbc}} so that substituting it will return just the language name. Or change ttbc to mark specific translations like you proposed. Then it will be just a question of "ttbc" → "t".) Keφr 16:45, 17 September 2013 (UTC)
  • How many newbie editors know that they can do a thing like {{subst:cs}} to get "Czech"? Why is it easier to delete "ttbc" and type "subst" than just deleting "ttbc", in terms of keystrokes? Why do you take something that works flawlessly and is obvious and replace it with something unobvious? --Dan Polansky (talk) 16:42, 17 September 2013 (UTC)
    • What about my idea above? It would mean we don't need to substitute anything at all. —CodeCat 16:44, 17 September 2013 (UTC)
      • Then, why is it easier to type "{{subst:#invoke:language utilities|lookup_language|xyz|names}}" than to delete "ttbc"? And again, why do you take something that works flawlessly and is obvious and replace it with something unobvious? --Dan Polansky (talk) 16:45, 17 September 2013 (UTC)
        • Not that idea, the first one. —CodeCat 16:48, 17 September 2013 (UTC)
  • ┌─────────────────────────────────┘
    So you mean this idea, I guess: "Rather than using {{ttbc}} to replace the language name, we just write the language name and place {{ttbc}} after it instead." So instead of "{{ttbc|Czech}}", we are going to write "Czech {{ttbc|cs}}" right? For what benefit? And this leaves the key question unanswered: why do you take something that works flawlessly and is obvious and replace it with something unobvious? --Dan Polansky (talk) 16:58, 17 September 2013 (UTC)
    • I suppose we disagree on the assumption that the current setup works flawlessly and is obvious? I would not have proposed it otherwise. —CodeCat 17:01, 17 September 2013 (UTC)
      • Transferring "ttbc|Czech" to "Czech" is obvious; the burden of proof is on you to show that what you are doing is more obvious, IMHO; the editor has to guess that "subst:cs" is going to work. If you claim the current setup does not work flawlessly, then what are its flaws? --Dan Polansky (talk) 17:04, 17 September 2013 (UTC)
        • It is documented, guessing is unnecessary. On the other hand, manuals are written to be never read. How about the other CodeCat's proposal? Keφr 17:11, 17 September 2013 (UTC)
        • In the case of my proposal, it would involve changing * Czech: {{ttbc|cs|chování|n}} to * Czech: {{t|cs|chování|n}}. I don't think that is really any more difficult from your idea. But I think that's far more intuitive to do it this way, because it's clear which translation needs checking. And you also avoid all issues with mixing language names and codes, because the name is always there. It makes it much easier for tools like XTE to parse it too. —CodeCat 17:18, 17 September 2013 (UTC)
          • Two things. What are the flaws of the current setup?
          • Now that I see it fully exemplified, I like your proposal, and I can confirm that it is no more laborsome than the current setup. --Dan Polansky (talk) 17:23, 17 September 2013 (UTC)
            • Aren't the flaws more or less why you posted this discussion? {{ttbc}} allows both language names and codes, which is exceptional and somewhat strange in itself. When a code is given, then that doesn't work neatly with the other translations because they begin with a name. This could be solved by changing {{ttbc}} to use a language name at all times, but that is also unusual (all of our other templates use codes), and it still means that the language name portion of the translation line has to be parsed separately, because it can either be a plain language name or {{ttbc|language name}}. My proposal increases consistency by saying that all translation lines begin with the language name, no exceptions. It also allows you to tag individual translations for checking, which the current method does not allow; the best we can do now is marking all translations for a given language to be checked and hoping that others will figure it out. And because my proposal takes the form of just another translation template, existing tools such as XTE only need to be adapted by adding "ttbc" to the list of recognised translation templates (which currently contains "t", "t+", "t-", "tø", "t0"), so it does not make translations harder to parse. —CodeCat 17:31, 17 September 2013 (UTC)
            • Oh and just to be clear, my idea affects {{trreq}} as well. It would be placed after the language name, rather than replacing it. So a request for a Czech translation would look like this: * Czech: {{trreq|cs}}. —CodeCat 17:34, 17 September 2013 (UTC)
              • I like the idea, but I think it would be better to use a new template; for example, we could use {{t?}} (both for translations-to-be-checked and for translations-requests, the difference just being that the former includes a provisional translation while the latter does not). And we'd probably want to still support the ability to link to the FL wikt, since that's actually very helpful in checking a translation. Maybe {{t?+}}? (Or is that starting to become inscrutable?) —RuakhTALK 20:06, 17 September 2013 (UTC)
                • We can use {{t?}} or something that's a bit longer, {{t-check}} just to make it clear and stand out more. I'm not sure what to do with the interwiki link. I suppose it should be included, but then we just end up adding lots more templates, which doesn't exactly make things easier to follow. —CodeCat 20:27, 17 September 2013 (UTC)
                  • All of {{ttbc}}, {{t?}} and {{t-check}} seem okay to me: it does not need to stand out, as it is located in a dedicated "translations to be checked" section of the Translations section. As for {{t?+}}, I don't know; it could be useful. --Dan Polansky (talk) 08:33, 18 September 2013 (UTC)
                    • Not all translations to be checked, or even most of them, appear in a separate section. Most appear among the regular translations. XTE tags translations to be checked that way, but it was only copying existing practice which existed long before then. —CodeCat 13:13, 18 September 2013 (UTC)

Wikisaurus and attestationEdit

I have created vote Wiktionary:Votes/pl-2013-09/Wikisaurus and attestation.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 17:14, 17 September 2013 (UTC)

Translations pairs, copyrighted translation dictionaries, and copyright violationEdit

I believe that copying translation pairs from copyrighted translation dictionaries with incompatible license into Wiktionary is a copyright violation. As at least two editors claim otherwise, I'd like to discuss this in Beer parlour. --Dan Polansky (talk) 07:54, 18 September 2013 (UTC)

To be specific, I cannot take the bulk of http://eudict.com/ and copy their translation pairs to Wiktionary. Likewise, I cannot take the translation pairs from http://slovnik.seznam.cz/ (provided by Lingea company) and copy them to Wiktionary. And I cannot take the multilingual translation pairs from http://www.thefreedictionary.com/ (a multilingual dictionary popular per Alexa rank) and copy them to Wiktionary as I see fit. --Dan Polansky (talk) 08:01, 18 September 2013 (UTC)

Can someone please help me find the topic, also in Beer parlour, where I asked about the necessity of references in entries? For some reason I think Dan Polansky also took part in the discussion. Anyway, the outcome was that we can use published dictionaries for separate words, if I remember correctly. --Anatoli (обсудить/вклад) 08:05, 18 September 2013 (UTC)
I believe that already copying a single translation pair from a single copyrighted translation dictionary constitutes a copyright violation. Entering translation pairs such that each of them is present in several independent copyrighted translation dictionaries seems to be not a copyright violation. Granted that one infringed translation pair is not a real issue; the issue is the use of distributed editorship to copy a translation dictionary while each of the copying editors would only copy a small fraction. --Dan Polansky (talk) 09:26, 18 September 2013 (UTC)
Translations are facts. Facts by itself are not copyrightable as per the US Copyright Law. Please see the w:Feist v. Rural ruling for more information. -- Liliana 09:31, 18 September 2013 (UTC)
Translations dictionaries show originality in their choice of target terms for source terms. That originality is protected by copyright law. If all translation dictionaries ended up with the same target terms for each source term, it would be true that translation pairs are not protected by copyright as being a straightforward unoriginal obvious expression of a fact, but such is not the case. Strictly speaking, translation pairs cannot be facts; they can at best be a straightforward unoriginal unobvious expression of facts. Likewise, a sentence is not a fact; it is the meaning of a sentence that captures a fact. A fact is a state of affairs; in the case of a translation pair, the state of affairs is that one of the meanings of the source term is identical or similar to one of the meanings of the target term. --Dan Polansky (talk) 09:55, 18 September 2013 (UTC)
Corrected myself by inserting "unoriginal" and striking "un". --Dan Polansky (talk) 10:17, 18 September 2013 (UTC)
Translations are not facts, they are opinions are potentially copyrightable. However as far as I understand it here has to be a minimum level of creativity before a definition is copyrightable. For example French casser translated as break is not copyrightable, indeed that's why so many dictionaries have it. If there was a sentence of usage notes, that IMO would be copyrightable. Mglovesfun (talk) 10:03, 18 September 2013 (UTC)
I am talking translation pairs, not definitions and not usage notes. Again, the choice of translation pairs shows originality. They do not need to show creativity in any artisitic sense. I believe that a set of, say, 10,000 genuinely random numbers that a person publishes is subject to copyright because of the originality shown in that set, albeit not artistic originality.
The claim that "Katze" is a valid German translation of English "cat" is a fact, not an opinion. --Dan Polansky (talk) 10:08, 18 September 2013 (UTC)
we define foreign words with one word translations where we can; also I believe you are correct because of the derivative work rule (regarding your first paragraph). Mglovesfun (talk) 10:11, 18 September 2013 (UTC)
I believe that translation pairs of single words themselves cannot be copyrighted. Translation pairs of single English words are facts. It is difficult to show originality in translation pairs when it comes to translation dictionaries. Originality in translation pairs does not apply to single word translations. These translation rules you are imposing are going out of hand. Relying on only one source is what may constitute a copyright violation. Tedius Zanarukando (talk) 02:40, 19 September 2013 (UTC)
I generally think that translation pairs in a published dictionary cannot be simply copied into Wiktionary, but if the pair exists in two dictionaries (or better, three) with different copyright holders, then there is no copyright issue because it is public knowledge. However, if I, as a competent speaker of language X, see that dictionary ZZZ has translated a word a certain way and I agree with that translation, it seems reasonable that I can use that translation because I can objectively look at it and say it's basically a fact. --BB12 (talk) 05:09, 19 September 2013 (UTC)
Translation pairs are not copyrightable in the US. "Copyright does not protect names, titles, slogans, or short phrases."[3] Larger collections as a whole probably are copyrightable. (I don't see the relevance of 10,000 genuinely random numbers, but you can only copyright human creativity, not some 10,000 numbers your computer spit out.)--Prosfilaes (talk) 06:20, 19 September 2013 (UTC)
Re: "Copyright does not protect names, titles, slogans, or short phrases." I don't see any relevance to translation pairs: a translation pair is neither a name nor a title, slogan or a short phrase. On the other note, the point with random numbers is that copyright does not protect human creativity in any deep sense; it protects that which is original in a very weak sense of original. --Dan Polansky (talk) 08:55, 19 September 2013 (UTC)
Translation pairs that are common in several sources are not copyright violation. It has to be unique to the dictionary for it to constitute a copyright violation. The said Estonian translation for legislator is not unique to EUDict.com. I am 95% sure that the translation is correct. Tedius Zanarukando (talk) 16:55, 19 September 2013 (UTC)
So "I think Katze is a valid German translation of English cat" is not copyrightable, but "Katze -- cat" is? Do you have a citation for that? The counterpoint with the random numbers is if we disagree about whether they're copyrightable, then using them to argue about the nature of copyright is pointless. The logic runs "A implies B; A; then B", but we disagree about A.--Prosfilaes (talk) 22:57, 19 September 2013 (UTC)

Dictionary authors copy a lot from one another. I have seen some Serbo-Croatian dictionaries having hundreds of headwords shamelessly copied (with errors in them) from some older dictionaries which are still in copyright (perhaps they got the permission, but I doubt). In general, it's almost impossible to prove that a translation pair was taken from a particular dictionary. Google translate has statistics-based translations and is safe to copy from there (provided that you check the translations as proper). Another approach is that you compile an entry by combining translations from several dictionaries, with some of your own creativity. --Ivan Štambuk (talk) 11:52, 19 September 2013 (UTC)

For reference, the subject was discussed at Wiktionary:Beer_parlour/2011/July#Where_exactly_does_copyright_violation_begin_when_copying_dictionaries.3F. --Dan Polansky (talk) 09:17, 22 September 2013 (UTC)

At the risk of getting philosophical, translations are not facts, they are opinions. Even if four billion people agree on something, a lot of opinions is never a substitute for a fact. In reply to above "The claim that "Katze" is a valid German translation of English "cat" is a fact, not an opinion." Yes I suppose the claim is verifiable so it's a fact that this is a claim, but it doesn't mean the claim itself is a factual one. Mglovesfun (talk) 11:16, 23 September 2013 (UTC)
@Mglovesfun: Some questions, if you like:
  • Are there any linguistic facts, that is, facts about language?
  • Is the claim that '"cat" is an English word' a fact?
  • Is '"cat" is an English word' a true statement?
  • Is the claim that '"cat" is an English word that in one of its senses refers to things among which is the following animal: Cat03.jpg' a fact?
  • What distinguishes opinions from facts?
--Dan Polansky (talk) 20:12, 23 September 2013 (UTC)
“Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed.”[4] I don’t believe copyright protects opinions or other abstract things that exist in one’s mind. It protects the published expression of facts, opinions, art, and anything else. The question isn’t so much whether our set of translations is the same as that in some other dictionary, but whether we copied it from there.
We can protect ourselves by referring to and citing multiple sources, by writing according to our house style, and, yes, by changing the wording when we repeat the verifiable facts provided by a source. But sometimes the irreducible expression of a singular fact – like a simple definition or translation – is going to be identical to its expression in a hundred other dictionaries. This has always been the case, and dictionaries have always had to resign themselves to it. There aren’t too many ways to say “French le = ‘the’.” Michael Z. 2013-09-23 17:21 z

Etymology policy, original research, aliaque Wiktionarii conturbataEdit

The above discussion with Stambuk has made it painfully clear to me that there is a real, urgent need for a discussion on a good policy concerning etymological information and reconstructions here; WT:Etymology or WT:About Proto-Indo-European are clearly not enough. People are apparently doing their own thing without much regard for accuracy -- and that is a source of concern in a dictionary project with Wiktionary's scope. So, here is my proposal:

  1. (a) etymologies are hypotheses, not words; as such, they need justification from sources, since there is no other way to ascertain their existence and plausibility; therefore
    • (a-1) unsourced reconstructions should not have a page in the Appendix; if a source cannot be provided, the page should be deleted;
    • (a-2) etymology sections should also refer to sources, or at least link to pages with reconstructed protoforms that refer to sources; if there are no sources, the etymology should be deleted;
      • (a-2.1) the exception are obvious derivational etymologyes (reader = read + -er), which need no sources
        • note: this implies the problem of defining "obvious", which people can meaningfully disagree on; but this shouldn't be a worse problem than deciding when a certain term is SoP or not -- i.e., commonsensical ad hoc solutions are possible
  2. (b) a source is a published work; this includes:
    • - paper publications (etymological dictionaries, articles on etymological / diachronic questions, etc.); e.g., the OED
    • - electronic / internet publications (the way to go, it seems, in science publication anyway); e.g., etymonline.com
    • - personal electronic / internet publications (webpages, viewpoints, blogs, and Wiktionary pages set up to defend a viewopint; see (c) below)
    • (b-1) sources differ in trustworthiness and acceptance: some published ideas are near-consensus, others are almost universally repudiated, most are somewhere in-between; because of that:
      • (b-1.1) Wiktionary should have somewhere (in the Appendix, in one of the About... pages, or elsewhere) a discussion of said sources to point out which ones are accepted here, and why.
        • note: the purpose is not to engage in scholarly debate about the ideas and their merits, but to state, as clearly and succinctly as possible, the sources which are accepted here, and why
      • (b-1.2) should a source be considered untrustworthy, then any etymologies that cite it should be considered unsourced; in which case, a new source should be provided, or else the etymology should be deleted
  3. (c) since Wiktionary is not Wikipedia, original research is not necessarily bad; it is therefore possible to create pages discussing and/or judging new or controversial theories (e.g., the current version of Proto-Balto-Slavic; Glottalic Theory; Austronesia Root Theory; Nostratic; Amerind etc.) or proposing new ideas (e.g., a certain reconstructed Proto-Baltic protoform as being ascribable to Proto-Balto-Slavic; or a general rule for assigning a set of Proto-Baltic protoforms to Proto-Balto-Slavic), as long as the reasons for the position and/or suggestion are explicitly given, with all relevant details.
    • (c-1) Witkionary is, however, a dictionary, not an encyclopedia; so such original research pages should be limited only to those that are relevant to specific dicionary needs. For instance, a page on Nostratic would be acceptable only if Nostratic etymologies were cited; if no Nostratic etymologies are cited (for the reasons given in the general discussion about sources mentioned in b-1.1 above), then there should be no page on Nostratic. A page on Proto-Baltic vs. Proto-Balto-Slavic (or a text on them as part of some other page) should exist if both Proto-Baltic and Proto-Balto-Slavic etymologies are currently cited in the etymology section of some word.
    • (c-2) Such "original research pages" should be treated as sources and cited as such, perhaps with a standard format ("original Wiktionary contribution", especially if it is the result of collaborative work, as is the usual case in wikis), in the same way that new contributions or suggestions by the author of an etymological dictionary are clearly marked as such in most published etymological dictionaries.
    • (c-3) Like other sources, such "original research pages" can and should be criticized, and, given consensus, can be changed or even abandoned, in which case they should be deleted, and the pages that refer to them should be updated accordingly (just as they would if a given published source were to be considered untrustworthy; see b-1.2 above).
  4. (d) When citing a source, the citation must be accurate. This means:
    • (d-1) Reconstructed protoforms should be cited as they are written in the source; if a more recent reconstruction is also available, then its source should either replace the old source (and the old protoform should be deleted), or then both forms should be cited with their respective sources as footnotes(e.g., "X is from proto-Y *A1 (or maybe from *B2)", where "1" and "2" refer to sources in the References section), so that it is always clear which author proposes which form
    • (d-2) The same is true for diachronic derivations: *A > *B > *C > *D can be given only if all steps are present in the source being cited. If one step is not present but is taken from some other source, it must have a footnote (e.g., *A > *B > *b1 > *C > *D) referring to this other source.

So... what do y'all think? --Pereru (talk) 09:28, 20 September 2013 (UTC)

On point c) "No original research" applies to all Wikimedia projects. We are describing external world not inventing our own. Compiling other people's original research is one thing, but making our own original research is something entirely different. Wiktionary editors should not be making up reconstructions on how they see fit, based on some theories they have in their heads. Even for "obvious" derivational etymologies (such as reader that you mention) you can in many cases reconstruct proto-forms, because those kinds of patterns are usually inherited. If we allowed original research by Wiktionary editors we would have to restore Chinese phonosemantic interpretations as well. --Ivan Štambuk (talk) 09:47, 20 September 2013 (UTC)
We also have WT:PROTO, which seems to contain at least parts of what you propose. Keφr 09:48, 20 September 2013 (UTC)
Yes, indeed I hadn't seen it... But if there are such policies already here, why aren't they being consistently enforced? I mean, I don't see anybody (except me) deleting unsourced etymologies... --Pereru (talk) 10:36, 20 September 2013 (UTC)
Insufficient manpower? Keφr 12:56, 21 September 2013 (UTC)
"No original research" does not apply to all Wikimedia projects. Wikiversity explicitly permits original research; Wikinews permits original reporting; Wikibooks, as a collection of user-written textbooks, says, "In principle, Wikibooks discourages original research. In practice, however, Wikibooks allows material based on repeatable information from personal experiences or from common knowledge when published literature might reasonably support it, or consensus might reasonably agree with its inclusion." Wikivoyage doesn't seem to have a policy on OR either way, but I can't imagine that it's free of users' personal experiences and relies entirely on previously published sources. —Aɴɢʀ (talk) 10:16, 20 September 2013 (UTC)
Wikibooks doesn't allow OR by editors. See: [5], quoting: Wikibooks is not a place to publish primary research. Examples of things not allowed on Wikibooks include proposing new theories and solutions, presenting original ideas, defining new terms, and coining new words. In short, primary research should be published elsewhere, such as a peer-reviewed journal, or our sister project Wikiversity. Reconstructions invented by one Wiktionary editor can hardly be argued to "based on repeatable information from personal experiences or from common knowledge when published literature might reasonably support it, or consensus might reasonably agree with its inclusion."
Wikiversity seems to be the place where it should be done) all kinds of OR. I suggest that CodeCat, Pereru and other interested parties publish their theories on Wikiversity, and then perhaps we can add a box in the proto-appendices that says "Wikiversity has some original research on this subject".
Publishing news and travel experience is not a scientific discipline, so what Wikinews and Wikivoyage do doesn't really matter. --Ivan Štambuk (talk) 10:24, 20 September 2013 (UTC)
I know nothing of etymology (I have always assumed it was just guesswork) but original research is essential here. We see how they are used in real-world texts and devise our own definitions. Or are we just supposed to copy definitions from other dictionaries? SemperBlotto (talk) 10:29, 20 September 2013 (UTC)
A lot of science is just guesswork - e.g. many theories in physics cannot be tested in practice. What is important is who is doing the research: is it a published peer-reviewed research, or someone publishing on their web page. What is being argued here is that Wiktionary editors should not be allowed to make their own theories, because every reconstruction carries in itself all kinds of details of a proto-language theory, what kinds of sound changes might have occurred and similar. That would set a dangerous precedent, because due to the lack of external validation mechanisms such as peer-review there would be no way to distinguish "educated guesses" from synthetic and imaginative work such as the controversial Chinese phonosemantic interpretations (which were deleted but I thought they were interesting and should be preserved in appendices). The community cannot be allowed to on its own to both devise theories, and verify their plausibility. Only the rest of the world, i.e. the community of historical linguists, can validate such reconstructions, and theories on which they are based. But for them to do so such theories must be published in respective journals, and not here.
Inferring meanings from attestations of words (i.e. making up definitions for them) is not OR. OR would be coining a new word and a new meaning. Simply providing a definition is describing a fact of the external word. Reconstructions are not such facts. --Ivan Štambuk (talk) 10:48, 20 September 2013 (UTC)
You're entering philosophy here. All theories in physics must be testable; you're probably thinking of M-theory, but then the controversy about it is exactly about whether it is a theory (instead of philosophy), since it's not testable.
A community can be allowed on its own to both devise theories and verify their plausibility; this is what the scientific community does. Again, I mention that etymological dictionaries often add new ideas to their discussions, clearly labeled as the author's. The difficulty is distinguishing "educated guesses" from good ideas is not bigger than distinguishing good, acceptable pusblished sources from bad ones. You speak in theoretical terms, not in practical ones; the best argument I can think of for not doing that is that probably nobody here would care to check if a certain idea is good or bad (and, I'm afraid, nobody is also going to check if a given source is good or bad).
Because here is the main point: if all ideas are clearly labeled, no information is lost, and all information can be criticized. You keep saying "there would be no way to..." but I don't see that this would be any more difficult that many things Wiktionary already does. --Pereru (talk) 11:19, 20 September 2013 (UTC)
  • There is absolutely no difference between making up new words, and making up new reconstructions. Neither have attestation. We don't allow the former because they cannot be attested (which means that their meaning cannot be inferred from the context), and we shouldn't allow the letter because they cannot be referenced (which means that they are not supported by reputable scholars). --Ivan Štambuk (talk) 10:32, 20 September 2013 (UTC)
    • Etymologies are not words, but hypotheses; they don't need attestations, they need an author, and arguments. If the author is a Wiktionarian, then so be it -- as long as this is clearly labeled as such, and as long as there is a source (a Wiktionary page, for instance) to be cited and inspected if needed, I don't see a problem. Again, I say: etymological dictionary writers do this all the time -- they add new ideas to their discussion of published ideas and hypotheses, clearly labeling them as "the author(s)' ideas". Why shouldn't Wiktionarians immitate them? What is the problem?
    • If the question is simply "Wikimedia policy", what exactly would happen if original research were placed here -- would Wiktionary be shut down?
    • Having said all this, I could agree with what Stambuk is suggesting: put the original research somewhere else (Wikiversity, Wikibooks) and then cite it here. That would work, I think. Do y'all agree? --Pereru (talk) 10:42, 20 September 2013 (UTC)
And, now the million-dollar question: if we have a policy on etymologies as always needing sources, shouldn't we all now deleting those that don't? Or at least adding sources to them? Including in this the Appendix pages with reconstructed protofroms? And shouldn't we be writing a page in which we discuss which sources we accept and which we don't? --Pereru (talk) 10:45, 20 September 2013 (UTC)
It should be specified in the policy if it works retroactively or not. I'd suggest that it does not, and that doubtful etymologies (I've noticed that you extended the whole thing from reconstructions to general etymologies) once the policy is successfully voted be marked with template such as {{rfv-etymology}}, and be given a one month or so to be referenced, otherwise they should be deleted. This should only apply to etymologies containing borrowing or inheritance chains, and not simply derivational morphology (i.e. reader). --Ivan Štambuk (talk) 10:57, 20 September 2013 (UTC)
- Why shouldn't it work retroactively? If something is a good idea, it is a good idea. Now maybe there isn't enough manpower to enforce it, but that's a different question. (Yes, I did extend it, because, of course, etymologies and reconstructions are both hypotheses, and what is true of one is true of the other, and for the same reasons.)
- What you're suggesting sounds OK, but considering the huge number of unsourced etymologies here, it sounds to me like a task for a bot.
- And, pardon me for insisting, but again: what is the problem with original research (protoforms, the suggestion that PBS = PB in most cases, etc.) in Wiktionary -- with arguments, etc. -- since this is done in good, respected etymological dictionaries? I'm OK with the discussion being somewhere else (as long as there's a link to it -- I simply want the information to be retrievable, so if a reader wants to know where a given idea comes from, then s/he can trace it), but what is the problem with it being here? Would Wiktionary be shut down? --Pereru (talk) 11:04, 20 September 2013 (UTC)
It shouldn't work retroactively because thousands (perhaps tens of thousands) of etymologies would then fail, and would have to be deleted. Most of them are, in fact, perfectly fine and could be referenced if tagged. Only doubtful etymologies should be tagged, and particularly doubtful simply removed. Most of the Proto-Slavic appendices do not have references, I've been very slowly adding them, but most of them are fine. It takes time to check and expand entries. Tagging them all at once, and giving some fixed amount of time for references to be provided would make us lose lots of valuable material. (Which could then be restored with citations in the future, but still.)
I've already listed several problems with OR by Wiktionary editors: 1) no way to validate plausibility of reconstructions. Chinese phonosemantic interpretations would be perfectly valid Wiktionary OR and should be restored in the appendix. 2) we could be making dozens of different but equally valid reconstructions by taking bits and pieces from various referenced theories, with this sound change occurring per this author, and that sound change occurring per that author, creating a mess of reconstructions and being completely usless to the interested reader 3) we would attract all kind of etymology enthusiasts who would use Wiktionary as a venue to channel their creative efforts, wasting other people's time overseeing them and assessing the quality of their contributions. 4) simply mixing theories by reputable etymologists and Wiktionarians would make us look amateurish and a laughing stock of the Internet. Think about the hypothetical etymology: "According to Beekes ..... Jasanoff argues for .... User:Pereru thinks that...". It would look ridiculous. --Ivan Štambuk (talk) 11:39, 20 September 2013 (UTC)
For borrowings there are in many cases attestations, where we can even see who was the first to use the term and from what language. Or, in case of coining the word, who coined it and when. For reconstructions that is not the case, and that makes them the most controversial part of the proposal. Also cases when a word was borrowed in the prehistoric period, i.e. from a proto-languge to an attested language, or from a proto-language to a proto-language. Everything involving proto-stuff should be mandatorily referenced. Normal borrowings from attested language to an attested language are much less controversial and they should be referenced only if tagged by a template such as {{rfv-etymology}}.
Wiktionary shouldn't imitate authors of etymological dictionaries because that would constitute true original research. Of course, the community can decide to allow OR by editors regarding etymologies, but as I said - a vote is needed for that. It should be carefully explained what the proposal entails, and have all of the pros and cons laid out, because its consequences are far-reaching. --Ivan Štambuk (talk) 11:15, 20 September 2013 (UTC)
WT:OR currently redirects to Wiktionary:Wiktionary for Wikipedians. I think we need WT:Original research. Mglovesfun (talk) 11:23, 20 September 2013 (UTC)
Mglovesun, I thoroughly agree. --Pereru (talk) 11:35, 20 September 2013 (UTC)
Ivan, even in the case of borrowing, it is still a theory -- it maybe that language A borrowed it from language B, or maybe from language C, or... and the discovery of a new old source with an earlier attestation of the word in question can suddenly make a certain borrowing path seem more (or less) likely to be true. So, in principle, it's the same problem for all of them: what is the evidence, and what do you conclude from it?
I think that everything that is a theory or a hypothesis (borrowing, protoforms, etc.) should be referenced. And I'm always surprised that this seems so controversial. After all, anyone here who is aware of a hypothesis (a borrowing path, a reconstructed protoform, a diachronic sequence) has read about it somewhere; why does it seem to be so difficult to agree on making it obligatory for editors who want to write about it to also write down where they saw it?
I agree that a vote is needed for that. I wouldn't mind seeing it happen. Where should that be? (Personally, I don't have any new theories to add here; since I'm not a Balto-Slavist, I'm perfectly happy with repeating the ideas of others. But I see people like CodeCat making unpublished, but apparently possibly true, claims, like "many Proto-Baltic reconstructions can be proposed as such also for Proto-Balto-Slavic"; what would be wrong with defending this idea here somewhere, and then citing this defense when a given published Proto-Baltic protoform is cited here as being Proto-Balto-Slavic? To me, as long as nothing is hidden and everything is clearly marked, this would not be a problem. Besides, if it is possible to publish things at Wikiversity or Wikibooks, and then cite them as sources here, doesn't it all boil down to the same -- are we, or aren't we, going to allow proposals from this source, published on paper or at Wikiversity, in Wiktionary? And shouldn't there be a page somewhere making explicit what the policy for choosing sources here ought to be?) --Pereru (talk) 11:35, 20 September 2013 (UTC)
It just isn't true that "anyone here who is aware of a hypothesis (a borrowing path, a reconstructed protoform, a diachronic sequence) has read about it somewhere". When I wrote the etymology sections of Lower Sorbian words like bom, šula, and šołta, stating them to be loanwords from (Low) German, I wasn't following anything I had read anywhere; I was using common sense and my expertise as someone who studied historical linguistics for many years. As it happens, these etymologies can be backed up by a 100-year-old dictionary whose content is available online ([6], [7], [8]), but I didn't know that when I wrote the sections, as I only found those links just now. —Aɴɢʀ (talk) 11:58, 20 September 2013 (UTC)
Yes it's all a theory, but I wouldn't classify them all as involving the same amount of controversy, and potential for mistakes. If some borrowing is "obvious", the only reason why it should be deleted is if there is another source claiming otherwise, or the editor inspecting it doesn't agree that it is that much obvious. But we have many such "obvious" but unfortunately unreferenced etymologies. There is no harm in having them.
Yes it would ultimately boil down to whether we include OR from Wikiversity or not. But transferring the OR to Wikiversity would enable us to make a clear distinction between 1) having OR 2) including such OR in normal etymologies (or perhaps only in appendices of proto-forms). Currently OR that we do have is also automatically used. Personally, I don't want Wiktionary either hosting or using OR in etymologies. Whether we should include OR from Wikiversity as a source of etymologies should also be sanctioned by a vote.
I propose that a vote on these etymology issues has several independent points 1) retroactively being applied 2a) references mandatory for all etymologies except from morphological derivations 2b) references mandatory for all etymologies involving proto-forms 3) Wiktionary hosting OR by Wiktionarians, in some particular format (with clearly marked authorship) 4) Wiktionary referencing OR in etymologies - in what format, and where (in the mainspace entries, or just in appendices). --Ivan Štambuk (talk) 13:28, 20 September 2013 (UTC)
  • Nobody cares about OR by Wiktionarians. The rest of the world, and I suspect most of the editors here, only care about theories on word origin that are backed up by scholarly sources. They want opinions of experts which have their name and prestige in the academic community at stake, and not the opinion of User:XYZ. Having original research on Wiktionary would compromise its integrity and make it an etymological equivalent of Urban Dictionary. We might as well write a computer program that has a set of words as input, sound changes defined in Backus-Naur form, and generate thousands of proto-form appendices, with the computer code that generated them as the "author". The perspectives are appalling to say the least. --Ivan Štambuk (talk) 12:05, 20 September 2013 (UTC)
    • I agree that we need to be careful about OR in etymologies, and that scholarly research is far preferable to laymen's speculations, but completely eliminating all OR from Wiktionary would mean deleting all of our definitions that haven't been copied from PD dictionaries. We don't want to throw the baby out with the bathwater. —Aɴɢʀ (talk) 12:25, 20 September 2013 (UTC)
      • OR would be creating new words and new meanings without attestations. Describing the meanings of existing, attested words in a form of their definitions is not OR. We are describing external fact (usage of words) that are beyond our control. How we describe them is up to us. Wikipedia uses references and citations to back up a particular formulation of some fact, which must be written in a manner different from copyrighted works. Wiktionary uses attestations to describe meanings of words, in a manner different from copyrighted works. Just as WP has many articles on topics that pass it notability criteria, and are found in any other encyclopedia, so does WT include words that pass its criteria for inclusion, and are not found in any other dictionary. --Ivan Štambuk (talk) 13:34, 20 September 2013 (UTC)
  • Reading this discussion I have to wonder, why is original research by Wiktionary editors treated so differently from original research by external sources, no matter how bad or outlandish those sources may be? If we want to completely eliminate all judgement by Wiktionary editors concerning etymologies and their reconstructed forms, then we'd have to admit Nostratic or Altaic etymologies too. After all, they're published sources too. Do you see what I am getting at here? What makes the sources so holy compared to our editors' own understanding of the subject, that we'd willingly say "we don't know anything, let's trust them blindly"? There are many papers that are self-published by well-known professional linguists such as Kortlandt, which are nonetheless relatively controversial in their field. Yet it seems like many people here would favour referencing to such a source. So if a source published by Kortlandt on his own web page is considered valid, then what's wrong with publishing something on our own web page (Wiktionary)? The only difference I see is that Kortlandt has a diploma and Wiktionary editors do not, but it would be rediculous to say that having a diploma suddenly makes you a credible scientist. Anyone who understands the scientific method knows that scientific hypotheses are judged on their merit and not on their source and that source's credentials. We should do the same here. —CodeCat 13:00, 20 September 2013 (UTC)
If the sources are indeed bad or outlandish, then there is somebody out there describing them as such. That someone out there is not a Wiktionarian - it should be a reputable scholar. For example, the statement "Pokorny's PIE reconstructions are badly outdated" can be corroborated by such scholars.
I see no harm in including Nostratic or Altaic etymologies. But they should be clearly marked as controversial and not having a widespread acceptance, and referenced of course. There is no reason to ignore them altogether. I have an etymological dictionary of Croatian from 1992 (the only one written, although a replacement is being written for a long time), which lists Proto-Slavic, Proto-Indo-European, and Proto-Nostratic etymologies and cognates. The author is an etymologist (though not "long ranger") and there is no reason to doubt his judgement. Readers might be interested in Proto-Nostratic reconstructions, they can be abundantly cited, there are dictionaries published, conferences organized etc. There is no reasn to ignore it just because it is controversial. However, it should undergo the same kind of scrutiny as normal reconstructions, but of course with a big question mark and a clear indication that we're dealing with a theory that has not gained widespread acceptance among historical linguists.
What makes them holy is that people like Kortlandt have a job called historical linguist, that they have been studying their field for half a century, and that their opinions are accepted by other people in the field (even if those other people happen to be their close associates). Nobody cares about Wiktionarian's original research. Wiktionary is not "our web page" - it's a project with a high degree of autonomy, but still under the auspices of the WMF. We cannot put any kind of nonsense here just because we like it. We have an obligation to our readers and the project's goals to become a reliable dictionary (normal dictionary, etymological dictionary, all kind of dictionary..) or else nobody will trust its definitions and etymologies.
We cannot discuss merits of particular theories (including our own) because it is not up to us to discuss such merits. (We can do it on talk pages though, but not in the articles). We can only discuss merits of opinions of real linguists. We are nobody. People with diplomas are there to discuss merits, make reconstructions and speculate on word origin. We can discuss their opinions, compile it a user-friendly format, taking into account what is more and what less accepted, what is newer and what old or obsolete research. We cannot simply say "The reconstruction *xyz is one true form, the rest are wrong". --Ivan Štambuk (talk) 14:01, 20 September 2013 (UTC)
I happen to care about Wiktionarians' original research, because if it can be peer reviewed and tested against reality and existing theories, then it's as scientific as anything. And a wiki, by definition, is peer reviewed. I am strongly opposed to deleting content just because it can't be sourced. In the main namespace, we don't source terms, we show their existence by giving evidence. The same should be done for etymologies: we should give evidence to back up the claim, not just throw our hands in the air and point to someone else to do the arguing. I find it very worrying that you would hold external authorities in higher regard than our own ability to reason and judge content, on principle alone. You say that we have no way to judge the validity of a reconstruction without an external source to back it up, but to me it's the opposite. I don't find anything credible unless it makes sense to me, and just saying "X said so" doesn't really tell me anything or make me more likely to understand it. On the other hand, a well-formulated explanation of the process involved would make something credible to me. So if an etymology says "reconstruction is based on sound laws X, Y and Z (with the sound laws themselves referenced)" then that would be much more useful to me than "reconstruction based on the judgement of person P". I can judge the sensibility of sound laws, but I can't judge the sensibility of some person's ideas. —CodeCat 14:15, 20 September 2013 (UTC)
We are not a scientific community. It is not our duty to verify other Wiktionarian's theories. Replacing an etymology referenced with a work written by a scholar, with an etymology that is Wiktionarian's original research should never happen. We can provide a more detailed interpretation of reconstructions made by scholars, such as listing the sound changes that have occurred or how the reconstruction is made (the stuff which is "obvious"), but making reconstructions of our own, as well as making frameworks for reconstructions of our own (how proto-language looked like), seems to me a bit too far. Making judgements on the plausibility of sound laws is subjective and to a large degree arbitrary - it's far from certain that what seems OK and credible to you, will also look the same to somebody else. --Ivan Štambuk (talk) 14:38, 20 September 2013 (UTC)
So you'd support deleting WT:RFV? —CodeCat 15:13, 20 September 2013 (UTC)
After reading what wrote Ivan Štambuk, I don't understand why you would think he'd want that. The purpose of this page is precisely to verify claims ("this sense exists") that have no sources and seem dubious: if there is none, the claim is usually deleted. It's the same with etymologies: if we can't substantiate claims as speculative as reconstructions, then we can't write them here. Wiktionary, like the other projects, is not a place for original ideas. Dakdada (talk) 15:38, 20 September 2013 (UTC)
I do agree with you, but what I meant to show was that there is no difference between verifying a sense and verifying a reconstruction. In both cases you are testing a hypothesis and seeing if it holds up to scrutiny. In the case of RFV, the hypothesis is "term X means Y" and in the case of a reconstruction, the hypothesis is "the origin of X can be reconstructed as Y". RFV then looks for evidence to corroborate the claim, and if none is found, it's deleted. Verifying a reconstruction should follow the same process: look for evidence that corroborates the claim. Evidence in this case would be descendants, along with the known sound changes that applied to them. Someone else's reconstruction is not evidence, anymore than someone else's dictionary can satisfy an RFV. Just as we don't take it for granted when others say "term X means Y" we also should not take it for granted when others say "the origin of X can be reconstructed as Y". I don't see any difference here. —CodeCat 15:59, 20 September 2013 (UTC)
Did you perhaps mean "deleting by means of WT:RFV" rather than "deleting WT:RFV"? —Aɴɢʀ (talk) 16:05, 20 September 2013 (UTC)
re "Wiktionary, like the other projects, is not a place for original ideas." Perhaps not ideas, but definitions most certainly.
In a small way every definition we have that is not in a dictionary with identical wording is original research, sometimes based solely on introspection of one's own idiolect. Definitions worded identically to those in dictionaries still in copyright with no citations are probably copyright violoations. Those with citations and non-identical wording have a good defense against copyright infringement. There is a kind of "original research" required for this. And one of the the things that sets us apart from other dictionaries is that we have words and definitions that none of them have: original research. DCDuring TALK 16:10, 20 September 2013 (UTC)
Writing a definition may be original, but it is not research: one does not invent a new idea just by writing a sentence that describes a sense. Etymology, on the other hand, clearly requires research to reconstruct a story, and this can't be invented by us. Dakdada (talk) 17:16, 20 September 2013 (UTC)
  • After reading through the above, I see that OR is a controversial issue here. Perhaps an independent discussion should be started on this topic -- I will go ahead and add a new heading with it it below. --Pereru (talk) 16:59, 20 September 2013 (UTC)

I'm personally fine with letting what we have stand.--Prosfilaes (talk) 19:16, 20 September 2013 (UTC)

I'll note that in contrast to all this stuff about Proto-Nostratic, a lot of etymologies are obvious to the adder yet useful. I've added at least two English words that have come from the local Spanish community, with the obvious etymology. A bunch of slang and jargon may have known etymologies but never be important enough to appear in an etymological dictionary.--Prosfilaes (talk) 19:30, 20 September 2013 (UTC)
See my comments below: Wiktionary:Beer parlour/2013/September#Original_research_at_Wiktionary. - -sche (discuss) 22:06, 20 September 2013 (UTC)
  • I oppose original research in etymologies. While definitions are attested by finding actual uses of words, etymologies have to be sourced. When entering etymologies, I state my source in the edit summary. --Dan Polansky (talk) 08:41, 22 September 2013 (UTC)

Han Character PagesEdit

On the Han Character Pages (e.g. [沐]), I am curious about the language order. With Mandarin being the primary language (see en.wikipedia.org/wiki/Standard_Chinese) of the People's Republic of China, shouldn't Mandarin and Cantonese switch places on the pages? The order should thus be:

  • Translingual
  • Mandarin
  • Japanese
  • Korean
  • Cantonese
  • Vietnamese

—This unsigned comment was added by (talkcontribs).

Language order is alphabetical, with the exception of "Translingual" and "English" which come before everything else, in that order. That makes finding a particular language easy (binary search algorithm). Ordering by some ill-defined "primarity" or "popularity" of a language would counter that. Imagine having to find a language in a list ordered like that in an entry with dozens of languages listed, like AKeφr 13:43, 20 September 2013 (UTC)

Original research at WiktionaryEdit

Basically: what are the pros and cons of original research at Wiktionary (or, what I consider to be a notational variant of it, original research at Wikiversity/Wikibooks with links to Wiktionary)? Here's my personal take on it: It all depends on what Wiktionary (and perhaps dictionaries in general) are supposed to be. If you see Wiktionary as a repository (= deposit) of existing information to which nothing new is to be added, then of course any original research is anathema. If you see it as source of good, accurate information, then there is nothing in principle against original research (which can be just as good as "copied" information from other sources). In the latter case, the basic question is how the new information added -- be it original or non-original -- is to be judged, and under what circumstances it is to be allowed to stay, corrected, or deleted. So, Wiktionarians, what is your opinion: is Wiktionary a repository of words and information about words ("we cite, we don't suggest") or is it a source of good information about words? ("we inform accurately, we remove wrong stuff")? What do you think, Wiktionarians? --Pereru (talk) 17:07, 20 September 2013 (UTC)

I prefer the second approach. Doing anything else would kind of dumb it down and turn editors into nothing more than information-collectors. It would also make it impossible for us to define anything without citing other sources that define things. How do you describe the meaning a word without either making up your own description, or copying someone else's? I think the crucial difference is that Wikipedia, being a collection of general knowledge, can afford to relegate its responsibilities to other sources. There is generally a wide variety of usable sources. But the only sources that give the information that we give are other dictionaries (general or etymological). And we're already trying to be a dictionary, so how can we hope to offer anything new that other dictionaries don't, if we rely on them for our content? I think it all depends on what you see as the general mission of Wiktionary. Are we trying to be the best dictionary, or simply a free alternative to existing dictionaries? —CodeCat 17:16, 20 September 2013 (UTC)
I think that there is one kind of "OR" that is not be very controversial, or at least shouldn't be, though perhaps it warrants review: citation-based definitions.
In a small way every definition we have that is not in a dictionary with identical wording is original research, sometimes based solely on introspection of one's own idiolect. Definitions worded identically to those in dictionaries still in copyright with no citations are probably copyright violoations. Those with citations and non-identical wording have a good defense against copyright infringement. There is a kind of "original research" required for this. And one of the the things that sets us apart from other dictionaries is that we have words and definitions that none of them have: original research.
By extension, grammatical and context labels could require similar OR and mostly escape objection. Given the lack of documentated uncopyrighted sources, it seems to me that our pronunciations often require some "OR", which however good they are has less inherent objectivity than citation-based definitions. We are probably saved only because very few folks add pronunciations and not too many users rely on them.
Usage notes have been more controversial, not because of OR, but because they often include matters of stylistics or prescriptivism.
Generally, the norm of "description, not prescription" plus the effort required to find and format citations limits the potential for abuse of original research in definitions and the associated matters above.
Etymologies seem different to me, more conjectural. I am not alone: Anatoly Liberman seems to not be impressed with many published etymologies. DCDuring TALK 17:32, 20 September 2013 (UTC)
There must not be any kind of original research (OR), for etymologies or whatever. If we did OR, we would be useless to the reader. Adding pronunciations based on the established (prescriptive or descriptive) grammar rules on how words are pronounced is not OR. Some of our Ancient Greek pronunciations generated by the {{grc-ipa-row}} could possibly be OR. Yes we are "dumb data collectors". WikiData is similarly "dumb data storage" (Thought they are indeed trying to change that by building some kind of a multilingual semantic dictionary, which will no doubt fail because such thing cannot be done.). I personally don't want to read theories of etymologies written by User:XYZ. I want to read etymologies which a backed up by hard evidence, i.e. references by scholars whose job is to research etymologies. People which have name and surname, and a diploma in the subject. Those who want to make up new theories of word origin should do it on their own personal piece of Internet, and not instead ruin the credibility of the entire project. --Ivan Štambuk (talk) 17:59, 20 September 2013 (UTC)
I think you still don't see that you are implicitly objecting to the inclusion of any definitions at all on Wiktionary. Because all of them are original research, and very few of them are referenced to external sources. Note that I explicitly mean sources that contain the same information, like Wikipedia references; I don't mean citations as they are not sources of information, they are corroborating evidence. So you cannot object to all original research without objecting to Wiktionary users writing their own definitions and other information (like inflections). You also cannot object to all OR without objecting the process of RFV, which is pretty much entirely OR, as it revolves around Wiktionary editors testing the hypotheses (senses) against real evidence (citations) and trying to find the best ways to describe the evidence that is available. —CodeCat 18:07, 20 September 2013 (UTC)
[9] - As I've explained here, adding definitions is not OR. OR would be coining new words or meanings for existing words, that are not found in the outside-world usage. Attestations are the corroborating evidence of their definitions. Each of them represents a small semantic context of the real-world usage, and their intersection must be covered by the definition. By describing a word usage through definition, we do not exert any kind of influence on the outside world. It's a purely descriptive endeavor. But by creating reconstructions or proto-language theories of our own, we do. Senses are not hypotheses, they are (and must be) descriptive in character. RfV process is there to process senses which appear to be invented, and not backed up by usage. We cannot make similar RfV for our own reconstructions, because there is nothing we could compare them with. We could only speculate on the plausibility of the particular reconstruction, whether this sound change or that analogical leveling seems likely or not. But given how each reconstruction is not merely a sequence of proto-phonemes, but a convergence point of various assumptions on how the proto-language looked like, and that there is no unique reconstruction of proto-language (even for well-researched ones such as Proto-Indo-European), such discussion would always turn inconclusive because its outcome depends on the personal preferences of the editors discussing them, which do not have to agree. In other words, we can only collect reconstructions done by historical linguists, reflect on them, and describe them in a manner that is more accessible than a typical etymological dictionary (which are usually written in a dense barely-legible manner, with many abbreviations and similar), or, in case of conflicting etymologies, collect all of them from various sources and present them to the reader in a neutral manner. Not making judgements of our own why this reconstruction is "better" than that reconstruction, or this theory than that theory. --Ivan Štambuk (talk) 18:29, 20 September 2013 (UTC)
I am a bit confused about what you say about senses being "descriptive in character". Reconstructions are also descriptive, but on a different level; they describe the relationship between the descendants. Formulating a hypothesis that fits the evidence is the same in both cases, and science as a whole is aimed at formulating ways of describing the reality that we see. In the case of senses, we hypothesize that they have a certain meaning, but that hypothesis is free to be tested against real usage/attestations through RFV. Reconstructions can be tested the same way, but they require more background knowledge. That doesn't mean that the process of testing in itself is a fallacy, it only means that we need to make sure we understand the way in which the reconstruction matches the attestations. So the only real difference between the two, that I can see, is that verifying senses requires less specialised knowledge and is thus "easier" to do than verifying reconstructions. So it's a job that "almost anyone can do", while reconstructions have a lot more theory behind them. The only reason that I can think of for allowing one but not the other is that, because reconstructions are difficult, there is much more room for error and so we don't want to "trust" users with them. But I really don't know if that's a good reason. —CodeCat 19:23, 20 September 2013 (UTC)
There is a big difference - words are attested in contexts and inferring a definition from a set of contexts is a descriptive work. By providing a reconstruction you are making a stronger statement: your are claiming that words in question are related in a particular manner (by inheritance from an ancestor language), of which you don't have any real proof at all. We have proofs of contexts of word's usage (citations). Verification of a proto-form by checking its structure (phonological restrictions) or the list of sound changes from/to it is is not unique because rarely a proto-language is uniquely defined - there are many theories and reconstructions can take many forms. Depending on the evidence some authors would prefer one proto-form and some other authors another proto-form. Proto-Balto-Slavic forms that you added are reconstructed in other ways by Kortlandt, Derksen and the rest of the "Leiden School". Why is their form better/worse than that of Kim, Matasović or somebody else? We cannot and we shouldn't judge that. We also cannot judge whether a form invented by a fellow Wiktionarian is is plausible or not, regardless how "strong" the evidence seems. --Ivan Štambuk (talk) 20:13, 20 September 2013 (UTC)
Not at all. Reconstructions don't claim relationship, although it's not surprising that someone might think that. A reconstruction only says "given the sets of regular sound changes that we have, we can derive all these words from this one". That's a subtle difference but quite a crucial one. If sound laws can be given that allow words in several languages to descend from a certain reconstructed term, that doesn't imply at all that those two words are related. Relationship is not absolute, there are degrees of confidence, and confidence grows as more successful reconstructions can be made using the same set of theories. So there is no such thing as "proof": nobody can "prove" that English is related to Russian via common ancestry. All we can do is suggest hypothesis and test them against data, and even after we think our hypotheses have stood up against scrutiny, they can still be wrong. This happens all the time in science. A notable linguistic example would be when linguists thought that Armenian was an Iranian language.
The same can be said about definitions as well. Definitions on Wiktionary are hypotheses too, they cannot be tested against an absolute "truth" because there is no such thing. How do I know that apple means what our definition says it does? Even if we find citations for it, there won't be a whole lot that actually describe what an apple is, compared to citations that just refer to an apple expecting it to be understood. And this is my point. If none of our citations actually tell us what the word "apple" refers to, we can only go by our own understanding of the word (Original Research) and formulate a description (hypothesis) that fits with the way the word is used in the attestations we find (evidence). But just try seeing what happens when you define "apple" to actually describe a pear. You'll find that many attestations of "apple" can be interpreted that way without problems. So there is no "truth" in attestations at all, only evidence that we can use to either corroborate or disqualify our definitions. In the end, a lot of our definitions are still founded in the personal understanding of our editors, and in their ability to describe the semantic associations that go on in their head. —CodeCat 20:49, 20 September 2013 (UTC)
(edit conflict) Most of what goes into an entry isn't research, but organization and presentation of information. Etymologies are different, they're syntheses of historical data and theory that require expertise to do properly, and should be referenced, where possible. We already have enough problems with single-purpose contributors who insist on massively adding terms in their favored language to lists of cognates based on their subjective hunches. When we don't have editors with both the expertise in the linguistic history of the languages in qustion and the time to review the edits, a lot of really bad information just sits there in the etymologies. "Peer review" was mentioned in another discussion: peer review here is different from peer review in the literature, because "peers" in the literature are linguists and other scholars, but here they're basically anyone who knows how to edit and has an opinion.
As to the topic that precipitated this discussion: is there some way we can tag "in-house" reconstructions differently from those that are obtained from sources? It would be unrealistic to require enough consistency in referencing to make simple lack of a reference an indicator that a reconstruction isn't from a source. I'm not talking about adjusting for changes in notation and theoretical framework such as replacing ei with ey, or even a long vowel with e+laryngeal (though which laryngeal might be a judgement call), but new reconstructions. I think these are worth having, but they need to be labeled. Chuck Entz (talk) 18:34, 20 September 2013 (UTC)
  • At least no one is challenging what we do with definitions and other semantic and grammatical matters, whether called OR or not. I think matters connected to "correct usage", style, and the farther reaches of pragmatics and discourse analysis have similar problems of evidence and quality to those of etymology. DCDuring TALK 19:44, 20 September 2013 (UTC)
  • This discussion and the preceding ones have touched on two distinct things:
(1) allowing users to proposed etymological relationships between words which we can find in use or (in the case of reconstructed terms) in dictionaries.
(2) allowing users to reconstruct terms, e.g. to reconstruct Proto-Germanic *foo (despite it not being found in any other reference work) on the basis of English foo, German Foo, etc.
I strongly agree that we should continue to allow (1). As Prosfilaes notes, much etymological information that is as obvious from its context (which e.g. in the case of a borrowing might be: that speakers of two languages have a history of contact, and the word exists in the suspected loaner language and refers in both languages to something found in the loaner culture, etc) as words' definitions are from the words' context (the sentences in which they are used) may nonetheless go unrecorded in other dictionaries.
With regard to both (1) and (2), the concern that allowing users to produce etymological theories means accepting all such theories (or accepting phonosemantic theories) seems unfounded. We already use our discretion at RFV to evaluate how well a user's claim that the word x means "y" is supported by evidence (examples of the use of x), we already use our discretion in the WT:ES to evaluate how well various etymological claims are supported by evidence. We even use our discretion at RFD to decide how well users' claims that multi-word phrases are idiomatic holds up against "evidence"/"our gut feelings" (take your pick). We can continue to use our discretion.
I admit that (2) feels dicier than (1), though logically, I suppose it isn't. I'm not sure whether (2) should be allowed or not. I'll think about it some more. In the end, I may agree with Chuck Entz: "these are worth having, but they need to be labeled."
PS, I strongly agree that when different dictionaries use different systems for representing/recording reconstructions (e.g. some reference works use macrons to indicate long vowels, some use dots after the vowels; some works use ei where others use ey, etc), it is acceptable and good to standardise on one system.
- -sche (discuss) 22:05, 20 September 2013 (UTC)
I tend to agree with CodeCat and with -sche above. Ivan Stambuk is apparently claiming that allowing Wiktionarians to propose reconstructions is tantamount to allowing any crazy idea be kept. I don't see why this is the case. If Wiktionarians can be trusted to decide on sources, so that "fringe research" like e.g. glottalic theory or Amerind is kept out, why shouldn't the very same Wiktionarians be trusted to weed out bad original ideas? If we have discussions in which we convince each other that etymology A from source 1 is better than etymology B from source 2 (so that etymology A stays here and etymology B is deleted), why on earth could this not be the case for OR etymologies?
Maybe what Ivan is afraid of is that, once the gates of OR are open, all kinds of people will flood to Wiktionary to add crazy etymologies (monkey < French "mon coeur", says I!) to all words, so much that the end result would be worthless for the casual reader. To prevent that, here's a solution: before etymological OR is allowed here (either with its own page here, or with a link to a Wikiversity/Wikibooks page), let there be a discussion somewhere (the BP? The etymology scriptorium? A "OR Etymologies" page?), with a vote, so that new OR etymologies are only allowed after succeeding in a vote. Any unsourced etymologies/reconstructions which were not voted on will be assumed to be wrong and deleted (or at least tagged with RfVs and later deleted). Would that solve the problem? --Pereru (talk) 12:26, 21 September 2013 (UTC)
Deciding on whether a theory or a reference represents a mainstream or fringe opinion is one thing, but deciding on whether a particular theory or a part of it (as represented by a particular reconstruction) is "correct" or "better" than another, is an entirely different thing. The former can be done by scanning the relevant literature, the second involves value judgments, reasoning, arguments - the type of stuff that legitimate scientists are doing. Not User:XYZ.
It's not about trust but about verifiability. We shouldn't be evaluating ideas. Only how other people's ideas (outside the project, that publish true research) are to be presented, with respect to NOR, NPOV and V. Whether their opinion merits inclusion or not. Not creating information but describing information. Stuff like canonicalization of reconstructions, or filling the blanks (intermediate steps of reconstructions) is somewhere in between, and not that controversial. --Ivan Štambuk (talk) 12:41, 21 September 2013 (UTC)
Indeed that is a very noble position, Ivan. But (a) I am not sure from what I read above that Wiktionarians in general agree that we shouldn't evaluate ideas (most of the examples of "acceptable OR" that you listed include that); and (b) I see that this is done in etymological dictionaries, which are also repositories of ideas; and yet they also introduce new hypotheses, without this apparently destroying their value as descriptions of other ideas. As long as they are clearly labeled, so we know whether a certain idea is Pokorny's or User:XYZ's, I don't see the problem. If a user doesn't like Wiktionarian ideas, he skips over them and reads only Pokorny's, or Kortlandt's, or the OED. You keep saying "we shouldn't do X", but doesn't it all boil down to a labeling problem? I fail to see the problem created by allowing OR here at the level of suggested protoforms and/or derivational/borrowing sequences: if everything is clearly labeled, and discussed somewhere before being allowed here, how can anyone be confused? How exactly would the quality level drop? --Pereru (talk) 13:01, 21 September 2013 (UTC)
Canonicalization of reconstructions or giving a more detailed description of reconstructions (such as listing all of the sound changes involved, or how the form is reconstructed) is not OR. It's simply and extended, user-friendly description. Etymological dictionaries usually ignore that kind of detailed description due to space constraints, and the assumption that the reader posses enough knowledge to conclude that by themselves. Which often makes them barely legible. Just because something is not specifically mentioned in the reference it doesn't mean it's original research.
Etymological dictionaries published by etymologists introducing new ideas are themselves original research by those etymologists. It makes no difference whether such new and original etymology was published in a dictionary or a paper. There are also etymological dictionaries which do not introduce new and original etymologies, but simply compile existing theories and opinions. We can be that kind of dictionary.
Of all the possible issues with OR done on Wiktionary (some of which I listed), the thing that bothers me to most are inconclusive etymologies where just about anything can be provided as an explanation. There are many cranks out there with elaborate "proofs" that Hungarian has hundreds of words of Sumerian origin, Albanian words of "Illyrian" origin, substratum word with various fancy explanation with POV-pushers usually endorsing the one that does not involve borrowing from a language belonging to an "undesirable" group. The big problem with them is that those kinds of OR etymologies can rarely be proven formally (i.e. on the basis of sound changes involved, and similar). Just about anything goes, depending on the author's imagination and cultural preferences. The second thing that bothers me the most is that bits and pieces of various theories be cherry-picked into some monstrous original theory of a proto-language, which is different with than anything else found in the literature. For an example of this type of development see here. --Ivan Štambuk (talk) 13:34, 21 September 2013 (UTC)
Another possibility that just occurred to me: Maybe Ivan Stambuk is afraid of the overall low qualification level of Wiktionarians to make etymological suggestions (He wrote: "we are nobody"; "I don't want to read the etymological speculations of untrained people", or something like that). Well, as a practicing historical linguist (in the field of South American languages) with 12 years of professional experience, over 30 articles in linguistics journals and 2 scientific books (Mouton de Gruyter; plus a third one coming, also from Mouton), let me say this: there is a lot of "etymological speculation" (= junk) that comes out of the highest academic sources as well. Which is why, in the end, I distrust the "argument from authority". Yes, a well-trained historical linguist is much more likely to produce a good, sound, trustworthy contribution; but if you stop checking because of that, you're likely to end up very, very disappointed. So: checking is vital. It's the peer-review system that makes science science, not the fact that there are people who worked for decades in it and manipulate complicated theories (alchemy and astrology have that, too). So, I suggest: the most important thing here is how you decide which ideas to include and which ideas to exclude. Who proposes the ideas should certainly be an important criterion, but not the only one.
Test case: both Ivan Stambuk and CodeCat seem to agree that Proto-Balto-Slavic is the modern consensus. I see dissent in recent published sources -- the "Baltic Languages" article in the Elsevier Encyclopedia of Languages and Linguistics (2006), written by an American (S. Young), disagrees, tagging Proto-Balto-Slavic as "still controversial". So, what do we do?
(a) We don't take a position, we just compile the sources. So, if a source has a Proto-Baltic reconstructed word, we can enter it into the Appendix as such; if another source has another (or even the same!) word listed as Proto-Balto-Slavic, we enter it as a separate word into the Appendix; in both cases the source is mentioned at the end;
(b) We discuss and agree that one of these positions -- say, Proto-Balto-Slavic is the scientific consensus -- and keep it, eliminating the other position; Proto-Baltic words are not allowed in the Appendix (if there are any, they are deleted), nor in etymologies of speicific words.
(b-1) We keep strictly to the sources: only those Proto-Balto-Slavic words that were reconstructed in some (trustworthy) source are allowed here;
(b-2) We adapt Proto-Baltic sources: at least some Proto-Baltic reconstructions are clearly also acceptable Proto-Balto-Slavic reconstructions, so we can relabel them accordingly and add them to Wiktionary.
I claim that (b-2) clearly involves original research (or at least original ideas), and probably also (b-1) and all of (b) at some level. Which option is more in agreement with the spirit of Wiktionary? In passing: if you agree that Wiktionarians can decide what the "best theories" are, then (it seems to me) you are implicitly disagreeing with the idea (that I think Ivan Stambuk espouses) that "Wiktionarians can't be trusted to do OR or decide on such matters". --Pereru (talk) 12:51, 21 September 2013 (UTC)
The controversy seems to be mostly political in nature, mainly within the Baltic states themselves who don't want to associate themselves with their former Russian conquerors. I experienced that first hand myself when an anonymous editor on Wikipedia repeatedly tried to remove all references to Balto-Slavic from the w:Baltic languages article, claiming that it was "false propaganda". It seems that there is consensus for Balto-Slavic outside the Baltic states but there is consensus against it within them, so when the two ideas meet there is controversy.
The question of Proto-Baltic is somewhat separate from that, though. It's about whether Balto-Slavic (or Indo-European) split into Baltic and Slavic, or whether Balto-Slavic split into Slavic, East Baltic and West Baltic. In the latter case, there is no Proto-Baltic language, only Proto-Balto-Slavic, so those that discard Balto-Slavic as a whole follow the "Baltic node" hypothesis. Those that did subscribe to Balto-Slavic also tended to believe in "Baltic is a node" and that was the mainstream idea. It's only recently, with linguists getting a deeper understanding of the sound changes of Balto-Slavic, that many people are questioning whether Baltic really is a node. They find that when they reconstruct Proto-Baltic with a modern understanding, what they get doesn't differ substantially when they add Slavic into the mix. In other words: if you reconstruct Proto-Baltic, then in the majority of cases you can make Proto-Slavic descend from it too. Which means that you really reconstructed Proto-Balto-Slavic. —CodeCat 13:30, 21 September 2013 (UTC)
I haven't read this but, usually on Wikipedia original research means something you can't back up with reliable sources. It's a pejorative term that we wouldn't apply to someone who's put time and effort in to sourcing a word that isn't in any published dictionary. Mglovesfun (talk) 21:10, 21 September 2013 (UTC)
  • I oppose original research in etymologies. I object to Wiktionary editors inventing etymological hypotheses not backed by reliable sources and entering them into Wiktionary. On this, I seem to generally agree with Ivan Štambuk. As far as definitions, I support the current practice of editors providing original phrasing of definitions based on the attestation evidence of words as found in use, regardless of whether this practice is considered original reaseach. Given the practice relies on primary sources, it seems to be original research as defined by Wikipedia at W:Wikipedia:No original research, especially W:Wikipedia:No_original_research#Primary.2C_secondary_and_tertiary_sources: "All interpretive claims, analyses, or synthetic claims about primary sources must be referenced to a secondary source, rather than to an original analysis of the primary-source material by Wikipedia editors." Our definitions are at least in part a result of original analysis of the evidence found in primary sources rather than being verified by referencing to a secondary source. See also Wiktionary:Wiktionary is a secondary source. --Dan Polansky (talk) 08:59, 22 September 2013 (UTC)
  • The problem with etymology is that some of them are so obvious people don’t even bother publishing. If you’re an author of a Portuguese etymological dictionary, there is little point in wasting space to say that casa, meaning house, descends from Old Portuguese casa, meaning the same thing. If we want to be completionists (and we do), some degree of original research is necessary.
The opposers of this practice are committing a false dichotomy, implying that either no OR whatsoever be allowed, or any nonsense someone comes up with be allowed. This is not the case; naturally, OR shouldn’t be accepted when it comes to controversial or uncertain etymologies/reconstructions/whatnot. — Ungoliant (Falai) 13:14, 24 September 2013 (UTC)
  • Anything involving protolanguages is controversial and uncertain. There is not a single protolanguage that has unique reconstruction, accepted by all historical linguistis. Had you actually followed the discussion you'd see that this OR proposal primarily refers to protolanguages.--Ivan Štambuk (talk) 15:28, 24 September 2013 (UTC)
Nonsense. The discussion encompasses all sorts of OR. Etymology, reconstruction, pronunciation and definitions have been mentioned so far. There is little controversy and uncertainty when it comes to the reconstruction of Vulgar Latin etyma, for example. — Ungoliant (Falai) 15:45, 24 September 2013 (UTC)
Correct me if I am wrong, but etyma is a word that is actually used. We are talking about reconstructions in etymologies. Those are highly speculative (contrary to pronunciations or definitions) and need to be sourced. Dakdada (talk) 16:04, 24 September 2013 (UTC)
I know, that’s why I’m using it. As I said, not every reconstruction is highly speculative. Vulgar Latin *oclus, for example, is not speculative, it’s a simple matter of knowing the slightest thing about sound change and Romance languages. — Ungoliant (Falai) 16:17, 24 September 2013 (UTC)
Vulgar Latin is really not a typical example of a protolanguage. Definitions and pronunciations are irrelevant. Let's stick to real protolanguages and not various intermediate forms between attested ancestor and daughter languages. --Ivan Štambuk (talk) 16:30, 24 September 2013 (UTC)
A complete ban on OR would also prevent OR in atypical protolanguages. As I said before, there is a false dichotomy. It’s not a choice between no OR at all and total OR anarchy. Different protolanguages, even different reconstructions in the same protolanguage, should be treated differently. Vulgar Latin is a protolanguage (and it is a real protolanguage) that doesn’t really need references for many entries, while Proto-Indo-European is one that does. — Ungoliant (Falai) 16:41, 24 September 2013 (UTC)
There is no false dichotomy. You keep repeating that the arguments applied to dismissing OR in protolanguages are being applied to dismiss OR in general, even if applicable to to obviously plausible and conclusive etymologies. This is not the case. This issues that I listed with OR etymologies pertain to controversial and inconclusive etymologies, regardless whether they involve speculative borrowings or reconstructions.
Vulgar Latin is an opaque term spanning centuries and dozens of Romance dialects. Proto-Romance is a protolanguage reconstructed by comparative method. Plausibility of a particular reconstruction such as *oclus in no way implies the plausibility of a protolanguage as a system; e.g. there could easily many different reconstructions of Proto-Romance, all having *oclus, but differing in some other details. Granted, I'm not that terribly familiar with the issues involving the historical development of Romance languages, but I doubt that the the rosy picture that you paint is accurate. Regardless, what is applicable to Proto-Romance, something reconstructed on the abundance of attested evidence and an attested ancestor language (Latin), is not applicable to 99% of other protolanguages. --Ivan Štambuk (talk) 17:54, 24 September 2013 (UTC)
“There must not be any kind of original research (OR), for etymologies or whatever.”
Original research is original research whether it is conclusive and uncontroversial or not. In any case, it doesn’t matter if we define OR as any content users create that’s not based on a source, or just uncertain content, as long as we don’t allow them to add unfounded crap. — Ungoliant (Falai) 18:13, 24 September 2013 (UTC)

If we allowed OR in protolanguages some checks and balances are in order if we don't want to become a mixture of Urban Dictionary and a conlang community:

  1. Only reconstructions based on attested comparanda are allowed. I.e. no "long range" reconstructions based on comparing reconstructions themselves (e.g. no Nostratic) are allowed. No reconstructions involving protolanguages that have not been generally accepted (e.g. Indo-Uralic) are allowed.
  2. Speculations regarding prehistorical borrowings from and to protolanguges are forbidden.
  3. No reconstructions based on the uncertain (i.e. guessed) meanings of words are allowed, unless an uncertain meaning is supported by a reliable source.
  4. OR reconstructions must be spelled in some Wiktionary canonical form, or a form commonly used for the respected protolanguages. Authors must not make innovative assumptions regarding the properties of protolanguages, such as inventing new phonemes, endings or paradigms. I.e. reconstructions must be done within an established protolanguage framework.
  5. Inventing new inflections for protolanguages is forbidden. All inflections must be sourced. If generated by a template and not manually entered, all of the affixes must be sourced.
  6. Every OR reconstruction must have an appendix page created before being referenced in the etymology sections of mainspace entries. Every such appendix page must have a template placed on top (similar to {{reconstructed}}) that says "This entry contains original research". Every such appendix page must have on its talk page described how the author reconstructed it, or (preferably) have a ===Reconstruction=== L3 that describes the process. [This can be facilitated by making separate appendices containing the list of sound changes in chronological order, that could then be referenced by number]. OR reconstructions must not be used outside the etymologies of words that involve them.
  7. Acceptability of doubtful OR reconstructions is to be decided on their talk pages by a voting process. Those that fail will have their entry deleted, and must not be recreated without prior approval on the talk page. --Ivan Štambuk (talk) 16:08, 24 September 2013 (UTC)

I think that these will eliminate 99% of trash. --16:08, 24 September 2013 (UTC)

I agree with all of your points except for 5. I don't see a reason why we must treat inflectional endings specially, when they are reconstructed using the same processes as any other term. If comparative evidence leads unambiguously to a single reconstructed form, it should not matter whether the reconstruction is for a stem, ending or a full form.
Point 6 also needs a bit of discussion I think. I agree with the idea on principle, but I think we should follow an RFV-like process for reconstructions: if in doubt, give a chance to source it first, delete only if nothing is found after that.
As for point 1, I assume that it only applies to "macro" reconstructions, and not the reconstruction of individual morphemes within a language. For example, if many reconstructed words display a clear suffix with a definite meaning and inflection, the suffix should be reconstructable too. —CodeCat 17:22, 24 September 2013 (UTC)


Hello community,
this is to inform you about the (re)start of a discussion in which you might be interested. In short, myself and a few other Wikimedia editors decided to oppose the registration of the community logo as a trademark of the Wikimedia Foundation.

The history of the logo, the intents behind our action and our hopes for the future are described in detail on this page; to keep the discussion in one place, please leave your comments the talk page. (And if you speak a language other than English, perhaps you can translate the page and bring it to the attention of your local Wikimedia community?) I’m looking forward to hearing from you! odder (talk) 10:00, 21 September 2013 (UTC) P.s.: You can check whether the WMF protects the logo of your project by seeing if it's listed as "registered trademark" on wmf:Wikimedia trademarks.

Make the logo as random as that of Omniglot, IF you could with at least Branah (assemblage point research is optional, regardless)... --Lo Ximiendo (talk) 09:19, 22 September 2013 (UTC)
Speaking of which, right now we have two logos that are trademarks for Wiktionary. Maybe one day we will have a real logo, the same used by everyone. Dakdada (talk) 09:25, 23 September 2013 (UTC)

WorldCat templateEdit

I have imported the OCLC template from Wikipedia. This is a useful reference for book cites that are too old to have an ISBN number. I have frequently thought this would be useful while compiling historic cites. I hope others find it useful too. I had to remove the code for the printed version of the template since it called several other templates that do not exist on Wiktionary. I don't know whether printing is a big issue on Wiktionary, but if it is perhaps someone good with code can take a look at it. SpinningSpark 14:35, 21 September 2013 (UTC)

Links in headwords to specific sensesEdit

I recently had an edit reverted by bot in which I had placed a section link in the headword phrase pointing to the correct etymology section of the second word of the phrase. I cannot for the life of me see why this is not a helpul thing to do, especially in this case where the pronunciation is variant from the other five etys on the page. If we are not going to link to the correct word on the page it seems pointless to link headwords at all. I took this up with the owner of the bot [10] but they do not agree so I am bringing it here to see what others think. SpinningSpark 17:01, 21 September 2013 (UTC)

The problem with such links, that I can see, is that section links are possibly ambiguous and break very easily. In our conventions, only language name sections are guaranteed to be unique, all others can appear multiple times on a page and are therefore ambiguous. It's even worse with numbered sections like this. What if someone changed the numbering? Then the link would link to the wrong section too. Section links should be treated purely as convenience. They should not, ever, be used to convey information to the user. If you are relying on that, then you are doing something wrong and should probably rethink what you are trying to do. —CodeCat 17:06, 21 September 2013 (UTC)
As I said on your talk page, that problem is easily overcome by using html anchors instead of the heading as a target for the link. If additional sections are added, the anchor will still be attached to the correct section. I'm quite happy to do it that way but you still seemed to find it objectionable. In any case, on the vast majority of pages the number of etymologies is pretty stable so the problem you raise is rather minor. SpinningSpark 17:12, 21 September 2013 (UTC)

[[mazātl]] and [[mazatl]]Edit

Something needs to be done with these two, but I am not sure what that is. Keφr 20:27, 21 September 2013 (UTC)

I'm still confused over this one myself. I seem to think macrons are used in Nahuatl, but not always, because Nahuatl is a relative latecomer to the Latin script. Mglovesfun (talk) 20:32, 21 September 2013 (UTC)
By the way your quadruple bracketing breaks the automatic anchor from the table of contents. Mglovesfun (talk) 20:40, 21 September 2013 (UTC)
Module:languages is currently set to strip macrons from Nahuatl page names. So mazātl can't be linked to with {{l}}. —CodeCat 20:54, 21 September 2013 (UTC)
mazātl is actually Classical Nahuatl. Mglovesfun (talk) 21:06, 21 September 2013 (UTC)
That's what CodeCat means: Module:languages removes macrons from Classical Nahuatl (nci) page names. (By the way, I think it's a mistake that {{l}} forcibly prevents links to page-titles that it has a mapping-rule for. I believe this was discussed for Latin, and then just thoughtlessly applied to all other languages. As a result, we can't even really have entries for the Hebrew vowel symbols.) —RuakhTALK 05:19, 22 September 2013 (UTC)
Can't be linked to by {{term/t}} or {{t}} either then; not necessarily a bad idea just because there are some exceptions. Admittedly in this case you're going to end up typing it manually. Though like I said, I thought that Nahuatl does use macrons just they are not universal. Mglovesfun (talk) 11:47, 22 September 2013 (UTC)
It should be possible to change the rules a bit so that diacritics aren't removed if the page name is only one character. That would allow us to link to the diacritics themselves. —CodeCat 12:01, 22 September 2013 (UTC)
Addendum: Actually, on second thought, you can currently circumvent {{l}} by using numeric character references: for example, {{l|nci|maz&#x101;tl}} currently produces a link to [[mazātl]]. Which is probably what we'd want to do anyway when the entry-title consists entirely of a diacritic, because otherwise the wikitext becomes hard to work with. But I wouldn't be surprised if someone is already hard at work right now, bent on closing this loophole. :-P   —RuakhTALK 21:59, 22 September 2013 (UTC)
Well, what's wrong with simplifying Wiktionary to bring it within the range of the skill set of available resources? This might be a reason to consider whether its worth fighting Webfonts and ULS. DCDuring TALK 14:19, 24 September 2013 (UTC)


Dictionaries, phrase-books, and year-books write & type like three gears of war, FYI. --Lo Ximiendo (talk) 14:13, 24 September 2013 (UTC)

[ɾ] or [d] in thirty?!Edit

Ad https://en.wiktionary.org/w/index.php?title=thirty&diff=23244022&oldid=23231423 - discussion about [ɾ] or [d] in thirty. I must say that the [ɾ] is really confusing for an average reader (and even for such like me, who at the college attended among others phonetics as well), because at least the listening impression of the sound is really /d/ (I am a native speaker od Czech, not of English, and I bet for a German native speaker it will be the same). I beg for a system solution of this - at least to add a note of this at every relevant place (in a system way), or better to add a pronunciation with /d/ in such cases. For example http://dictionary.cambridge.org/dictionary/british/thirty chooses another solution, which is actually not much better - writes [t] and [t̬ ] (an average reader (and even me ;) ) - doesn't know such a sign - it seems it might be a variant of /t/ or a mistake, maybe). However, [ɾ] indicates a pronunciation of an r-sound, what is definetely misleading (at least for me)... Thanks, --Jiří Janíček (talk) 23:25, 24 September 2013 (UTC)

As a native speaker of US English who uses flaps in words like butter, I agree that this seems like a /d/. That's how it seems when I produce it and when I listen to it. --BB12 (talk) 23:45, 24 September 2013 (UTC)
As a native US speaker, I have used all of the pronunciations given, the hard "t" pronunciation for emphasis or when the number is a focal point of the conversation, the "d" pronunciation most commonly (I think), and the minimal-consonant pronunciation when tired or rushed. The last comes naturally and doesn't really need dictionary documentation, but folks like to record that kind of thing anyway, even if it is misleading or confusing. DCDuring TALK 23:48, 24 September 2013 (UTC)
I certainly use [ɾ] (i.e., the same consonant that most Americans, including myself, use in butter), and I suspect that BB12 and DCDuring do as well. It don't think it really makes sense to say that it's /d/; the reason it sounds the same as /d/ (in the mouths of some speakers) is that the distinction between /t/ and /d/ is neutralized in this context (for those speakers), so /d/ is no better than /t/, and the latter has the advantage of history, analogy, and spelling. It's presumably true that when someone who doesn't neutralize the distinction hears the pronunciation of someone who does neutralize it, it'll sound like /d/ to them (due to the voicing); but that's a terrible basis for a transcription. (We could toss out half the vowels if we only care about what something sounds like to people with a different accent.) That said, if we want to capture the neutralization in an otherwise phonemic transcription, I think we could write /D/ (using uppercase because it's an "archiphoneme" subsuming /t/ and /d/ in this context). —RuakhTALK 01:40, 25 September 2013 (UTC)
It probably varies both geographically and over time. Probably also be level of education and social context. bd2412 T 02:45, 25 September 2013 (UTC)
The phoneme, broadly transcribed, is definitely /t/ and not /d/; Merriam-Webster, Dictionary.com, Macmillan (US/UK) and the Oxford Advanced Learner's Dictionary all agree on that. The phoneme is realised as [ɾ] (narrowly transcribed) in this context in American and Australian English because those dialects exhibit "intervocalic alveolar flapping". That, as Ruakh notes, means they reduce /t/ and /d/ to [ɾ] in several circumstances: for example, petal and peddle fall together as [ˈpɛɾl̩] (which is, incidentally, also how some Scots pronounce pearl). - -sche (discuss) 06:03, 25 September 2013 (UTC)
I looked around, and specialists seem to agree that this sound in "thirty" is a flap, but to me, the flap in "butter" seems different from the sound in "thirty." Without further proof, I have to agree with the specialists :) --BB12 (talk) 09:51, 25 September 2013 (UTC)
Maybe the r in thirty has the effect of turning the flap into a slightly more retroflex-like sound? —CodeCat 17:50, 25 September 2013 (UTC)
  • In
    , I hear something like "/ˈθɜɹdi/", but I am a Czech native speaker. By constrast, the British
    produces a clear "t" for me. I also hear "d" in the U.S. pronunciation of "pretty", as if "pridi":
    . It's great that Wiktionary users can peruse sound recordings rather than having to rely on their IPA transcriptions. --Dan Polansky (talk) 18:27, 25 September 2013 (UTC)
I found a reference saying it's [d] at w:Intervocalic_alveolar-flapping: "Flapping/tapping does not occur for most speakers in words like carpenter and ninety, which instead surface with [d]." This is a Wikipedia page with a link to a sound file, so it is not written proof. --BB12 (talk) 21:27, 25 September 2013 (UTC)
But the relevant phoneme in thirty is not preceded by an /n/. — Ungoliant (Falai) 21:37, 25 September 2013 (UTC)
Good point; that makes it inconclusive. I have r-ful pronunciation, so the /r/ in "thirty" has a full consonant value for me and it seems to me fairly clear that I pronounce this more as a /d/ than a flap. That is original research, though :) --BB12 (talk) 23:09, 25 September 2013 (UTC)

If a question about pronunciation of a common English word takes this much discussion, then the outcome might belong in square brackets.

Wouldn’t most readers benefit from a broad, phonemic pronunciation like /ˌθərti/, utilizing a simple set of IPA and not labelled with any accent? Every English speaker knows how they pronounce /t/ in that place of the word. Michael Z. 2013-09-26 14:39 z

As someone who has been a native US English speaker for over 30 years and has lived on both sides of the country, I can tell you with relative certainty that /θɜɹdi/ ('thurdee') is the normal pronunciation of the word in the US. Kaldari (talk) 09:27, 5 November 2013 (UTC)
Here's a Youtube video specifically about how to pronounce the word "thirty": http://www.youtube.com/watch?v=1jVjujBjcdY. In the video, the teacher (who sounds very American) explicitly says that the 't' should be pronounced as a 'd'. Kaldari (talk) 09:37, 5 November 2013 (UTC)

Mandarin PinyinEdit

I think that pinyin entries looks strange. Look at ma. It has 3 romanization sections. I can understand that you want to make clear what is standard spelling and what is not but 3 sections is too much and makes no sense. Why not just have one section and make non standard spelling a subsection under romanization. The categories have same kind of problem. The logical categories in my opinion would be standard spelling in category:pinyin and non standard spelling in a subcategory to pinyin. Pinyin with tone number should also have a separate category. Kinamand (talk) 17:22, 25 September 2013 (UTC)

There's no problem with ma#Mandarin. --Anatoli (обсудить/вклад) 04:11, 26 September 2013 (UTC)
Actually the first section looks redundant. Everything in there is already found under the second section. JamesjiaoTC 04:15, 26 September 2013 (UTC)
Good point. I have removed the fist section, modified the second a bit. With is a bit confusing, because it can also be used in traditional, so I made two definition lines. --Anatoli (обсудить/вклад) 04:23, 26 September 2013 (UTC)
This shows how it happened: (diff). There were three different headers, all of which ended up as "Romanization". I'm sure the idea was to go back and consolidate the sections on the entries that were changed by bot, but this one, at least, was missed. Chuck Entz (talk) 05:54, 26 September 2013 (UTC)

Heading for adjective forms of nouns? (eg father→paternal, eagle→aquiline)Edit

If you look up "father" on Wikipedia it will tell you in the second sentence that '[t]he adjective "paternal" refers to a father'. So it seems strange that when you look up "father" on Wiktionary, "paternal" is only listed amongst a bunch of "see also" words. Many other nouns, such as eagle, don't even list its associated adjective (aquiline).

Existing headings ("Derived" and "related" terms) don't fit for adjectives which are "collateral adjectives", that is, not related etymologically (e.g. sheep→ovine, weather→meteorological); lumping the adjective forms under the "see also" heading makes them impossible for the reader to find without more research; and expecting readers to find the adjectives under in appendicies such as Appendix:Animals is not at all friendly.

I don't know grammatical terminology well, so I'm not sure what the best heading would be. Perhaps one of:

  • adjectival forms
  • adjectives
  • denominal adjectives
  • relational adjectives

Or perhaps the adjective could fit in the with the noun header, e.g.:

sheep (plural sheep, adj. sheepish, ovine)
  1. A woolly ruminant of the genus Ovis.

Sorry if this has been discussed previously or already has an answer but I couldn't find it if it has. Other related terms that should be tagged as such include "group" words (whale→pod) and name for "homes" of animals (seal→rookery), so if you're looking at a broader heading or solution then perhaps consider those and the other headings of Appendix:Animals should be considered too, but I think a simple "Adjective forms" heading or the like would be simplest.

In short, how can a noun's adjectives be listed so they can be identified as such? --Pengo (talk) 22:23, 25 September 2013 (UTC)

I would just put them under "See also" and look hard at the other items under that heading with an eye toward deletion. There are some that are formed by productive affixation or compounding (eg, eaglish, eagle-like, egalitarian) (Mr. Hall said, 'What you have said today can be construed neither hawkish, or dovish, but eaglish, that gallant and magnificent creature that symbolizes the strength and honor of this great nation'). What about formations like eagle-eyed, spread-eagle, eagleless?
Are they all supposed to be in Derived or Related terms? As most of the adjectives are Derived terms, there would be duplication. Which should take precedence? If this is only for adjectives that do not fit under the existing headings, I think there will be relatively few, which is why See also seems like a reasonable home. DCDuring TALK 23:03, 25 September 2013 (UTC)
I would list them as adjective forms, and leave them out of derived terms, much the same way the plural form is not included as a derived term. The fact many adjectives are derived forms, not collateral adjectives, makes it even more important to separate them out or tag them. For example, try finding the adjective form(s) of turtle amongst its derived terms. How is a reader possibly meant to even know which heading to even look for it? or that they've found all the adjectives listed? or that any word listed is even the adjective form? Can you imagine trying to find adjectives on a mobile device by clicking through every term listed under "see also" and "derived terms" to check? The current system is clearly broken. Pengo (talk) 00:11, 26 September 2013 (UTC)
There are lots of interesting semantic connections among things. I don't think that we want to add headings for each one. See also is a wonderful home for terms that don't fit elsewhere. We don't usually have items appear under multiple headings in a given entry. Usually a term gets only one appearance in the definition or under the etymology, derived terms, synonyms, hyponyms, hypernyms, related terms, or see also headings.
Perhaps some kind of categorizing label would help, both to find the particular reason why an item appears under See also and also to make it possible to find all of them. In fact, that seems like a good idea for all the terms now appearing under See also. I often find it hard to understand why some terms appear there. DCDuring TALK 01:52, 26 September 2013 (UTC)
I think we should abandon both "See also" and "Related terms" and come up with something more obvious than that. —CodeCat 01:57, 26 September 2013 (UTC)
Paternal is not a form of father, so “Adjectival forms,” which means forms of this term, is not right. Most of our headings represent etymological relations, including “Related terms,”, not semantic ones like this. But we do have Synonyms and Antonyms, and Coordinate terms, Hypernyms, and Hyponyms. I think See also is a good enough catch-all for such second-order relationships as eagle–aquiline, but I am open to suggestions for better heading names. Michael Z. 2013-09-26 14:13 z


We have a template called zh-tone but it is only used in few entries. It should be in all entries in the category Mandarin nonstandard forms. Can someone fix it with at script or bot? If you look at feng the text English transcriptions of Chinese speech .. should be replaced with the template. Kinamand (talk) 08:58, 26 September 2013 (UTC)

Well it won't be that easy it's more than I can do. Mglovesfun (talk) 18:04, 29 September 2013 (UTC)
The closeness of wording between the written-out version at feng and {{zh-tone}} makes me suspect that the template has usually been substed in. Is it really so bad to leave it the way it is? —Aɴɢʀ (talk) 18:08, 29 September 2013 (UTC)
I think the answer is probably not. If the content of {{zh-tone}} were radically changed, then it would be a problem. As a rule though, best not to subst: in this situation because then one change to the template filters down to all the entries that use it. Mglovesfun (talk) 20:27, 29 September 2013 (UTC)
Given that the message has varied slightly over time (in 2008, EncycloPetey linkified transcriptions and tonal; in 2012, Hamaryns changed the transcriptions link to point to the lemma; this year, Mglovesfun changed Chinese to Mandarin), it might be better if entries used the template normally, rather than substing it, so the message could be kept up-to-date everywhere.
I've checked the database dump for all occurrences of of Chinese speech often fail, of Mandarin speech often fail, and/or appropriate indication, and found 337 instances of English transcriptions of Chinese speech often fail to distinguish between the critical tonal differences employed in the Chinese language, using words such as this one without the appropriate indication of tone. on a line by itself, plus three instances of this sentence as a list element, and one instance of each of four relatively minor variations on it.
I'd be happy to standardize and/or templanate this, if people want.
RuakhTALK 07:08, 1 October 2013 (UTC)
It is entirely likely that I did that when creating pinyin tone entries. Sorry about that! Please do template the lot, as it is conceivable that there will be minor adjustments to that presentation in the future. Cheers! bd2412 T 15:22, 1 October 2013 (UTC)
O.K. then. Does anyone object to my templanating these? And, does anyone have a preference between the name {{zh-tone}} and the name {{cmn-toneless-note}}? —RuakhTALK 07:57, 3 October 2013 (UTC)
I definitely prefer the latter as long as it's only going to be used for Mandarin. —CodeCat 12:31, 3 October 2013 (UTC)
Same as CodeCat. Mglovesfun (talk) 13:46, 3 October 2013 (UTC)

Automated romanization of BurmeseEdit

Currently we use a phonetically based romanization of Burmese which, due to the vagaries of Burmese orthography, cannot be generated automatically but must be added manually. Since our Burmese entries already almost all include IPA transcription, however, a phonetically based romanization is not really necessary, and trying to keep up with all the Burmese redlinks around and bringing their transliteration into line with the guideline at Appendix:Burmese transliteration is like herding cats. Therefore, I propose we switch to a romanization based on Burmese orthography; for full details see Appendix talk:Burmese transliteration#Okell's Recommended Standard.Is this a good idea? Is it feasible? If so, can someone else write the Lua code since I couldn't code my way out of warm Jell-O? And once it's done and tested and ready to implemented, can someone send a bot out to search and destroy all existing Burmese transliterations on non-Burmese pages so they don't overwrite the automated one? —Aɴɢʀ (talk) 17:57, 29 September 2013 (UTC)

Having automatic transliteration for Burmese would be great. I only don't know why Okell's standard is better. Transliterating Burmese won't be an easy task and I don't know if we have volunteers. The challenge is probably to determine the syllable boundaries and vowels that surround consonants. A few people here are able to do it, I think but they may not have interest. Besides, opposition to particular standards works quite discouragingly and may upset all efforts. No single standard is liked by everybody. See Module_talk:ko-translit/testcases#Confusion_between_transcription_and_transliteration.3F. --Anatoli (обсудить/вклад) 00:59, 2 October 2013 (UTC)
Okell's standard is just a suggestion; we could also use the MLCTS, which has some official recognition and which is in widespread use at Wikipedia. Syllable boundaries aren't too hard to detect in Burmese; , , and always mark the end of syllable, as do and unless they're followed by a tone mark, as do vowel diacritics unless they're followed by (a consonant plus) one of the things that marks the end of a syllable. I don't see why vowel diacritics per se should present a difficulty. As for reaching consensus on which transliteration system to use, there are so few people around here who are interested in Burmese at all it shouldn't be too difficult. —Aɴɢʀ (talk) 11:45, 2 October 2013 (UTC)

Green-linking the links to French feminine formsEdit

Hi. I have asked that something be done to {{fr-noun}} so that links to non-existent feminine forms of French nouns be colored green in the main entry, for those who use the "accelerated creation" gadget. As I understand the answer of CodeCat, this wasn't done because this would ease the creation of "feminine form of" entries. While I see nothing wrong in easing the creation of such entries, I have offered other options on the discussion page, which I invite you to comment. Thanks. — Xavier, 20:53, 29 September 2013 (UTC)

As far as I know, CodeCat was the one that removed the acceleration. I 'vote' to reinstate. Mglovesfun (talk) 21:16, 30 September 2013 (UTC)
I think this is the wrong discussion. I'm sure everyone can agree that if we want these nouns to be defined as "feminine form of ____" (or similar), then we should make that accelerable; and I'm sure everyone can agree that if we don't want these nouns to be defined as "feminine form of ____" (or similar), then we shouldn't make that accelerable. I suppose there might be some editors who'd support defining these nouns as "feminine form of ____" (or similar) for the simple reason that they want them to be accelerable (and that this or similar is by far the easiest format to accelerate), but even if so, the right thing to discuss is: are we O.K. with these nouns being defined as "feminine form of ____" (or similar)? Personally, I really don't think they should be defined as exactly "feminine form of ____", but I might be on board with something similarly automatic. —RuakhTALK 01:30, 1 October 2013 (UTC)
Any reason you didn't object at the time? Mglovesfun (talk) 13:49, 3 October 2013 (UTC)
Sorry, object to what? At the time of what? —RuakhTALK 15:13, 3 October 2013 (UTC)
The creation of French feminine nouns as 'feminine form of'. And for how long? Erm I requested it from Conrad.Irwin so we can probably find the specific edit. And until very recently; perhaps from 2010 to 2013. Mglovesfun (talk) 15:17, 3 October 2013 (UTC)
I don't remember. Either I wasn't aware, or I didn't feel the same way that I do now, or I didn't care enough to speak up. (Why do you ask?) —RuakhTALK 17:59, 3 October 2013 (UTC)
Ruakh, I have started this discussion because I have been asked to by CodeCat. I too am not quite satisfied by the direction this discussion takes: if we don't want nouns to be defined as the "feminine form of" another noun, then the f parameter must disappear from this template. But I'm not asking that it be removed or kept. This is quite another discussion indeed. Actually, I am just asking that the creation of the feminine form be accelerated when this f parameter is used, that is when a noun is explicitly defined as the feminine form of another. Can we all agree on this? — Xavier, 23:14, 13 October 2013 (UTC)
Sorry, but that's wrong: "when this f parameter is used" is not equivalent to "when a noun is explicitly defined as the feminine form of another". (That's begging the question.) Just because the headword-line for a masculine noun includes a link to the corresponding feminine one, that doesn't mean the feminine noun should necessarily be defined in terms of the masculine. Note that the template supports both {{fr-noun|m|othergender=chatte}} and {{fr-noun|f|othergender=chat}}; so by your approach, we could happily define chatte as "feminine of chat" and chat as "masculine of chatte", and never actually mention that either word means "cat"! —RuakhTALK 05:36, 14 October 2013 (UTC)
Interestingly, we had a similar discussion on this subject on fr.wikt. To sum up, we don't consider that chatte is the feminine of chat: we treat them as separate words. Still, we add a link to the corresponding form in this form: "chat (equivalent for the other sex: chatte)". Dakdada (talk) 09:04, 14 October 2013 (UTC)
Interesting to me because it was the other way round when I edited there. Very well. Mglovesfun (talk) 09:27, 14 October 2013 (UTC)
BTW, since the beginning I am referring to the template's documentation, which reads: "f: feminine form". This is perhaps the ground of my misunderstanding. As for the looping issue, I don't know if it was intended to be humorous, but the same reasoning could also apply to singular/plural ("chien is the singular of chiens"). Should we also ban the "plural of" definitions ? ;-) Personally, I'm very much in favor of simplicity and I believe that an entry that defines a word being the feminine form of another word is better than no entry at all. After all, this is a wiki, and any one can write a better definition afterwards. Anyway, thank you all for discussing this. — Xavier, 01:06, 15 October 2013 (UTC)
Wait, when I point out that your argument gives an absurd result, your response is to point out additional absurd results it could give? I gotta be honest, that is not winning me over to your viewpoint. :-P   (Though in fairness to your argument, it doesn't actually give the absurd result that you say it would. Plural-form entries don't link to the singulars on the headword line, because we do that on the definition line instead; so they're different from these feminine-noun entries, where we do link to the masculine counterparts on the headword line.)RuakhTALK 05:58, 15 October 2013 (UTC)
My objection to defining these as "feminine forms" is that it turns the masculine form into some kind of lemma while the feminine is defined in terms of the masculine. For adjectives it can be argued that the masculine and feminine are part of the same lemma and have the same meaning, and it's customary to use the masculine singular as the lemma form. But for nouns it doesn't work that way because the two nouns have different meanings: the distinction is not grammatical like it is in adjectives, but semantic. This means that they are distinct nouns with different definitions and should be defined as separate lemmas. An argument from precedent can also be made: we don't define actress as "feminine of actor" in English, so why should we do this for French? —CodeCat 02:32, 15 October 2013 (UTC)
How is this different than Category:Dutch diminutive nouns ? --Ivan Štambuk (talk) 03:13, 15 October 2013 (UTC)
I don't know really. It just "feels" different? I can't really say why. In Dutch at least, pretty much every noun has a diminutive, so they could be considered a part of a noun's paradigm of forms in a way. But I think I just dislike the idea of treating feminine nouns as secondary to masculine ones. —CodeCat 03:19, 15 October 2013 (UTC)
Masculine and feminine forms of a French noun have different definitions only if you want them to. In every French dictionary I know (and believe me I have perused a lot of them), masculine and feminine forms are defined under a single entry which is written in a neutral way.
The fact that you keep taking examples from the English language (actor/actress, prince/princess, etc.) shows that we are not on the same ground. No one is saying that an actor is the masculine form of an actress in reality. Keep in mind that we are dealing with the French language here, a language with its peculiarities, and among them 1. a gender which doesn't exist in English and 2. a masculine form which has a role of a kind of radical (akin to an infinitive for the verbs) upon which we construct the feminine forms. Even if that does not please you, saying that this word is the feminine form of that word is grammatically correct in French. On the other hand, saying what is actually described by this word is the feminine of what is actually described by that word does not make any sense. In any language I guess.
So. On one hand, saying that chienne is the feminine form of chien is grammatically correct in French. On the other hand, saying that une chienne is the female of a chien is also correct but we are then talking about what the word describes, not about the word itself. So we have two possible definition of a feminine French word, and both are correct. Now, the question is should we choose a definition over the other?
Since both are correct, theoretically we shouldn't. But the grammatical definition is more practical than the other. You are suggesting that we duplicate every definition of French nouns that have two genders but for what purpose exactly? Do you consider the extra-effort that it would take, and the management of the discrepancies that will inevitably ensue (if you add a sense here, you must add the same there too, etc.). Obviously you don't quite realize the vastness and the richness of the French language. Duplicating is not practical. Even if we do not have the constraint of space that paper dictionaries have, we have the same constraint of limited work-force. — Xavier, 00:09, 16 October 2013 (UTC)
In all French dictionaries chien and chienne are indeed described in the same entry. However, this is most likely due to a matter of available space than grammar. In the online version of Larousse chienne is in a different entry than chien ([11]). Also, there are usually a lot of senses that are specific to one gender or the other beyond the proper sense (see again chien vs chienne) : if the differences are not made clear, then the reader may think that both terms can be used interchangeably for every sense, which is far from true. Dakdada (talk) 14:31, 16 October 2013 (UTC)
Last modified on 14 April 2014, at 19:13