Last modified on 12 March 2015, at 12:21

Wiktionary:Beer parlour

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives +/-

January 2015

Wiktionary:Translation requestsEdit

I think, with 38½ sections per month on average and little interest in archiving, this page is a good candidate for conversion to the monthly pages system. If we still want to keep it, anyway... Keφr 22:33, 1 January 2015 (UTC)

Or it could be purged more often. — Ungoliant (falai) 22:36, 1 January 2015 (UTC)
Yes, but who wants to do it? Keφr 22:48, 1 January 2015 (UTC)
Nevermind. I thought it had the same “archiving” system of the feedback page. — Ungoliant (falai) 23:25, 1 January 2015 (UTC)
I prefer the purging method, but the auto-archiving method is easier! Renard Migrant (talk) 13:42, 2 January 2015 (UTC)

Maybe have some new words of the day this yearEdit

Does anyone fancy updating the word of the day a bit more? I noticed we had a lot of repeats last year and already this year. I'd be grateful to anyone who put any work into it. Renard Migrant (talk) 13:08, 2 January 2015 (UTC)

I’ve briefly considered it before, but I’ve worried that, if I were to accidentally feature a word that was already Word‐of‐the‐Day elsewhere, it could spell serious trouble for the project. --Romanophile (talk) 13:11, 2 January 2015 (UTC)
To blow out of proportion, verb: To overreact to or overstate; to treat too seriously or be overly concerned with. Keφr 13:27, 2 January 2015 (UTC)
Quite an understatement. I do not think we had any new word of the day at all this year. And I think you can guess at least one reason why if you look up how these are set up compared to FWOTDs. Keφr 13:27, 2 January 2015 (UTC)
From above "if I were to accidentally feature a word that was already Word‐of‐the‐Day elsewhere, it could spell serious trouble for the project." Seriously? Is this a joke? Renard Migrant (talk) 13:28, 2 January 2015 (UTC)
Couldn’t somebody sue the Wikimedia foundation if it looked like we copied their Word‐of‐the‐Day? See also: ‘Avoid other WOTDs - We want to feature words that haven't been WOTDs for other dictionaries, partly to highlight unique terms that make Wiktionary so special, partly to avoid complaints (by preventing the possibility entirely) that WOTDs were "stolen" from other dictionaries.’ --Romanophile (talk) 13:37, 2 January 2015 (UTC)
I'm not an intellectual property lawyer, but no I don't think so! If we copied the definitions or copied large strings of words of the day from another websites, then yes. But nobody can copyright the use of a word on its own. @BD2412: would be the person to ask (he actually is an intellectual property lawyer) but I can't imagine I'm wrong on this. Renard Migrant (talk) 13:41, 2 January 2015 (UTC)
You're not wrong on this. Cheers! bd2412 T 14:29, 2 January 2015 (UTC)
Certainly one reason we haven't any new WOTDs yet this year is that this year is only two days old. —Aɴɢʀ (talk) 14:02, 2 January 2015 (UTC)
While not ideal, perhaps it would be better to cycle through our (non-offensive) entries one by one, regardless of what the words are, than to repeat a few hand-picked ones forever. Equinox 14:08, 2 January 2015 (UTC)
How about extrabellum? Cheers again! bd2412 T 14:37, 2 January 2015 (UTC)
I don't like the idea of cycling through all our entries regardless of quality. At WT:FWOTD there is the requirement that a candidate word must have pronunciation information and at least one citation; I think that English WOTDs ought to be held to at least that standard if not a higher one. —Aɴɢʀ (talk) 15:17, 2 January 2015 (UTC)
I was just going to propose something similar. Keφr 15:30, 2 January 2015 (UTC)
How hard can it be to find words that meet the criteria? Or, better yet, to take words that are close to that and bring them up to par? bd2412 T 15:39, 2 January 2015 (UTC)
That depends on the word, but usually relatively easy, methinks. But I guess it has to be blessed as policy somehow. Not necessarily a full vote. Keφr 15:52, 2 January 2015 (UTC)
The problem is not the lack of words to feature (the nominations page has enough interesting words to last for months), it’s the lack of someone to update the templates. — Ungoliant (falai) 17:20, 2 January 2015 (UTC)

Deprecating "Acronym", "Initialism" and "Abbreviation" headersEdit

Previous discussions: Template talk:abbreviation-old#RFD discussion, Wiktionary:Beer parlour/2014/July#Template:abbreviation-old

Some time ago it was proposed that these "part-of-speech" headers be deprecated (in favour of real part-of-speech headers like noun, adjective, interjection, etc.). I support the idea, but to be honest, I cannot recall a wider discussion about it. Meanwhile people are still adding new "initialism" entries, and there is nothing to point them to about it. If we formally deprecate those headers, we should remove them from WT:NEC, deprecate and track {{head|...|abbreviation/acronym/initialism}}, and probably set up an edit filter. Do we do it? Keφr 18:10, 2 January 2015 (UTC)

I get the impression, of the people who've expressed an opinion, that everyone's broadly in favour of it (i.e. deprecating), but implementing it would be horrendously difficult. Because they all need to be sorted by hand, it can't be automated. Users are still adding them in good faith and it's better to have them under an initialism header than not at all. Renard Migrant (talk) 18:13, 2 January 2015 (UTC)
Like I said, we could set up an edit filter which would catch that and show an explanatory message. Keφr 18:16, 2 January 2015 (UTC)
Sorry I missed that bit. Support. Renard Migrant (talk) 19:11, 2 January 2015 (UTC)
Support. — Ungoliant (falai) 19:05, 2 January 2015 (UTC)
I support it in principle, but I've found that there are some cases where it's not easy to define it with any other part of speech. I can't think of any example right now but I know there are some. —CodeCat 19:19, 2 January 2015 (UTC)
@CodeCat: I don't know if this is what you meant, but there are various kinds of abbreviations of phrases for which Phrase doesn't seem like the right header, eg, YOLO. DCDuring TALK 19:41, 12 January 2015 (UTC)
Yes, that would be a good example. —CodeCat 19:53, 12 January 2015 (UTC)
YOLO seems somewhat like an interjection to me. And the more I think about IANAL, the more it seems like an interjection too. (Compare QED.) Keφr 20:56, 12 January 2015 (UTC)
@Kephir: Just because we have numerous erroneous uses of Interjection as PoS header doesn't mean that we should err yet again. Consider this definition of interjection from MW Onlne:
"an ejaculatory utterance usually lacking grammatical connection: as
a : a word or phrase used in exclamation (as Heavens! Dear me!)
b : a cry or inarticulate utterance (as Alas! ouch! phooey! ugh!) expressing an emotion"
Their definition of ejaculatory refers to ejaculation "something ejaculated; especially a short sudden emotional utterance"
Their other definitions are even less applicable.
YOLO, QED, and IANAL represent full sentences without marked emotional content. DCDuring TALK 21:06, 12 January 2015 (UTC)
What is the term for a phrase that has a finite verb as its head? —CodeCat 21:07, 12 January 2015 (UTC)
@CodeCat: Do you mean an absolute? That also strikes me as too technical for a PoS header, even if it were in WT:ELE (and for any definiens, except possibly some more technical style/grammar/linguistics terms). DCDuring TALK 22:39, 12 January 2015 (UTC)
No, that's not it. It would have to include the terms that the abbreviations above stand for. All I can think of is "sentence", but they're not necessarily always used as stand-alone sentences (they can be, though). —CodeCat 22:53, 12 January 2015 (UTC)
Wikipedia says at w:Finite verb: "A finite verb is a form of a verb that has a subject (expressed or implied) and can function as the root of an independent clause; an independent clause can, in turn, stand alone as a complete sentence." So I think "independent clause" would be most fitting as the part of speech for these and similar terms, whether abbreviated or not. It may not be the easiest to understand for readers, but any alternative would be ambiguous, so we don't have much choice. —CodeCat 22:58, 12 January 2015 (UTC)
Whatever grammatical term we use is likely to be a problem because the term will be technical or it will clash with the basic notion that something spelled without spaces is a word (not a clause or phrase), or both. We have entries for MWEs that have the PoS header "Phrase" that are technically not phrases, but I don't think that the technical requirement that a "real" phrase be a constituent bothers very many ordinary users.
For now I am willing to plow ahead on the entries that have clear-cut PoS headers to replace these and wait for lightning to strike in the form of some conceptual breakthrough. Or for English speakers and their dictionaries to decide that interjections don't have to be emotional outbursts or that phrases don't have to have spaces between the component words. DCDuring TALK 23:19, 12 January 2015 (UTC)
I definitely support it in principle, but it is somewhat tedious to implement.
It would be handy to have a template to automatically provide the pronunciation for initialisms so that the pronunciation information implicit in the initialism header would not be lost due to the additional effort to provide a proper IPA pronunciation. DCDuring TALK 20:14, 2 January 2015 (UTC)
I did make one some time ago, but it was before Lua so it was a bit cumbersome to use, and never saw much use. I don't remember if I deleted it or not. —CodeCat 20:31, 2 January 2015 (UTC)
Tidiness strikes again? Even a poor template could be used to test the concept and discover missing features that might be useful. DCDuring TALK 20:52, 2 January 2015 (UTC)
Would that be {{IPA letters}}? DCDuring TALK 21:02, 2 January 2015 (UTC)
{{IPA letters|I|B|M|lang=en}}IPA(key): /aɪ biː ɛm/ DCDuring TALK 21:05, 2 January 2015 (UTC)
It would be handy if it worked directly on {{PAGENAME}}, perhaps with {{PAGENAME}} subst:'d. DCDuring TALK 21:23, 2 January 2015 (UTC)
You would have to use Lua for that. Keφr 21:26, 2 January 2015 (UTC)
I hope you mean you in the sense of "one". DCDuring TALK 23:06, 2 January 2015 (UTC)

I support it. Some pages already use the right header (e.g. UFO). The example of fr.wikt shows that it's possible to do without these headers. Lmaltier (talk) 17:08, 3 January 2015 (UTC)

Filter created: Special:AbuseFilter/42. Keφr 19:17, 12 January 2015 (UTC)

Thanks. Once we clean up the instances of {{en-acronym}} and {{acronym-old}} and any associated Acronym L3 headers, which usually need Pronunciation sections too, we will have {{initialism-old}} and {{en-initialism}} to work on, adding {{IPA letters}} under Pronunciation. DCDuring TALK 22:50, 12 January 2015 (UTC)
Note the filter has not been enabled yet. Also, NEC has been modified not to offer the headers discussed here. Keφr 18:23, 14 January 2015 (UTC)

Can we insert proto‐words into translation tables?Edit

If I desired to add Proto‐Germanic *kūz to a translation table for cow, could I get away with it? --Romanophile (talk) 01:26, 3 January 2015 (UTC)

No, as far as I know, the common agreement is that only attested terms can be in translation tables. —CodeCat 01:48, 3 January 2015 (UTC)
I for one do not want to see protolanguages in translation tables. —Aɴɢʀ (talk) 08:58, 3 January 2015 (UTC)
How about no links to unattested form in translation tables. Renard Migrant (talk) 15:09, 3 January 2015 (UTC)
I wouldn’t mind reconstructed terms in translation sections. I bet there are more people who want to know what’s water in PIE or PG than people who want to know what it is in Minica Huitoto, Northern Emberá or Lijili. — Ungoliant (falai) 16:19, 3 January 2015 (UTC)
The information is of interest, but it doesn't belong in a translation table with attested words. Maybe in a "See also" section? Chuck Entz (talk) 17:28, 3 January 2015 (UTC)
I can't imagine anyone wanting a translation that isn't attested. That is, a translation which according to evidence, may never have been used by anyone, ever. Renard Migrant (talk) 19:52, 3 January 2015 (UTC)
I would not be opposed to things like Appendix:English–Proto-Indo-European glossary, Appendix:English–Proto-Semitic glossary, Appendix:English–Proto-Algonquian glossary and the like, though, just like proto-language word are in Appendix space rather than mainspace. —Aɴɢʀ (talk) 21:07, 3 January 2015 (UTC)
As a translation- no. The point is that someone might want to know about the history of a particular concept within a given language family. Right now, that requires either fishing through translations in descendant languages looking for references in the etymologies, or browsing through the appendix to find the entry. That's why I suggested putting it in "See also"- there's no implication beyond there being some connection that might be of interest. Of course, it could easily be overdone, but that just calls for restraint, not a categorical exclusion. Chuck Entz (talk) 21:18, 3 January 2015 (UTC)

Make Wikisaurus a separate wikiEdit

Hopefully this doesn't come across as blasphemous, but it seems to me that Wikisaurus should be made into a separate Wikimedia wiki project instead of existing as a sub-namespace within Wiktionary. The way this project is currently, it is quite problematic to search for thesaurus entries on words, because they're mixed in with the dictionary entries, and "Wikisaurus:" works like a kludgy title prefix on every single thesaurus entry. Also, it's awkward within a Wikimedia project that every single linked thesaurus word on another thesaurus article entry must be wrapped in a "ws" template. It seems that ideally, it would be desirable for plain word links within Wikisaurus to link within Wikisaurus, and if we wanted any links between Wiktionary and Wikisaurus then that should use proper inter-wiki links. Also, the Wiktionary articles already have static synonym lists in them; wouldn't it be desirable to have this be dynamically generated cross-wiki from Wikisaurus? Also, wouldn't it be desirable to have an automatic reference link for every word entry between these two projects, similar to how we have template-based automatic links between Wikipedia and Wiktionary? It seems very substandard to keep Wikisaurus as simply a static sub-namespace, and I feel like it needs to be developed into something much more tailored to working like a thesaurus site. —This unsigned comment was added by Wykypydya (talkcontribs) at 11:23, 3 January 2015 (UTC).

Burn the witch!
No, seriously. It would complicate things for the sake of… what exactly? Most links in Wikisaurus do, in fact, point to regular entries. Keφr 12:01, 3 January 2015 (UTC)

No, I think that this belongs to the Wiktionary project. Here, words can be found through the search box, through categories and through thesaurus pages, these 3 methods should be kept, all of them are very useful, depending of what the user needs. A thesaurus page can provide thousands of words (not only synonyms), its contents cannot be moved to normal pages. But I would rename Wikisaurus to Thesaurus, this would be less kludgy... But something should be done about phrasebooks entries: they should be grouped into topical pages, just as in all paper phrasebooks, or they could be moved to another project (Wikiversity?) Lmaltier (talk) 17:22, 3 January 2015 (UTC)

Renaming rhyme pages voteEdit

Could you please post your abstains (or other votes, as applicable) to Wiktionary:Votes/2014-09/Renaming rhyme pages so that the vote shows explicitly editor indifference, or maybe even gets a clearer outcome? Thank you. --Dan Polansky (talk) 11:50, 4 January 2015 (UTC)

Category:Terms by their individual characters by language automation in LuaEdit

Happy new year to all. I've just adapted my "rare letters" identifier Lua module here to deploy this automatic categorization like on the French Wiktionary. This example works, but we'd rather invoke this module from the existing templates, like Template:en-noun. JackPotte (talk) 16:14, 4 January 2015 (UTC)

To clarify what was said at Module talk:languages, the idea is to include a list of the letters for a language in our data modules. That way, any entry containing a character not appearing in there can automatically be added to a category. The question, though, is which letters should be considered unusual in a given language. Some languages don't natively use letters like q or x, but still have them in many loanwords. —CodeCat 16:30, 4 January 2015 (UTC)
Also an implementation detail: While it seems sensible at first glance to list the letters only in one case form, case conversion is actually language-specific. A good example is Turkish: it has separate dotted İ i and dotless I ı. Using the "standard" case conversion would give incorrect results for Turkish. —CodeCat 16:33, 4 January 2015 (UTC)
For example I've let "é" into this letter list for French (Module:Rare letters/data, which can be filled with upper or lower letters) in spite of the detection of nearly 150,000 entries. JackPotte (talk) 16:36, 4 January 2015 (UTC)
I don't think that's the right approach, though. Unicode contains thousands of characters. We want to list the characters that are not rare, so that any that are not in the list are categorised. —CodeCat 16:38, 4 January 2015 (UTC)
In a second time we could gather in one category the letters which are unknown by the script. JackPotte (talk) 19:25, 4 January 2015 (UTC)
Right now, basically all the French letters (and English- the list for both is identical) with diacritics and all the digraphs are listed in (Module:Rare letters/data as "rare". That's not just isolated silliness: it's created redlinks to Category:French terms spelled with À‏‎ and Category:French terms spelled with Ç‏‎ from its use in just one entry. If this template were implemented in all French entries with the current data, a substantial part of all the French entries would be in one or more of these categories, and many of the categories would be so huge as to be pretty much useless. Please don't add this to any more entries without a consensus as to which characters should be included in the "rare" lists. Chuck Entz (talk) 04:11, 6 January 2015 (UTC)
Actually we are now all able to measure objectively and precisely the numbers of these presumed rare letters. For example "é" represents 341 342 of the 1 366 145 French articles, so 25%! Consequently we can now remove it from our categories if the consensual criteria becomes 1% at the light of these global digits. JackPotte (talk) 19:35, 6 January 2015 (UTC)
I think we need to discuss why we want such categories: remember that categories aren't for classification, but for navigation. The question you have to ask is "why would someone want a list of entries with a given letter in it?". I would say that relative frequency doesn't make a such a list of interest: even the rarest letters in the English alphabet such as j,q and x are found in lots of ordinary words- it's only unusual contexts like word-initial x or q not followed by a vowel that are of interest. French letters in the normal French alphabet aren't worthy of categorization- who wants to look through a huge list of French entries with ç in them? It's only letters that aren't part of normal French, and that make a word unusual by their presence, that should be categorized. Chuck Entz (talk) 15:03, 8 January 2015 (UTC)

Lemma dilemmaEdit

My issues are more involved than simply a lemma, but the title sounded so cool.

I've got some experience on Wikipedia, but I'm working on my first Wiktionary entry. It is the Indonesian word pengairan which is not yet complete. A number of things came to my mind as I attempted to make this word part of Wiktionary.

  1. What is a lemma? According to Wiktionary:Lemmas, "When a word has multiple distinct forms, the lemma is the main entry at which the definitions, etymology, inflections and such are placed." Also, "For nouns, the lemma is normally the form that is used as the singular subject of an intransitive verb." Pengairan meets these criteria except for the etymology, since that information belongs with the root air, the prefix peng- and the suffix -an used to form the word. However, if we look at the second Wiktionary definition of lemma, it says, "The canonical form of an inflected word; ie the form usually found in dictionaries." Here is where the conflict arises. Pengairan will certainly be found in Indonesian dictionaries. However, it will be listed under a for air, not under p. This is true of all Indonesian words formed by a base and one or more prefixes and/or suffixes. By way of example, the official dictionary of the language printed by the Indonesian government has a listing for air. It gives two definitions. These are followed by six well-known proverbs in which the word is used with each proverb explained. Next, there are 171 idiomatic phrases that begin with the word air. Finally, there are six words derived from air listed: mengairi, pengairan, berair, perairan, berpengairan and keairan. None of these words can be found in the dictionary under m, p, b or k. Indonesian dictionaries treat words like this in a manner similar to the treatment English dictionaries afford inflections. In that regard, these words aren't lemmas, since the entire concept of a lemma is different. Were a dictionary of the Indonesian language to be written (for printing on paper) using the methodology employed in writing English dictionaries, these words would certainly have separate entries. For that reason, I've placed pengairan into the category for Indonesian lemmas. However, I would like to know what others think. Should the lemma category for non English words be for things that are lemmas as determined in the way they are in English, or should the foreign language's concept of a lemma be used?
  2. Other than pronouns, Indonesian nouns are pluralized by doubling them. So, the plural of pengairan is pengairan-pengairan. Alternatively, the word para can be placed in front of the noun to indicate plurality as para pengairan. I created an entry for pengairan-pengairan. However, should we consider something else? Every countable Indonesian noun other than pronouns would have an entry that simply doubles the word. This seems to be a large number of entries that may be better handled another way. It might make sense to create an entry for pengairan-pengairan and have it redirect to pengairan. On the pengairan entry, the plural could be shown as an inflection but not linked. Folks should be mindful that plural forms are only used in Indonesian when they are needed to disambiguate. For instance, the word for dog is anjing, and the word for cat is kucing. People say, "She has two anjing and three kucing," not "two anjing-anjing and three kucing-kucing," since the context makes it obvious that the word is plural. This means the plural inflection has less importance in Indonesian than it does in English, and doesn't appear often.
  3. Aside from plurality, other inflections apply to many Indonesian words. Nouns do not have possessive forms. Instead, they just follow the nouns they own. For example, "anak" means child, and "anjing anak" means the child's dog. However, Indonesian nouns do have possessed forms if they are owned by singular pronouns. These are the same for every Indonesian noun that can be owned. So, "anjingku" means my dog, "anjingmu" means your (singular) dog, and "anjingnya" usually means his, her or its dog. The -nya suffix can also act as a definite article. So, "anjingnya" can also mean the dog. Either way, it adds specificity to the base word. There are no irregular forms of these. They apply to every Indonesian noun. Should separate entries be created for pengairanku, pengairanmu and pengairannya, since each of these is a separate word?
  4. Another inflection issue in Indonesian is the suffix -kah. If pengairan were to appear as the first word of a question, it could be pengairankah. This alerts the reader or listener that a question is coming without changing the meaning of the word. It is only used where the writer or speaker thinks it is necessary. Often, this would be at the start of a lengthy question. Usually, voice inflection alone is sufficient to make a sentence an obvious question to a listener. Question marks are used to end written questions as in English. So, -kah is omitted far more often than it is used. Should a separate entry be created for pengairankah, since it is a separate word? A vast number of Indonesian words can be used to start questions meaning each of these would have an entry for the -kah inflected form.
  5. Finally, there is also an inflected form with a -lah suffix. This is most commonly used to soften a message. For example "duduk" means to sit. It can stand alone as a sentence in the imperative mood. However, by itself, "Duduk," can sound like a command given to a child. English speakers inviting someone into their homes might say, "Please, sit," as they welcome their guests. Indonesians can say "Silakan duduk," which means the same thing or simply "Duduklah," which isn't the same as saying please but is also not a stern command; it is polite. The suffix -lah can also be used to emphasize a word and will usually appear as the first word of the sentence. For example, "Pengairanlah bisa memecahkan masalah kami," is the same as saying, "IRRIGATION can solve our problem." The suffix -lah serves to highlight pengairan. This usage is fairly common and can apply to words across different parts of speech. Does pengairanlah merit its own entry in Wiktionary?

Of course, these questions are not really all about pengairan. They are about how these peculiarities in the Indonesian language should be handled in Wiktionary. Perhaps similar issues have arisen with other languages (or maybe these issues with Indonesian have been discussed before). I'd be delighted to read through such prior discussions if someone could point me in the right direction. Thank you for your patience, if you made it all the way here, and thank you for any thoughtful reply you provide. Taxman1913 (talk) 20:26, 6 January 2015 (UTC)

  1. Why is pengairan not considered a lemma by other dictionaries? Being morphologically predictable is one thing, but morphology alone is not enough to make a word. For example, in English, paintify is morphologically and semantically transparent (the noun paint + -ify), but that doesn't mean that people actually use it. Considering pengairan a non-lemma form of air would only make sense if any noun has a peng- -an form.
  2. I would say that the Indonesian plural shouldn't be included, as it is fully predictable, has no exceptions, and is written in a form that allows its parts to be recognised easily. This is similar to, for example, the English negative form of a verb, which just consists of a form of do paired with not and the infinitive. There's no objection to including it in an entry without a link, but that would only be useful for absolute beginners. I know no Indonesian but I still know that doubled nouns are plural.
  3. I think the possessive forms of nouns should be included, as unlike with plurals, it's not obvious what parts pengairanku is made up of at first glance. Likewise for the definite form. Irregularity doesn't really matter, compare for example Esperanto where all words are regular, but we still include inflections for it. I think we also have possessive entries for Hungarian, Finnish and Turkish? It's not clear where to draw the line, though.
  4. If -kah can be attached to any word, then it behaves similar to the Latin suffixes -que, -ve and -ne, the English -'s, or the Finnish -kin and -kaan. We don't include entries for words which have these suffixes. I'm not sure why not, exactly, but it could just be a purely practical consideration: if they can be attached to any word, then you'd have to double, triple or quadruple the number of entries (lemma and non-lemma) for that language, without a very clear benefit. See also the deletion discussion for satisne.
  5. This seems more or less the same as above; if it can be attached to any word, it's probably better not to have entries for it.
CodeCat 20:53, 6 January 2015 (UTC)
Note that some suffixes are systematic in English~, too (e.g. -like can be attached to any noun, according to my Pocket Oxford Dictionary), but this is not a reason to omit these words provided that they are attested. In my opinion, requiring attestations if sufficient, and this requirement would reduce drastically the number of potential entries. Lmaltier (talk) 21:26, 6 January 2015 (UTC)
The point about Indonesian lemmas is that given the second definition of lemma right here on Wiktionary, pengairan doesn't look like one, because you will not find it under "p" in an Indonesian dictionary. It will be under "a" for air. The Indonesian government's official dictionary defines the word lema (Indonesian for lemma) as (1) Kata atau frasa masukan di kamus di luar definisi atau penjelasan lain yang deberikan di entri; (2) butir masukan; entri. I'll translate that as (1) The word or phrase of a dictionary entry outside the definition or other explanation given in the entry; (2) An entry. Masukan (as used here) and entri are synonyms. The definition of masukan is not helpful as it is frequently used outside linguistics contexts. Entri is defined as (Linguistics) (1) Kata atau frasa di kamus beserta penjelasan maknanya dengan tambahan pejelasan berupa kelas kata, lafal, etimologi, contoh pemakaian, dan sebagainya; (2) lema. I'll translate that as (1) A word or phrase in a dictionary along with the explanation including additions to the explanation in the form of part of speech, pronunciation, etymology, usage examples, etc.; (2) lemma. So the first definition of lema could lead to a conclusion that air is a lema while its derived words are not, since they are presented in the explanation and not outside it. However, with lema and entri being used to define one another in secondary definitions, we are left wondering whether lema could also include the entirety of a dictionary entry as entri clearly does. Whether pengairan is deemed a lemma by Indonesian linguists has no bearing on whether it merits its own entry in Indonesian dictionaries. It is a derived term and listed under its base word which makes it look like something other than a lemma according to the definition of a lemma on Wiktionary. Taxman1913 (talk) 21:56, 6 January 2015 (UTC)
CodeCat, your point about pengairan being a lemma unless all Indonesian nouns have peng-*-an forms is an excellent one. The prefix peng- is a form of the prefix pe- which can also take the form of pem- or pen- depending upon the first letter of the base word. Both nouns and verbs can have one or both of these affixes added to them. For example, main means to play, pemain is player, mainan is toy, and pemainan is game. While these affixes are common and can be used with both nouns and verbs, Indonesian people aren't free to just make up all kinds of words by using them. According to the government's dictionary, pengair and airan are not words. Similarly, a word I used above, lema, has no derived forms resulting in pelema, lemaan or pelemaan. So, it is clear that words formed in this manner have unique characteristics that lend much credence to the argument that they are truly lemmas. Taxman1913 (talk) 22:19, 6 January 2015 (UTC)
CodeCat, your reasoning about Indonesian plurals makes sense. Unless someone presents a compelling reason why an entry for each of them is needed, I'm going to delete the one I created. Taxman1913 (talk) 22:22, 6 January 2015 (UTC)
I think attestation of the specific possessed inflections of Indonesian nouns as suggested by Lmaltier would drastically reduce the number of entries. How often will someone use the phrase "my irrigation"? While the construction of pengairanku is obvious to me, I can see CodeCat's point that most users of English Wiktionary not familiar with Indonesian word contruction will disagree. So, if attested, the word has a place here. Taxman1913 (talk) 22:32, 6 January 2015 (UTC)
Someone may not normally say "my irrigation", but Wiktionary entries don't always bother to indicate which parts of a word's inflections don't exist, especially if there are many of them. For example, it's clearly possible for one form of a Latin verb, out of the dozens it has, to happen to not ever be used (at least that we can tell). But it wouldn't make much sense to require excluding this in each case individually; it would become a nightmare to manage it. So for the sake of convenience, we sometimes list possible forms, even if they may not actually be used very much, or at all. This is not a hard rule of course, because in some cases we do exclude certain parts of the inflection, like comparatives for adjectives that are not comparable. For Indonesian nouns, I suppose we could include an approach that's somewhat similar, with "ownable" being like "comparable": there would be ownable and unownable nouns.
But we have to be careful, as someone might not normally use a form, but they still exist in case anyone wants to. For example, rain and its translations in many languages are often considered to be "impersonal" verbs, having only a third-person singular form (or equivalent), as there is not normally any specific thing that rains. But in figurative meanings or poetry, you might indeed find we rained. So we shouldn't be too quick to say that an Indonesian noun is unownable, because even though it's not "logical", there may be cases where someone has used those forms anyway. —CodeCat 22:43, 6 January 2015 (UTC)
I wouldn't consider rain an impersonal verb unless you promise never to rain on my parade. Taxman1913 (talk) 22:51, 6 January 2015 (UTC)
CodeCat, I agree with you about -kah and -lah forms. They are even more common than the Latin -que and the English -'s, since those are for nouns. These two can be attached to just about anything including prepositions and adverbs. But your mention of -que and -'s makes me think once again about -ku, -mu and -nya. Perhaps these are overkill as well. Even though pengairanku might not look obvious, are good quality entries for both pengairan and -ku sufficient? Even if attested, perhaps they're unnecessary. I could probably find an example of dog's somewhere. But we don't need a definition for it on Wiktionary. Taxman1913 (talk) 22:46, 6 January 2015 (UTC)
I'm going to work on creating an Indonesian noun template and circulate it for commentary. Taxman1913 (talk) 22:53, 6 January 2015 (UTC)
Actually, Latin -que can be attached to anything as well. It's often found attached to conjunctions for example, or indeed any other word that appears first in a clause.
I understand your point about the possessives, and there isn't really a clear line we can draw to say "this is ok to include" and "this is not ok", or at least not without just making up an arbitrary reason. The best I can come up with is this: Possessives only modify the meaning the noun they attach to, so they are "word-level" suffixes. But -kah and -lah are "clause-level" suffixes, they change the sense of whole sentences at a time. At least, that's how I understand it, please correct me if I'm wrong. So with that distinction, we could decide on a rule that words with clause-level suffixes/enclitics don't get their own entry.
We can also think of practical reasons. For possessives, we know that only nouns have them, and only some nouns at that (going by the "my irrigation" example). But the other suffixes can be attached to anything, and I'm guessing that they can even attach on top of other suffixes like possessives, and perhaps even on top of each other (is -lahkah possible, or something similar?). The number of possible combinations quickly becomes unmanageable. So we could decide to limit it to possessives, just to avoid creating a huge mess of possible combinations. —CodeCat 22:57, 6 January 2015 (UTC)
The Latin I studied for four years in high schoool is obviously rusty after 30 years. I only recalled the usage of -que as a conjunction for nouns until consulting the definition. Hopefully, Fr. Tighe and Mr. Scott can find it in their hearts to forgive me.
Your word-level/clause-level theory makes sense, and it points me more firmly into the direction of not including an entry for Indonesian possessed forms. You are correct that -kah and -lah change the sentence without changing the meaning of the word to which they attach. The change is subtle; I would say they only change the tone of the sentence, not its basic meaning. I've spent about 85% of the past four years living in Indonesia and have not come across -kah and -lah used in combination, but it is easy to imagine someone wanting to do that. The logical form would be -kahlah as in "Pengairankahlah belum digunakan di sini?" Which translates as "Hasn't IRRIGATION been used here yet?" This accomplishes sending out the question alert with the opening word of the sentence while also highlighting that word. So, -kahlah would be another possibility that could be affixed to nearly all Indonesian words including those to which -ku, -mu or -nya is already attached.
Turning back to the Indonesian possessed forms, the suffixes seem to me to also change the meaning of the sentence rather than the word. The mechanics are closer to the Latin -que than to the English -'s in this regard. Consider:
  • dog's = belonging to the dog; this word is no longer purely a noun.
  • puerque = and the boy; the meaning of puer is unchanged, and -que ends up replacing what could be written as a separate word in the sentence as in et puer.
  • anjingku = my dog; the meaning of anjing is unchanged, and -ku ends up replacing what could be written as a separate word in the sentence as in anjing aku.
  • anjingmu = your (singular) dog; the meaning of anjing is unchanged, and -mu ends up replacing what could be written as a separate word in the sentence as in anjing kamu.
  • anjingnya = his dog; the meaning of anjing is unchanged, and -nya ends up replacing what could be written as a separate word in the sentence as in anjing dia.
Illustrating it in this way leads me to conclude that the Indonesian possessed forms are mechanically identical to Latin words constructed with -que and do not need separate dictionary entries.
This is also true when -nya is used as a definite article.
  • anjingnya = the dog as opposed to dog; the meaning of anjing is unchanged, and -nya has its own meaning that just happens to be attached to the word.
The definite article is only used in Indonesian for disambiguation. Since the construction is the same as the third person singular possessed inflection, it often is impossible to tell what was in the speaker's mind when it was used as in "Gadis takut karena anjingnya." This could be "This girl is afraid of the dog," or "The girl is afraid of his dog," when translating to English. Only context can provide the correct answer. It really doesn't matter much. We can understand that there is a specific dog that makes the girl afraid. If we said, "Gadis takut karena anjing," we would understand that the girl is afraid of all dogs. No matter what, anjing remains dog and doesn't become somethng else with -nya affixed to it.
I'm in favor of no separate entries for Indonesian words with -kah, -lah, -ku, -mu or -nya suffixes. I also feel that separate entries are not needed for Indonesian plurals that are simply doubled. Hopefully, I can create a template that will present this in a clear way. I'm not sure there are nouns that cannot be owned. For instance, we might think arithmetic cannot be owned until someone says, "Your arithmetic is flawed." Nevertheless, I'll build an option into the template to make nouns unownable. Taxman1913 (talk) 07:46, 7 January 2015 (UTC)
But we do include possessives for Hungarian and Turkish nouns. Compare hal or hal for example. I don't think the Indonesian possessives work any different from that. So at least there is precedent for including them. I disagree with you that they work more like Latin -que, because like I said the possessives only modify a single word. "my" clearly belongs to "dog" and can't somehow be applied to the whole sentence. Your example of "puerque" shows that -que can also apply to a single word, but I think that's a wrong way of seeing it. -que joins two words, or two clauses, and therefore applies to more than just a single noun. Furthermore, you can imagine saying "a big dog and a small cat"... which word does -que attach to then? I would guess that it attaches to "small", which clearly shows that it's not about the noun, but about the whole phrase, and that it always attaches to the first word in the phrase. How would this work for Indonesian possessives? —CodeCat 15:12, 7 January 2015 (UTC)
Since Latin word order is free, -que attaches to whichever word is first: canis magnus felisque parvus or magnus canis parvusque felis, which just proves your point. —Aɴɢʀ (talk) 15:52, 7 January 2015 (UTC)
Here are a few Indonesian phrases for us to kick around:
  • a big dog and a small cat = seekor anjing besar dan seekor kucing kecil
  • my big dog and my small cat = anjing besar dan kucing kecilku
  • your big dog and her small cat = anjing besarmu dan kucing kecilnya
  • a big dog and my small cat = seekor anjing besar dan kucing kecilku
  • his big dog and a small cat = anjing besarnya dan seekor kucing kecil
We can see from the above that when an adjective modifies the possessed noun, the possessive suffix usually attaches to the adjective which ordinarily follows the noun. Some adjectives (usually those indicating quantity) normally appear in front of the nouns they modify. Here are more illustrative phrases:
  • many dogs = banyak anjing
  • my many dogs = banyak anjingku
Here the possessive suffix attaches to the noun, because it is the final part of the phrase that is owned. This is a consistent characteristic. Similary, we consistently apply this in English in the opposite direction. We say, "my many dogs," not "many my dogs" unless we are in a particularly poetic mood.
Indonesian adjectives modifying two nouns joined by a conjunction typically refer to both nouns. This leads to more illustrations:
  • blue pants and blue shirt = celana dan kemeja biru
  • my blue pants and my blue shirt = celana dan kemeja biruku
Here we see -ku attached to an adjective that simultaneously modifies two nouns and indicates that the entire phrase is possessed.
I look forward to reading thoughts on these. Taxman1913 (talk) 18:53, 7 January 2015 (UTC)
These examples have convinced me that the Indonesian possessive suffixes are clitics and operate on a phrase-level rather than on a word-level. The crucial part, to me, is that -ku attaches to the last word, irrespective of what kind of a word it is. The suffix is evidently parsed as including the entire preceding noun phrase. So I don't think we should have entries for possessives. Is it the same for the definite suffix -nya? —CodeCat 19:01, 7 January 2015 (UTC)
I'm curious though, how you would write something like "a dog and a small cat" or "dogs and my cats"? You'd have to clarify in this case that the adjective or possessive applies only to the last noun phrase. —CodeCat 19:04, 7 January 2015 (UTC)
The definite article suffix -nya works the same way as the third person singular possessed -nya. It wraps around an entire phrase where applicable.
These three sufffixes show up in other places as well. They act in a manner similar to contractions at the end of some transitive verbs and prepositions. Here are some illustrations:
  • bagi = for
  • bagimu = for you, effectively a contraction of bagi kamu
  • melihat = to see
  • melihatku = to see me, effectively a contraction of melihat aku
This also affects a vast numer of words. Any thoughts? Taxman1913 (talk) 19:35, 7 January 2015 (UTC)
My first thoughts on your questions were
  • seekor kucing kecil dan seekor anjing
  • kucing-kucingku dan anjing-anjing
Reversing the word order solves the problem. That's what I would do to be sure I was understood. My wife, a native speaker, says that if the context is known, an adjective or a possessive suffix could apply to only the second of two nouns linked with a conjunction. Conversely, she suggests that redundant use of adjectives and/or possessive suffixes may be warranted where the listener might incorrectly assume they only apply to the second of two nouns linked by a conjunction. She offers the following:
  • seekor anjing dan seekor kucing kecil
    • The presence of seekor as an indefinite article is sufficient to tip off the reader/listener that kecil only modifies kucing. Articles are frequently omitted in Indonesian in places where they would certainly be used in English. Indonesian indefinite articles are cumbersome. They vary depending upon the type of noun. The most common indefinite article is sebuah. Others include sebiji, sebatang and sebutir. Seekor is used for animals. Ekor means tail. They are all multisyllabic and force you to take a pause. In my wife's opinion, this is enough to separate the two linked nouns.
For "dogs and my cats," my wife agrees that reversing the word order is the best way to ensure you're understood. You could just go with anjing-anjing dan kucing-kucingku if the context makes it obvious that only the cats are yours. I should point out that if the listener already knows there is more than one dog and more than one cat, the nouns wouldn't be doubled. Taxman1913 (talk) 20:10, 7 January 2015 (UTC)
The same ambiguity issue can arise in English with the adjective at the front. "I went on vacation and spent the week drinking hot tea and beer." Alternatively, "I went on vacation and spent the week drinking cold beer and tea." Both of these are ambiguous. We can make the sentence more clear by using an adjective to modify each direct object. The same can be done in Indonesian to produce more clarity. Taxman1913 (talk) 20:44, 7 January 2015 (UTC)
My two cents:
With agglutinative languages, there are many components, which are attached to words. It's not really necessary to create all forms, IMHO, even if they are spelled without a space. Japanese, Korean have a lot of forms, which are attached, e.g. 커피하고 머핀 (keopihago meopin) "coffee and muffin" where 하고 (hago) is attached to 커피 (keopi). Indonesian doesn't seem to have a lot such forms and possessive forms are quite predictable. Arabic also has enclitic possessive suffixes - my, your, his, etc. It's part of the language grammar. E.g. بَيْتِي (baytī) means "my house" = بَيْت (bayt) + enclitic ـِي (ī) but I don't think we need such entries. Arabic doesn't need definite entries like الْبَيْت (al-bayt) "the house", even if the definite article is attached to the beginning of a word.
There is no consistency in how some languages are treated. Scandinavian languages include definite forms of nouns but Bulgarian/Macedonian don't. Albanian definite nouns change their forms, e.g. feminine indefinite -ë changes to -a, so it would probably make sense to include them but redirects or definite forms in the header would suffice.
The case with "pengairan" is trickier. It reminds me of Arabic roots. Best Arabic dictionaries store information by the root consonants and having all words separately sometimes makes dictionaries too large but it is sometimes hard to determine the root consonants, e.g. فَتَاة (fatāh) is derived from ف ت ي (f-t-y) or ف ت و (f-t-w). So, of course, it's better to have separate entries.
Wiktionary doesn't boast good coverage of Indonesian. So it's better to focus on lemmas, not predictable forms, IMO.--Anatoli T. (обсудить/вклад) 23:04, 7 January 2015 (UTC)
Two cents? That was worth at least a dollar, Anatoli. Thank you for such a thoughtful and thorough analysis. It seems there are no objections to excluding -ku, -mu, -nya, -kah and -lah forms of Indonesian nouns from having their own entries. That would apply to adjectives as well, since the analysis of the issue would be identical. Further, plurals of Indonesian nouns do not need entries. Would it be useful or advisable to have an infobox for Indonesian nouns and adjectives alerting readers to the possibility that the word may appear with one or more of the aforementioned suffixes attached? If so, is the best place for this with the headword or in the area where declensions would go? Taxman1913 (talk) 19:47, 8 January 2015 (UTC)
I tried to make pengairan-pengairan completely blank assuming that would inspire a bot (or a human) to delete it. The word cannot be attested and, therefore, doesn't meet the criteria for inclusion. The system thought I was vandalizing and wouldn't let me make it completely blank. It insisted I leave a language and part of speech on the page which I did. It suggested I contact an administrator if I think my edit was constructive. One would think something so sophisticated to protect against vandalism would provide a link to let an administrator know what you tried to do. But that isn't the case. Taxman1913 (talk) 20:00, 8 January 2015 (UTC)
Your edit showed up in Special:AbuseLog. Though I think only I look at it semi-regularly (mostly watching out for spam filter false positives). If you had used {{delete}} or {{rfd}}, the filter would not have been triggered. Whether the request would have been granted is another matter of course. Keφr 20:04, 8 January 2015 (UTC)
Thank you, Keφr. Taxman1913 (talk) 20:36, 8 January 2015 (UTC)

Rename tr= parameter to xlit= or similarEdit

There are quite a few entries where tr= as been interpreted as meaning "translation". This is not very surprising. To prevent this kind of misunderstanding and confusion, I'm proposing to rename this parameter to xlit=, or something else that is less liable to be confused. —CodeCat 18:31, 7 January 2015 (UTC)

On the French version, we use "R=" like "Romanization". JackPotte (talk) 19:50, 7 January 2015 (UTC)
It should be easy to detect when someone is using it for translation just from the characters, no? (or with a language that should never be transliterated). DTLHS (talk) 19:54, 7 January 2015 (UTC)
PS: Romanization stands for an hyperonym of transcription and transliteration. JackPotte (talk) 19:58, 7 January 2015 (UTC)
|r= looks good. It's not misleading, also it's shorter. "Romanization" is also more accurate to describe what we pass to the parameter. --Z 22:36, 7 January 2015 (UTC)
  • Don't rename. Rare confusion is not a sufficient reason for this kind of deviation from the previous practice. --Dan Polansky (talk) 22:39, 7 January 2015 (UTC)
    • And established practice (that is, the inertia of established editors), is not an argument against improvement. —CodeCat 22:50, 7 January 2015 (UTC)
      • I also don't think a change is necessary. It seems like a solution in search of a problem. —Aɴɢʀ (talk) 22:52, 7 January 2015 (UTC)
      • Established practice is an argument against low value added change, whether considered an improvement or not. Changes have costs, and these have to be weighed against minor or even hypothetical benefits. I ask the reader to check various historical revisions of our mainspace pages and see how changes made by CodeCat had drastically reduced legibility of these revisions, where the legibility is reduced by missing templates and by various module errors produced by templates. It is a pitiable state of affairs to look at. --Dan Polansky (talk) 22:55, 7 January 2015 (UTC)
        • Writing a dictionary in plain text was a terrible idea from the start and it still is, which is why we need all the templates to compensate, and why our templates and other technical infrastructure need so many revisions. I can't help it if Wiktionary was badly designed from the start. That's no reason not to try to fix it up some. I will continue to try to improve Wiktionary to the best of my ability, and provide a counterbalance to those who do otherwise, as long as I am active on this project. —CodeCat 22:59, 7 January 2015 (UTC)
          • I could not disagree more. I love wikitext, I love Wiktionary design and I dread the day on which the likes of you are going to remove the wikitext from the editor, and lock it down behind a WYSIWYG interface. --Dan Polansky (talk) 23:06, 7 January 2015 (UTC)
            • Sadly, that day won't come. My hope is to make Wiktionary semantically parseable, ideally something that a script could convert to another format like XML. In other words, to make Wiktionary's code about content only, and not about presentation. —CodeCat 23:10, 7 January 2015 (UTC)
              • "Established practice (that is, the inertia of established editors), is not an argument against improvement". It should be in template-space, CodeCat. Moving, renaming, and adding or removing parameters confuses the shit out of editors. You and Kephir take it far too lightly. Purplebackpack89 23:13, 7 January 2015 (UTC)
            • Love makes you blind, apparently. Wikitext has quite obvious limitations and drawbacks when it comes to creating a dictionary. If it were not for the comprehensive template infrastructure, maintaining category lists would be burdensome, and categories would often fall out of sync with the rest of entry content (just look at pages "tagged but not listed" on RFX). If it were not for standardised headers, distinguishing between definitions, etymologies and synonym lists would be much harder, if not impossible. Typing wikitext manually is error-prone; if it were not for sanity checks in templates and modules, the errors might linger in entries indefinitely. Remembering how every single template should be used is tiresome; even I cannot be bothered to do it, and I would be happy if I could just assume every template works the same. There are tons of boilerplate in each entry — entries being free-form wikitext pretty much forces us to have it; the alternative is complete chaos. Our translation lists would most probably be much smaller if we did not have WT:EDIT. If it were not for edit filters, the flood of spam and vandalism might be unmanageable. And each of these solutions I just mentioned is a brittle workaround, and entries still fall between the cracks anyway. Splitting entries into per-language pages gets proposed every once in a while, and is probably the most desirable feature of all, but if we were actually to do it, it would be a nightmare from both technical and copyright standpoints. Wikitext talk pages are obnoxious for those who know how to use them and confusing for those who do not. Free-form wikitext is an awful foundation on which to build a dictionary. Now waiting for the inevitable liberum veto thought-terminating cliché of "I disagree, just because". Keφr 10:13, 10 January 2015 (UTC)
Unfortunately, anything you propose tends to attract a disproportionate amount of vehemence and bitterness from certain quarters, which is getting very old. Still, on the merits, I'm very leery of changing something that's second nature to anyone who's edited much in non-Latin scripts just because of a small percentage that are confused (there are probably more who mistakenly use the language code "dk" for Danish, but no one is arguing we change that). I think the potential to alienate large numbers of our hardest-to-find type of editors by forcing them to relearn a basic part of their editing routine outweighs solving such a minor problem. Chuck Entz (talk) 03:37, 8 January 2015 (UTC)
Oppose. It’s not that common, in my experience. — Ungoliant (falai) 17:20, 8 January 2015 (UTC)
More pathological need to change things from CodeCat. Could you change other parts of your life and leave us alone? If the need is so strong you must change all parts of your life at all time, have you considered talking to a doctor about it? No this isn't sarcasm. Renard Migrant (talk) 17:28, 8 January 2015 (UTC)
Why don't you just go fuck yourself? That's a change I could agree with. —CodeCat 18:02, 8 January 2015 (UTC)
Could you answer the questions? Renard Migrant (talk) 22:28, 8 January 2015 (UTC)
  • Oppose: Because of what I said about moving stuff confusing people. Purplebackpack89 17:38, 8 January 2015 (UTC)
  • Count me as opposing. I am not convinced that the benefits outweigh the costs. Keφr 20:05, 8 January 2015 (UTC)
  • Oppose Though the recommendation CodeCat proposes would have been a good idea long ago, there is no evidence that this is a significant problem. In fact we have a completely unsubstantiated claim that it is sometimes a problem without a single instance. If, 1., it isn't worth collecting the evidence that it is a problem and , 2., most contributors to this discussion don't think it is a significant problem, then we should not waste more time on this. DCDuring TALK 22:30, 8 January 2015 (UTC)

Project Wiktionary Meets Matica SrpskaEdit

We would like to inform you about the project we are starting. The project's aim is to increase support for Open Knowledge / Free Content movement through establishing long term strategic partnership with the venerable cultural institution Matica srpska and to increase the quality, accuracy and volume of Wiktionaries through digitization of two dictionaries, while developing a potential model for future development of cooperation across Wiktionaries through targeted mobilization of the communities.

There are two activities within the project that we need your support for. First, we are preparing a list of lexicographical terms (it would contain approximately 100 terms) that needs to be translated into as many languages as possible, in order to ensure further work on the project and to create the foundations for other lexicographical projects in the future. For this task, we would use a separate application, but all of the terms would be inserted into Wiktionaries, as well (making approximately 10,000 new entries per Wiktionary, counting that the terminology would be translated into 100 languages). That would also serve as the preparation for translating the Serbian Ornithological dictionary, which is the second activity.

The Serbian ornithological dictionary encompasses all local names of birds living on the Serbian speaking territory. All names are specified under the appropriate Latin name in accordance with the contemporary classification system. This creates the opportunity to translate it easily to various languages, since the basic list of terms is in Latin and it is fairly small (370 species of birds).

We would like to try and motivate as many Wiktionary communities as possible to participate in translating these two dictionaries, especially since the benefit for each particular Wiktionary would be great - for example, if we succeed to motivate people from 100 Wiktionaries to participate, the amount of primary entries to these 100 Wiktionaries would be 3,700,000 (37,000 per Wiktionary). If we succeed to motivate just 20 Wiktionaries, the amount of entries to these 20 Wikitionaries would be 148,000 basic entries. Of course, these entries would be incorporated into the respective Wiktionaries according to the interest and the rules of each community.

With this project, we are opening cooperation with the venerable Serbian cultural institution Matica srpska [2] and we believe that this partnership will have major impact on future cooperation between Wikimedia organizations and similar institutions in Slavic countries. If this cooperation could be relevant to any other partnership you are trying to establish in your country or globally, we would be more than willing to share our knowledge and contacts.

Besides support in translation, we are open for Wikimedia volunteers to participate more substantially and thus build knowledge inside of the community on how to deal with this kind of data. For example, if you are willing to join the core team and help us in communication with the Wiktionary communities of the languages which you are speaking, please contact Milica ( or Milos ( via email. The same goes if you are willing to contribute by coding in Python and/or PHP.

Please join us on project's discussion page or send us an e-mail. If you are willing to participate but are not sure in your knowledge of English please check the list of languages organizational team is speaking - there is a chance we can communicate in your native language.

We are very excited about this projects and hope that you will be part of it as well!

Looking forward to hear from you!

--Godzzzilica (talk) 15:55, 8 January 2015 (UTC)

How does our merger of Serbian, Croatian, Bosnian and Montenegrin into Serbo-Croatian affect the project? How does Matica Srpska feel about that? — Ungoliant (falai) 17:23, 8 January 2015 (UTC)
Serbian lexicography usually treats all those languages as one, with mentioning that something is a Croatian variant (for example, attorney is "advokat" in Serbian, while "odvjetnik" in Croatian; capital monolingual dictionary of Serbian language would have entry "odvjetnik", while mentioning that it's "hrv. for advokat"). This topic is much more problematic in Croatia than in Serbia. --Millosh (talk) 20:32, 8 January 2015 (UTC)
Said so, the best idea is not to insist on their formal position. If you don't insist, that would be treated as editorial choice of the English Wiktionary, not under the jurisdiction of Matica srpska. If you insist, they would likely give a kind of negative comment. --Millosh (talk) 20:32, 8 January 2015 (UTC)
Linguistically speaking, some things could be fixed. For example, одвјетник (Cyrillic spelling of odvjetnik) is a kind of neologism. It's strictly Croatian word and contemporary Croatian is written strictly in Latin alphabet; it could be transcribed for the linguistic and similar purposes, but if in Cyrillic alphabet, it's translated to адвокат. --Millosh (talk) 20:32, 8 January 2015 (UTC)

Oh, I've read everything in relation to the decision made few years ago now. The community didn't make a formal decision in relation to that issue. So, brace yourselves! This could become significant issue on English Wiktionary (again) after this project gets public attention. Note that this is likely an exclusive practice of English (and Serbo-Croatian) Wiktionary. --Millosh (talk) 21:45, 8 January 2015 (UTC)

Topical context labelsEdit

diff. Judging by that edit, User:Jamesjiao probably feels that context labels are meant to show limited usage or to disambiguate. Since this definition doesn't need disambiguating and is in general use, he removed the label. But the fact that the label was there in the first place means that someone else thought differently. That other person presumably thought that labels could also used merely to give extra information about what general semantic field or topic a term pertains to, even if it's not a term specific to it.

These two approaches have been at odds with each other for a while now, but I don't know if there has been a discussion on it recently-ish. If there has been, it was either inconclusive or not widely advertised so I missed it. So my question is, should context labels be used to indicate topics or other information, when they give no further on the definition itself that is not already apparent from that definition? That is, should labels be always restrictive/defining or can they also be non-restrictive? —CodeCat 00:02, 9 January 2015 (UTC)

I feel that in some cases people use labels because they are too lazy too add the categories manually. — Ungoliant (falai) 00:33, 9 January 2015 (UTC)
Well, it's true that some other dictionaries do add topical labels. So it's probably inspired by that. That doesn't mean I understand why those other dictionaries do it, though. —CodeCat 00:39, 9 January 2015 (UTC)
I see the logic of both kinds of labels, but the restrictive usage context seems to be an essential function for a reference that purports to be about words. We apply neither type of label consistently, even in a single PoS section, eg, car#Noun. I don't think that we have even well characterized what we mean by a usage context. Is it a group of people with shared vocabulary, eg, soldiers? Is it a situation, eg, military service? Is it a kind of technical expertise, eg, modern firearms design or use? DCDuring TALK 01:27, 9 January 2015 (UTC)
The less said about the arbitrariness of topical categories the better. DCDuring TALK 01:27, 9 January 2015 (UTC)
I personally feel that geology as a science is quite a specialised field that has its own set of jargons, which this term shouldn't be a member of, imho. One reason I removed this label is because I noticed that the term is already found in this category Category:nl:Landforms, which better suits non-jargon terms like this one. JamesjiaoTC 01:34, 9 January 2015 (UTC)
One trouble with categories is that they are not visibly connected to a particular sense of a polysemic word. DCDuring TALK 02:32, 9 January 2015 (UTC)
The discussed edit (diff) complies with Wiktionary:Votes/pl-2009-03/Context labels in ELE v2. It also matches our dog entry, which does not use "zoology" label before "A mammal, Canis lupus familiaris, that has been domesticated for thousands of years, of highly variable appearance due to human breeding" definition. --Dan Polansky (talk) 18:13, 9 January 2015 (UTC)

Word-mining at WikipediaEdit

A development at Wikipedia may be of interest to Wiktionary editors. Over at w:Wikipedia:Typo Team/moss a new project is finding all words occurring within the English-language Wikipedia that are not mentioned in any Wikipedia article title, nor in any Wiktionary entry or Wikispecies entry. One of the aims is to identify and fix errors in Wikipedia articles; another is to identify and fix missing entries at Wiktionary and Wikispecies.

You are invited to review the current list and perhaps be inspired to create missing Wiktionary entries. -- John of Reading (talk) 14:22, 9 January 2015 (UTC)

  • I've added a few. SemperBlotto (talk) 14:59, 9 January 2015 (UTC)
  • The list is useful, but does it distinguish the different languages on Wiktionary? A word may have an entry on Wiktionary, but not in English. —CodeCat 15:08, 9 January 2015 (UTC)

RFV - period for closing nominationsEdit

Wiktionary:Requests for verification/Header currently says:

Closing a request: After a discussion has sat for more than a month without being "cited", or after a discussion has been "cited" for more than a week without challenge, the discussion may be closed. ...

The same page says that a discussion can be archived a week after closure.

This double one-week period seems excessive. I propose this change:

Closing a request: After a discussion has sat for more than a month without being "cited", or after a discussion has been "cited" for more than a week without challenge, the discussion may be closed. ...

Thus, once an entry is cited, the discussion should be able to be closed immediately, after which there will still be one week for objections before the discussion is archived. Thoughts? --Dan Polansky (talk) 13:48, 10 January 2015 (UTC)

  • If the sole reason for change is "This double one-week period seems excessive", why would we bother? In particular it doesn't seem excessive to me. DCDuring TALK 14:18, 10 January 2015 (UTC)
    • As per the header, when the entry is cited, one first needs to wait one week before closing the nomination, and them one week before archiving the nomination. You rarely close or archive RFV nominations, which may help explain that this does not seem excessive to you. To me, waiting one week after an entry has been cited before removing the nomination from RFV page (via archiving) seems sufficient, especially when the closing person is different from the person who added the quotations. --Dan Polansky (talk) 14:28, 10 January 2015 (UTC)
Since struck headers can be unstruck, there's no reason not to close immediately if the citations are apparently watertight. It would be bad faith if there are less than three certain citations. However if a mistake is made by striking a header for a term that isn't certainly cited, we can just unstrike it. In fact, we do. Renard Migrant (talk) 15:53, 10 January 2015 (UTC)

acronym, initialism, abbreviation template useEdit

Looking at Category:English acronyms I note many entries that seem unpronounceable as words or, at least, not obviously pronounceable. I hope we have agreement that for purposes of our definitions that, for English L2 sections at least:

  1. an acronym is an abbreviation formed from the initial elements (or just initials?) of the component words of the term it represents pronounced as a word (uses {{acronym of}})
  2. an initialism is an abbreviation formed from the initials (or initial elements?) of the component words of the term it represents pronounced as the component letters of the initialism (uses {{initialism of}})
  3. an abbreviation is anything not certainly or probably either an acronym or initialism (uses {{abbreviation of}}).

If something is pronounced either as an initialism or as an acronym in the senses above we should have a pronunciation section and not use {{acronym of}} or {{initialism of}}. We should probably require a pronunciation section for everything claiming to be an acronym.

Is the above a reasonable summary of existing preferences? It certainly is not carried out. I would suggest that, if there is agreement, we should perform some cleanup, especially on the entries in Category:English acronyms. DCDuring TALK 14:12, 10 January 2015 (UTC)

I was doing a course a few months ago where the tutor referred to the 'acronym' DLA which of course, isn't an acronym. It's even used erroneously to mean initialism, so yes, correct it as much as possible. Renard Migrant (talk) 22:53, 11 January 2015 (UTC)
I'd like more people to agree that this is the correct approach so that I don't duplicate the mistake being perpetrated with respect to {{eye dialect of}}. (See WT:TR#gub'mint.) DCDuring TALK 23:15, 11 January 2015 (UTC)
I just looked at an extensive discussion of use of the term initialism on Wikipedia. It was a debate between advocates of "precision" versus advocates of accessibility to a general population of readers. DCDuring TALK 16:28, 12 January 2015 (UTC)
I suppose some of the good news is that {{acronym-old}} remains on only some 200 entries. I don't know how many L2 sections have Acronym as a PoS header. I miss Ullmann's runs that produced lists of non-ELE L3, L4, L5 headers. DCDuring TALK 16:53, 12 January 2015 (UTC)
KassadBot used to keep records of headers. Renard Migrant (talk) 18:00, 12 January 2015 (UTC)
I can see the category populated by {{rfc-header}}, but not something comparable to User:Robert_Ullmann/L3#Invalid_L3_headers. He seems to have been way more diligent than anyone since. DCDuring TALK 19:18, 12 January 2015 (UTC)
Someone — I think it was User:DTLHS — made a WT:TODO list last year of all headers used on Wiktionary, which they and CodeCat and I were using to find and fix misspelt headers like "===Etymologoy===", but which also caught correctly-spelt but nonstandard headers. Perhaps they could generate a new list for you. - -sche (discuss) 23:41, 12 January 2015 (UTC)
You'd be amazed if you fix say, 5 per day, how quickly a list of 200 goes down to zero. I seem to think I fixed 700 Swedish entries with a declension table not under a header doing 10 per day. Renard Migrant (talk) 16:32, 13 January 2015 (UTC)
I've added all the entries I can find with the offending headers to Category:Entries with non-standard headers. Renard Migrant (talk) 00:24, 16 January 2015 (UTC)


User:Wyang and User:Mar vin kaiser seem to be implementing Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese which not only doesn't close for another two weeks, it's on course to fail by a clear margin. What to do, nothing? Blocks? Mass deletions? Renard Migrant (talk) 23:17, 11 January 2015 (UTC)

What are you talking about? The discussion has reached consensus among Chinese-language editors. As I said before on my talk page, there is no point for a vote when all Chinese-language editors support the proposal. The vote is a means for a bunch of utter standers-by to dictate what chores others should do. Wyang (talk) 23:22, 11 January 2015 (UTC)
The objective of the project is a dictionary to be used by readers. What is the proportion of Chinese-language editors among readers? It's negligible. What editors should do should be dictated by what users need, like, use. Lmaltier (talk) 22:04, 14 January 2015 (UTC)
What Wyang said - we have reached an agreement. The vote doesn't have any rationale because it wasn't created by Chinese editors. There are legitimate concerns, though. Eventually and ideally, the simplified entries should display the same information as the traditional but the contents should be stored in one place. Having entries, which are out of sync, is a big problem. (Correctly formatted) traditional entries use both traditional and simplified - in synonyms, usage examples, etc, so there is no discrimination. --Anatoli T. (обсудить/вклад) 23:28, 11 January 2015 (UTC)
In any case, how does deleting simplified entries with {{zh-see}} help the project!? --Anatoli T. (обсудить/вклад) 23:30, 11 January 2015 (UTC)
Right you are. Thanks for bring this to my attention. Renard Migrant (talk) 23:38, 11 January 2015 (UTC)
Then, we should nominate Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese for deletion perhaps? Renard Migrant (talk) 18:04, 12 January 2015 (UTC)
I am confused. What are you trying to say? Keφr 18:05, 12 January 2015 (UTC)
That perhaps we should nominate Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese for deletion. Renard Migrant (talk) 18:13, 12 January 2015 (UTC)
…because? Keφr 18:40, 12 January 2015 (UTC)
A RFDO on the vote is unlikely to pass the 2/3 threshold, and seems pointless. The vote is fine. The voters have plenty of room for posting their rationale directly in the vote, or linking from the vote to locations where they posted their reasoning; they can also post "as per <person>". On another point, "consensus among Chinese-language editors" is not enough, IMHO, in part since the Chinese entries also serve readers, and since multiple of the editors who helped create Chinese entries are no longer around to oppose. On a related note, in Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese, among the supporters I currently see two editors with anything like significant contribution to Chinese entries: Anatoli T., and Wyang. --Dan Polansky (talk) 18:26, 12 January 2015 (UTC)
If the vote fails, which seems likely, then we maintain the status quo, which is to use {{zh-see}} anyway. Renard Migrant (talk) 18:55, 12 January 2015 (UTC)
Are you trying to solve a problem? If so, what is the problem statement and what are the tentative solutions that you offer? --Dan Polansky (talk) 19:06, 12 January 2015 (UTC)
No. Renard Migrant (talk) 19:11, 12 January 2015 (UTC)
There is no point in keeping the vote open now that it is being ignored; this is a perfectly reasonable observation. Kaixinguo (talk) 19:14, 12 January 2015 (UTC)
The vote is not being ignored: people keep posting to it. Via the vote, we are getting increasingly better evidence and information about the actual scope of support and opposition; we would not have that without the vote. Furthermore, your defeatist stance is dangerous, since it encourages certain types of editors to think like this: "go ahead, ignore any vote or opposition, and the opposers will just give up". For those who support, posting "support per <person>" costs close to nothing, likewise for those who oppose, so I see little point in trying to make prophesies about the outcome. A better course of action is figure out the right stance, and let it known in the vote. --Dan Polansky (talk) 19:39, 12 January 2015 (UTC)

Merging Finno-Volgaic, Finno-Samic, Finno-Permic and Finno-Ugric into UralicEdit

The internal subgroup of the Uralic family is not usually settled upon, and many competing ideas still exist, some with greater or lesser support. The first two families in the header (or three, if Wikipedia is to be believed) are still in dispute. Finno-Ugric has much more support, but as a reconstructed language it's virtually identical to Uralic. In fact, even Finno-Permic does not differ significantly from Uralic as a whole at least phonologically. The tradition among many Uralic linguists is to label reconstructions based on the branches it's attested in. So for example, an etymology would label a reconstruction Finno-Ugric if it's not found in the Samoyedic languages.

At Wiktionary talk:About Proto-Uralic, User:Tropylium (who I definitely trust on Uralic linguistics) suggested that we should eliminate the various subgroups and their associated proto-languages as separate languages on Wiktionary, and merge them into Uralic. Treating each language or branch separately is not practical for Wiktionary, where we would end up with a long string of identical words in etymologies, like the example Tropylium gives for Finnish kala. It gets even worse if we could potentially create three or more separate entries for the various proto-languages in the chain, all of which would have identical forms for almost any word. It's much more practical, as in the current Finnish entry, to jump straight from Finnic to Uralic.

So the proposal is to treat Proto-Finno-Ugric, and perhaps also Proto-Finno-Permic, as simply dialects of Proto-Uralic (which they were in reality, in all likelyhood). Entries would be moved and etymologies adjusted accordingly, so that the categories for terms derived these languages would end up either empty, or subsumed under Uralic just like we do with Category:Terms derived from Anglo-Norman or Category:Terms derived from Late Latin. References to Finno-Volgaic and Finno-Samic would be removed altogether.

I just want to note that this is explicitly not a statement that these are not valid subgroups or languages. This is just a matter of practicality, just like when we choose to merge any other group of languages. —CodeCat 22:14, 13 January 2015 (UTC)

I support merging Finno-Volgaic and Finno-Ugric into Uralic. Ever since Tropylium pointed out the issue at Wiktionary talk:Families#Removing.2Fadding_families (in 2012), Finno-Ugric has been on a sticky note in the back of my mind as one of the places where our classification of linguistic families needed to be updated; kudos to you for pushing for action on the matter. Those unfamiliar with this language family can take a look at the first two sentences of w:Finno-Ugric languages and the first three short paragraphs of w:Finno-Volgaic languages, which sum up how out-of-date the linguistics behind those groupings is. As for Finno-Samic (and for that matter Finno-Volgaic): we never gave it a code in the first place, did we? So there's nothing to merge. - -sche (discuss) 04:22, 14 January 2015 (UTC)
I am obviously enough in support of this procedure. Though as noted at Wiktionary talk:About Proto-Uralic, it might be a decent idea to not completely merge "Finno-Ugric" and "Finno-Permic", but to relabel them dialects of Proto-Uralic instead.
User:Liliana-60 also mentioned back at Wiktionary talk:Families that aside from etymological appendix work, families are useful for users to find languages. If a user wants to find information about the other Finno-Ugric languages, can we leave redirects so that they will end up at the right address? --Tropylium (talk) 22:34, 14 January 2015 (UTC)

Merge directionEdit

A subtopic: in what direction should a merger be done? The term "Finno-Ugric" has a long history of being used not only for Uralic minus Samoyedic, but also less formally for the family as a whole. It is also much more commonly used than "Uralic": the term "Finno-Ugric languages" gets some 800k Ghits, "Uralic languages" only 80k. So it might be worth considering to abolish the label "Uralic" instead. This would however not allow a dialect status for Finno-Ugric. --Tropylium (talk) 22:34, 14 January 2015 (UTC)

If we use Uralic, then there is no ambiguity over what we mean. With Finno-Ugric that doesn't apply. So we should use Uralic I think. —CodeCat 01:11, 15 January 2015 (UTC)
Yeah, use Uralic and list "Finno-Ugric" as an alternate name. (As an aside, maybe we should revive the idea of splitting the names= field into canonical and alternate name fields.) - -sche (discuss) 03:34, 16 January 2015 (UTC)

If there are no objections or comments, I'll start merging soon. —CodeCat 22:38, 26 January 2015 (UTC)

I've now deleted the "fiu-pro" code from the main language modules, and moved it to the etymology language module. This means that it's no longer valid to link to a Proto-Finno-Ugric entry, but you can still specify it as a source language in an etymology (like "Late Latin" and similar). There will probably be a lot of module errors from entries that currently still link to Proto-Finno-Ugric terms, I'll be running a bot regularly to change these to link to Proto-Uralic. If you find one and can't stand to leave it for the bot, you can fix it yourself by replacing {{m|fiu-pro|...}} with {{m|urj-pro|...}}. —CodeCat 15:29, 28 January 2015 (UTC)

Shall we also merge Category:Terms derived from Finno-Ugric languages and Category:Terms derived from Finno-Permic languages (and related target-lang-specific ones) with Category:Terms derived from Uralic languages? --Tropylium (talk) 23:18, 28 January 2015 (UTC)
I kept the categories separate for now because I wasn't sure if we still wanted to allow these languages to be mentioned in etymologies, even if we treat them as Proto-Uralic dialects. —CodeCat 23:36, 28 January 2015 (UTC)
Aren't "etymology language" exceptions encoded separately from the language family tree (which still seems to feature fiu)? --Tropylium (talk) 05:44, 31 January 2015 (UTC)
Yes, but I think you're confusing two things. There are both the language fiu-pro and the family fiu. The language has been merged completely so it's now a dialect of urj-pro, but the family still exists. —CodeCat 14:19, 31 January 2015 (UTC)
That's exactly what I was saying: shouldn't we also merge the families? Or do you think this would take further discussion yet? --Tropylium (talk) 17:17, 31 January 2015 (UTC)
I have been, but it takes longer for the database to catch up because all the etymology categories need to be updated too. —CodeCat 18:38, 31 January 2015 (UTC)

Admin userboxesEdit

I know that userboxes are usually not permitted, but why hasn't anyone made an admin userbox yet for Wiktionary? I think it'd be useful to be able to look at someone's userpage and know straight off the bat, without having to go through all the trouble of looking through the large list, to see if someone is an administrator, crat, rollbacker, sysop, or whatever. (All due respect, I refer to all 3, admins, crats, and sysops, all as admins, because its too complicated to differentiate). If this isn't a good idea, shouldn't we at least have a category for admin users that shows up immediately at the bottom of every admin's user page? Has there been a discussion on this yet, or a resolution? NativeCat drop by and say Hi! 00:14, 9 January 2015 (UTC)

I think it’s a good idea. — Ungoliant (falai) 00:16, 9 January 2015 (UTC)
I might not particularly mind them, but I worry it may encourage what on TOW is called hat collecting. Admin userboxes can be put on pages fraudulently (i.e. without user being an admin), though this is usually not a problem — the userbox may contain a link to a page which verifies that an account indeed has admin rights. On the flip side, however, (as I expect, at least) no admin will be forced to put those userboxes on their pages, so it may fail to achieve the result you have in mind. Keφr 10:39, 9 January 2015 (UTC)

omg, like I suspected, people do not know about user scripts' existence :/

@NativeCat: insert this line in your common.js


--Dixtosa (talk) 13:31, 15 February 2015 (UTC)

I will be completely unavailable for the next week.Edit

Please have the dictionary finished by the time I get back. This includes all the foreign languages. Cheers! bd2412 T 05:53, 14 January 2015 (UTC)

Haha, sorry, your wish cannot be fulfilled. NativeCat drop by and say Hi! 13:26, 14 January 2015 (UTC)
  1. REDIRECT Template:Reply to Well, not with that kind of attitude. —Justin (koavf)TCM 17:00, 14 January 2015 (UTC)
Haha, a dictionary can never be completely completed. Who disagrees? NativeCat drop by and say Hi! 21:27, 14 January 2015 (UTC)
(That's the joke.) --Catsidhe (verba, facta) 21:40, 14 January 2015 (UTC)
Jokes aside, targets for languages could and should be set for each language. How about 20,000 lemma entries for most frequent words for a language? Inflection (if necessary for a particular language), pronunciation and etymology being desired but not mandatory. If this can be achieved, then a language could be considered to have a good coverage in our dictionary. How many foreign languages would fit these arbitrary criteria at Wiktionary? Twenty to thirty? --Anatoli T. (обсудить/вклад) 21:41, 14 January 2015 (UTC)
Of the languages I search, I find only our English and Latin content good enough to be my primary source. Spanish and German are close. Italian has a lot of entries but most of them are so vague, it is easier to try to make sense of monolingual dictionaries. I’m sure Finnish is on the same level as English and Latin because there are a lot of entries and all the ones I run across are good. — Ungoliant (falai) 22:12, 14 January 2015 (UTC)
Don't you wish Danish coverage could be as good as the languages Ungoliant listed? NativeCat drop by and say Hi! 23:29, 14 January 2015 (UTC)
I would do Danish, but Norwegian is taking up all of my available time (in between burying our dead cat today). Donnanz (talk) 23:44, 14 January 2015 (UTC)
Apart from the above, I would say the coverage of Russian, Chinese, Japanese, French, Portuguese, Serbo-Croatian, Dutch are also quite good, including definitions, probably Polish, Hungarian. Number of Korean lemmas is close 20,000. --Anatoli T. (обсудить/вклад) 00:26, 15 January 2015 (UTC)
  • @Ungoliant MMDCCLXIV: Latin isn't as good as you think. Our entries are great, although they tend not to be as good as other online resources, but Latin is lacking in so many translation tables that one can't use en.wikt for English to Latin. That's a long-term project for me to attack when I've more time (which may never happen). —Μετάknowledgediscuss/deeds 23:35, 14 January 2015 (UTC)
    English often disappoints me. We have good breadth of coverage but quality of definitions for many common words is poor or, at best, uneven. DCDuring TALK 23:39, 14 January 2015 (UTC)
    My main contact with Latin is through etymology, and since I began editing there have been only a handful of cases where a Latin etymon is entryless. But yeah, the English-to-Latin translation coverage is horrible. If you’re interested, I can generate one of these for Latin. — Ungoliant (falai) 23:46, 14 January 2015 (UTC)
I don't think any Wiktionary has a good enough coverage of Danish as it should. I'm currently working a lot at the Danish Wiktionary, and even it doesn't have nearly as many Danish words (or {insert language here} words) as it could, which is a little disappointing. Really this problem goes on for all Wiktionaries, and English Wiktionary is not the only Wiktionary that has this problem. However, I feel we'll never be complete. That's the point of a wiki, it is never complete, there is always new content to add/edit. NativeCat drop by and say Hi! 00:33, 15 January 2015 (UTC)
"Never complete" is a strong statement. What we need, is the dictionary to be useful and used. If a Wiktionary covers a large number of words by frequency in a good manner, then all fancy, archaic, interesting otherwise words can be added later. Many editors focus on entries they like, not the words that are really necessary by dictionary standards. Russian entries, for example (and a couple of other Slavic languages), have inflections for most words, which are missing in most published dictionaries, which is a clear advantage, since inflections are not straightforward and can't be easily construed from grammar references. Chinese, apart from Mandarin, now has thousands of transliterations/pronunciations for other topolects, which is also hard to get (separate topolects, including Mandarin may still not be able to compete with other dictionaries but our entries have not only pinyin but also IPA and zhuyin, other dictionaries don't have all three). Electronic dictionaries available mostly provide only Mandarin and Cantonese, with Cantonese having less coverage. There's, of course, room for a lot of improvements. Finnish is probably the best online dictionary available. --Anatoli T. (обсудить/вклад) 00:50, 15 January 2015 (UTC)

Spanish adjective formsEdit

Over the past two weeks, Wonderfool has created no less than thirteen sockpuppets (one of which has been used four times), acting as bots to create missing Spanish participle and adjective forms. As he shows no sign of giving up, and the blocks only serve to slow him down, it seems to me that we should try to take some sort of alternative action. I don't really want to get involved in edit warring, or, frankly, any discussion outside of my field of expertise, so I'm merely going to make a couple suggestions as to solving or mitigating this problem:

  1. Create an abuse filter. I'd do this, but (a) I don't know if there's policy involved, (b) I don't know if I'm allowed, and (c) I don't know how to write one (I could figure it out, but someone would still have to check over it.)
  2. Request and create a bot to add these entries, thereby removing the need for Wonderfool to do so. I don't especially want to take that much time to do this, but if this is deemed a good solution and nobody else will, I can. ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 19:19, 14 January 2015 (UTC)
Special:AbuseFilter/22 already exists for that purpose. Remember not to make the filter too broad. Keφr 19:28, 14 January 2015 (UTC)
Ah! I'd missed it, thanks. I'd assumed such a filter would be closer to the end of the list, though... has he done this before? ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 20:09, 14 January 2015 (UTC)
Okay, I've made some modifications. Please do check over the filter if you can. ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 15:19, 15 January 2015 (UTC)
  • Hi. It would be super if someone ran a bot to create these forms, which should free me up some time for more useful stuff around here. I did find a handful of errors in this category and this one, which I've corrected, so ideally the bot should be run be someone with Spanish knowledge. User:Adrian F4/Bot code has some forms which can be used by a bot account, which all have been checked. Regards - WF. --Adrian F4 (talk) 21:12, 15 January 2015 (UTC)
    Until then, I'll take charge of adding any forms -WF. --Adrian F4 (talk) 21:14, 15 January 2015 (UTC)
    BTW, the filter you made doesn't really work. I can still create the pages. Perhaps you could make it stricter, or just forget about it. -WF --Adrian F4 (talk) 21:18, 15 January 2015 (UTC)
    Another suggestion is giving me a flood flag, like DCDuring kindly did, and I can add loads of good entries without bothering the RC patrol. Up to you, really. -WF --Adrian F4 (talk) 21:19, 15 January 2015 (UTC)
    Yes, but it's a terrible suggestion. Renard Migrant (talk) 22:58, 15 January 2015 (UTC)

New code for NormanEdit

Today I discovered that there is a new ISO 639-3 code, valid for only a few days, nrf, for Jèrriais roa-jer plus Guernésiais roa-grn. Would it be straightforward and uncontroversial for us to get a bot to merge those two ad hoc codes into the new, official one? The bot should also make liberal use of {{label}}s so that we can still categorize J and G separately, much as we categorize the B, C, M, and S lects of Serbo-Croatian.

Also, a few months ago, Chuck Entz brought up here the issue of Sercquiais, the langue d'oïl variety spoken on Sark in the Channel Islands. Ungoliant and I supported the idea of merging all the Channel Islands varieties into Norman roa-nor, but no one else commented and the discussion fizzled out.

So I'm raising the issue again, but this time for the new code nrf. Can we decide to apply nrf to all Norman varieties, rather than just for Jèrriais+Guernésiais, as ISO has it? This would have not only the obvious advantage of collecting several dialects of what really is a single language into a single code, but would also allow us to use "Norman" as the canonical name for nrf instead having to come up with something like "Jèrriais–Guernésiais" or "Channel Islands Norman" or something.

Is this something that needs to be voted on?

Pinging NativeCat, Chuck Entz, Ungoliant, and Embryomystic. —Aɴɢʀ (talk) 22:08, 15 January 2015 (UTC)

Support merging Jèrriais, Guernésiais and Norman, but it probably should have a vote. — Ungoliant (falai) 22:39, 15 January 2015 (UTC)
I'll admit to being a little attached to the idea of the different Norman varieties as separate things, but considering that they already have a shared Wikipedia (under nrm), it certainly does make sense. And between Jèrriais and Guernésiais, especially, there is a lot of duplication that could be trimmed down. I've not been able to find as much info on Continental Norman or Sercquiais, but I'm fairly certain there's duplication with the former, perhaps not so much the latter (with its quite distinct orthography). embryomystic (talk) 22:43, 15 January 2015 (UTC)
If we merge them, then we need to change the mapping to nrm in Module:wikimedia languages. —CodeCat 22:49, 15 January 2015 (UTC)
Better yet, Wikimedia needs to move its nrm projects to nrf. —Aɴɢʀ (talk) 23:05, 15 January 2015 (UTC)
I support this. It makes it less confusing. If it's just a mere dialect, not even considered a language in itself, we should not have it as a header, IMO. We should do like we do with Serbo-Croatian and add words pertaining only to a specific dialect using Template:context. NativeCat drop by and say Hi! 23:08, 15 January 2015 (UTC)
Support merging the varieties of Norman into nrf. The fact that there's one unified Norman Wikipedia is telling. - -sche (discuss) 08:40, 17 January 2015 (UTC)
As someone with Norman ancestry, I support merging the varieties of Norman; they are dialects and not separate languages. The dialects often have spelling differences. But we don't consider American English a separate language just because it uses color rather than colour. Taxman1913 (talk) 17:10, 21 January 2015 (UTC)
  • So can we consider this to have consensus, or do I have to start a vote before it will actually happen? —Aɴɢʀ (talk) 22:11, 24 January 2015 (UTC)
    Since there's only support, no opposition, and since the dialects in question were never recognized by even the generous/lax folks at the ISO as languages, anyway (we had to give them exceptional codes), I say you could just start merging them. FWIW, my standard is to set up votes only for cases that discussion (either on-wiki or in the real world) has shown to be contentious either linguistically or politically (like Moldovan→Romanian). And this doesn't appear to be contentious. Let me know if you need help merging them. - -sche (discuss) 22:57, 24 January 2015 (UTC)
    As a first step, I've added nrf as "Norman" (with all the dialects as alt names). - -sche (discuss) 23:11, 24 January 2015 (UTC)
    Short of going through all of Category:Guernésiais language, Category:Jèrriais language, and Category:Norman language manually, I don't know what to do to merge them. I'm not a bot operator, nor would I even know how to begin programming a bot. —Aɴɢʀ (talk) 12:49, 25 January 2015 (UTC)
I'm not sure a bot would be a good idea, since we have multiple language sections that need to be merged, with the potential for differences in the content between them. I'm sure a great deal of the content is going to substantially overlap, but where there are differences, we need to think about how to represent them. We also need to preserve the dialectical information (have we added "Guernésiais" and "Jèrriais" to Module:labels/data so we can use context labels?). Chuck Entz (talk) 16:11, 25 January 2015 (UTC)
Module:labels/data already includes "Guernsey" and "Jersey", so we can use those. It will generate the names "Guernsey Norman" and "Jersey Norman", which are probably easier to understand than Guernésiais and Jèrriais anyway. Without a bot, it will take for freakin' ever, since Jèrriais alone has over 9000 entries. —Aɴɢʀ (talk) 18:41, 25 January 2015 (UTC)

Proposal for bot edits to FrenchEdit

Two proposals:

  1. Remove gender templates from adjective forms which have gender indication in the definition. For example, {{masculine plural of|word|lang=fr}}.
  2. Change m-p and f-p to m and f inside {{head|fr|plural}}. So {{head|fr|plural|g=m-p}} becomes {{head|fr|plural|g=m}}. Rationale: 'plural of' is already in the definition.

Objections? Renard Migrant (talk) 18:17, 16 January 2015 (UTC)

I don't object. However, the proposal would be easier to assess if you gave us two example diffs or if you gave links to two entries that would be affected, so that we can check the current situation. Not as means of objection but rather from curiosity: is this a fun activity for you? Or do you think there will be tangible benefits for the user of the dictionary, more tangible than your adding new entries? --Dan Polansky (talk) 09:40, 17 January 2015 (UTC)
Regroupements for a noun example. Renard Migrant (talk) 17:43, 17 January 2015 (UTC)
Blanches adjective example. Renard Migrant (talk) 17:47, 17 January 2015 (UTC)
Support. I would also support removing the gender from non-lemma form headwords altogether, as it reduces duplication and makes maintenance easier. Of course, genders aren't modified that often, but they still can be, and nobody is going to remember to change all the form-of entries as well, especially if there are many of them. —CodeCat 18:01, 17 January 2015 (UTC)
I would strongly oppose that for inflected forms that have inherent gender, like maisons (French) which is inherently feminine. I think it's misleading to remove the gender, because it might look like French plurals don't have gender, furthermore it's use-unfriendly because the user has to click on the singular to get the gender. I feel like you're proposing to replace a good system with an inferior system, so I oppose it. Renard Migrant (talk) 19:12, 17 January 2015 (UTC)

Codes the ISO deleted or added in 2014Edit

In 2014, the ISO deleted some codes and added others. Here are the changes they made and my thoughts on whether we should follow suit. If you have any comments of your own, please comment.

- -sche (discuss) 08:59, 17 January 2015 (UTC)

There are some I wanted to request the deletion of, but I can't find a 'contact us' page. Renard Migrant (talk) 17:39, 17 January 2015 (UTC)
According to the SIL page on page on submitting change requests, you can fill out this form and e-mail it to the address listed here. The form will become part of the public record. - -sche (discuss) 22:04, 17 January 2015 (UTC)
Data point: the entire known corpus of Yurats amounts to < 150 isolated words. I feel that extinct languages that have only been attested at this level of detail would, in general, be better recorded as single pages the Appendix namespace than as entries in the Main namespace. Do we have any general policy on the CFIness of extinct and poorly recorded languages? --Tropylium (talk) 01:52, 18 January 2015 (UTC)
The most relevant point would be WT:LDL. But we do include quite a few languages that are very poorly attested, such as Mycenaean Greek, Proto-Norse, Crimean Gothic and Oscan. —CodeCat 01:56, 18 January 2015 (UTC)
In general, the criteria for inclusion actually make it easier to have extinct and poorly-documented languages in the main namespace than to have well-attested living languages there, by requiring more citations of the latter, heh. That's because (we wanted to make it easier to include extinct and poorly-documented languages, and) this site is geared towards covering attested natural languages in the main namespace. I think that makes some sense, too — have all such languages in one namespace. (We include a couple languages of which only a single word is attested.) But for languages with few words, we could certainly also have an Appendix: or Index: for it.
In this particular case, however, because it's not at all clear that Yurats it actually a distinct language, I recommended we not give it a code, but we could make an appendix of it. - -sche (discuss) 03:01, 18 January 2015 (UTC)
Update: I've deleted all of the codes the ISO deleted except aue, and I've added all of the codes they added except cbq, gku and the code noted above as being excluded (rts); aue, cbq and gku I simply haven't gotten around to dealing with yet (aue and gku since they're slightly messier than I anticipated, and cbq because I'd like to figure out which of its names is most common, but none of them seem to be attested at all). - -sche (discuss) 22:12, 30 January 2015 (UTC)

Literary Kajkavian Serbo-CroatianEdit

A few days ago, the ISO added a code for literary Kajkavian Serbo-Croatian: kjv, defined as denoting specifically the 16th to 19th century literary language and not modern Kajkavian dialects. What should we do? We could decline to follow suit, and continue to handle Kajkavian as sh; or we could add kjv to Module:languages and start having ==Kajkavian== entries; or we could add it to Module:etymology language so it could be cited in etymologies but wouldn't get its own L2. Or we could repurpose the code to refer to modern Kajkavian (the same way we repurposed ltc from "Late Middle Chinese" to "Middle Chinese") and then do one of the last two things. Or...
Pinging Serbo-Croatian speakers User:Ivan Štambuk, User:Biblbroks, User:Dijan.
- -sche (discuss) 09:41, 17 January 2015 (UTC)

There is no difference between "Kajkavian literary language" and modern Kajkavian dialects (spoken and written), apart from orthographic and typographic conventions used. By the 16th century the development of all SC dialects was pretty much over (there are few changes in case desinence usages, but that's it). At any case, this should be discussed if and when the number of Kajkavian entries and their categorization and periodization becomes an issue, and not before. --Ivan Štambuk (talk) 10:03, 17 January 2015 (UTC)
The goal of ISO 639 is apparently not one language, one code, but just to have codes for whatever is useful, even if a language gets represent twice, three times or whatever. Renard Migrant (talk) 19:15, 17 January 2015 (UTC)
I read your comment as "don't add kjv (at this time)". That's fine by me. (As for categorization, we do have Category:Kajkavian Serbo-Croatian.) To ask a more specific question, is Kajkavian mentioned often enough in etymologies that it would be useful to have an etymology code for it, the way we have frc for "Cajun French" (Category:English terms derived from Cajun French)? - -sche (discuss) 21:54, 17 January 2015 (UTC)
For my point, very good example. For ISO 639, the fact that Cajun French has a code does not imply that it's not the same as French. It has a code because such a code could prove useful. Renard Migrant (talk) 21:57, 17 January 2015 (UTC)
Kajkavian, Chakavian and Shtokavian are the three main varieties making up Serbo-Croatian, where Shtokavian is the one used as the base for writing. Linguistics topics refer to the other two varieties often enough, but I don't know if there are any specific phonological isoglosses that would separate them from the larger whole. In fact, the distinction is based on a vocabulary isogloss, the word for "what": kaj, ča, što. —CodeCat 22:02, 17 January 2015 (UTC)
@-sche: Your powers of observation serve you well - the point was indeed that we shouldn't put the cart before the horse and make lengthy discussions and arrangements for activity that might as well never materialize. Regarding the the Kajkavian borrowings into the the literary form - we're dealing with perhaps at best a couple of hundred terms in the entire language, mostly in regional usage and I would be hard-pressed to find attestations for them outside Kajkavian works. Kajkavianisms and other words from subliterary dialects were mercilessly weeded out from the standard language (along with LWs from Turkish etc.) by Vukovians during the standardization in the 19th century. (It's a Balkans thing - keep your language as pure as your society through occasional cullings of the unfit). If anything, there is a need for deriving the other way around - literary SC borrowings into Kajkavian (which would be kind of nonsensical to name - Kajkavian Serbo-Croatian terms derived from Serbo-Croatian). Most of the kaj words were added by User:Fejstkajkafski who is a dialectologist (I think), but they don't seem to edit anymore. When it becomes a necessity we can create codes for Kajkavian as a whole and/or its subdialects (which are many, they differ a lot, are not always mutually intelligible and are today for the most part written in scholarly transcription by linguists who study it since native speakers always utilize the standard language in writing). --Ivan Štambuk (talk) 22:27, 17 January 2015 (UTC)
This reminds me of the language ISO is pleased to call "Hiberno-Scottish Gaelic", a term they apparently made up, which has been given the code ghc. Basically, it's a cover term for Early Modern Irish and Early Modern Scottish Gaelic and is hardly more different from the modern varieties (abstracting away from the Irish spelling reform) than Early Modern English is from modern English. We do not recognize ghc as a separate language, but simply treat 13th- to 17th-century Irish as ga and 13th- to 18th-century Scottish Gaelic as gd. I'm inclined to believe Ivan when he says the differences between "Literary Kajkavian" and modern dialectal Kajkavian are more orthographic than linguistic, and to oppose recognition of the kjv code, at least for the time being. —Aɴɢʀ (talk) 10:33, 19 January 2015 (UTC)
OK, I have updated WT:LANGTREAT to note that we treat kjv under sh’s code and header. - -sche (discuss) 20:44, 21 January 2015 (UTC)

Redundant Austronesian entitiesEdit

I took a look at our Austronesian categorization system, and there seem to be some messy things going on. Some initial observations:

--Tropylium (talk) 02:38, 18 January 2015 (UTC)

If you're surprised that Proto-Eastern Polynesian exists and has a code, just know that until recently it existed twice and had two codes. (See this RFM.)
Anyway, pinging User:Amir Hamzah 2008 and User:Metaknowledge, who are the main editors of our Polynesian proto-language entries.
- -sche (discuss) 03:19, 18 January 2015 (UTC)
I agree with you on deleting Western Malayo-Polynesian and merging Central Malayo-Polynesian to Central-Eastern. - -sche (discuss) 03:19, 18 January 2015 (UTC)
The non-Polynesian mess is due to the easy availability of Blust's Austronesian Comparative Dictionary online. I refer to it a lot because it has a great deal of useful data, but you have to be aware of the author's biases, and you also have to be aware that he regularizes the orthography of the reflexes to make it easier to compare between languages. That source doesn't provide proto-forms for Polynesian, but they're available at the Combined Hawaiian Dictionary and Pollex Online. There are actually enough differences between PPN and PEP for the latter to be worth maintaining, even if we don't have much of anything in it as yet. I haven't really looked at the differences between Tongic and Nuclear Polynesian enough (or at least recently enough to remember anything), so I can't say much about it. If anything, I would add Proto-Central Eastern Polynesian, rather than taking away PEP. Chuck Entz (talk) 05:59, 18 January 2015 (UTC)
"There are actually enough differences between PPN and PEP for the latter to be worth maintaining". Yes, I'm sure there are differences — but we don't usually create separate proto-stages just to highlight the phonetic evolution of a word (for example, early PIE *h₂érh₃trom 'plough' is considered sufficient, and we do not create a "Late PIE" entry *árə₃trom, or a "Mid PIE" entry *h₂árh₃ətrom). If you wanted to be systematic about this type of a thing, it leads to tons of unmaintainable duplication, where approximately every Proto-Pol. entry has a corresponding Proto-EP entry — and then easily a dozen other nodes downward towards Proto-Austronesian. Yet it seems to me that the differences are not quite enough to make the two proto-stages more than two dialects of the same languages, so we ought to be able to cover this type of situation well enough by noting things like "evolution s > h is Eastern Polynesian", or labeling Eastern lexical innovations as dialectal Proto-Pol.
Core point being that etymological appendices do not exist for the purpose of documenting various reconstructible proto-language stages; they exist for demonstrating the relationships of attested languages. When chronology is known in great detail, this seems to necessarily lead to having to treat closely successive stages on single pages. --Tropylium (talk) 20:17, 18 January 2015 (UTC)
These are not different stages of PIE but different notation systems - underlying (aka (morpho)phonological or etymological) vs. surface (aka phonetic, i.e. the one you get with comparative method). The established notation is however inconsistent, though some recent works tend to use slashes and square brackets to distinguish the two. E.g. our entry *bʰréh₂tēr is technically nonsensical, it should be either */bʰréh₂ters/ or *[bʰráh₂tēr] (or *[brā́ter] if you one doesn't subscribe to the theory that the Balto-Slavic acute is of laryngeal origin..). No idea about Polynesian stages though, but if the separate clades are generally accepted than intermediary steps are OK to have, especially if there are restrictions in lexicon to various subbranches (e.g. West-Germanic only word should be reconstructed as Proto-West-Germanic and not Proto-Germanic). --Ivan Štambuk (talk) 00:21, 25 January 2015 (UTC)
What? Of course those are different stages. Laryngeal theory is precisely about (among other similar developments) the idea that (some cases of) Late PIE *[a-] historically developed from an earlier *[Ha-], which historically developed from earlier *[h₂a-], which historically developed from Early PIE *[h₂e-]. One can claim that the sound changes continued to hold an existence as phonological processes (i.e. that *[Ha-] would have still been phonologically */h₂e-/), but don't let this distract you. All phonological processes have a diachronic origin.
…Anyway, I'm switching to your West Germanic example, as something less prone to getting the discussion stuck in details. Of course we should reconstruct any exclusively West Germanic words as West Germanic and not as Proto-Germanic proper. The question is how should we notate this. In principle, we could create a separate Proto-West Germanic appendix. But I continue to think that this is a poor approach, since an entire separate appendix for a only marginally different proto-language stage entails massive duplication of work. We do not create pages like "Appendix:Proto-West-Germanic/balluz" to go with Appendix:Proto-Germanic/balluz, even though a word's existence in West Germanic languages and Proto-Germanic implies its existence in Proto-West Germanic. What we can do instead, what we already do, and what I am proposing should be done in similar cases as well (like here, Proto-Polynesian versus Proto-East Polynesian etc.), is that we can set up Proto-West Germanic as a dialect of Proto-Germanic, and file words only attested in West Germanic under Category:Regional Proto-Germanic.
Note again also the absense of entries just to mark phonetic development! People working on this area have been content to have just e.g. Appendix:Proto-Germanic/ēnu, not creating duplicate entries like "Appendix:Proto-Germanic/ānu" to demonstrate the Northwest Germanic development *ē > *ā. It's even entirely possible to be explicit about a thing like this regardless, the easiest way being to include a line such as
* Northwest Germanic: *ānu
in the list of descendants, and then indent all NW Germanic entries to depth-2. --Tropylium (talk) 21:52, 25 January 2015 (UTC)
Nope, they're synchronic really: */bʰréh₂ters/ → *[bʰráh₂tēr], not */bʰréh₂ters/ > *[bʰráh₂tēr]. It's the same thing just in two different notations. *a is just an allophone of *e colored by an adjacent *h₂. We just use the former notation to distinguish such *a from the "real" *a that is sometimes postulated. Nobody knows the ancestral form in Early and Mid PIE because there is nothing to compare them against, and or theories based on internal reconstruction are too speculative because the evidence is very scarce.
It's phonemically */bʰréh₂tēr/. Resolving -ēr into -ers is not phonemic, because there could in theory be another word for which the underlying phonemes are also */bʰréh₂tēr/, but which can't be morphologically resolved into -ers. The proper analysis is that the phonemic distinction between -ēr and -ers is neutralised. —CodeCat 23:45, 25 January 2015 (UTC)
If the distinction is neutralized it in principle doesn't matter which form you use, except in this case *ers# → *ēr# by Szemerényi, so we know that the "real" underlying phonological form for this specific word is */bʰréh₂ters/. It doesn't matter if there are other reconstructions with the same form at the surface level. Their existence or absence does not invalidate the underlying form of other words. --Ivan Štambuk (talk) 00:27, 26 January 2015 (UTC)
The differences between PWGmc and PGMc seems substantial though (shouldn't it be PWGmc *ballu really?), and when we have protolangs separated by centuries we can't really speak of dialects anymore. I suspect that there are many such "small" protolangs that should better be fitted into some larger grouping, but we don't really have space constraints and if there is a body of scholarship supporting them (not just the reconstructions, but the entire protolanguage, with inflections etc.) I see no point in forbidding them. They're the problem of those who add them. Not everything has to be perfectly structured, it's all a perpetual work in progress here. --Ivan Štambuk (talk) 23:15, 25 January 2015 (UTC)
Space constraints are a red herring. My concerns are internal consistency, and editor attention constraints on what can be reliably kept up to date. Or, more specifically: as our treatment of things like Northwest Germanic demonstrates, "having an intermediate proto-stage" and "having a separate appendix / separate language status for an intermediate proto-stage" are two different things. No one wants to stop recognizing Proto-Eastern Polynesian; I am only proposing covering it in the same appendix entries as corresponding Proto-Polynesian forms.
(This might also be the first time I hear "it is a work in progress" used as an argument against improving organization.)
Also I guess a more general discussion on preferrable ways to organize etymology appendices might be worthwhile. I think I'm going to start that below. --Tropylium (talk) 00:11, 26 January 2015 (UTC)
Actually, Proto-West-Germanic never existed in all likelihood, given the current thoughts in the field. West Germanic doesn't appear to be a clade. While it certainly has some innovations that are shared among the group, it seems that West Germanic was still in a continuum with North Germanic when those changes occurred. This is similar to the West and South Slavic language groups, for which a single proto-language stage is not reconstructable either. —CodeCat 22:07, 25 January 2015 (UTC)
The Ringe & Taylor 2014 book has two chapters on it. Situation with Slavic division is different - we're know for certain that they are geographical groupings and as speech communities they never existed because the earlier form of the language is already attested in terms of Old Church Slavonic, which already demonstrates dialectal variation.--Ivan Štambuk (talk) 23:15, 25 January 2015 (UTC)
  • I'm certainly responsible for some of the mess; I haven't time to respond in full, but suffice it to say that entries in PNP or PEP are not that we may show every stage in the phonetic development of a proto-word but rather that we may have entries for words that by editorial laziness or simple lack of cognates we cannot place in PPn in good faith and yet are demonstrably valid up to a certain level in the Polynesian tree. —Μετάknowledgediscuss/deeds 21:17, 22 January 2015 (UTC)
  • Returning to the subject at hand... Western Malayo-Polynesian and Central Malayo-Polynesian are negatively-defined areal rather than valid genetic groups, so proto-languages for them seem nonsensical. Is there any logical opposition to deleting the first one and merging the second to Central-Eastern? - -sche (discuss) 00:27, 26 January 2015 (UTC)
    • No argument from me. They're just the kind of artifacts you would expect from Blust's dependance on lexicostatistical cladistic analysis, so it's not surprising that he would provide lots of them- but that doesn't make them valid. Chuck Entz (talk) 01:45, 26 January 2015 (UTC)

Header levelsEdit

To the best of my knowledge, there is no 'official' list of header levels, WT:ELE mentions the headers, but not the levels. A particular one to start with, should Descendants be L3 when the descendants are not from a particular part of speech? I think the most official list we have is User:AutoFormat/Headers. Renard Migrant (talk) 11:55, 18 January 2015 (UTC)

What about DTLHS's list? —CodeCat 20:14, 20 January 2015 (UTC)
What makes something "official"? Shall we have a vote? My list also doesn't take into account the relative positions of headers. DTLHS (talk) 20:20, 20 January 2015 (UTC)
I'm not advocating officialness just asking the question. Renard Migrant (talk) 20:41, 20 January 2015 (UTC)


Hi, could someone do me a favor and create a language module for Proto-Arawak?

m["arw-pro"] = {
names = {"Proto-Arawak", "Proto-Arawakan", "Proto-Maipuran"},
type = "regular",
scripts = {"Latn"},
family = "awd",

Thanks for your help! --Victar (talk) 17:49, 20 January 2015 (UTC)

I've created it, but as awd-pro, since the usual naming scheme for proto-languages is to add "-pro" to the family code. I also used "Proto-Arawakan" as the canonical name because "Arawakan" was the name of the family, but given that Proto-Arawakan and Proto-Arawak have about as common over the last few decades, and Arawak has always been more common than Arawakan, feel free to start an WT:RFM or ask me to if you think the family and protolanguage should be renamed. - -sche (discuss) 17:46, 21 January 2015 (UTC)
Thanks a ton -sche! --Victar (talk) 19:05, 21 January 2015 (UTC)

Taíno vs TainoEdit

Hi, the proper way to spell Taíno in English is Taíno, not Taino (sans i-acute). Can we change its canonical name to reflect this? --Victar (talk) 18:34, 20 January 2015 (UTC)

It's not one person's decision what the 'proper' way to spelling a word is. Sure, we could change it if we collectively wanted to, but do we? Renard Migrant (talk) 20:42, 20 January 2015 (UTC)
Appendix:Taíno/kasike‎ may need moving and correcting. Renard Migrant (talk) 20:46, 20 January 2015 (UTC)
By proper, I mean the spelling used in all the vast majority of academic papers and on Wikipedia. So if an admin could change it before start adding more entries, I would appreciate it. --Victar (talk) 20:53, 20 January 2015 (UTC)
All academic papers? All of them, in the history of the English language, that's what you're saying isn't it? Renard Migrant (talk) 22:21, 20 January 2015 (UTC)
Yes, all in the history of man. ;-) But seriously, if it's the form used in academic papers and on Wikipedia, it should be the form used on Wiktionary. Do you have the capability to change it? --Victar (talk) 22:27, 20 January 2015 (UTC)
Wrong! I found one that uses Taino. Hard lines. Renard Migrant (talk) 22:33, 20 January 2015 (UTC)
OK, sure, but I can tell you that it is not the common spelling in academic papers, and not he spelling used on Wikipedia. --Victar (talk) 22:38, 20 January 2015 (UTC)
Check google ngrams. DTLHS (talk) 22:43, 20 January 2015 (UTC)
I don't know if that's the best method, since OCR software frequently misses /í/ and people often casually omit diacritical marks. Taino is also a Japanese surname. But if anything, the spelling should match the Wikipedia articles, w:Taíno, w:Taíno language. --Victar (talk) 22:58, 20 January 2015 (UTC)
Why? We are not Wikipedia’s vassals. — Ungoliant (falai) 23:04, 20 January 2015 (UTC)
That's true, but consistency is a good thing and if Wikipedia thought Taíno the preferred spelling, that says something in and of itself.. --Victar (talk) 23:13, 20 January 2015 (UTC)
I'll propose the renaming on Wikipedia then. Renard Migrant (talk) 12:30, 21 January 2015 (UTC)
... and the Ethnologue and Wiktionary thought Taino the preferred spelling. If we do change the name, it should be because the Wiktionary community found it better to use the new name, not because we have to kneel before Wikipedia. — Ungoliant (falai) 23:25, 20 January 2015 (UTC)
I don't know about Ethnologue and whether they make full use of diacritical marks, but Wiktionary had no Taíno entries until I just added three, so that's why it's being brought up now. Also, are we not the Wiktionary community having a discussion on this right at this very moment? --Victar (talk) 23:40, 20 January 2015 (UTC)
  • While I grant that we intend (generally) to be a descriptive dictionary in how we create and edit our term entries (and as such, we should probably have entries for both spellings), in my subjective experience, we also seem to hew somewhat to academic norms when it comes to the terminology used for categories and other infrastructure, such as language names and codes. Avoiding ambiguity and all that. ‑‑ Eiríkr Útlendi │ Tala við mig 23:16, 20 January 2015 (UTC)
For reference, Languages of the Pre-Columbian Antilles is probably the main published work on Taíno terminology and will be often cited for entries. It too uses the spelling Taíno. --Victar (talk) 23:25, 20 January 2015 (UTC)
As is documented in wt:Languages, "Whenever possible, common English names of languages are used, and diacritics are avoided." Judging my ngrams and by the comments about academic literature, this may be a case similar to that of Maori (Māori). I'd stick with the diacritic-less name since it's common (well-attested) and easier to type. - -sche (discuss) 22:00, 22 January 2015 (UTC)
Well, like I mentioned above, ngrams isn't a good tool in this case so you have to really look at the current published academic literature to make an assessment. My impression is that Taíno is the far preferred spelling in papers on the topic. --Victar (talk) 23:20, 22 January 2015 (UTC)
I see you're point though with the example of Māori being the more "correct" spelling, but for the sake of ease in typing, Maori is preferable. I think the major difference though is Taíno is a reconstructed language, with a finite amount terms taken from academic papers, where as Māori is a living language with tens of thousands of terms. --Victar (talk) 02:46, 23 January 2015 (UTC)


Discussion moved from WT:RFDO#JAnDbot.

Please, unblock user:JAnDbot. I am working on maintenaning interwiki links on (language) categories and I need to use it in enwikt too (now I must use my own account). Please note, that bot is blocked for more than 6 years and the reason is now obsolete. JAn Dudík (talk) 11:00, 21 January 2015 (UTC)

It's an illegal bot because it hasn't passed a bot vote. That's the reason. Renard Migrant (talk) 12:35, 21 January 2015 (UTC)
Bot did not passed a vote for bot flag, but there is no reason for leave it blockd for many years. I dont want to work on interwiki on articles, but on categories only. And I want to use my bot's account, beacuse of SUL (in the other case I could make new account, but it is problematic). There are many categories which have no interwiki even if exist in some other languages. JAn Dudík (talk) 14:05, 21 January 2015 (UTC)
No, the requirement for a vote has nothing to do with the bot flag, but has everything to do with whom we trust enough to allow the capability to perform high-volume, unsupervised edits. It doesn't matter what account you use- if you operate a bot without permission, you're subject to being blocked. Chuck Entz (talk) 14:28, 21 January 2015 (UTC)
The user is now operating his own account as a bot... - -sche (discuss) 22:24, 26 January 2015 (UTC)
I've blocked the user for a day, and think the bot account should stay blocked because of this. —CodeCat 22:27, 26 January 2015 (UTC)

Please, can you say me, how can I work on category interwiki? When I want to use bot you say 'no, it should stay blocked. When I use my own account, tou say he is using his account as a bot, block him. English wictionary is the biggest one, so even If I edit here only in must-edit-cases (incorrect interwiki, no interwiki, no other language with this interwiki, missing 5 or more links), there are many edits. There is 170 other wictionaries, where I work, and there are no problems with these edits. But english Wictionary is so selfcentric and selfish, that there say Problems with interwiki on other Wictionaries is not our problem. We dont want to have correct interwiki here. And some interwiki conflict? WTF? JAn Dudík (talk) 07:33, 28 January 2015 (UTC)

I would suggest starting a new bot vote and saying: These are the kind of interwiki links I want to add. Hopefully there will be more participation in the bot vote, and people will be able to look at your recent contributions and see if they're correct or not. I'm sorry we're such sticklers about bot policy... - -sche (discuss) 19:59, 28 January 2015 (UTC)
I've drafted Wiktionary:Votes/bt-2015-02/User:JAnDbot for bot status for you. - -sche (discuss) 22:33, 7 February 2015 (UTC)
@: Articles are made autonomously, but work on categories means lot of manually work (if no interwiki is present, I must manually find and type correct category on some other project). JAn Dudík (talk) 21:17, 18 February 2015 (UTC)
The vote has ended, passes. JAnDbot is unblocked and flagged as a bot. —Stephen (Talk) 03:17, 26 February 2015 (UTC)

Voting to always supersede policyEdit

I think we should seriously consider a vote that says voting always supersedes 'policy'. By this, individual pages (no matter what the namespace) and groups of pages will not be subject to any written policies if there is a vote on that individual entry. The prime example is Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep, voting comes before policy for deletion matters. The only reason I can think of not to vote on this issue is that it's equivalent to a vote on whether water's wet. It's already what we do. Renard Migrant (talk) 12:43, 21 January 2015 (UTC)

Consensuality and attestationEdit

I have just and rather belatedly realised that the work I had last done yesterday evening had been undone on two grounds: lack of consensus and multiple quoting.

If consensus is not required for undoing but is required for doing plain work, and if it is not a general principle of Wikipedia, though it is elsewhere in dictionaries, that quotation is primarily for attestation while examples are primarily examples, then I'm giving Wikipedia away, not without very great chagrin.—ReidAA (talk) 21:31, 24 January 2015 (UTC)

Mari templatesEdit

Our templates for Mari language seem to be inconsistent with ISO 639-3 and even between themselves.

External links section in the entry Mari links this to Mari the macro language, which is consistent with ISO, but {{etyl|chm}} links to Wikipedia article on Eastern Mari.
External links section in the entry Mari links this to Eastern Mari, which is consistent with ISO although the preferred name for the language seems to be Meadow Mari, but {{etyl|mhr}} returns an error message.
External links section in the entry Mari links this to Western Mari, which is consistent with ISO although the preferred name for the language seems to be Hill Mari, and {{etyl|mrj}} links to Wikipedia article on Mari language.

Could somebody familiar with the workings of our language templates fix this? I will write Wiktionary entries for Meadow Mari and Hill Mari. --Hekaheka (talk) 11:09, 25 January 2015 (UTC)

We had a discussion in WT:RFM a while ago. @-sche: Could you help dig it up, please?
I don't find anything on Mari in WT:RFM, not in the active discussion nor in the archives. --Hekaheka (talk) 12:59, 25 January 2015 (UTC)
I'm sure it was in RFM. --Anatoli T. (обсудить/вклад) 13:58, 25 January 2015 (UTC)
"chm" is the code for Mari and Eastern Mari, which have been merged to Eastern Mari, since Eastern Mari (Meadow Mari) is the standard language of Mari people. "mrj" is reserved for Western Mari (Hill Mari), which is considered a dialect and "mhr" is not used @Wiktionary. If "Mari" is used on its own, Eastern Mari is implied, there is no other (macro) Mari language. We decided to only use terms "Eastern Mari" (="Mari") and "Western Mari" here, not "Meadow Mari", "Hill Mari" or just "Mari". --Anatoli T. (обсудить/вклад) 11:33, 25 January 2015 (UTC)
"chm" is the code for Mari and Eastern Mari, which have been merged to Eastern Mari -- I can see that we have done it. I just pointed out that we have it differently from ISO 639-3. Has this merger been done somewhere else outside the Wiktionary space or is this our own invention? Anyway, I maintain that I'm right in saying that our current practice is confusing. Perhaps a clarifying usage note under the entry "Mari" would do the trick. My Russian is elementary, but I understand from ru-Wikipedia that Hill Mari (Горномарийский язык) has some sort of official status in Mari El. Anatoli - can you elaborate a little on this point? --Hekaheka (talk) 12:59, 25 January 2015 (UTC)
Eastern Mari, Russian and Western Mari are three official languages in Mari El but Eastern Mari has 10 times more speakers than Western, is used wider and is often just called Mari (in whatever language). Despite separate code, there's no separate Mari language. Let me know if you want to know more. --Anatoli T. (обсудить/вклад) 13:58, 25 January 2015 (UTC)
There have been two discussions of Mari, wt:Beer_parlour/2013/September#Merging_Mari_and_Buryat_varieties and wt:Language treatment/Discussions#Merging_Buryat_dialects.3B_also.2C_merging_Mari_dialects, which led to the status recorded on wt:LANGTREAT: "Both the macrolanguage and its subdivision mrj are treated as languages, but the macrolanguage code is used in place of the code (chm) which the ISO gave the standard variety of the language." I thought the use of the macrolanguage's code (chm) for Eastern Mari (properly mhr) was because most sources used chm for Eastern Mari text (à la what Anatoli is saying about how it "is often just called Mari"); if that's not the case, we could always retire chm and add mhr. But note that using a macrolanguage's code for the standard variety is something we've done before, e.g. we use lv for standard Latvian even though technically lv includes Latgalian, and standard Latvian alone would be lvs; likewise we use et and not ekk for Estonian. (And the decision to recognize either the dialects (under whatever codes) or the macrolanguage, but not both, is, well, to avoid redundancy.) - -sche (discuss) 17:15, 25 January 2015 (UTC)

All right, I think I now understand the way of thinking followed here. In the end the exact definitions of languages are often a matter of convention, so let's follow the one that has been reached. I have written a usage note to the entry for Mari and also otherwise edited it. As the terms "Hill Mari" and "Meadow Mari" are widely used, I think we should somehow explain how they relate to the whole. I made an attempt for that end, too. Please check what you think of the entry as it is now. --Hekaheka (talk) 19:06, 25 January 2015 (UTC)

Surely Wiktionary:About Eastern Mari and Wiktionary:About Western Mari (both currently nonexistent!) should be the primary places to record policy on the representation of Mari on WT? Although I observe that about pages are neglected for a lot of smaller languages. --Tropylium (talk) 22:03, 25 January 2015 (UTC)
Feel free to set up such pages. FWIW, though, WT:LANGTREAT is the central place to record mergers of dialects, splits of languages, and information like that Western Mari is mrj while Eastern Mari is chm. The about pages often concern themselves with orthography, transliteration, entry formatting, etc. - -sche (discuss) 22:06, 25 January 2015 (UTC)

General principles of protolang appendicesEdit

It seems that WT:PROTO and WT:ETYM do not actually go much into this topic at all. A new section in the former, or a separate think tank at Wiktionary:Proto-languages might be in order.

So, some guidelines I would like to have stated explicitly — or, in case it turns out other editors disagree, discussed explicitly:

  1. Appendix pages are not dictionary entries, and are hence not automatically subject to WT:ELE.
    • This means e.g. that we are not obliged to create different entries for different, but related proto-words. In principle an etymological proto-root appendix may be only a single stem, and the different morphological variants indicated by the different descendants discussed in prose instead.
    • Obviously many etymology appendices regardless are effectively about a single reconstructed word. I agree with the current statement at WT:PROTO that in this case,

      (…) the layout of the entries generally conforms to WT:ELE, although some compromises may be made for the sake of usability.

  2. Documenting a proto-language is not the main motivation for having etymology appendices in the first place: they instead exist primarily to highlight the etymological relationships of attested words.
    • All proto-language roots should have at least one existing Wiktionary entry listed as a descendant.
    • Synonymous proto-forms differing only in e.g. declension class should be preferrably discussed on a single page, not split across several "appendix entries".
    • Likewise, even if a subclade with a slightly different proto-language can be established, it should be by default treated together with its parent root. If no parent root of a particular item is known, it should be filed under "Regional Proto-Foo".
      • Which does not mean a blanket ban on establishing closely-spaced protolang levels, of course, if reasons for creating some were to regardless exist.
  3. Roots in intermediate proto-languages which have a well-known parent — such as Proto-Germanic, Proto-Indo-Iranian, Proto-Finnic, Proto-Oceanic — should be provided with either an inherited or loan etymology, marked as words of unknown origin, or tagged with a request for etymology.
    • Not all bottom-level proto-languages have been well-reconstructed, though. E.g. no consensus exists on what Proto-Afro-Asiatic looked like, and so entries in e.g. Proto-Berber or Proto-Semitic would probably not benefit from mass-tagging with requests for etymology just because a reference to cognates elsewhere in AA has not been found.
    • We currently also have a single entry in Category:Proto-Indo-European terms with unknown etymologies for some reason.
    • Similar to WT:WDL, establishing a list of well-reconstructed proto-languages might be useful to have for reference. Not for enforcing new standards on the languages on it — but as a warning sign to anyone setting out to create appendix pages dealing with proto-languages that don't have any "standard" reconstruction.

--Tropylium (talk) 01:50, 26 January 2015 (UTC)

As for Category:Proto-Indo-European terms with unknown etymologies, I just reverted the edit responsible. The etymology said "Origin unknown", and someone couldn't resist the temptation to replace that with {{unk.}}, which is kind of silly for a top-level proto-language. Chuck Entz (talk) 02:23, 26 January 2015 (UTC)

Is documenting all Unicode characters within the scope of Wiktionary?Edit

We have a lot of information about a wide variety of Unicode characters. In many cases, there is a Translingual section just to give a "definition" for some obscure symbol. I have my doubts about whether this falls within the scope of Wiktionary. We're a database on languages, words and their meanings, but arbitrary symbols aren't necessarily used in any of those. So I think we should reconsider this, and set some specific criteria on what symbols to include. Either that or we should explicitly state that Wiktionary is a Unicode database. —CodeCat 20:15, 26 January 2015 (UTC)

I do not think any special criteria are necessary for symbols: attestation in natural-language texts should suffice. I would go further, actually, and establish a principle of describing characters rather than code points. For one, I do not think that fullwidth forms of Latin letters deserve separate entries. Keφr 20:34, 26 January 2015 (UTC)
At least we managed to reach agreement to use regular Latin letters rather than fullwidth letters in the names of entries like CD and CD机.
I could get behind redirecting the individual fullwidth letters to their regular-width counterparts.
- -sche (discuss) 21:37, 26 January 2015 (UTC)
I'm inclined to be more restrictive than that. Symbols should only be allowed if they can be used as entities representing words, not as entities representing themselves. That is, they are used as representing something other than the symbol. That would mean "" or "" would not be allowed for English, because I imagine the only occasion where you could find them in running text is as mentions, in the sense that the symbol stands for itself as an entity. It's not used as a word to stand for something else. —CodeCat 22:09, 26 January 2015 (UTC)
Our usual attestation criterion requires "conveying meaning", so I took this as implied. Keφr 22:39, 26 January 2015 (UTC)
Yes, but the outcome of a recent vote is that consensus can make CFI mean whatever we want it to mean. —CodeCat 22:42, 26 January 2015 (UTC)
Some people may find translingual sections useful but I don't. If character information is to be kept - stroke orders, input methods, references, such as links to Unihan database, they are better kept outside Chinese sections, so this is a pro-argument.
Chinese (single) characters often convey basic meanings, without any connection to a part of speech. That's why translingual sections for Chinese characters lacked PoS. The actual PoS is determined by the usage in a phrase but there could be multiple interpretations and no consistency. There is no inherent PoS in a Chinese word. Published Chinese dictionaries sometimes use PoS info, sometimes don't and they have a lot of mismatches. We should allow ===Definitions=== header, at least for single characters. Definitions from Translingual should be moved to appropriate language sections - Chinese, Japanese, Korean, Vietnamese where appropriate. --Anatoli T. (обсудить/вклад) 23:11, 26 January 2015 (UTC)
This discussion is more about characters like U+1F6A6 VERTICAL TRAFFIC LIGHT than 漢字. Keφr 23:28, 26 January 2015 (UTC)
Well, Chinese characters were mentioned and the presence of a definition line, that's why I commented. The definitions in the Translingual sections may be useful until the moment they are moved (checking required) into language sections. --Anatoli T. (обсудить/вклад) 23:43, 26 January 2015 (UTC)
This is a completely different issue. This is about whether arbitrary Unicode characters warrant entries. 漢字 were only used as an example of characters that may be mentioned, instead of used for their meaning. Keφr 10:08, 27 January 2015 (UTC)
Yes. Chinese characters convey meaning in the languages where they are used, and don't merely stand for themselves. So they can be included for those languages. —CodeCat 23:34, 26 January 2015 (UTC)

Let's keep the meaningless Unicode to appendices, please. We have them for whoever's looking, but we need not have entries for them. bd2412 T 00:39, 27 January 2015 (UTC)

I propose to make entries of characters whose only "definition" is their Unicode character name speedy-deletable. Does this warrant a full vote? Keφr 10:08, 27 January 2015 (UTC)

Yes, because I'd oppose. I think entries of characters whose only definition is their Unicode character name should be hard redirects to whatever Appendix lists them, rather than being redlinks. —Aɴɢʀ (talk) 13:03, 27 January 2015 (UTC)
In the spirit of compromise, I propose to speedily redirect entries of characters whose only "definition" is their Unicode character to whatever Appendix lists them. bd2412 T 13:52, 27 January 2015 (UTC)
Yes, I think that's the best solution. If we delete, someone may be tempted to recreate the entries. Chuck Entz (talk) 15:10, 27 January 2015 (UTC)
On the other hand, redirecting might discourage creation of entries with legitimate definitions. For one, [[]] currently has no attested definition, but it does not mean that it never will. [[]] currently does not exist, but I have seen the symbol used as a logical AND operator over arbitrary sets: I have a really hard time finding good attestation for it, though. Redlinks make it explicit that there is no entry for a given character; redirecting gives the impression that further content is not needed, even superfluous — even though a determined editor can create an entry over the redirect. Secondly, readers may take the Unicode character name as a definition — at which it often does poorly, as I noted in the RFD to be archived at Talk:⦰. And lastly, I just dislike cross-namespace redirections. Keφr 21:53, 29 January 2015 (UTC)
You, Aɴɢʀ and bd2412 specified "characters whose only definition is their Unicode character name", though? This seems to primarily imply "decorative" Unicode blocks like "miscellaneous symbols", "box drawings", "arrows". If there is reason to suspect that there is a more specific definition in existence — as is the case for all orthographic and mathematical characters, for starters — no redirect should be created, IMO. --Tropylium (talk) 22:14, 29 January 2015 (UTC)
Nothing in this proposal, as written, implies it. There are plenty of mathematical and other technical characters with exactly this kind of non-definition (say, [[]] or [[]]), but this is not yet an indication that this is the only possible one. Keφr 13:33, 30 January 2015 (UTC)
Sorry, poor L2 use of "implies" here, I suppose. If an alleged mathematical symbol fails to be attested in any use, I agree that this proposal would allow redirecting that into the appendix namespace as well. --Tropylium (talk) 04:38, 31 January 2015 (UTC)
Symbol support vote.svg Support DCDuring TALK 15:52, 27 January 2015 (UTC)
Symbol support vote.svg Support Equinox 19:29, 28 January 2015 (UTC)
Symbol support vote.svg Support (but see above). --Tropylium (talk) 22:14, 29 January 2015 (UTC)
  • Documenting all Unicode characters could be within scope of Wiktionary if we so decide, and I tend to support this for the sake of user convenience. Having the information in the appendix while having a link to the appendix from the mainspace seems to do the information service for the user, although I am not sure what the benefit is over having this information in the mainspace. As I have pointed out, we include letter entries as letters despite the fact that the letters have no meaning as letters; the objection was that letters at least form meaning-carrying larger objects, which I accept as an interesting one.

    As for "consensus can make CFI mean whatever we want it to mean": no, that is not the outcome of the recent vote. The CFI means what it means and nothing else, and the vote did not suggest otherwise. The opposers in the vote did not suggest that we should be lying about what the CFI says, merely that we should not consider CFI 100% binding. --Dan Polansky (talk) 19:45, 30 January 2015 (UTC)

Any final thoughts from CodeCat and Kephir on redirecting to appendices? bd2412 T 16:52, 6 February 2015 (UTC)
I would support that too. —CodeCat 17:14, 6 February 2015 (UTC)
Above I have given reasons for not using Unicode character names as definitions; since redirecting would essentially give these names the appearance of being definitions, I oppose. Keφr 22:27, 8 February 2015 (UTC)
Since the support seems to overwhelm the opposition, I have gone ahead and redirected to Appendix:Unicode/Miscellaneous Mathematical Symbols-B. It would be nice, however, if it was possible to make the redirect point directly to the row containing this entry, if anyone can tell me how to do that. Cheers! bd2412 T 21:25, 16 February 2015 (UTC)
@Kephir: That would require some changes to Module:character list, so that it adds an anchor to each table row. —CodeCat 22:03, 16 February 2015 (UTC)

Goodbye (with regrets and thanks)Edit

As an editor who has made over ten thousand individual edits over the last year or two, I was dismayed when another editor began undoing all my edits of the past few days without any warning, much less with running his reasons past me.

When this had just started I put an entry into the beer parlour justifying my edits in general terms. This has received no comments from the community.

I am aghast at this sort of behaviour by another editor, and dismayed that it doesn't seem to worry the community. My dismay is because I saw a future for Wikipedia in which the online environment, having enormous data capacity and the ability to link within and beyond Wikipedia, had the potential to eventually greatly outdo my favourite reference book, The Oxford English Dictionary.

However, this intracommunity brutality has quashed my expectations and has made me realise that I have become addicted to trying to improve the Wiktionary. The only way to cure an addiction is to give it up completely. So, and sadly, this is goodbye with thanks to the brute.—ReidAA (talk) 00:38, 27 January 2015 (UTC)

The reason it has no comments from the community is likely that this user is well known for this kind of harassment and point-making, so it's kind of the same old for most of us. Furthermore, stepping up against it would just cause a repeat of the kind of drama that the user is known for causing. So I think people kept quiet to avoid trouble. I know that was my reason for not responding, in any case. —CodeCat 01:07, 27 January 2015 (UTC)
I seem to recall that ReidAA was previously cautioned about using the same fairly lengthy passage over and over again, when the quote at issue appeared to be pressing a political viewpoint (even if the intention was not to press that viewpoint). It is quite frankly very rarely the case that the same quote will be the best showcase for each word used in it. I tend to think that the reasons for the contested reversions were pretty clear. bd2412 T 01:15, 27 January 2015 (UTC)
I don't see what the point of all the reversions was. It looks to me as if someone was just ticked off at him for reasons beyond understanding. Instead of having mediocre-to-average citations for uncited senses we have none. How is that an improvement? DCDuring TALK 01:54, 27 January 2015 (UTC)
It seems to me that the wrong party left. DCDuring TALK 01:56, 27 January 2015 (UTC)
Re: "the wrong party left". Agree here. --Anatoli T. (обсудить/вклад) 04:27, 27 January 2015 (UTC)
I, too, agree. And Reid isn't the first highly-productive user Dan has driven away from the project; remember Speednat? Unfortunately, the community seems to be very reluctant to ban people, even when they are clearly hurting the project. - -sche (discuss) 19:16, 27 January 2015 (UTC)
Let the reader please check User talk:Speednat. The editor seemed to be copying definitions word-for-word from a copyrighted source; shortly after I asked him about that (User_talk:Speednat/2012#Webster.27s_Third_definitions), he stopped adding definitions and went to enter attesting quotations which I suspected were from a copyrighted dictionary as well which lead to User_talk:Speednat#Source_of_quotations; shortly after that, they left. To see that kind of editing from that editor as "highly-productive" is an error. The reader may have a look at User talk:Speednat, and check whether the complaints that I have raised on that page were justified. --Dan Polansky (talk) 20:14, 27 January 2015 (UTC)
I may have jumped the gun with my assessment, but that was the experience I recall. bd2412 T 03:52, 27 January 2015 (UTC)
I draw your attention to User talk:ReidAA items 11 et seq. I think one person single-handedly managed to drive a promising, detail-oriented contributor out of the project. I wish I had been able to pay attention to this over the last 2-3 weeks. DCDuring TALK 04:22, 27 January 2015 (UTC)
While most of the back-and-forth there does involve a single editor, there are other conversations where this editor seems to have driven others to exasperation. Of course, we probably all do that from time to time. bd2412 T 15:11, 27 January 2015 (UTC)
I think the project is worse to have lost him as a contributor, especially as he was beefing up our English language content, especially the more common words, which could well use the attention. Many of his typographic concerns are legitimate and have been neglected and could have been addressed by attention from our technical contributors together with his willingness to attend to the details. DCDuring TALK 15:47, 27 January 2015 (UTC)
I've been asked to draw attention to the recent contributions of Reid and of Dan, where the latter are just more reverts of the former. If we can come to consensus about which version of the entries in question (put away, etc) is best, perhaps we can entice Reid to return. - -sche (discuss) 19:16, 27 January 2015 (UTC)
I request that, above all, user ReidAA is prevented from (a) removing spaces from after #, and (b) performing non-consensual switching of context and label templates. If, by contrast, user ReidAA is not prevented from switching context to lb, then I do not see how I can be prevented from switching lb to context, given the best evidence about consensus or its lack available at Wiktionary:Votes/2014-08/Templates context and label. As for (c) having a single quotation used more than 10 times, I oppose that and I want to see community consensus for this before this continues; this could even by copyright violation, since fair use rationale gets weaker with this: we need quotations for attestation, but we do not need to reuse a single quotation e.g. 20 times. As for (d) put away, I emphasize that the grouped senses were not grouped by hyponymy but by being from baseball. User ReidAA has been making these kinds of willy-nilly groupings as he saw fit without any regard to lexicographical propriety, which I surmise is inferior and should not be continued. I ask the reader to check OneLook dictionaries how they do things, and check whether any of the dictionaries is making sense groupings in ReidAA arbitrary vein. --Dan Polansky (talk) 19:33, 27 January 2015 (UTC)
Later: There would be no copyright violation with even high-multiplicity repetition of quotations from out-of-copyright works. But I find such highly repeated use highly inferior nonetheless, worse than nothing. I surmise that even attesting quotations should be good examples of use, which highly repeated use makes unlikely. --Dan Polansky (talk) 19:41, 27 January 2015 (UTC)
Edits that don't change the appearance of the page, such as whether there's a space after # and whether {{context}} or {{lb}} is used, should never be edit-warred over, or worried about in any way. If someone switches between one way and the other, let them. It has no bearing on the dictionary at all. As for copyvio, what does BD2412 say? It seems to me to be less of a copyvio to use the same quote over and over than to use different sentences from the same work to exemplify different words, because that way we're using less of the total work. And copyrighted or not, I see no problem at all in using the same sentence to illustrate multiple words. I've done it myself to illustrate words of Irish. —Aɴɢʀ (talk) 19:46, 27 January 2015 (UTC)
Re: "If someone switches between one way and the other, let them." I vehemently disagree. I especially disagree that editors should be allowed to remove spaces from after #, since it massively hinders usability of the wikitext from my standpoint. Furthermore, generally, this non-consensual switching generally leads to back-and-forth even without an appearance of an edit war: one editor switches to no space after #, and, say, after a month or more, another editor switches back to their preferred form. We've seen this with "<" vs. "from" in etymologies. This non-consensual back-and-forth is unprofessional and counterproductive; it suggests immaturity on the part of the editor pushing their preferred style. It is one of the reasons why Wikipedia has rules about U.K. vs. U.S. spelling; they do not say "if someone switches U.K. spelling to U.S. spelling, let them". Moreover, I point the uninvolved reader to User talk:ReidAA to see for how long a time I have shown restraint as for label vs. context, that I talked about this multiple time to the user to no avail. I will emphasize one more time: non-consensual switching leads to back-and-forth and should be avoided. --Dan Polansky (talk) 20:00, 27 January 2015 (UTC)
The difference between US and UK spelling is visible on the displayed page, as is the difference between "<" and "from". The difference between presence and absence of a space after # is not. And there is no way the absence of a space after # "massively hinders usability of the wikitext"; it's a purely aesthetic preference on your part. "Non-consensual switching" only leads to back-and-forth if a second editor actually goes to the trouble of switching an edit back to how it was before, which in the cases under discussion is superfluous and frankly silly, since edits of this kind have no effect on the actual content. Of the diffs provided in this thread, I have not yet seen one worth getting upset about, much less one worth reverting. —Aɴɢʀ (talk) 20:12, 27 January 2015 (UTC)
You are entitled to your indifference about wikitext, and I am entitled to my care about wikitext and its legibility. Obviously, user ReidAA cares as much as I do, albeit in the other direction; if they did not care, they could have left the spaces alone. You cannot allow ReidAA to care, and disallow the very same type of care (albeit in the other direction) to me; that is unfair and unjustifiable. --Dan Polansky (talk) 20:18, 27 January 2015 (UTC)
If he were the one reverting your edits my reaction would be the same. —Aɴɢʀ (talk) 20:25, 27 January 2015 (UTC)
I think you're just using consensuality as a pretext for forcing your own preferences. Essentially, what you're implying is that any difference from the norm whatsoever needs to be discussed at length, and people are not allowed to edit as they wish. This, in turn, means that if anyone makes a change you don't like, you can just revert it and claim they have to ask everyone nicely for permission first. But the truth is, the only permission anyone ever needs is yours, as you seem to have declared yourself the arbiter of right and wrong on Wiktionary. And if anyone speaks up about it, the result is a stream of wikilawyering, tu quoqueing and other forms of communal shaming from your part. And I don't see any restraint at all, you basically just messaged a user and told them they should be blocked for not doing things they way you want. That's not just completely unproductive, that falls squarely in the "intimidating behaviour/harassment" ban reason. I would have blocked you for it already if I could be sure your sympathisers would not just unblock you again. —CodeCat 20:18, 27 January 2015 (UTC)
If anyone should feel entirely free to start removing spaces from after # regardless of previous overwhelming practice, why should not another person be entirely free to be adding spaces after # as they see fit? This is plain as a day. You seem to want to disallow me to be adding spaces after # as I see fit, and to be converting lb to context as I see fit, while allowing the conversion in the other direction to another person; how more unfair can this get? --Dan Polansky (talk) 20:28, 27 January 2015 (UTC)
You can't compare those two things. The space after # is more or less universal across Wiktionary, and so is support for continuing to keep it that way. However, you well know that the issue for context and label is much less clearly established and no consensus exists. So if you want to be fair, you shouldn't just be reverting this one user for that, but also the many other regular contributors that use {{cx}}, {{lb}} or {{label}}. The fact that you don't revert all editors, but do revert this one user, seems to me like you're just picking an easy target to bully, who you know won't have much support in the community or will to stand up against you. And now they left. You're a real hero, Dan, congrats. —CodeCat 20:32, 27 January 2015 (UTC)
I have absolutely no wish to perform any non-consensual switching in any new entry created by an editor with any of the {{label}}, {{context}}, {{lb}}, and {{cx}}, and I see nothing unfair about it. I see it as perfectly legitimate to use any of the four templates in newly created entries.
As for "The space after # is more or less universal", so was the use of the likes of {{slang}} without {{context}} before an illegitimate run of your bot. --Dan Polansky (talk) 20:45, 27 January 2015 (UTC)

With respect to the copyright question raised above, there is little to no impact on a fair use rationale from using the same quote multiple times. However, there is no need for us to expose ourselves unnecessarily to any risk of copyright infringement. I would import the practice that Wikipedia uses with images: if it is in the public domain, use it however you want; if it is not, then use it only if there is no comparable public domain alternative. Of course, new words (or words coined since 1923, in any case) will primarily be found in works that are under copyright, as will examples showing that old words remains in current use. I admit, when I add cites to an entry, I always try to find both the oldest use I can find, and the most recent. This also speaks to the quality of sentences as showcases for the words defined. We have a much better fair use argument for reproducing a sentence that does an excellent job of showcasing the word (where that word is the "star" of the sentence) than we do for a lengthy passage with the word incidentally buried in it, particularly if it can easily be shown that better example sentences are available. bd2412 T 20:32, 27 January 2015 (UTC)

Are function words ever the star of a passage (at least to anyone besides a grammarian) ? —This unsigned comment was added by DCDuring (talkcontribs) at 21:31, 27 January 2015 (UTC).
Probably only in very rare and odd circumstances, but I would guess those words would be the easiest to find in public domain sources. Don't forget that the public domain covers not only pre-1923 stuff, but all U.S. government-produced documents (including the limitless world of federal court opinions), and anything that any private author has deliberately released into the public domain. bd2412 T 21:37, 27 January 2015 (UTC)

Arrowred.png Um, guys --

The very next thread here, though by another user, mimics the lead-in to this thread so closely that I surmise it is probably the same editor as ReidAA. Notably, this new editor also signs off as WF, making me wonder if we aren't barking up the wrong tree by censuring anyone for ReidAA's seeming departure... ‑‑ Eiríkr Útlendi │ Tala við mig 21:28, 27 January 2015 (UTC)
It could just as easily be WF mocking ReidAA; or another person entirely mocking them both. bd2412 T 21:35, 27 January 2015 (UTC)
Yes, Wonderfool is mimicking Reid's post for comedic purposes; he previously mimicked Semper's post in which he announced he would stop patrolling Recent changes. - -sche (discuss) 21:40, 27 January 2015 (UTC)
  • Does anyone here have access to tools for seeing the IPs from which ReidAA and Regret and Reward have connected? Otherwise we're left with conjecture. ‑‑ Eiríkr Útlendi │ Tala við mig 00:20, 28 January 2015 (UTC)
    D'oh. DCDuring TALK 00:34, 28 January 2015 (UTC)
  • Does it matter? FWIW I think it's most likely to be WF making fun of this thread and that ReidAA has nothing to do with it, but really it just doesn't matter either way. —Aɴɢʀ (talk) 00:38, 28 January 2015 (UTC)
  • If the two are the same, then this current thread about why ReidAA left is basically entirely mooted. ‑‑ Eiríkr Útlendi │ Tala við mig 00:46, 28 January 2015 (UTC)
  • Only a checkuser can do that, and they wouldn't do it just for a fishing expedition like this. As for whether the two are the same: the only thing they share is their manner of using quotes. In editing style, level of competence and basic approach, they're quite different. Chuck Entz (talk) 03:08, 28 January 2015 (UTC)

Hello (with relief and disapproval)Edit

As an editor who has made over one hundred thousand individual edits over the last ten years or so, I was amused when another editor began deleting all my edits of the past few days without any warning, much less with running his reasons past me.

When this had just started I was creating lots of Spanish plurals, by the way.

I am not surprised by this sort of behaviour by another editor, and relieved that it doesn't seem to worry the community. My amusement is because I saw Wiktionary as a ting in the past, having enormous data capacity and the ability to link within and beyond Wikipedia.

I too am addicted to improving Wiktionary, and figure the only way to take my addiction to the next level is by staying put. So hello to all users old and new, it will be a pleasure working side by side with you all. WF --Regret and reward (talk) 16:07, 27 January 2015 (UTC)

Abuse filters for TV show content?Edit

I don't know why — perhaps it's a rogue bot — but we very often get people creating TV episode lists on Wiktionary. This has been going on for months. I suggest new abuse filters that prevent entries with a title starting "List of", or entries that contain {{Infobox television. Equinox 00:26, 29 January 2015 (UTC)

I would say this is a relatively rare occurrence. If this were a bot, we would get that much more regularly. I think these are just idiots who cannot tell us from Wikipedia and click buttons at random. I rewrote Special:AbuseFilter/12 to catch this and some more. Keφr 13:06, 29 January 2015 (UTC)


I have blocked this user (User:Type56op9) with no direct evidence. It is up to you guys to judge if this block is correct. --kc_kennylau (talk) 14:46, 29 January 2015 (UTC)

  • I have always assumed he was WF. SemperBlotto (talk) 14:48, 29 January 2015 (UTC)
    • @SemperBlotto: He has admitted it in his talk page, case closed. By the way, what does WF mean? --kc_kennylau (talk) 14:51, 29 January 2015 (UTC)
      • Oh, I talked too early... --kc_kennylau (talk) 14:54, 29 January 2015 (UTC)
        • I can’t help but like WF. He was always very sharp, but also immature for his age. Most people continue to mature in judgment, understanding, and outlook through the age of 30, and I assume WF is slowly growing up as well. I would not support him for admin status, but I think he’s an asset and his occasional bouts of wigging out didn’t result in any damage that was hard to find and fix. —Stephen (Talk) 21:45, 29 January 2015 (UTC)
          • I've come around to the position that indeffing WF was wrong and that he should be unblocked. Purplebackpack89 22:59, 29 January 2015 (UTC)
User:Type56op9 is unquestionably Wonderfool (aka WF), though only Wonderfool knows who Wonderfool himself is. As Stephen said, he's mostly an experienced, hard-working asset to Wiktionary, but is prone to irresponsibility and occasional bad behavior. He also tends to focus more than he should on quantity rather than quality. Many years ago, he did a bunch of massively disruptive stuff and got permanently blocked for it. For some years after that, he would constantly reappear under new accounts until someone (usually SemperBlotto) figured out it was him and blocked those accounts, too (he's gone through literally hundreds). Finally he got tired of doing that all the time, and proposed a compromise, which resulted in an agreement: as long as he stuck to one account at a time and behaved himself, he wouldn't get blocked. I would liken this to parole or work release: he's free to do what he wants, but we watch him carefully, and any admin is free to block him at any time for even minor misbehavior. This has worked for the most part, but he does still end up getting blocked from time to time, and he should never be taken at his word, and never be completely trusted. As Stephen said, he's gotten more responsible over the years and acts up less. The main problem lately: Given his track record with bots (we still occasionally find bad entries from years ago- especially from his Polish period), it's hard to trust him with one, and his fixation on high edit-volume makes him want to do things with bots and accelerated entry-creation all the time. Chuck Entz (talk) 02:26, 30 January 2015 (UTC)
User:Chuck Entz's pretty smart, you know. The best new user in the last few years, IMHO, thanks to his diplomacy, good humour, sharp observations and conscientious editing. He's hit the nail on the head with me, too. As for the Polish, yeah, I admit punching way above my weight in that one - I'd been learning Polish, but certainly not enough for quality entries here. --Type56op9 (talk) 07:49, 30 January 2015 (UTC)
BTW, it's nice to be talked about every now and then, too. Thanks! Perhaps we all should talk about each other more (nicely, if we can) and this place will be more harmonious. Of course, the opposite could certainly be true too... --Type56op9 (talk) 07:52, 30 January 2015 (UTC)
The only thing worse than being talked about is not being talked about. —Aɴɢʀ (talk) 09:38, 30 January 2015 (UTC)
Yes, but keeping your head below the parapet for a while, not causing controversy, and not being talked about can also be nice. Even though I'm not admin and have no desire to be, I would say no bots for this user. Donnanz (talk) 10:09, 30 January 2015 (UTC)

Cosmetics vs. makeupEdit

Category:Fashion contains both Category:Cosmetics and Category:Makeup. What, if any, is the difference between them? —Aɴɢʀ (talk) 17:40, 30 January 2015 (UTC)

  • There's a very fine line of distinction, I think. Can they be merged under one category - e.g. "Cosmetics and make-up"? Donnanz (talk) 17:57, 30 January 2015 (UTC)
There's not many entries in either category. It wouldn't take long to merge them. Donnanz (talk) 18:01, 30 January 2015 (UTC)

Distinguishing loans vs. inheritanceEdit

Etymologies on Wiktionary currently use the expression "(derives) from" in no less than three mutually exclusive ways.

  1. A word has been inherited from an ancestor language, and its older form has been recorded/reconstructed as (*)foo.
  2. A word is a loan from another language, where it appears as föö.
  3. A word is a derivative of another word in the same language, and can be analyzed as fo+o.

The third is generally categorized separately (Category:Words by suffix by language, etc.), but the first two are currently completely mixed together under Category:Terms derived from other languages.

There seems to be widespread consensus that a word's etymology should be traced as far as possible along any one of these directions: e.g. name needs to be not only in Category:English terms derived from Middle English, but also Category:English terms derived from Old English, Category:English terms derived from Proto-Germanic, and ultimately Category:English terms derived from Proto-Indo-European. Similarly, alkali needs to be at least in Category:English terms derived from French, and Category:English terms derived from Arabic. (Whether we need to note other intermediate languages of transmission is less clear, but this is not the current point.)

Things get messy once we start mixing these together. Obviously we will not categorize terms according to their morphological structure in their source language, whether it is an ancestor or a loangiver. But plenty of words like bazaar are currently filed under "English derivations from PIE", despite deriving via a long Wanderwort chain and not via inheritance. (This particular word is pretty bad case, in that there isn't even a PIE cognate to the word itself, only to the roots from which it was assembled in Old Persian.)

To get some order around here, I believe introducing some finer-grained variants of {{etyl}} would help, perhaps by some flags. This would allow us to distinguish the various ways in which a word can derive from a different language. I could imagine eventually distinguishing all sorts of things, e.g.

  • {{etyl|la|fr|via=desc}}: French words inherited from Latin
  • {{etyl|la|fr|via=loan}}: French words loaned from (e.g. Modern) Latin
  • {{etyl|la|fr|via=loan-desc}}: French words loaned from a descendant of a Latin word in another Romance language
  • {{etyl|la|fr|via=desc-loan}}: French words inherited from Old French but loaned in there from (e.g. Medieval) Latin
  • {{etyl|la|fr|via=und}}: French words deriving in some undetermined fashion from Latin
  • {{etyl|la|fr|via=deriv-desc}}: French words derived from another French word, which has been inherited from Latin
  • {{etyl|la|fr|via=deriv-loan}}: French words derived from another French word, which has been loaned from Latin
  • {{etyl|la|fr|via=loan-deriv}}: French words loaned from a Latin word, which is a derivative of another Latin word
  • {{etyl|la|fr|via=desc-deriv}}: French words inherited from a Latin word, which is a derivative of another Latin word
    • These last four could be useful if the intermediate derivative is archaic or unattested.

But just a loaned/inherited/neither distinction should be a good start. --Tropylium (talk) 20:22, 31 January 2015 (UTC)

We already have {{borrowing}}, so I propose we create another template to match it, for inherited terms. —CodeCat 20:50, 31 January 2015 (UTC)
Strong support. This would be a huge improvement on the meaningfulness of our etymological categories. — Ungoliant (falai) 20:58, 31 January 2015 (UTC)
I recognize that it will entail a lot of work, but I'll go along with making a distinction between loanwords and inherited words. However, I caution that the finer the level and greater the number of the distinctions beyond that which one tries to make, the less likely it is people will make the distinctions correctly. The last four categories you list, and especially the last two, are of questionable utility, in my opinion.
Re {{borrowing}}: it is useful, but not flexible — it can't handle steps beyond the immediate one, hence umlaut uses {{borrowing}} to cover the immediate step of "English borrowed this from German", but still has {{etyl|goh|en}}. That's where something like the via= parameters proposed above would be useful. - -sche (discuss) 01:13, 1 February 2015 (UTC)
This doesn't apply to a potential inheritance template though. Terms are almost always inherited from one language at most, and that language is always fixed. The source language would only need to be specified in the rare case of pidgins, which have multiple parents. —CodeCat 01:37, 1 February 2015 (UTC)
I think this is a good idea if kept limited to just a few types. It would indeed be useful to distinguish uninterrupted inheritance from everything else: Arabic كحول, for example, is a borrowing from English alcohol, but traces back eventually to Arabic again, so it would be nice to indicate that it's not ultimately of Old French origin. Direct borrowings are also good, as are derivations via an undetermined route. On the other hand, there are just too many different combinations of borrowing and inheritance to try to distinguish between the middle cases: various Germanic and Romance languages as well as Medieval Latin remained in contact for a long time, and terms were borrowed back and forth in various directions, with stages of inheritance in between. For instance, Old Norse terms were borrowed directly into Old English, indirectly into Norman French and then into Middle English, and inherited by modern Scandinavian languages, then borrowed into modern English, so you could have borrow-inherit-inherit vs. borrow-borrow-inherit vs. inherit-inherit-borrow. Latin, on the other hand, could be borrowed just about anywhere at any point in history from the Classical period on, and then inherited and/or borrowed repeatedly- so it could get really, really complicated. Lastly, "borrowing" within a language could be interpreted in so many ways, especially with complications such as analogical levelling and the like, that such categorization would be pretty arbitrary, at best. Chuck Entz (talk) 04:09, 1 February 2015 (UTC)
True enough, I don't expect to see most of those "combo" derivation types in widespread use anytime soon. Just noting some possibilities. An explicit "derived from, but neither by direct inheritance nor by a chain of loaning" flag (to put things in a "Category:X terms indirectly derived from Y") would probably be sufficient for almost all residual needs.
And yes, this would be a lot of work, but parts of this could surely be automated — e.g. if we have an English term whose etymology section starts "from Middle English", that can be automatically marked as "inherited", and if it starts "from X" where X is not an ancestor of English, that can be automatically marked as "loaned". It's the etymologies with longer chains that require more care, in general. But at least the plain "X terms derived from Y" categories would then work as laundry lists of etymologies that still need fine-tuning (or research).
(By the way, I would support not recognizing ultimate Wanderwort origins with a different meaning as plain "loaned". E.g. alcohol may well derive its shape from an Arabic word, but the important step where the meaning 'alcohol' was attached may have come later.) --Tropylium (talk) 05:02, 1 February 2015 (UTC)
  • I oppose this huge overcomplication, adding a barrier to entry of etymologies to Wiktionary. The existence and use of {{borrowing}} is bad enough. I don't see OneLook dictionaries caring to distinguish borrowing vs. inheritance, and I don't see why they should. I saw people replace "from" with {{borrowing}}, and I do not now where they source this information from. And again, I do not see why our users would care. --Dan Polansky (talk) 10:13, 1 February 2015 (UTC)
The idea is not to mandate descent type parameters, but to just have the possibility. This would moreover not be about changing the etymologies themselves as listed in a given article, as much as subdivide the currently often overloaded "X terms derived from Y" categories. E.g. users who are interested in finding what words in Modern English are inherited from Proto-Germanic are unable to find this information anywhere, since Category:English terms derived from Proto-Germanic is also brimming with words loaned at some point from Norse, from Dutch, from Frankish via French, etc. --Tropylium (talk) 23:24, 1 February 2015 (UTC)
The solution to your problem is using {{etyl|gem-pro|-}} instead of {{etyl|gem-pro|en}}. Even better would be cutting off the etymology chain at the point of first borrowing and continuing it on the page of the etymon. There are currently etymologies of modern Japanese borrowings from English tracing the origin up to Proto-Indo-European. This is ridiculous and leads to duplication and desynchronization. We should also remove cognates from the etymology section if the proto-page exists and the cognates are listed there. --Vahag (talk) 11:16, 2 February 2015 (UTC)
Yes, to a point. Of course we should avoid lengthy etymologies for recent borrowings, but we don't have entries for many of the terms in the etymologies, and most people adding etymologies aren't competent in all the languages referenced. Do we really want to create an added incentive to make bad entries? You probably don't work much with many of the languages in English etymologies, but I've seen more than my share of unaccented Greek reverse-transliterated from some dictionary's etymology, complete with epsilon instead of eta, nu-gamma instead of gamma-gamma, etc. Chuck Entz (talk) 14:25, 2 February 2015 (UTC)
The chain should be cut and moved to the etymon's entry only after the latter is created. In the meantime the {{etyl|xx|-}} format can be used. This is what I do for Armenian entries. --Vahag (talk) 14:55, 2 February 2015 (UTC)
This is a possible alternate approach, but let's consider it a bit further:
  • Switching en masse from something like {{etyl|fr|en}} to {{etyl|fr|-}}, if a word is as much as e.g. a loan French → Dutch → English would eliminate entirely, or at least vastly diminish, several "X terms derived from Y" categories. Say, Category:English terms derived from Ancient Greek. I am skeptical on if this would be an improvement.
  • Leaving all but the immediate loangiving language entirely unmentioned I would definitely oppose. Any "Wanderwört of unambiguous origin" should note the language where the word's popularity arises, e.g. à la and variations in the appropriate subcategory of Category:Terms derived from French, even for languages that got it indirectly via English or Russian or whatever.
  • Least radically: I would be mostly OK with leaving at least terms that derive via some combination of descent and loaning (and possibly derivation) without language-specific categorization. On the other hand, this solution would have to be elevated to the level of policy to be even remotely workable. And we'd have interesting border cases to contemplate with chronolects like Middle English. Greco-Latinate educated vocab, for one, has often been attested in literary languages from a relatively early date, but it does not sound reasonable that therefore any Modern English (or Modern German, or Modern Hungarian…) word should aim to only note its descent from some 500-700 years back, and not loan origins beyond that? ---Tropylium (talk) 17:01, 2 February 2015 (UTC)
I don't know, it is hard to formulate a mathematical rule on how far to take the chain, but to me it is intuitively clear in most cases. For example, I would take Japanese アーク, up to English only, フラン up to French and up to PIE. Is that reasonable? I am probably using the notion of what you called a "Wanderwört of unambiguous origin". --Vahag (talk) 17:35, 3 February 2015 (UTC)

Re: We should also remove cognates from the etymology section if the proto-page exists and the cognates are listed there. This is not directly related to the discussion above, but sounds reasonable… to an extent. If a language group has any generally accepted "key" languages, cross-linking cognates in those seems like a good idea. (Isn't e.g. mentioning any possible Sanskrit and Latin cognates of Ancient Greek words already policy?) --Tropylium (talk) 17:01, 2 February 2015 (UTC)

AFAIK there is no written policy. I remember long time ago that me, User:Ivan Štambuk and few others decided to include one cognate from each independent branch of PIE. But now that we have many beautiful proto-pages, cross-linking to even "key" cognates is redundant. Only perverts like you and me are interested in such deep etymologies and they can make one extra click and go to the proto-page. --Vahag (talk) 17:35, 3 February 2015 (UTC)
The idea of dropping non-immediate steps from etymology sections, and making users click through the chain of etyma one link at a time, has been proposed before, and is interesting. I'm of two minds about it. On the one hand, Vahag is correct that duplicating the full chain of an English loanword's descent back to PIE in all the languages that borrowed it constitutes a hideous amount of content duplication and leads to those chains becoming out of sync when people correct errors in only some entries and not others. On the other hand, Chuck is correct that (a) people may feel comfortable noting that e.g. a certain English/French/etc word derives via Persian from Ancient Greek, but may not feel that they know enough Persian to create the Persian entry, which would mean the Ancient Greek connection would have to go unrecorded, or (b) people may feel erroneously confident that they can add the Persian and Ancient Greek entries, and make errors in adding them. Also, do I think it's useful to allow users to see which words from e.g. Aleut made it into German, even if they had to pass through English to get there. The solution is something sort-of like Module:etymtree, IMO. (But apparently not Module:etymtree itself, since it seems designed only for use in appendices.) Users should be able to say "this Japanese word was borrowed from English foo, and then have a module call all the ancestry of English foo up from the same central repository that the English entry for foo, and the Dutch and descendant-by-borrowing foo, etc, etc, call upon. - -sche (discuss) 17:29, 2 February 2015 (UTC)
Etymtree is usable in the main namespace as well. — Ungoliant (falai) 17:36, 2 February 2015 (UTC)

-ise/-ize wordsEdit

Hi, is there any reason why some -ise/-ize pairs (e.g. pedestrianise/pedestrianize or satirise/satirize) have full duplicated (if not always synced) definitions, while others (e.g. specialise/specialize or polarise/polarize) merely have "alternative spelling of"-type entries for one spelling? 23:50, 31 January 2015 (UTC)

This is a point of much contention. Some (like myself) argue for having the content in a single place, and a stub-like "alternative form of X" entry at other places; this leads to arguments about people's personal favourite spellings being marginalised (ized!). There is also the fact that in some cases (I think honor/honour is one), certain obsolete senses can only be attested under one of the spellings. The pair that usually comes up in these arguments is color/colour. Equinox 23:52, 31 January 2015 (UTC)
I think it would be better to have only one entry, e.g. "pedestrianise or pedestrianize" or "honor or honour", or however it was decided to present it, to which both lookups redirected. Then the precedence would at least only be in the order of presentation. Cases of only one spelling being attested could be fairly easily handled with, for example, a "honor only" or "honour only" note against an individual definition. 00:05, 1 February 2015 (UTC)
I believe someone's working on a way to do this. (Who was it? Stand up!) There are probably some technical hurdles to overcome. Equinox 00:17, 1 February 2015 (UTC)
Maybe you're thinking of the Grease pit thread "Revisiting the issue of English UK/US spellings and entry synchroni(s|z)ation". I am interested in this subject, of course. Donnanz (talk) 00:26, 1 February 2015 (UTC)
Yeah it's here: [1]. Thanks. Equinox 00:35, 1 February 2015 (UTC)
Any idea anyone what the technical issues are? I know that Wikipedia can somehow get italic words into article titles (see e.g. here). If the same can be done in Wiktionary, then the heading could be, for example:
honor or honour
Then I suppose some fiddling would be needed with the "en-noun" and "en-verb" templates (or dual-spelling alternatives), but this doesn't seem an insurmountable hurdle. Templates are all under control of Wiktionarians and can be made to do whatever we want, right? As a minimum solution, the "en-noun" or "en-verb" line could just be repeated for the alternative spelling. 15:02, 1 February 2015 (UTC)
It's a Lua module: w:Module:Italic title. I don't know much about Lua, but I think it should be easy enough to install on Wikitionary (although it would need some rewriting to make it only italicise "or"). Smurrayinchester (talk) 10:01, 3 February 2015 (UTC)
  • tangent regarding spelling-specific senses:
    Can anyone point to a concrete example of a word with a certain meaning attested only in one national standard and not in another? I'm just curious, because I can't. I can point to examples where different non-US/UK-specific spellings have different senses. I've spent the past couple of days very comprehensively citing the Norse-gaumr-derived word gaum/gorm/goam. I'm still looking for more citations, but it's looking a lot like:
    • the sense "understand; comprehend; consider" is attested 3+ times only in the spelling gaum,
    • the sense "gawk; to stare or gape" is attested 3+ times only in the spelling gorm, and
    • the sense "see, to recognize, to take notice of" is attested 3+ times only in the spelling goam,
    even though all three senses are found in dictionaries, and in use <3 times, in both of the first two spellings, and they constitute, on a spoken level, the same word.
    Point being that US/UK differences are not the only (and perhaps not even the main) cause of words' meanings being split across different entries.
    (I agree with 109.153 that spelling-specific senses "could be fairly easily handled with, for example, a "honor only" or "honour only" note against an individual definition".)
    - -sche (discuss) 19:53, 4 February 2015 (UTC)
    I'm not entirely sure what you're looking for. Something along the lines of a sense of favorite that favourite does not have? If so, there's a meaning of program that programme doesn't have, namely a computer program (which is spelled program in en-GB as well as en-US). —Aɴɢʀ (talk) 21:29, 4 February 2015 (UTC)
    My opinion: entries should be centralised and all senses should be attested on one of the spellings. Some reservations, e.g. is [[disk]] also used for intervertebral disc, as disc? --Anatoli T. (обсудить/вклад) 22:06, 4 February 2015 (UTC)
  • @Aɴɢʀ: When the subject of centralizing -ize/-ise and -or/-our and other US/UK pairs of spellings' content onto single pages comes up, people sometimes say "what if a given sense is only attested in one spelling?" My points are (1) senses being split by spelling isn't just a US/UK thing, (2) splits can be handled in the way the IP suggests, and (3) senses being split specifically by US/UK spelling seems to be not only too vanishingly rare to represent an obstacle, but perhaps entirely nonexistent. An example of a US/UK spelling split would be if "realize" were used to mean "say foobar" in America, always with the -ize spelling, while it was never used to mean that in Britain or with the -ise spelling. (In such a case, if we centralized page would have to contain a note that the "say foobar" sense way US-only.) In the case of program/programme, I note that google books:"computer programme" is actually very well attested. Even if it weren't, the fact that both national varieties of English used the spelling program would be a different kind of issue. (Specifically, it would mean there were two things we needed entries for: one thing spelt "programme" in the UK and "program" in the US, and another thing spelt "program" in both countries. In the "realize" example, in contrast, we'd be talking about a thing that was attested only in one country and in one spelling, and that wasn't used in the other country in any spelling.) that turned into a ramble, I apologize/apologise... - -sche (discuss) 22:30, 4 February 2015 (UTC)
    So something more along the lines of British arsed (as in I can't be arsed to do it), which isn't *assed in American English, because that word isn't used with that meaning in American English? I know that's not a perfect example because arse and ass differ in more than just spelling, but if spelling were the only difference between them, would that be what you're looking for? —Aɴɢʀ (talk) 23:17, 4 February 2015 (UTC)
    A {{qualifier|only "arse" spelling}} could be used then or something similar. --Anatoli T. (обсудить/вклад) 23:20, 4 February 2015 (UTC)
    Ooh, yes, if it's true that "assed" isn't used that way, then that's the sort of thing I was talkng about. :] And as Anatoli says, it's straightforward to handle with labels (and/or usage notes). - -sche (discuss) 03:01, 5 February 2015 (UTC)

February 2015

Wiktionary:Criteria for inclusionEdit

I ask that Kephir edit to Wiktionary:Criteria for inclusion (diff) is undone. I would do it myself but I cannot. Vote Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep did not pass, and therefore cannot lead to any edit to WT:CFI; no edit to that page was proposed. Furthermore, the opposers of the vote did not express the wish to ignore CFI, merely to override it in a relatively small number of cases. I am fairly certain that the edit is not based on consensus. --Dan Polansky (talk) 12:42, 1 February 2015 (UTC)

Yeah, it probably should be undone by a sysop. At present, we have no mechanism for deletion of articles. See my comment below. Purplebackpack89 16:21, 1 February 2015 (UTC)

Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep: What does it mean?Edit

The clock ran out on Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep, and it was closed as not enacted. So what does this mean? Recently, Wiktionary:CFI was changed to be an obsolete policy. Much as I do not agree with CFI as currently written, it's clear that that was not the correct approach, and creates a host of problems stemming from the lack of a mechanism to delete any article. No, what I interpret the discussion to be is that many participants are unhappy with CFI as written and they believe things other than CFI should be considered in RfD discussions. The upshot of this is not demoting CFI (or at least the part of it that is RfD's purview) to obsolescence, but demoting it to a guideline. Some of you have asked "what's the difference between a policy and a guideline?" A policy is overarching, supported by a wide supermajority of participants, and should be followed all or almost all the time. A guideline can be ignored if there's a consensus to, need not cover everything, and can be enacted with less of a supermajority. The practical implications of this are that articles can still be nominated for deletion under the auspices of CFI, but they won't necessarily be deleted solely on CFI. I see this as being in line with the vote. Note that much of this doesn't apply to the parts of CFI that have to do with RfV. I have seen no evidence of people being upset with the verifiability sections of CFI, which should probably be spun off into a different page that remains policy. Purplebackpack89 16:16, 1 February 2015 (UTC)

Vote Trimming CFI for Wiktionary is not an encyclopediaEdit

FYI, Wiktionary:Votes/pl-2015-02/Trimming CFI for Wiktionary is not an encyclopedia. Let us postpone the vote as long as discussion requires. --Dan Polansky (talk) 16:24, 1 February 2015 (UTC)

Just some food for thought about botsEdit

In the words of C.G.P.Grey, "They don’t need to be perfect, they just need to be better than us [humans]."

How should we deal with bots who make mistakes? --kc_kennylau (talk) 16:42, 1 February 2015 (UTC)

  • The traditional thing is to tell their human owner. That's what people have done with mine (far too often). SemperBlotto (talk) 16:44, 1 February 2015 (UTC)
    • @SemperBlotto: Well, telling the human owner does not directly solve the problem. Sometimes, fixing the error is much harder and time-consuming than creating it, especially if the error is hidden amongst a group of, let's say, 5000 pages. Personally, I spend 1% of the time writing the programme and 99% of the time debugging. --kc_kennylau (talk) 17:29, 1 February 2015 (UTC)

Request to Add New Subcategory "LWT" Within LDLEdit

This is a request for the Wiktionary Community to consider adding a new subcategory, Languages without a Written Tradition (LWT), under LDL (Less Documented Languages).

What is an "LWT"?

An LWT is a language that has an oral tradition, but has no tradition of writing and no written publications authored by native speakers. LWTs are a subset of LDLs. (Note that documents authored in other languages by outsiders and merely translated by native speakers, such as the Bible and government documents, are not suitable as sources for documenting a language.)

Why not Simply Call LWTs "Unwritten Languages"?

The term "unwritten" can be misleading, because the boundary between languages that are "unwritten" and languages that are "written" is actually quite fuzzy. Presumably we can all agree that a language community that has no writing system, no notion of literacy, and has never had its speech transcribed by outsiders can be considered an "unwritten language."
But when that community is visited by linguists who develop an orthography, and (perhaps imperfectly) transcribe some words and phrases from the spoken language into written form, perhaps publishing the results, what then? Is this language "written," even if no one in the language community is literate, and the published "results" contain the errors of a non-native speaker? Some of you might call such a language "written," and others just as reasonably might say it is "unwritten."
Let us now consider a third example. What about a small indigenous language community in Brazil that is completely unfamiliar with writing, and yet, through a process of increasing contact with the national society, develops an orthography and village schools, where children are taught to read and write in both their indigenous language and Portuguese? Obviously, when nearly every child can write words in their own language, the language cannot be considered an "unwritten language." Yet is it a "written" language? Does it have great literature? Yes, in oral form. Poetry? Absolutely, in oral form. Historical narratives, sacred texts, genealogies, song lyrics, compendiums of botanical and zoological knowledge? Yes, all in oral form. What, then, is written in this language? Aside from basic word lists and literacy primers modeled on Portuguese examples, virtually nothing — yet. Today's young adults are the first literate generation.
This is the case with Wauja, an Arawak language spoken by 400 indigenous people in lowland Amazonia. Although Wauja was "unwritten" a generation ago, today it is "written," in the sense the children are taught basic literacy in their village schools. However, as yet — and this doubtless will change — there is no written tradition in this language, no body of publications authored by native speakers. All their literature is still in oral form.
For the purposes of Wiktionary, the key issue is not whether a missionary or professional linguist has phonetically transcribed snippets from the language, but whether there exists a body of work authored by native speakers that is large enough to provide references for every word in the language. For languages like English, Chinese, and all "major" languages, the answer is yes. These languages have extensive written traditions. For thousands of small and endangered languages, the answer is no. These are languages with rich intellectual and literary traditions — in oral form. Such languages may have some (recently-acquired) knowledge of writing, but they have no tradition of writing. This presence or lack of native-speaker-authored published references is the distinction that matters for the Wiktionary community, at least in reference to inclusion criteria.

Why is the Subcategory LWT Needed?

LWTs, by definition, lack a body of published sources authored by native speakers. As a result, it is not possible to use published sources to attest to Wiktionary entries for LWTs. Nevertheless, LWTs are important members of the family of human languages, with rich literary and intellectual traditions, and they deserve to be included in Wiktionary. In fact, these LWTs are typically endangered languages spoken by language communities that are most in need of the permanent, globally accessible, open source, cultural commons platform that only Wiktionary can provide. Therefore, it is proposed that the Wiktionary community define this limited category of languages (LWTs) and agree upon attestation criteria that are sensible and appropriate for such languages.

Can LWTs Meet Current Attestation Standards?

Current Wiktionary attestation standards (see Criteria for inclusion) call for verification either through widespread use (hard to verify for a language without publications) or "use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year (different requirements apply for certain languages)." For spoken languages that are living [but not well documented on the Internet], only one use or mention is adequate, subject to the following requirements:
  • the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
  • each entry should have its source(s) listed on the entry or citation page, and
  • a box explaining that a low number of citations were used should be included on the entry page (such as by using the LDL template).
Assuming that the first bulleted requirement above refers to a list of materials that are permanently available online, probably most LWTs cannot meet this requirement. For example, in the case of the Wauja language, spoken as a first language by 400 people in the Amazonian rainforest, there are hundreds of audio recordings, and several dozen carefully transcribed traditional stories, but none of them currently are available online. (Though they could be made available to Wiktionary admins upon request.)
Before these stories are posted online, the community must agree that they are correctly transcribed. That's because they were first recorded and transcribed several decades ago by an anthropologist (myself, in this case), at a time before any Wauja were able to read and write. Today, there is a cadre of young university-educated Wauja bilingual schoolteachers who are deeply committed to standardizing their orthography and documenting their language. However, this process takes time, because it is not decided by fiat. Instead, the Wauja, like many communities that speak LWTs, take time to reach decisions through building consensus. It's a chicken-and-egg situation. Without a standard orthography, it's hard to build a dictionary, but without a dictionary, it's hard to standardize the orthography.

Proposed Attestation Standards for LWTs

To allow responsible documentation to proceed within Wiktionary while members of LWT communities increasingly move toward standard orthography, publications by native speakers, and full compliance with Wiktionary LDL attestation standards, the following interim attestation standards for LWTs are proposed:
  • The community of editors for that language should maintain a list of materials deemed appropriate as the only currently existing sources for entries.
  • These sources may include audio or video recordings of native speakers, and transcripts of such recordings.
  • Sources also may include direct quotes from letters and written messages produced by literate native speakers, provided that the quoted material is archived online and annotated as described below.
  • All sources must include mention of the date of the recording or transcription, names of the native speakers recorded, the location of the recording, the name of the person making the recording, and location where the source is archived, if not online.
  • Once the transcript has been authorized by the language community as a faithful transcription, the names of community members involved in verifying the transcription also must be noted, and a copy must be posted to a permanent online location, such as Wikisource.
  • If Wiktionary admins find any reason to doubt the authenticity of the sources cited, they shall be allowed to examine the source material.
The overall goal of attestation standards for LWTs is to ensure responsible and reliable attestation for LWT entries, while making Wiktionary the best platform for documenting the world's many LWTs.
Clarification re: Sources for Attestation (Text vs. Audio and Video)
Based on comments below, it appears that the "Proposed Attestation Standards" listed above need clarification. My intention was to propose uploading all TEXT (written) sources to a permanent online location. This could be Wikisource or another location, such as an endangered language digital archive. However, I cannot propose uploading the actual AUDIO and VIDEO recordings to a Creative Commons site, because some language communities might not want the actual voice and video recordings of their elders in the public domain. For instance, in the case of the Wauja community (an indigenous people of Central Brazil), it would be offensive to publicly play recordings of elders after they have died, particularly since the community would have no say in how often or under what circumstances the recordings would be played. No such restriction is attached to mere text transcriptions, however, which Wauja elders consider to lack the spiritual power of the human voice. Most types of written texts could be posted in a freely accessible permanent online location.
Fortunately, however, the recordings could still be used for attestation. The Criteria for inclusion states:
"Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google. Print media such as books and magazines will also do, particularly if their contents are indexed online. Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived." (emphasis added)
If audio and video recordings used for attestation are deposited at a digital archive for endangered languages (for example: ELAR, the Smithsonian Institution, the Library of Congress), then "someone referring to Wiktionary years from now is likely to be able to find the original source" and, at the same time, the wishes of the language community will be honored regarding the respectful and appropriate use of their recorded material.
In summary, I propose that TEXT source materials (such as PDFs of transcriptions and translations) be posted on Wikisource or another location, such as an endangered language digital archive, but that AUDIO and VIDEO recordings of actual human beings be archived in a publicly-accessible digital archive that is equipped to honor specific intellectual property rights and privacy concerns of the endangered-language community in question. (Examples of suitable archives: ELAR, the Smithsonian Institution, the Library of Congress, and so on.) Emi-Ireland (talk) 22:10, 2 February 2015 (UTC)

Honoring the "No Original Research" Principle

For a language with a written tradition, it is appropriate to refer to published sources written in that language. However, for a language that consists of an exclusively oral tradition, it is appropriate to refer to authoritative oral sources that have been recorded and transcribed. To ensure that the "no original research" principal is honored, transcriptions of traditional stories, historical narratives, public oratory, and sacred incantations performed by elders before an audience can be given priority as sources, since these linguistic sources are particularly authoritative and reliable for LWTs.
Clarification: Faithful transcriptions from audio or video sources are NOT considered original research on Wikipedia
I searched for a Wiktionary policy statement on No Original Research, but have not found it yet. In the meantime, here is the Wikipedia policy statement on transcriptions from audio and video sources:
"Translations and transcriptions: Faithfully translating sourced material into English, or transcribing spoken words from audio or video sources, is not considered original research. For information on how to handle sources that require translation, see Wikipedia:Verifiability#Non-English sources." Emi-Ireland (talk) 19:05, 3 February 2015 (UTC)

Proposed Standard for Transitioning from LWTs to LDLs

When a language has a sufficient body of publications (authored by native speakers) so that every word in the language can be referenced to a published work authored by native speakers, that language is no longer an LWT.
In practical terms, there is no hard and fast cut-off point, but perhaps we can say that once an LWT community has achieved a minimum threshold of 3,000 entries in Wiktionary, the community will have become aware of the importance of lexicography and its methods, and it will have benefited greatly from using Wiktionary to document, analyze, and teach literacy in their language. The language community will have had an opportunity to standardize their orthography, properly review transcriptions of older recordings of traditional oral literature, have native speakers produce new publications based on new recordings, and permanently archive online all such transcripts and publications. As a result, this language community will be considered capable of meeting LDL attestation standards going forward.

Emi-Ireland (talk) 19:35, 1 February 2015 (UTC)

Symbol support vote.svg Broadly support. This whole idea needs much more detail yet, but it seems clear that attestation standards for languages that have an existing literature, just one that is not well documented on the Internet, will have to be different from languages that have never had a written tradition.
It is also unclear to me if any considerable community of editors with a LWT as their heritage language (whether as a mother tongue, passive knowledge, or something in-between) even exists on the English Wiktionary yet. Even several larger minority languages out there, with relatively long-running historical traditions, have hardly any editors with more than elementary skills (e.g. Nahuatl, Navajo, Northern Sami, Xhosa). The situation might be different on other Wiktionaries, though, and e.g. I would not be surprized if a hypothetical Wauja wiktionarian community ended up preferring the Portuguese Wiktionary.
Also, as far as I know, "materials deemed appropriate as the only sources for entries based on a single mention" is not a priori limited to material permanently available online. This could well include sources such as linguistic publications, depending on the language in question.
Some other issues to consider:
  • If no consensus orthography exists yet, how are we to title any word entries? In terms of a pronunciation?
  • Would the entries qualify for the main namespace at all, or should a new dedicated namespace such as Unwritten: be established?
  • What about unwritten extinct languages? (I have been preparing a proposal with respect to some extinct languages, but for now I think I will instead watch this discussion unfold.)
--Tropylium (talk) 00:16, 2 February 2015 (UTC)
Re Tropilium's question: "If no consensus orthography exists yet, how are we to title any word entries?"
Tropylium, this is an excellent question. Perhaps we can consider the case of the Wauja, as an example. Currently, the Wauja themselves agree on the spellings for many words, but there is a vowel that missionary linguists spell one way, using a character not found on standard keyboards, and some young Wauja schoolteachers want to spell it another way, using the standard Latin alphabet. The community will have to sort that out, and it may not be decided overnight. (Certainly English spelling was not standardized overnight). In the meantime, the spelling of Wauja words in Wiktionary may occasionally need to be corrected.
A more thorny issue is where to place the breaks between words. This is where various Wauja authors most often differ from one another. Wauja is an agglutinating language. For example, verbs can use multiple suffixes simultaneously. Some authors write them all as one word, and others might break off the last suffix or two and write them as a separate words. It is possible that decisions to break up long Wauja words may result from notions that a word looks "too long" when compared to Portuguese words. My view is that both approaches are entirely valid, and that the community will have to decide which it chooses to use as the standard.
In the meantime, this wonderful language is endangered, and so it is essential to continue with the process of documenting it. Documentation is valuable not only for its own sake, but because it sends a strong message to young Wauja that the outside world values their language. In fact, the community is very excited that this summer they will be trained in how to participate in building a digital lexicon on the Wauja-Portuguese site. It is entirely possible, Tropylium, that you are correct, and that the Wauja will see the Wauja-Portuguese site as "their" dictionary. However, the Wauja also see themselves as global citizens, and are delighted and proud that a dictionary is being created that translates their language into English. Currently, some young Wauja learn snatches of English from popular song lyrics they encounter online. A Wauja-English Wiktionary will be welcomed not only by scholars and the general public, but by the Wauja themselves. Emi-Ireland (talk) 01:19, 2 February 2015 (UTC)
  • Oppose. No mechanism of independent verification proposed. The attesting recordings are proposed to be uploaded directly onto Wikimedia servers as per "Proposed Attestation Standards for LWTs" above. The section 'Honoring the "No Original Research" Principle' above seems to be contradictory; this does look like original research, especially in that the attesting material itself is original research (we do original research in that we are figuring out definitions from attesting quotations, but that's a different game, I think). Thus, this seems like something for Wikiversity. --Dan Polansky (talk) 18:48, 2 February 2015 (UTC)
Re: No Original Research rule
Please note that, per Wikipedia Policy, transcribing spoken words from audio or video sources is not considered original research. If Wiktionary has a policy on transcriptions and the No Original Research rule that contradicts this, I have not been able to find it. See: Emi-Ireland (talk) 19:22, 3 February 2015 (UTC)
Re: Importance of including all human languages
Thank you for your thoughtful comments. Given that you do not support the proposed attestation standards, I earnestly invite you to contribute your own suggestions for attestation standards that you could support. If we put our heads together, surely we can devise attestation standards that do not automatically exclude a large number of human languages, simply because they have an oral tradition, and not a written one.
The thing we must not lose sight of is that languages without a written tradition should not automatically be excluded from Wiktionary. That would be grossly unfair to the speakers of those languages, and it would be a sad day for Wiktionary, as well. We must find a way to include all human languages in Wiktionary, while taking every reasonable measure to ensure that the work is done as it should be.
I am a newcomer to Wiktionary, and so I assume your knowledge of suitable attestation standards is greater than mine. Can we work together to come up with a standard that does not exclude LWTs (languages that do not happen to have a written tradition)? Surely they are many ways we can address this problem. The important thing is to refrain from treating LWTs as if their languages don't belong here.
This is what inspired me to contribute to Wiktionary:
"Wiktionary ... aims to describe all words of all languages using definitions and descriptions in English."
We should live up to that, as well as to our attestation standards. Let's find a way to do both.
Emi-Ireland (talk) 20:10, 2 February 2015 (UTC)

Add category for terms with IPA pronunciationEdit

I propose to add a category for the terms with IPA pronunciation by language. The edit is here which I reverted one second after I did it, in order to demonstrate the method before in order to ask for consensus. --kc_kennylau (talk) 12:34, 3 February 2015 (UTC)

To what end? I think it would be more helpful to have a category for terms without IPA pronunciation, so we know what needs to be added. —Aɴɢʀ (talk) 20:37, 3 February 2015 (UTC)
We already have Category:English terms with audio links though. —CodeCat 20:42, 3 February 2015 (UTC)
@Angr: Well, it is virtually impossible to have a category for terms without IPA pronunciation, and using category scanning tools can actually identify those terms without IPA pronunciation if there is a category for terms with IPA pronunciation. --kc_kennylau (talk) 11:10, 4 February 2015 (UTC)
Sure, why not? We are blessed to be equipped with categories, and they should be exploited. --Type56op9 (talk) 11:15, 4 February 2015 (UTC)
Done. --kc_kennylau (talk) 12:42, 6 February 2015 (UTC)

Languages - are they proper nouns or not?Edit

I have had a long discussion with User:-sche (User talk:-sche#Maori) (can't get it right) about whether languages are proper nouns. In my opinion they are mass nouns instead. It was suggested by -ische that a discussion be started here on the subject. Donnanz (talk) 22:52, 3 February 2015 (UTC)

IFYPFY - -sche (discuss)
Thanks for that! Donnanz (talk) 23:38, 3 February 2015 (UTC)
I see someone already made the point about pluralisation ("various Englishes"); however, our proper-noun template doesn't preclude the possibility of a plural. Some people seem to like to add plurals for given names and surnames. I will admit that I find the common/proper distinction very confusing. Equinox 23:44, 3 February 2015 (UTC)
We should make languages common nouns, also demonyms (nationalities, ethnicities), e.g. German (person), even if they are capitalised in English and some other languages. Nominalised adjectives, like English, Chinese, etc. shouldn't have plural forms in standard English, it's easy to address. --Anatoli T. (обсудить/вклад) 23:47, 3 February 2015 (UTC)
@Atitarev: WTF? Why? This is English Wiktionary. Why in the world should we subjugate our syntactic tradition to those of other languages? That there are lots of secondary uses of proper nouns as common nouns or mass nouns is immaterial.
"Nominalised adjectives, like English, Chinese, etc. shouldn't have plural forms in standard English, it's easy to address." Are you saying that you don't like Englishes and that you disapprove of those who use the word? You should take it up with the authors in these Google Books hits for Englishes. DCDuring TALK 00:27, 4 February 2015 (UTC)
You misunderstand me but I haven't expressed myself well. My point is "English" (noun) and (proper noun) sections should be merged into common noun and a note about "Englishes" should be added, as it is normally uncountable and is pluralised only for some senses. --Anatoli T. (обсудить/вклад) 00:40, 4 February 2015 (UTC)
Funnily enough, the top match (World Englishes: A Resource Book for Students) is a course text at my university. Equinox 00:41, 4 February 2015 (UTC)
Perhaps English is not the best example but "Chineses" or "Vietnameses" sounds pejorative. Anyway, no need to be picky about what I said, let's focus on PoS discussion - common nouns vs proper nouns. --Anatoli T. (обсудить/вклад) 00:48, 4 February 2015 (UTC)
OK. Please provide some reason why you think the merger would be a good idea for Wiktionary users. DCDuring TALK 00:59, 4 February 2015 (UTC)
There was a similar discussion, I think started by User:CodeCat about eliminating proper nouns, there was some reasoning there. Not sure where that discussion is now. While I think it's a good idea, the first step is perhaps deciding what candidates are first to be reduced to common nouns. This will reduce the entries, remove duplication in translations, less maintenance. Days of the week (e.g. Saturday), month names (e.g. November) are also capitalised but they are common nouns. I think language names and demonyms are also common nouns but there are various opinions on this. Let's see what other people think. --Anatoli T. (обсудить/вклад) 01:18, 4 February 2015 (UTC)
The discussion I meant - Wiktionary:Beer_parlour/2014/October#On_proper_nouns --Anatoli T. (обсудить/вклад) 01:24, 4 February 2015 (UTC)
There seems to be a general consensus in that discussion that languages are not proper nouns. However I wouldn't go as far as recommending implementing CodeCat's suggestion that the categories for proper nouns and (common) nouns be merged. Names like English and French are also surnames, so even if languages are treated as common nouns there would still be a need for a proper noun in those cases. Donnanz (talk) 10:36, 4 February 2015 (UTC)
What consensus? Donnanz and Atitarev, non-native speakers of English? It would be like Equinox and I agreeing that Russian entries should not be in Cyrillic. DCDuring TALK 11:16, 4 February 2015 (UTC)
Perhaps you'd like to clarify that, I am a native speaker of English by the way. No decision was reached in that discussion, but reading that thread (even between the lines) I got the impression that there was a general consensus for treatment of languages as common nouns. Donnanz (talk) 11:32, 4 February 2015 (UTC)
In Spanish and French they are treated as common nouns (and are uncountable). --Type56op9 (talk) 11:13, 4 February 2015 (UTC)
In French, the plural may be used in some cases (e.g. Français parlés et français enseignés, a book by Juliette Delahaie, 2010). It's the same as English, except that they are not capitalized. Lmaltier (talk) 21:19, 4 February 2015 (UTC)
In English we dispense with diacritcal marks. So let's clean them out of Spanish and French entries. DCDuring TALK 11:20, 4 February 2015 (UTC)
The Scandinavian languages also treat languages as common nouns, and no capital letter is used. Donnanz (talk) 11:32, 4 February 2015 (UTC)
My inclination is to continue treating language names as proper nouns. I'm having a hard time find authoritative advice on the matter, however. Most of the reference works I've found (via Google Books), both those from a hundred years ago and those from last year, conflate properness-vs-commonness with capitalization. One outright says "Capitalize proper nouns and words derived from them; do not capitalize common nouns", which is obviously inaccurate — tell it to the Marines, the Americans and the Englishmen.
Alfred Marshall Hitchcock's 1910 Junior English Book says North, South, East and West are proper nouns; spring, summer, autumn, fall and winter are common nouns; arithmetic, science, geography and other branches of study are common nouns; and English, French, German, Latin and other names of languages are proper nouns. However, it then goes on to discussion how people don't capitalize the names of familiar animals, but are sometimes tempted to capitalize the names of unfamiliar animals, which makes me question if it, too, is equating properness-vs-commonness with capitalization.
Perhaps most promisingly, International English Usage (2005, ISBN 1134964714) discusses not only "proper nouns (and the names of languages are proper nouns)" and common nouns (its examples are fruit and spider) but also concrete (coin) vs abstract (jealousy) nouns. Maybe someone can find better references — but I'm told CGEL is silent on the matter. - -sche (discuss) 09:04, 5 February 2015 (UTC)
I finally found a source that is explicit on the subject: The Oxford Guide to Practical Lexicography, Atkins and Rundell (2008). In the course of discussing the groups of proper names that a dictionary might include depending on how important the class is to the target market, they have some lists: 'place name', 'personal names', and 'other names'. Under other names are the following subclasses: 'festivals, ceremonies', 'organizations', 'languages', 'trademarks', 'beliefs and religions', and 'miscellaneous'.
To this explicit characterization should be added that all language names are capitalized in English and that they refer to unique things, though those things may be subdivided, especially in technical discussion. The non-existence of a plural of a capitalized noun might be a sufficient condition to indicate that the noun is a proper noun, but the existence of a plural is not sufficient to indicate that it is a common noun. DCDuring TALK 04:59, 8 February 2015 (UTC)
Note that what you explain applies to languages names in English. I think no reader will object when finding language names in French described as common nouns, which reflects better how they are considered by French-speaking people. Lmaltier (talk) 18:47, 8 February 2015 (UTC)
They are common nouns in English too, except here of course. Donnanz (talk) 19:15, 8 February 2015 (UTC)
In the absence of a meaningful way to define the alleged distinction between common nouns and proper nouns (and I mean anywhere in the world, not just on Wiktionary), the question is moot. —Aɴɢʀ (talk) 20:38, 8 February 2015 (UTC)
And will remain moot, because there is a general idea of the definition (and this is more or less the same meaning in all languages), but details about how it's applied are ruled by tradition only, and this tradition depends on languages (and is not always clear). Lmaltier (talk) 20:50, 8 February 2015 (UTC)
@Donnanz: Could you produce some evidence from references or something that asserts that language names are common nouns in English? DCDuring TALK 21:47, 8 February 2015 (UTC)
Look for "mass noun" in orange (not the best colour), otherwise you may miss it. and : Donnanz (talk) 22:10, 8 February 2015 (UTC)
You assume that a mass noun must necessarily be a common noun. But you have acknowledged that trademarks are proper nouns. Providing specific counterexamples to the assumption is left as an exercise to the reader. DCDuring TALK 23:44, 8 February 2015 (UTC)
Trademarks are a different kettle of fish from languages. They start off as proper nouns, but can gravitate into common nouns and can even became verbs (e.g. google and hoover). See also Marmite, Mercedes and Bentley, I think the Bentley car should be a proper noun, not a common noun. Editors can quite easily get their knickers in a twist over trademarks, Oxford lists Marmite as both a mass noun and a trademark. But there shouldn't be any confusion with languages. Donnanz (talk) 09:57, 9 February 2015 (UTC)
There is no confusion. Languages are proper nouns because the are names of singular entities. That they can be used as plurals, used as mass nouns, and used attributively is barely interesting as other proper nouns can too, though the relative frequency might differ by type of noun. Perhaps it would be easier to swallow if you viewed it as metonymy: "There were two IBMs in a refrigerated room."; "I own too much IBM."; "It's an IBM computer." DCDuring TALK 15:28, 9 February 2015 (UTC)

Anagrams - do they serve a purpose?Edit

While I'm at it, I may as well ask whether the inclusion of anagrams in Wiktionary actually serves a purpose - are they useful, or just a fun thing? I'm not sure whether this has been discussed before. Donnanz (talk) 22:59, 3 February 2015 (UTC)

Yeah, someone will come along and object to the anagrams fairly regularly. Points brought up in the past include (i) they are useful for word games such as Scrabble; (ii) they are a genuine provable "function" of a word, whereas e.g. spelling-bee trivia is not. There also tends to be general interest in words with unusual properties, such as palindromes, very long words, and words with unusual combinations of letters (such as our Q-without-U category); anagrams are that sort of thing. Equinox 23:08, 3 February 2015 (UTC)
I see. I have actually changed one or two where the anagram happened to be a synonym or variant spelling, but this doesn't happen very often. Donnanz (talk) 23:21, 3 February 2015 (UTC)
What did you change? From the point of view of the Scrabble player, SPECTER and SPECTRE might as well be totally different words: the point is which one of them is better strategy (e.g. perhaps you don't want the E on a square that gives more possibilities to your opponent). Anagrams are anagrams; I don't think we should edit them based on semantics. Equinox 23:46, 3 February 2015 (UTC)
I'm afraid I can't remember now, but it was only two at the most. I'll bear that in mind in future. Donnanz (talk) 00:03, 4 February 2015 (UTC)

I give zero phucks about games. They do not add any lexicographical content, so we should not include it. Even paronyms and folk etymologies (I have read somewhere that we do not include them) are more interesting than anagrams. --Dixtosa (talk) 15:07, 5 February 2015 (UTC)

I agree, anagrams take space and are useless. Let's remove them. --Vahag (talk) 15:15, 5 February 2015 (UTC)
Out of interest, do Dixtosa and Vahag also favour removal of the palindrome and Q-without-U categories? Equinox 16:51, 5 February 2015 (UTC)
No, because categories do not take up space on the page. I am concerned about the layout of our entries. The fewer sections we have, the better. --Vahag (talk) 16:54, 5 February 2015 (UTC)
Agreed. BTW, I can argue that QwoU category can have lexicographical value (as I see it, it is gonna contain exceptional words, because every occasion when Q is not followed by U is an exception).--Dixtosa (talk) 19:10, 6 February 2015 (UTC)

I personally find anagrams (and also rhymes) pretty useful for solving and compiling cryptic crosswords. The sections are automatically maintained by bots, so I don't see any real reason to object to them. Smurrayinchester (talk) 15:28, 5 February 2015 (UTC)

  • I don't use the anagrams sections for anything myself, but they do no harm and don't duplicate information available (or even potentially available) in any other Wikimedia project, so I'd be opposed to trashing them. —Aɴɢʀ (talk) 15:41, 5 February 2015 (UTC)
  • Ah, we're getting some mixed reactions. Keep 'em coming. Donnanz (talk) 16:02, 5 February 2015 (UTC)
  • At least anagrams can be automatically generated (e.g. this tool for fr). Rhymes can also be generated automatically as well. Although both depend on the exhaustivity of information. Dakdada (talk) 16:08, 5 February 2015 (UTC)
  • A simple way to reduce the space taken by anagrams is to list them horizontally instead of vertically. DCDuring TALK 18:12, 5 February 2015 (UTC)
Not a bad idea, if there's quite a few of them. Donnanz (talk) 18:56, 6 February 2015 (UTC)
It is not exactly about how many lines they take, but rather the fact that nonlexicographical content does not deserve place in articles' pages. Besides, I am sure no1 is able to prove that listing rearrangements rather than, for example, subsets is more plausible. --Dixtosa (talk) 19:09, 6 February 2015 (UTC)
If we can accommodate such content at low cost and low intrusiveness, it might serve to get a few more contributors. I would expect that word-puzzle and word-game fans constitute a significant share of users and contributors. Anything that makes contributing fun is worth consideration. DCDuring TALK 20:23, 6 February 2015 (UTC)
Looking at earnt, putting anagrams on one line has already been happening. Donnanz (talk) 23:52, 6 February 2015 (UTC)
I've been doing it for a while, but not systematically. DCDuring TALK 02:25, 7 February 2015 (UTC)
Apparently,, which also inserted anagrams, used to do it. DCDuring TALK 02:29, 7 February 2015 (UTC)

Why stating that they do not add any lexicographical content? Usually, they are not included in dictionary entries, true, but it would be possible to include them, and we do it, this makes them lexicographical content. Some dictionaries are dedicated to anagrams (see w:Anagram dictionary). The important question is their usefulness, and I think they are useful. Lmaltier (talk) 18:55, 8 February 2015 (UTC)

Anagrams are mathematical. They reduce a word to mathematical properties ignoring meaning, pronunciation, etymology. Literally everything apart from what letters the word uses and if any other words use the same letters. 11:56, 7 March 2015 (UTC)

Admin voteEdit

This is to inform you that I've decided to nominate myself for adminship again. The reason is because I want to work with lots of templates, in order to make lots of cleanup pages. This way I won't have to keep bugging other admins to make changes. BTW, the page is at Wiktionary:Votes/sy-2015-02/User:Type56op9 for admin. It would be fun to hear your opinions. --Type56op9 (talk) 11:33, 4 February 2015 (UTC)

I oughta nominate myself for the mop as well. If this guy's gonna get it, and Kephir still has it, why can't I? Purplebackpack89 15:08, 4 February 2015 (UTC)
Hey, I gotta great idea. Instead of blocking vandals, why don't we make them sysops instead? SemperBlotto (talk) 21:07, 4 February 2015 (UTC)
Yeah! And instead of applying CFI and policy, why not ignore them instead? Oh wait we've done that one. Equinox 13:27, 5 February 2015 (UTC)
We've already made vandals sysops too, and even allowed them to remain sysops after their vandalism has come to light. —Aɴɢʀ (talk) 15:43, 5 February 2015 (UTC)
If I knew what a sysop is, maybe I could have a laugh (?). Donnanz (talk) 15:50, 5 February 2015 (UTC)
Did you consider consulting an online dictionary - hint sysop SemperBlotto (talk) 15:58, 5 February 2015 (UTC)
Er, no, I thought it was Wiktionary jargon. Thanks. Donnanz (talk) 16:04, 5 February 2015 (UTC)

Category for double modalsEdit

Would it be useful to have a category for double modals (might can), like we have one for double contractions? We don't even have a category for single modals at the moment. - -sche (discuss) 20:58, 6 February 2015 (UTC)

As an English-specific category, maybe. —CodeCat 21:26, 6 February 2015 (UTC)
Certainly. (For those reading this thread who don't speak other languages: German and many other languages are also able to stack modals, but they're not defective so they're not remarkable.) Do you think it would be useful to also have categories for modal verbs in general? I notice we do already have Category:German modal verbs, but it's empty. - -sche (discuss) 22:15, 6 February 2015 (UTC)
I'm surprised that we don't have something as fundamental as modals categorized. If we had that would we need one for double modals? That is, wouldn't it be clear by inspection of the base category which were double modals. DCDuring TALK 15:21, 8 February 2015 (UTC)
We do have Category:English auxiliary verbs, which does not seem to be complete. Even English modals may be too sparse to be a good category. Perhaps an Appendix? DCDuring TALK 17:37, 8 February 2015 (UTC)

Is Nostratic allowed in etymologies?Edit

When I removed Nostratic material from a PIE etymology section, Ivan Štambuk reverted me; evidently he has some belief in it, but scholarly opinion is strongly opposed to it. I think we ought to avoid having something so far from the linguistic mainstream treated as credible in PIE entries, and if necessary I would create a vote about it. Does the community support its inclusion? —Μετάknowledgediscuss/deeds 07:53, 7 February 2015 (UTC)

What's "Nostratic material" stand for, exactly? A sourced claim to the effect "*wódr̥ may be akin to *wete" would be sensible enough — there are several "Nostratic" comparisons of this sort that are both credible and well-established, and the main dispute is if they involve inheritance or some kind of loaning. But anything along the lines of "from Proto-Nostratic *wede" (privileging a disputed explanation; not to mention that no two Nostraticists agree on a reconstruction) or "also compare XX in Southern Oromo, YY in Old Kannada, ZZ in Evenki" (or other utterly dubious comparisons) should obviously be nuked on sight.
Looking up the appendix in question (*h₁er-), it seems we have some from column A, some from column B here. At least the Semitic root is well-reconstructed and should be OK to mention. I don't know if there's much point in discussing the alleged cognates in other Afrasian branches and Dravidian, if we don't even have the relevant Proto-Chadic or Proto-Dravidian etymology pages up yet. (The former could well be mentioned in the Proto-Semitic entry, of course.) --Tropylium (talk) 09:22, 7 February 2015 (UTC)
(Also, mutatis mutandis, I would suggest the same for claims of Altaic cognates, in case there's any work going on with that.)
A separate appendix of "Nostratic roots" for documenting the various proposals out there would be OK for me, but it should not be cross-linked from the main etymological appendices. --Tropylium (talk) 09:01, 7 February 2015 (UTC)
Why shouldn't it be linked? How else are people going to find out the connections? --Ivan Štambuk (talk) 20:55, 8 February 2015 (UTC)
What I'm against is creating any Nostratic appendices in the same mold as established protolangs, i.e. each page is an entry for a single proto-root, which lists its descendants, and each descendant is linked back specifically as "from Proto-Nostratic *ʔer-". The research just isn't far enough for that to be a sensible approach. There is no coherent consensus reconstruction of "Proto-Nostratic" that could be treated as a language according to Wiktionary's standards.
It seems doable to instead have pages that catalogue the different overlapping proposals in a particular semanto-phonetic area. Let's say we've a PIE root that has been compared to three different Semitic roots by Bomhard, Dolgopolsky and Illich-Svitych respectively; we can neither pool all of those into a single root, nor should we try to enforce an executive decree on which proposal is the closest to being correct. Instead a new kind of an article layout entirely seems to be required.
Moreover, note that this same problem also comes up within several established language families. There is no standard reconstruction of Proto-Afro-Asiatic, Proto-Niger-Congo, Proto-Sino-Tibetan, etc. So if we'd need some less mechanical way of formatting etymology appendices dealing with these families anyway, it stands to reason that the same approach, and not the proto-language approach, should be applied to Nostratic as well.--Tropylium (talk) 15:29, 13 February 2015 (UTC)
What do you mean by "research is not far enough" ? What standards does Wiktionary have when we allow original research in etymologies? I'm all for creating and establishing standards, but they should be applied consistently.
That type of layout should also be used for all protolangs, since they vary widely depending on the author/school, even established ones. The problem is the software which requires one spelling to be the main entry, and others redirecting to it. It's often a political question as well. But it's not that much of a priority IMHO - the priority is to collect information, and the formatting/presentation issue can always be solved later.
I think that there should definitely be a way in a PIE or PS reconstruction to indicate "there is a Nostratic root that has been connected with this reconstruction, and you can find more information about it here". It's absurd to have Nostratic roots listed somewhere without linking back to them. Perhaps some kind of a floating box would suffice? --Ivan Štambuk (talk) 01:12, 26 February 2015 (UTC)
The research is not far enough in that there is no such thing as accepted Nostratic soundlaws, or an accepted perimeter of Nostratic, that could possibly guide our work. Within any relatively young and well-studied group (on the order of Germanic, Slavic, Finnic, etc.) it is usually simple enough to check whether a particular proto-form, even if not explicitly sourced, is what the alleged descendants suggest. Admittedly I have not paid attention to what kind of OR we might have around exactly though; if editors are establishing etymological connections or devising new soundlaws all on their own, and they have some kind of a policy support for this, I'd argue that that's rather worrisome, yes. But my understanding has been that Wiktionary doesn't "allow OR" as much as "has been lenient in tolerating OR".
Your comment that some type of less formulaic entry layout should be used for better-established protolanguages as well is intriguing. It would indeed probably work for many roots in bottom-level languages like PIE and PU on which there remain many open questions. On the other hand, aside from notational fine-tuning, there is also widespread agreement on the reconstruction of words/roots like *gʷṓws or *kala, and chucking the regular entry layout entirely doesn't seem necessary (even if individual sub-headings may require different treatment). And again, closer to historicaly recorded languages, proto-words like this are probably the majority. --Tropylium (talk) 20:49, 27 February 2015 (UTC)
My opinion on Nostratic is the same as my opinion on Altaic. Someone who digs all the way down to the Appendix page of a PIE root is probably into etymologies, and might be interested in theories that go even deeper, like Nostratic — the key is that they need to be clearly labelled so no-one is mislead either to think that the theories are reliable, or that Wiktionary is believing them. "As part of the controversial Nostratic hypothesis, Smith connects this word to foo." And like Altaic, Nostratic should be limited to appendices (linked to from other places using {{etyl}}, the same way we link to any proto-language appendices).
If the wording were strengthened just a bit ("Within the controversial Nostratic framework" — note the added word and added wikilink to more info), the text at Appendix:Proto-Indo-European/h₁er- would be fine, IMO, though like Tropylium I would find it preferable if someone created a page for the Proto-Dravidan root and moved the individual proposed-cognates there. I wouldn't require someone to create pages for roots before mentioning cognates, for pretty much the same reasons as I outline in my comment here that begins "people may feel comfortable noting..."
- -sche (discuss) 23:05, 7 February 2015 (UTC)
You're presuming that there is a Proto-Dravidian root, but no such thing has been established here. Nostraticists are not a reliable source on whether a given word in a language is inherited. It's entirely possible that Dravidian specialists instead etymologize those Telugu and Kannada words by some kind of derivation, loaning, semantic shift, etc. E.g. if we can trust the StarLing people to have correctly encoded Burrow & Emeneau's Dravidian Etymological Dictionary, Kannada ere 'black soil' is not connected to the Telugu words, and indeed completely isolated within Dravidian. I for one would first ask if an etymology from the homonymous ere 'dark color' is possible, before reaching all the way to Nostratic. --Tropylium (talk) 15:57, 13 February 2015 (UTC)
Nostraticists don't make up the reconstructions for protolanguages that they compare. If you look at the e.g. last edition of Bombhard's dictionary, it has thousands of citations throughout, and the list of references alone is 300 pages. Burrow's dictionary has been available at the DSAL website for a decade now so it's easy to double check that: [2] - it's indeed connected. --Ivan Štambuk (talk) 00:57, 26 February 2015 (UTC)
OK, fair enough with the Dravidian words then. Although we may note that the original DED provides no reconstruction, and explicitly states it is an arrangement of data for later etymological study, not a Proto-Dravidian rootlist. I am also somewhat skeptical on using these kind of positivist resources, but in the absense of any clear arguments against the comparison, e.g. if there are no well-vetted reconstructions of PD out there yet, I'll accept it.
And no, I am not saying that Nostraticists mostly pull their reconstructions from up their sleeve! But sometimes they do, typically when attempting to project isolated words backwards (a la taking a word previously reconstructed only for Proto-Indo-Iranian and asserting a PIE root behind it), and we should source our lower-level proto-term reconstructions from "local" specialists in the first place. Whether preferentially or exclusively is a different debate though. --Tropylium (talk) 20:49, 27 February 2015 (UTC)
Your opinion and acceptance doesn't really matter in the grand scheme of things. The fact of the matter is that we have a credible authority in the field making the connection, and that's enough. Opinions of editors are irrelevant, other than assessing the credibility of the sources themselves. We are just minions collecting knowledge and our personal prejudices or affinities shouldn't get in the way of that. The relevant question is 1) Is information worthy of adding in terms of relevance 2) Is the source credible.
It's funny that you mention that projecting backwards thing - it's very common for "established" protolangs. Since all of the interesting stuff was done a century ago, researchers today are stuck with positing fanciful theories of protolang prehistory and making reconstructions on the flimsiest evidence. If you look at LIV and EIEC every other reconstruction has a question mark. Methodologically it's of course wrong, but there is always a possibility that a genuine PIE root was preserved only in one branch.
It would be nice to have a PII form instead, but unfortunately PII has not been yet been adequately reconstructed in two centuries of scholarship. IEist seem to like to take shortcuts instead, skipping the middle step. Why blame Nostraticists for doing the same thing?
Note that I don't necessarily disagree with criticism, but if you apply the same scrutiny the entry *h₁er- itself should be deleted. It's at least as far-fetched as the Nostratic etymology thereof. --Ivan Štambuk (talk) 00:24, 28 February 2015 (UTC)
If we're going for the "just follow the credible sources" angle, I should hope for you to remember distinguishing between "a number of Dravidian words have been considered probably related to each other" and "the mentioned words come from a unique Proto-Dravidian root".
I would agree, yes, that *h₁er- is not exactly the most convincing proto-root out there; but it does have one major selling point over any given Nostratic root, namely being reconstructed for a proto-language that we at least know to have existed. --Tropylium (talk) 01:59, 28 February 2015 (UTC)
I agree with including Nostratic on -sche's terms. --Vahag (talk) 09:41, 8 February 2015 (UTC)
  • Nostratic, Altaic and other long-range etymologies have vibrant scholarly communities who publish books and peer-reviewed papers on it. There are even journals exclusively dedicated to it. The opposition usually comes from linguists who oppose it in principle, and are not against any long-range theory per se. At any case, it's not up to us to decide whether it's worthy of inclusion or not on the basis of whether the majority of linguists believe those theories to be true or not - the only thing that matters is the notability of theories itself. It seems to me that you're rather worried that poor readers would be mistakenly guided into believing that Nostratic is on the same level of credibility as PIE. Which is on one hand kind of ironic because the PIE reconstruction *h₁er- is itself dubious, just like the two thirds of the entire PIE lexicon. At any case, should Nostratic be marked with some kind of exra-safe version of {{reconstructed}}, it would be fair that the same kind of scrutiny be applied to original research reconstructions done by CodeCat & co. --Ivan Štambuk (talk) 20:52, 8 February 2015 (UTC)

"It is important prononounce it with a long á, otherwise it will sound like…"Edit

This kind of a disclaimer seems to have been added to several Hungarian words that are vowel-length minimal pairs, mostly by User:Panda10. E.g. kén, kérés, kint, mély, méz, vágy, vét. Some cases also warn readers about consonant length, e.g. arra, száll.

Is there a point to this? On one hand, I guess this is semi-useful for English speakers prone to ignoring diacritics; on the other, it seems arbitrary to mention just these kind of minimals pairs, and not pairs involving e.g. s/sz. We are not a language-teaching resource, and so this does not seem to generalize into any kind of a useful policy. --Tropylium (talk) 09:38, 7 February 2015 (UTC)

There were other editors who complained about this before, so apparently it is not useful. Feel free to delete them when you see them. I will do the same. --Panda10 (talk) 12:31, 7 February 2015 (UTC)

Trivia sections in entriesEdit

WT:ELE says "Other sections with other trivia and observations may be added, either under the heading “Trivia” or some other suitably explanatory heading. Because of the unlimited range of possibilities, no formatting details can be provided." However, in practice, we haven't accepted ad-hoc section headings or random encyclopaedic factoids in years, and we've done away with ==Trivia== sections, too. (There were 22, out of our 3 950 000 entries, in the last dump, containing stuff like this.) I suggest removing the clause. - -sche (discuss) 06:07, 8 February 2015 (UTC)

On the one hand, spelling bee trivia is pretty clearly outside what belongs in a dictionary, as is stuff like this and this (which was falsely marked as a "Usage note"). On the other hand, all the other trivia seems to be along the lines of assessees#Trivia, 鬱#Trivia and scrootched#Trivia. If not exactly dictionary material, it is still at least information that pertains to the word itself. 死ぬ#Trivia in particular contains useful and interesting information, and I think we should certainly note somewhere that 死ぬ is the only ぬ verb in modern Japanese. What I think we need is just something a bit more structured than "You can add anything you like under any title you like in any format that you like". Smurrayinchester (talk) 12:07, 8 February 2015 (UTC)
I think they'd be more palatable if they weren't called "Trivia". We already have a "Usage notes" section; what about making an "Orthographic notes" section for information such as is provided in the sections linked to above? —Aɴɢʀ (talk) 13:06, 8 February 2015 (UTC)
That sounds a bit too formal and academic for what is essentially word games. Equinox 14:18, 8 February 2015 (UTC)
In my experience, things like the note at 死ぬ generally get shoehorned into the Usage notes section. And that's fine, in my opinion. I'd be hesitant to create an "Orthographic notes" section for just a tiny handful of entries. OTOH, perhaps we could move anagrams under that header, enclosed in a template similar to {{homophones}} (which would reduce how much space they take up), and then the section wouldn't be so useless/little-used? Alternatively, perhaps we could just have a ====Notes==== section, perhaps even replacing ====Usage notes====? But my preferred solution is to shoehorn the dozen-or-so useful Trivia sections under Usage notes. I mean, it's not wrong to call a note that 死ぬ is the only verb a "usage" note... - -sche (discuss) 18:42, 8 February 2015 (UTC)
  • I support a ===Notes=== header in ELE to replace both this trivia business and ===Usage notes===. —Μετάknowledgediscuss/deeds 20:50, 8 February 2015 (UTC)
    This is the fr.wikt current practice (a Notes header). Lmaltier (talk) 20:53, 8 February 2015 (UTC)
I don't like "Notes" — too vague. It's like having an "Information" section. The whole entry is notes, or information, of various kinds. Equinox 21:01, 8 February 2015 (UTC)
I don't mind "Trivia", but it shows condescension. MW Online and some other dictionaries accommodate word games in their entries. Even in taxonomic names folks play word games (eg Iouea, Aa, Zyzzyzus.
How about "Miscellany" or a right-floating box placed so that it does not rise far above the ruled lines at the bottom of entries. DCDuring TALK 23:06, 8 February 2015 (UTC)

On inflections of extinct languagesEdit

Wiktionary has an interesting policy of only including Old Irish verb forms that are actually attested. Why is this? Sometimes I've wondered if as similar policy is appropriate for Ancient Greek as well, which has never seemed to have well-defined conjugations. Thoughts? ObsequiousNewt (ἔβαζα|ἐτλέλεσα)

I wouldn't call it a policy so much as my personal decision which no one has objected to. I made that choice for Old Irish because Old Irish verb forms are notoriously unpredictable. It is very hard, often impossible, to say what a given form of a given Old Irish verb will be unless it's attested. (Students of Old Irish are often left with the impression that all verbs in that language are irregular; that's an exaggeration, but only a small one.) For this reason, I thought it best if we don't even try to predict them, but merely to list the attested forms. Ancient Greek, on the other hand, has comparatively well-behaved verbs: if you know the stem and the ending, you can glue them together to make the verb form. Even if that form is unattested, you can quite certain that the predicted form is correct. Also, the Ancient Greek corpus is orders of magnitude larger than the Old Irish corpus, making it much more difficult to find out what is and isn't attested. —Aɴɢʀ (talk) 16:41, 10 February 2015 (UTC)

Wiktionary CultureEdit

Some time ago I was told by email that my farewell message in the Beer Parlour had been greatly annotated and that there was a lot of support for what I was doing as a Wikipedia editor. At first I was inclined to ignore it but the lure of the Wiktionary's potential was such that I did look the Parlour item over. The experience was not encouraging as my position seems not to have been understood.
  My resolve to discontinue editing Wikipedia was very definitely not because of what was done to my contributions. Rather it was because of how it was done, that is, because of the despicable rudeness of not even telling me what was being done but leaving me to discover it.
  As a child of the Great Depression I was brought up to abhor rudeness: to avoid doing it and to shun people who do it. This was taught in the home and at both Sunday school and public school, and rudeness was severely punished in the latter, by strapping in the case of boys. Unfortunately the modern educational dogma of building self-esteem seems rather to build selfishness and thus encourage rudeness in many (see here).
  So it's not surprising to me that I should be subjected to rudeness in the Wikipedia quasicommunity, but the lack of surprise doesn't make it less abhorrent to me. In a way it makes it more abhorrent because it is accompanied by a great sadness.
  I know that modern culture is not my culture, but culture makes the person and is not easy to change, especially when you don't want to change it. When as a Wikipedia editor I would often come across modern sexual senses, with informal or transient attestation if any, I realised that this meant that the dominant culture of the Wikipedia editors was modern. Though I disagreed with the inclusion of such senses I left them alone. To undo them would have been rude, and with the predominant editorial culture I felt that raising such an issue in, say, the Beer Parlour would be a waste of time.
  Incidentally, I'm not WF, as was suggested in the Beer Parlour. But I must confess that I got my early Wiktionary coding skills from being an editor under another name. However I left the Wiktionary alone for quite some time after being very rudely lambasted and only returned when a growing enthusiasm for its potential overcame my distaste for the editorial culture.
  This message does not signify a return beyond a couple of items I will put below to convey my personal hopes for the Wiktionary. I will not log back in again and I will not read any emails coming from the Wiktionary editorial community.—ReidAA (talk) 03:45, 10 February 2015 (UTC)

Right, we'll stop posting "sexual" senses that you dislike, and await our strappings. Oh wait, my mistake, I meant to say "good riddance". There's nothing viler than somebody polluting a space with a big bitching rant topped off with the rotten cherry of "I won't be coming back though". If you're gone, don't post. Equinox 00:23, 12 February 2015 (UTC)
I can think of a great many viler things, even if I limit myself to behavior on online forums, and it's unclear to me if the above exaggerration is supposed to accomplish anything other than embellish how your disdain for this ex-editor runs deep indeed. --Tropylium (talk) 14:37, 13 February 2015 (UTC)
I found ReidAA editing style pretty rude: he generally did not respond to user talk interactions in action, and went ahead as he saw fit regardless of disagreement. For an instance not involving me as the main actor, when Widsith asked him to stop switching or get consensus in October 2014, he continued regardless of the conversation. From this and mine interactions with this user and from seeing his long-term pattern of behavior, I learned that he will need to be dealt with directly in the mainspace. And there is no riddance: User:Smuconlaw; last contribution: 15 February 2015. I find it pretty insolent to whine about how one is leaving the project and then go on editing under another user. I think I saw one more user used by the same person, but I cannot find it now. --Dan Polansky (talk) 20:31, 15 February 2015 (UTC) Let me strike out what is an inappropriate speculation, based on insufficient evidence; there is even some evidence to the contrary. --Dan Polansky (talk) 20:42, 15 February 2015 (UTC)
Er ... not sure what this is all about but I am not ReidAA. Smuconlaw (talk) 22:56, 15 February 2015 (UTC)

Suggestions for ExamplesEdit

The following suggestions describe an approach to adding examples to the Wiktionary. The motivation for this approach is to exploit the possibilities for an online dictionary to use practically unlimited storage.
  The potential for examples lies in their benefit for learners of English, either children or non-native speakers of English.
  For such users of the Wiktionary, most of whom one would expect to not be logged on as an editor, any quotations would only be accessible by clicking on the individual quotation tags provided with each sense when quotes are available. Probably the option of seeing all quotes should only be offered to users who are logged on, though the ability to get at quotes for individual senses must be available to the learner whose curiosity about the word/sense background must be catered for, which is why quotations should be linked to sources and to online text where the context of the quote can be found (more on this).
  Ideally (I would hope eventually) every sense would have a few examples. Being primarily for learners, the examples should be short phrases or sentences, each with a distinct context within a sense.
  The user should be able to click on any example to hear it spoken. To be able to do this would be a tremendous help for learners. Maybe there should be options for learners to choose between male and female speech when a choice is available, and even for regional accents.
  If the Wiktionary becomes popular for learning to speak English, the learner could maybe choose to have their pronunciation checked and corrected by it.
  Another good option would be to have sign language used to back up the example. This might not be of all that much use to deaf people, but there is a school of thought that sign language should be taught to all students in their early education both for the mental benefits of being bilingual in this fashion but also because it is a distinct channel of communication to be used where speech is ineffectual, for example in noisy rooms or across long distances.
  In traditional dictionaries, like 1913 Webster's, very brief quotes have been used as examples. This is because of the limitations of the printed page and bound volumes. There is some benefit and interest in using such quotes as examples, particularly for obsolete/archaic senses where an archaic pronunciation would be appropriate, but such examples should be stripped of all their context, except for the date, and an improved version of the example provided as a quotation.—ReidAA (talk) 04:05, 10 February 2015 (UTC)

Suggestions for QuotationsEdit

The following suggestions describe an approach to adding quotations to the Wiktionary. The motivation for this approach is to exploit the possibilities for an online dictionary to use practically unlimited storage and to link to a rich and rapidly increasing source of related online data.
  Quotations (hereinafter "quotes") are in general of particular interest to two kinds of users.
  Primarily they are of interest to experienced readers and writers wishing to discover more about a word or a phrase, or one of its senses, in particular about its early use and its more recent use.
  Secondarily they are of interest to avid readers for whom the quote may spark an interest in a quote's author or source or context.
  For neither of these kinds of Wiktionary users would an abundance of quotes be appropriate for any sense upon initial presentation. Rather a maximum of three or four should be presented directly and these should be spread over a variety of dates and contexts. Should there be more available then all should be stored separately and linked to as a store for that sense's quotes.
  For neither of these kinds of Wiktionary users would very brief quotes be available for any sense; that's what examples are for.
  For both of these kinds of Wiktionary users at least three links should be provided as well as a date: to the author(s), to the work, and to an online source of the work. The Wikipedia will often provide the first two links and Wikisource or Gutenberg or Google books the third.
  It's very important that the works quoted from should be formally published and that the links used should be reliably very persistent.
 The following suggestions describe an approach to adding quotes, at least while they remain relatively scarce. Although the quote will most often be for a word, it should be remembered that the Wiktionary also contains phrases, though not very thoroughly, and quotes for these should be added where there is a quote gap.
  1. Choose a book to read in hard copy whose text is available online and preferably whose author is not yet quoted in the Wiktionary.
  2. In reading the book make a note of any interesting word and its location, and check (then or later) that it is consistent with the online version. The online version will sometimes need editing, or might be of a different edition to the one you are reading.
  3. Prepare an RQ template (with documentation) for the book and enter it into the (incomplete) Wikipedia table of quoted works.
  4. Prepare a skeleton of the code to be used for adding the quote for the first word you have noted for use in the Wiktionary, using a text editor so that you can easily copy and paste the entry into the Wiktionary. The skeleton will be in two parts, the first using the RQ template filled out for the word of your choice, the second holding the chosen word and the surrounding text, say forty or fifty words, which can be copied easily from the online text. Do not highlight the occurrance(s) of your chosen word.
  5. Paste a copy of your code quoting the chosen word into the appropriate sense in the Wiktionary and highlight the word wherever it occurs.
  6. Now go through the quote word by word and check whether there is a quote gap for each word's sense. There are very very many such gaps, even for very common word senses. If you find a gap, fill it in the same manner that your chosen word was used.
  7. The skeleton code can be modified for each of your chosen words so that the procedure above can be repeated for them.
  Note that the benefit of using an RQ template for a book is not just to simplify the adding of multiple quotes from a single source, but also to allow all quotes from a single source to be upgraded, for instance when a better online source becomes available, simply by upgrading the template.
  Another avenue for quote improvement in the Wiktionary is to focus on one of the sources used as a combination of example and quote in the original 1913 Webster's Dictionary. Often these are simply given with only a short author or source name which is explained here.
  One way to improve one of these is on the one hand to simplify it as an example, with no source and only giving a date if the sense is archaic or obsolete; and on the other hand to expand and link it. Many of the sources for such quotes are already supplied with an RQ template (see [3]). — ReidAA (talk) 04:06, 10 February 2015 (UTC)

A great tool to have for quotations is some kind of app, where at the click of a button, one can add quotations to a corresponding WT entry, coming in from Wikisource, Google Books, or another compatible media. I'd pay good many for that! --Type56op9 (talk) 15:40, 10 February 2015 (UTC)
Ask, and ye shall receive. Smurrayinchester (talk) 15:53, 10 February 2015 (UTC)
Wow, that is an awesome gadget. It should be linked from Wiktionary:Quotations! --Type56op9 (talk) 11:46, 12 February 2015 (UTC)

Cool gadgetsEdit

After hearing recently about WT:QQ, a cool quotations gadget, I found myself wondering if we had some other cool gadgets that maybe some users don't know about. So, in hope of some civility, I think it would be appreciated if some other users mentioned here some cool Wiktionary gadgets that may not be known to all the communities, as a way of helping each other out. --Type56op9 (talk) 11:08, 15 February 2015 (UTC)

  • I'll start: WT:ACCEL is a nice gadget to quickly and semi-automatically create forms of words (plurals, conjugations, feminine forms etc.) in various languages. --Type56op9 (talk) 11:10, 15 February 2015 (UTC)
I am already working on that. here--Dixtosa (talk) 11:18, 15 February 2015 (UTC)

Let's categorize semantic loansEdit


I think it is interesting.--Dixtosa (talk) 15:31, 15 February 2015 (UTC)

I have seen "semantic calque" being used synonymously with "semantic loan". We can assume semantic loans are a subtype of calques and include them in Category:Calques by language. --Vahag (talk) 17:22, 15 February 2015 (UTC)
There are also "phono-semantic" loanwords (sometimes adding new, funny senses), such as 馬殺雞马杀鸡. --Anatoli T. (обсудить/вклад) 00:45, 5 March 2015 (UTC)

WT:WE lengthEdit

It was my impression base on this discussion that we were supposed to be slowly shortening the list on WT:WE. But given this diff, that is becoming very difficult. I like helping out with it, but it is slowly filling up with words I'm not able to add or that are so obscure that I can't find anything about them. May I request some aid shortening the list or removing unattestable entries? JohnC5 22:44, 18 February 2015 (UTC)

I've moved some apparent Translinguals out of there and -sche removed some blue links. You could move some of the non-English items that you don't know to the various WT:RE:lang pages, eg WT:RE:he. DCDuring TALK 03:33, 19 February 2015 (UTC)

Pitjantjatjara case markingEdit

In Pitjantjatjara, the ergative case is indicated by the ending -ngku: watingku yuu palyaṉu / man-ERG windbreak make-PAST / The man made a windbreak. Straightforward enough; indeed, I created an experimental entry at watingku. However, case marking in this language strikes me as rather odd: the case ending only attaches to the last word of a noun phrase: wati ninti tjuṯangku yuu palyaṉu / man wise many-ERG windbreak make-PAST / The wise men made a windbreak.

Could -ngku be considered a clitic? Is it appropriate to case entries of the type watingku, given this situation? Given that a lot of nouns (wati being a prime example) are frequently found in their inflected form about as often as not, I worry that we would be doing our readers a disservice by not creating entries for these forms. This, that and the other (talk) 12:32, 20 February 2015 (UTC)

New constructed languagesEdit

Why can't we include all constructed languages in Wiktionary, including Idiom Neutral?

Or my own constructed language, Sintelsk, at least somewhere in Wiktionary's appendix? My constructed language, which I'm writing about at my own Wiktionary that I created and run for fun: . It is a constructed language, mainly based on Danish, that has a precise pronunciation system, and looks more straightforward than most Germanic languages.

If you're interested in considering my advice about my constructed language at least, you may find these categories interesting: English lexicon, Danish lexicon, Spanish lexicon, French lexicon, Lexicon for Sintelsk itself, which includes definitions in Sintelsk and translations of its words into other languages, just like all other Wiktionaries. I built up my Wiktionary to make categories by using lots of templates. Also, my longest page on the wiki is on, which is a Sintelsk word meaning one, a, or an.

Also, let's consider definitely including Idiom Neutral into Wiktionary. NativeCat drop by and say Hi! 06:04, 21 February 2015 (UTC)

Minor Constructed languages can be included in the appendix namespace (for example Appendix:Sindarin). — Ungoliant (falai) 16:27, 21 February 2015 (UTC)
...within certain parameters. Most importantly, if the language is copyrighted (which many constructed languages are), we can't include too much of it or we're violating copyright; see Wiktionary:Beer parlour/2014/July#Inclusion_of_Dothraki. Secondly, if the language has no community of users, it's doubtful whether or not it should be included; many people have opposed including minor constructed languages that have no users. Lastly, if you made the language up yourself, it's a bunch of protologisms not suitable for inclusion except in your userspace. (As long as you're also making useful edits to Wiktionary, and as long as no copyright issues arise, people shouldn't complain about it if you put it in your userspace. Note that if you made the language up yourself, and copyrighted it, but you then publish it on Wiktionary, then per the disclaimer at the bottom of every edit window, "you irrevocably agree to release your contribution under the CC-BY-SA 3.0 License and the GFDL", which you might or might not want to do.) Disclaimer: I am not a lawyer, but we have lawyers here, and they weighed in on the discussion I linked to above. - -sche (discuss) 17:55, 21 February 2015 (UTC)

Manual adding of audio filesEdit

Hello. User:DerbethBot/February 2015 contains a list of audio files (and matching Wiktionary entries) that my bot was unable to add automatically - in most cases due to multiple etymologies (human needs to decide where an audio file belongs). Currently there are 675 audio files that can be immediately used to enrich entries in 30 languages. If you want to help, please check the page and remove entries that are done. --Derbeth talk 18:24, 21 February 2015 (UTC)

Subcategory for 'nyms?Edit

Would it be acceptable to set up a class of subcategory of "Category:<language> names" for ethnonyms (endonyms, etc.)? The names category has the boilerplate description "<language> terms that are used to refer to specific individuals or groups," but it seems this category generally just has 2 subcats, for given names and surnames - no problem there but wondering if the intent of the names categories was to accommodate wider purpose. Or whether an ethnonyms category would go elsewhere. Or if this topic has previously been considered and ruled out. Aside from thinking this would be of general cross-language interest, I'm personally interested in possibly adding various Chinese ethnonyms, using a category to facilitate access to them as a sub-lexicon. I'm no expert on Chinese, but have noted with interest variant forms in hanzi for African ethnic groups. TIA for any feedback.--A12n (talk) 16:39, 23 February 2015 (UTC)

Simplification of topic categories addingEdit

As the creator of {{zh-cat}}, I propose to generalize this template to {{cat}} and use it to add the topic categories. The syntax would be {{cat|en|CATEGORY_ONE|CATEGORY_TWO}}. I do not believe that there would be technical problems in creating the template, so I am only putting this here for consensus and discussion. --kc_kennylau (talk) 09:12, 24 February 2015 (UTC)

See {{catlangcode}}. Chuck Entz (talk) 13:25, 24 February 2015 (UTC)
@Chuck Entz: Oh, then I propose the automation of it and the name changing :) --kc_kennylau (talk) 16:28, 24 February 2015 (UTC)
Re automation (if I understand correctly that you mean a bot to go through and change existing cats to the new template): What would happen then if one wanted to keep specific categories with associated etymologies or word senses within a section for a particular language?--A12n (talk) 18:02, 26 February 2015 (UTC)
Support the simplification. I can also see benefits for languages requiring a non-default sorting order. Module:zh-cat does it already for Chinese - sorting by radicals. Ideally, Japanese would use a similar approach to sort by hiragana. That way, entries won't require code like this: [[Category:ja:Mammals|しし]] or [[Category:ja:Mammals|ひいばあ']] for kanji or katakana entries, e.g. in 獅子 or ビーバー. --Anatoli T. (обсудить/вклад) 00:41, 5 March 2015 (UTC)

Category:Historical terms by language vs. Category:Terms with historical senses by languageEdit

There seems to be a category scheme renaming in progress here that has not been quite completed. Apparently the latter is where most things go these days. Is there any particular reason the former is still kept around as well, just for three Chinese, three English, two Spanish, one French and one Latvian term? Or is it just waiting for deletion once the articles have been edited to be in the latter category branch instead?

The only previous discussion I can find on this is a brief exchange from June 2011: English terms with obsolete senses, etc. --Tropylium (talk) 19:39, 27 February 2015 (UTC)

See also Wiktionary:Requests for moves, mergers and splits/Unresolved requests/2012#Category:English_terms_with_obsolete_senses. - -sche (discuss) 03:43, 28 February 2015 (UTC)

March 2015

Templatizing topical categories in the mainspaceEdit

FYI: Wiktionary:Votes/2015-03/Templatizing topical categories in the mainspace.

Let us postpone the vote as much as discussion needs.

This thread seems related: Wiktionary:Beer_parlour/2015/February#Simplification of topic categories adding. --Dan Polansky (talk) 21:32, 1 March 2015 (UTC)

How is this even close to being ready for a vote?

[Global proposal] (all) Edit pagesEdit

MediaWiki mobile

Hi, this message is to let you know that, on domains like, unregistered users cannot edit. At the Wikimedia Forum, where global configuration changes are normally discussed, a few dozens users propose to restore normal editing permissions on all mobile sites. Please read and comment!

Thanks and sorry for writing in English, Nemo 22:32, 1 March 2015 (UTC)

Thanks for the news. We forgive you for speaking in English. --Type56op9 (talk) 14:44, 5 March 2015 (UTC)

Sports logos in imagesEdit

Happened to notice both woman and American have sponsorship logos clearly visible in the image thumbnails. If we need to illustrate these concepts, can we find images which aren't as corporatish? Pengo (talk) 07:16, 2 March 2015 (UTC)

We should also extirpate all national flags, political slogans, references to NGOs, religions, etc. not essential to the ostensive definitions the images provide. DCDuring TALK 12:32, 2 March 2015 (UTC)
Logos I'll grant that getting rid of a corporate logo for a generic concept like "woman" is probably a good idea but an American flag behind an American on the entry for "American" doesn't seem like a problem to me. In this case, the image contains the word "Toyota", which is the problem, not American symbols. —Justin (koavf)TCM 14:12, 2 March 2015 (UTC)
I agree that the American flag in [[American]] is OK. I've switched the entry's image to one which is similar in every way except that it lacks the Toyota logo. - -sche (discuss) 17:34, 2 March 2015 (UTC)


The documentation for {{l-self}} claims it does not support tr=, but a simple test reveals this is not the case. The question is then: should it? ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 14:29, 3 March 2015 (UTC)

In principle there's no reason why it couldn't. —CodeCat 19:55, 4 March 2015 (UTC)
But are there any languages that use transliteration within inflection tables? ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 20:06, 4 March 2015 (UTC)
Yes. And this template isn't used only in inflection tables. It's used for any template that includes links to the same language. And the underlying logic which omits links to the current page is also used by {{head}} for the inflections: {{head|en|noun|plural|fish}} on fish will not generate a link for the form. —CodeCat 20:19, 4 March 2015 (UTC)

Parameter for Template:head to indicate that a form is missingEdit

Several templates across a variety of languages have custom-written code to show a message like "missing" or "please provide" if one of the forms in the headword line is lacking. For missing genders, we already have a standard approach that {{head}} understands, which is to use "?" as the gender. I'd like to do the same for headword-line forms, so that the following will automatically generate a message and categorise the entry appropriately: {{head|en|noun|plural|?}}. Of course, templates written to use Module:headword or {{head}} can then use this themselves.

Of course, the downside is that you can't link to the entry ? in the headword line anymore, which is probably not normally going to be a problem, but there may be a few edge cases where it turns up. So an alternative way would be to include an extra parameter to indicate that a request should be included in case of a missing form. Something like this: {{head|en|noun|plural|f1request=1}} or perhaps the shorter {{head|en|noun|plural|f1req=1}}. This would then fit into the same fN... format that many of {{head}}'s parameters already use.

I don't expect there will be much opposition to this, but I'd like to ask anyway just in case. If you have a preference for one of the two proposed approaches, please indicate this. —CodeCat 19:54, 4 March 2015 (UTC)

The first one looks much better, is there (will be there) any edge case to start with? I don't think there would be any. --Z 08:13, 18 March 2015 (UTC)

Min Nan loanwordsEdit

How should Min Nan loanwords from Japanese be written when they don't have any kanji/Chinese characters? Min Nan is usually written in Chinese characters or in POJ. Should they be written in Pe̍h-ōe-jī, or should they be written in hiragana/katakana? For example, the Taiwanese Min Nan word for ice cream is "ai55 sirh3 khu33 lin51 mu11" according to 臺灣閩南語常用詞辭典. Currently, I have written it as アイスクリーム (ai55 sirh3 khu33 lin51 mu11) in the translation box under ice cream. The problem with loanwords is that that they don't follow tone sandhi and may not even have one of the 7 tones of Min Nan, which is problematic for POJ. Any ideas for this situation? Justinrleung (talk) 03:56, 5 March 2015 (UTC)

Min Nan terms should be written as they would be by Min Nan speakers. Unless Min Nan speakers use katakana to write the terms, we shouldn't. If we know the Japanese terms that are borrowed, those should be linked to in the etymologies for the Min Nan entries, but not be used in the names of those entries. Beyond that, I would refrain from meddling with a language I don't know. Chuck Entz (talk) 04:16, 5 March 2015 (UTC)
We should probably use the attestability for translations, just like for entries. I doubt "アイスクリーム" (Japanese for ice cream) can be attested to be Min Nan or any Chinese topolect, besides, it's a borrowing (ultimately) from English, so "ai55 sirh3 khu33 lin51 mu11" is a Min Nan pronunciation of "ice cream". Min Nan (Hokkien) is mostly a spoken dialect. If a written form is missing, then it shouldn't be added. As an example, Armenians use a lot of Russian words in speech but those terms lack a written form (ask User:Vahagn_Petrosyan). There are many other cases with diglossia or when a language/dialect lacks a well-developed written tradition.
The other issue is non-standard transliteration, as in tempura, see Min Nan translations 天麩羅 (thian35 pu55 lah3). As Justinrleung explained, it's not a standard tone sandhi but the source is only one online dictionary. --Anatoli T. (обсудить/вклад) 04:24, 5 March 2015 (UTC)
Are there any Min Nan speakers who can give any suggestions to this problem? Justinrleung (talk) 20:14, 7 March 2015 (UTC)
We have currently no native Min Nan speakers. The term may be derived from Japanese but katakana is not used to write Min Nan. It would be hard to attest both the Japanese spelling "アイスクリーム" and the "ai55 sirh3 khu33 lin51 mu11" since Min Nan, as I said, is mostly a spoken dialect. If it's written down, it's written in Chinese characters or Pe̍h-ōe-jī. The source above doesn't suggest the term is written in katakana in Min Nan. Here's what the dictionary says with English translations in brackets:
  • 詞目 ai55 sirh3 khu33 lin51 mu11 (dictionary item)
  • 日語假名 アイスクリ-ム (Japanese kana)
  • 日語羅馬拼音 aisukuriimu (Japanese rōmaji)
  • 釋義 冰淇淋(附錄-外來詞表) (meaning "ice cream" (appendix - table of loanwords))
From "ai55 sirh3 khu33 lin51 mu11" one can't really say that it's definitely from Japanese, not from English. I have recently added all translations of "ice cream" into Min Nan I could find in dictionaries and made アイスクリーム to be verified. Eventually, it should be deleted, since it's not verifiable as a Min Nan term. --Anatoli T. (обсудить/вклад) 21:57, 9 March 2015 (UTC)
  • Sorry for any confusion -- I wasn't making a case for アイスクリーム#Min_Nan. I agree with you that katakana, AFAIK, are only used to write Japanese. Instead, I just intended to ask if the etymology of the Min Nan term was EN > NAN, or EN > JA > NAN. ‑‑ Eiríkr Útlendi │ Tala við mig 23:47, 9 March 2015 (UTC)
  • I understood your question. It may be of Japanese origin, if it's a word in Min Nan. According to the dictionary it is. What I meant is that non-standard romanisation "ai55 sirh3 khu33 lin51 mu11" doesn't really indicate that it may be Japanese (except for "khu33"), it's very similar to how Mandarin words are transliterated using Chinese characters and phonology, note that (sī) and (mǔ) are some of the Chinese characters used in romanising loanwords with non-syllabic "s" and "m". Yes, Japanese words are or were well known in Taiwan and there are loanwords in colloquial Taiwanese Mandarin and Min Nan but this particular word may only have been used colloquially and may never had a written form. Most words have Chinese character spellings or at least POJ. --Anatoli T. (обсудить/вклад) 00:04, 10 March 2015 (UTC)

Inspire Campaign: Improving diversity, improving contentEdit

This March, we’re organizing an Inspire Campaign to encourage and support new ideas for improving gender diversity on Wikimedia projects. Less than 20% of Wikimedia contributors are women, and many important topics are still missing in our content. We invite all Wikimedians to participate. If you have an idea that could help address this problem, please get involved today! The campaign runs until March 31.

All proposals are welcome - research projects, technical solutions, community organizing and outreach initiatives, or something completely new! Funding is available from the Wikimedia Foundation for projects that need financial support. Constructive, positive feedback on ideas is appreciated, and collaboration is encouraged - your skills and experience may help bring someone else’s project to life. Join us at the Inspire Campaign and help this project better represent the world’s knowledge! MediaWiki message delivery (talk) 19:22, 5 March 2015 (UTC)

What 20%? We don't have women on Wiktionary. Hos are not good at lexicography. --Vahag (talk) 20:31, 5 March 2015 (UTC)
Not many but we do have them. What about active ones like Hekaheka, CodeCat, Panda10, Fumiko Take (not 100% about the gender of others)? --Anatoli T. (обсудить/вклад) 22:18, 5 March 2015 (UTC)
@Vahag, despite your generalization I'll assume good faith* and direct you to read ho. Modern and Old Armenian, Russian, German, and English aren't enough to familiarize— well, even some native speakers of American English— with just how
b£00d¥ ɟ∪ɔkᵻɳɢ INSULTING  that word is. It has no place whatsoever in any Wikimedia project except to be discussed, never used. --Thnidu (talk) 00:43, 6 March 2015 (UTC)
* Whoops! I just fixed this link. --Thnidu (talk) 04:41, 7 March 2015 (UTC)
I think good faith can only be assumed in combination with an assumption of mind-boggling ignorance and/or stupidity. Either way: not acceptable. --Catsidhe (verba, facta) 00:51, 6 March 2015 (UTC)
Unanimi sumus, Catsidhe. Nonne clare videtur ira mea? --Thnidu (talk) 05:35, 6 March 2015 (UTC)
Not good at all. I think Vahag was just being silly. I didn't get what "hos" mean at first. --Anatoli T. (обсудить/вклад) 05:53, 6 March 2015 (UTC)
@Anatoli T. (I've switched our four-colon replies to maintain chrono order.) "Being silly" does not stretch that far. (... Боже мой, I envy your polyglottism!) Perhaps one has to live in the US or be in very close touch with its cultures to appreciate that word. Calling that "being silly" is like excusing groping a stranger's crotch as "just like a tap on the shoulder". Uh-uh. And look at the sexist remark the word is embedded in. --Thnidu (talk) 06:15, 6 March 2015 (UTC)
I've known Vahag for a long time, not personally though. He trolls from time to time and gets into trouble for that but he is not really a racist, sexist, homophobe and anti-Semite as he sometimes pretends to be with his silly jokes and comments. I think he just wants attention or create a stir. Not sure. Re: polyglottism - thanks for the praise but I am not as good with languages as you may think but I spend a lot of time on them. --Anatoli T. (обсудить/вклад) 06:35, 6 March 2015 (UTC)

North American English vs Canadian and American EnglishEdit

Some entries are labelled {{lb|en|North America}} and some are labelled {{lb|en|US|Canada}}, and these are categorized differently. This seems unhelpful — users have to check two categories to find all Canadian (or American) entries. Should we (a) make {{lb|en|North America}} an alias of {{lb|en|US|Canada}}, or (b) try to periodically change instances of {{lb|en|US|Canada}} to {{lb|en|North America}}?
The first option is obviously more practical, as the second would require the sort of vigilance and recurring effort that we don't always manage to muster. One might say that it's useful to have a category for words common to both the US and Canada, but the same could be said of "ambitransitive" verbs, yet we've made that label an alias of "transitive, intransitive".
- -sche (discuss) 00:00, 6 March 2015 (UTC)

I like having "North America" be an alias for the separate categories. It would be useful to periodically review definitions that were in {{lb|en|US}} and not {{lb|en|Canada}} and vice versa, but, as we have no practice of marking items as having been passed such a review, it seems to mean a lot of repeated coverage of the same issue. DCDuring TALK 03:19, 6 March 2015 (UTC)
OK, I've made the "North American" label an alias for "Canada, US". Wiktionary:Todo/North American is a list of entries which are labelled as either Canadian or American but not both. We could go through the list, removing entries as we checked them. Once all the entries were removed, we could restore the list to its original state, periodically compile new versions of the list, and compare them to that version to find out which entries were new and thus needed checking. That would hopefully avoid too much re-examination of the same entries. - -sche (discuss) 05:13, 7 March 2015 (UTC)

anchors for links from other Wikimedia projectsEdit

  • On occult, I've added a null-length HTML span with ID to the medical sense of the adjective, as a target for a link from Wikipedia:Occult (disambiguation)#medicine, there being no single appropriate WP page; see the Talk page there.
  • I've done similarly on several other definitions here before, generally noting the reason for the anchor. But this time it occurs to me to ask if there's any problem with my doing this.

Please message me to reply. --Thnidu (talk) 00:09, 6 March 2015 (UTC)

Ungoliant MMDCCLXIV has helpfully answered me on my talk page:
Nothing wrong with it, just use the template {{senseid}} instead of adding the html code manually.

--Thnidu (talk) 02:28, 6 March 2015 (UTC)

We generally discourage HTML, especially in principal namespace In this case {{senseid}} is available and could be useful as a target for in-Wiktionary linking too. DCDuring TALK
Thanks, DCDuring. I'll try to go back over my contribs and templatize any HTML anchors. --Thnidu (talk) 05:41, 6 March 2015 (UTC)

Etymology: root or stem?Edit

How should the words root and stem be used in an etymology? Are they interchangeable? E.g. "From a Proto-Ugric root *xyz-" or "from an imitative root with -asb suffix"? Google search returns more hits for "imitative root" than for "imitative stem" and 9 hits for "Proto-Ugric stem" (mostly from our Wiktionary), 7 hits for "Proto-Ugric root". It would be helpful to have a list of recommended usage. --Panda10 (talk) 18:23, 7 March 2015 (UTC)

Looking at the Lexicon of Linguistics and other references at root at OneLook Dictionary Search and stem at OneLook Dictionary Search, they probably should not be used interchangeably in a dictionary with our pretensions to technical precision. As I understand it a stem is the invariant, common part of a set of inflected forms of a word. I think it should only be used within a given language. I think root can be used to refer to something more basic than a stem within a language as well as in comparisons across language (I'm hand-waving here.). DCDuring TALK 18:58, 7 March 2015 (UTC)
I don't know about Proto-Ugric, but there is a clear distinction between root and stem in Proto-Indo-European. The root is the most basic lexical part, which has a canonical shape (one or two consonants followed by a vowel [almost always e] followed optionally by a sonorant consonant followed optionally by an obstruent consonant). A stem is in many cases a root (appearing in one of its "grades", full grade, o-grade, or zero-grade) followed by a suffix; the stem is what the endings are added to. A single root may form multiple stems, especially in verbs, which may have a present stem, perfect stem, aorist stem, etc., all formed from the same root but using different "grades" and different suffixes (or no suffix at all—some stems are identical to the roots they're formed from) and maybe other modifications like reduplication. See for example *gʷem-, a root, which forms the present stem *gʷm̥sḱé-, the aorist stem *gʷém- (which happens to be identical to the root in this case), and the perfect stem *gʷegʷóm-. —Aɴɢʀ (talk) 19:31, 7 March 2015 (UTC)
The Uralic languages (to which Ugric belongs) also have a distinction between roots and stems. There are two basic root types: (C)VCV and (C)VCCV, where the second vowel must be a, ä or e (i is also equivalent to e in non-initial syllables). So anything that does not ultimately have this structure is not a root in Uralic. The difference with PIE is that roots can be (and often are) words on their own, so we don't put a hyphen after them. If the root is a verb, we do add a hyphen. As for Ugric, I would be very cautious making reconstructions for it as there isn't actually agreement on whether Ugric even exists as a linguistic group with a definite ancestor (other than Proto-Uralic). User:Tropylium can tell you more. —CodeCat 00:32, 8 March 2015 (UTC)
The technical definition is indeed as Angr says: a root is an inanalyzable content morpheme, a stem is a root plus any possible (productive or fossilized) derivational suffixes. Some definitions may include epenthetic vowels or other morphophonological alternations as a part of a stem, but not as a part of a root; e.g. it would be possible to say that Hungarian hal (fish) has the root √hal, but in some inflected forms the stem hala-.
(The a/ä/e thing is probably not a useful criterion for Ugric, since original unstressed vowels are not distinguished in Hungarian.)
Within etymology, I'd suggest not calling proto-language items "stems", unless one is talking about proto-language morphology specifically. --Tropylium (talk) 01:11, 8 March 2015 (UTC)

Thank you all for the helpful information. I have already started removing the words root and stem, using simply "From Proto-Ugric *xyz-" or "From Proto-Finno-Ugric *xyz". For the proto-language items, I am using two reliable references: Uralonet, an online Uralic etymological database of the Research Institute for Linguistics, Hungarian Academy of Sciences (take a look at kerül and its Uralonet entry, the other is a printed etymology dictionary. The challenge is to provide an accurate translation of the Hungarian text. --Panda10 (talk) 14:15, 8 March 2015 (UTC)

Women honoured in scientific names / Inspire CampaignEdit

Estimates of the percentage of Wikipedia editors who are female range from 9% to 23% percent.(source) I imagine the stats on Wiktionary are similar. WMF are searching for ways to address the gender gap with their Inspire Campaign. I have little idea how to address that issue in any really useful way.

But if anyone's interested in making entries for women naturalists/biologists, etc who have been honoured in scientific names, like, for example, [[kingsleyae]], I can put together a candidate list of potentially eponymous specific epithets (e.g. the most common epithets ending in -ae which have no other declensions). Then it will be a matter of picking out the names of humans from the list (which will also include places and parasite hosts) and making entries for them. Perhaps some notable scientists who are missing Wikipedia entries could be uncovered, and so feed into efforts of Wikipedians looking for such entries to create. I might try making a test list, and if anyone's interested in adding their name to a proposal, I might write up something for IdeaLab. —Pengo (talk) 16:56, 8 March 2015 (UTC)

I take it that entries like idae#Translingual are not what you have in mind. DCDuring TALK 18:29, 8 March 2015 (UTC)
Looking through the "A"s (through "An") in my Dictionary of Scientific Bird Names, there are a fair number of women's names. Unfortunately, the yield of those who were not wives, daughters, innamoratae, patrons, mythological or historical figures, or unknown is not high, to wit, two: angelae and annae. I looked at ever eponymous epithet in the range. I'm not really willing to go through the whole book with such a modest yield. DCDuring TALK 19:17, 8 March 2015 (UTC)
My impression from looking at hundreds of insect names is that people named tend to be: 1) The people who found and/or provided the type specimens, 2) colleagues (especially authors of invalid names superseded by the names published) 3) benefactors 4) friends and/or family 5) celebrities and/or historical figures 6) targets of disguised insults or other hidden messages. The earlier custom was to draw as much as possible from classical antiquity, which deteriorated into picking random names out of dictionaries as the number of new taxa outstripped the supply of meaningful figures to allude to. The sheer volume of taxa and the restriction on identical generic names or binomials has led to more and more frivolity such as puns, names from pop culture, etc.
Of the categories above, there are some really interesting people in the first category, including a surprising number of women. There are also a few surprises in the second category with some notable female scientists from a century or more ago. Chuck Entz (talk) 22:31, 8 March 2015 (UTC)
@DCDuring, Chuck Entz: — kingsleyae was actually my first find of a missing -ae named for a human, which gave me some hope. Most of the fish named for her seem to have been first discovered by her too. "idae" is kind of borderline, I guess at a minimum, finding who an entry is eponymous for is important (I'm guessing idae usually refers to an Ida of Greek mythology, though didn't find anything definite in my cursory search). Scientists was my initial focus, but there's nothing wrong with increasing the number of female historical figures, patrons, celebrities, and mythological figures too, and it's also quite possible family and innamoratae were also involved in research. —Pengo (talk) 00:15, 9 March 2015 (UTC)
@DCDuring: I don't suppose it would it be any less tedious if you had an "The Eponym Dictionary of Birds"? —Pengo (talk) 03:41, 9 March 2015 (UTC)
A favorite example is the whitefly genus Bemisia, described in 1914 in honor of Florence Eugenie Bemis, who was herself an expert on whiteflies. In 1904 she published a monograph on whiteflies of California in which she described 15 species new to science. I wish I could create a Wikipedia article on her, but I haven't been able to find biographical information, let alone citable references. Chuck Entz (talk) 04:08, 9 March 2015 (UTC)
What about focussing on species which were discovered by women? - -sche (discuss) 21:58, 8 March 2015 (UTC)
@-sche: Species discovered by women would be great, but I have no idea how to find or make such a list. Though it might be easier for plants. The International Plant Names Index ( has a "forename" field for their "authors" database, so it could be possible to pick out the feminine names, e.g. Miriam Cristina Alvarez (who described Ditassa oberdanii Fontella & M.C.Alvarez, a dogbane from Espírito Santo, Brazil). Ok, so maybe I do have an idea for how to make such a list. Some of the authors in the database seem to be authors of research papers but don't appear to have any species associated with them, e.g. I.Blok (Ida Blok), which tripped me up a bit. I'm not sure where to find an International list of male/female names. I could try extracting them from Wiktionary and/or try to guess based on suffix. Maybe I should write up a grant proposal. —Pengo (talk) 00:15, 9 March 2015 (UTC)
  • Let's say we do this. Are we doing it so that we can show that we care? If so, how will anyone know what we've done? Do we need a set of women's categories to advertise what we've done? DCDuring TALK 23:07, 8 March 2015 (UTC)
  • @DCDuring: "Are we doing it so that we can show that we care?" Yep. (Also there's a tiny chance it might even encourage new editors, as these entries are fairly straightforward to create.) "If so, how will anyone know what we've done?" Write up some sort of summary on an IdeaLab item I guess. I'll have a go at creating the start of one soon. A category could help. We really ought to have one for eponymous specific epithets named for non-mythological humans or the like already. No idea if a category should be split by gender, but it's easy enough to pick the -ae's from the -i's anyway. —Pengo (talk) 01:13, 9 March 2015 (UTC)

First attempt: Here's a bunch of epithets ending in -ae, sorted by usage in books. Not sure how useful it is. —Pengo (talk) 00:15, 9 March 2015 (UTC)

We have nearly 200 items in Category:Translingual taxonomic eponyms and I don't always remember to categorize the items there, so there could easily be fifty or a hundred more. DCDuring TALK 03:37, 9 March 2015 (UTC)
I got the total up to 660 without creating any new pages. Though only found 14 -ae pages to add (which includes a ship: sibogae). —Pengo (talk) 10:53, 9 March 2015 (UTC)

Here's the IdeaLab page, which I have created in my quixotic quest to gather more participants and interest. Please add your name of support it if you're even vaguely interested. Pengo (talk) 23:03, 11 March 2015 (UTC)

Show/hide brokenEdit

Some days the show/hide (inflections, conjugations, translations) functionality is gone and I can't view translations except by clicking "edit". What's going on? This started to happen one or two weeks ago, perhaps at the same time that "§" characters started to appear next to headings. I'm running Firefox on Linux. --LA2 (talk) 14:02, 9 March 2015 (UTC)

Even when it is broken, the content should always be viewable, so it's a double bug. It should not have anything to do with § though, since § is a new Mediawiki feature, and the "NavBars" (hide/show boxes) are created with MediaWiki:Gadget-legacy.js. If it happens again, could you check the log (Tools > Web development > Web console) to see if there is a javascript error? — Dakdada 15:02, 12 March 2015 (UTC)
Now I removed all cookies from my Firefox browser pertaining to en.wiktionary, and that solved the problem! Can you imagine that a cookie could cause this?! LA2 (talk) 19:05, 15 March 2015 (UTC)
Happened to me too, hours ago. I also deleted cache, which did not solve the problem. Then I checked Delete cookies and other site data checkbox which solved the problem.
@Dakdada, I did look into the console log and I remember there was an error caused by Gadget-legacy.js
It seemed to me that the problem started when I clicked some buttons under the "Visibility" toolbox. --Dixtosa (talk) 19:16, 15 March 2015 (UTC)

Bad italics in comparative/superlative entriesEdit

Could someone please modify Template:en-comparative of and Template:en-superlative of so that they don't put the literal word in italics at the end? e.g. at civilest, it should say "most civil", not "most civil". Equinox 19:16, 9 March 2015 (UTC)

Done. —CodeCat 19:21, 9 March 2015 (UTC)

Codifying sarcastic/ironic and some other rhetorical use as inelligible under CFIEdit

Vote created at Wiktionary:Votes/pl-2015-03/Excluding most sarcastic usage from CFI

Every so often, a definition like "big: (sarcastic) small" finds its way to RFD. Sarcasm and irony are productive in the English language (and all other spoken languages, as far as I know) and there are effectively no restrictions on what can be twisted sarcastically. Standard practice has been to delete obvious sarcastic and rhetorical use (see eg. talk:touché, talk:James Bond, talk:thanks a lot), but this isn't actually mentioned anywhere. Therefore, I would suggest adding the something like the text quoted below to CFI.

As far as I can tell, this would only result in merging/deleting senses on two pages: great and pray tell, possibly also no kidding, thanks a bunch (which did survive RFD) and eon. Thoughts or improvements welcome. Smurrayinchester (talk) 16:57, 11 March 2015 (UTC)

What exactly is this referring to in "this can be explained in a usage note"? DCDuring TALK 21:12, 11 March 2015 (UTC)
I've tried to make that sentence a bit shorter and clearer. Smurrayinchester (talk) 21:22, 11 March 2015 (UTC)
I'd have guessed that, but it wasn't clear. Thanks.
I agree that it would be useful to be able to point to a policy something like what you've offered. Your draft would be good enough for me, but perhaps it can be further improved. DCDuring TALK 21:45, 11 March 2015 (UTC)
Sounds good to me, though I wonder whether there are cases where a word is now almost exclusively used in a sarcastic way, and rarely or never with its original meaning. If so, those might need special treatment. Equinox 15:51, 12 March 2015 (UTC)
  • I don't want to see this sort of long wording in CFI. I think the problem of sarcastic meanings is marginal anyway. Furthermore, each sarcastic meaning has to be scrutinized for how characteristic it is, and therefore, to what extent it has become lexicalized and thereby inclusion-worthy. The regulatory part (as opposed to explanatory) of the above seems to be largely captured in this: "The straightforward use of sarcasm, irony, understatement and hyperbole does not usually qualify for inclusion." The use of "usually" makes room for reasonable exceptions. If metaphor is intended to be on the list, it needs to be explicity there; it is now conspicously absent. Of course, inclusion of metaphor in the list would make this rather open to abuse. --Dan Polansky (talk) 19:09, 12 March 2015 (UTC)
Metaphor is a tricky case, as you say. Since it's a much more irregular process than the rhetorical devices listed above (or perhaps more accurate, sarcasm, irony, understatement and hyperbole are subtypes of metaphor), and since it's one the main drivers of linguistic evolution, it would be daft to have a blanket exclusion. While it's a bit wordy, I think some explanatory verbiage is needed. CFI changes that just add a rule without giving any context to its application just seem to cause endless squabbling (look at the arguments WT:COALMINE caused). I've put a more pruned version below, which still (I hope) provides enough of the background to the rule to allow it to guide RFD debates effectively. Smurrayinchester (talk) 09:56, 13 March 2015 (UTC)
  • Oppose: Too blanket. There are some sarcastic/ironic definitions that we should have. Furthermore, some words/phrases are used sarcastically frequently, while most are used hardly at all. Purplebackpack89 20:21, 12 March 2015 (UTC)
Can you give an example of a term which would fail CFI under these rules, that should nevertheless be included? The cases that you mention are already covered by with the sentence "Common rhetorical use can be explained in a usage note, a context tag (such as (Usually sarcastic)) or as part of the literal definition." Indeed, usage notes specifically exist to explain the nuances of usage that a definition cannot provide. Smurrayinchester (talk) 09:56, 13 March 2015 (UTC)

Rhetorical devicesEdit

The meaning of a statement always depends on context, and there are various rhetorical devices that speakers and writers use in order to convey a particular message without meaning what they literally say. These include sarcasm, irony, understatement and hyperbole. In speech, the use of these devices is often highlighted by a particular intonation, and in writing, this may be mimicked by the use of italics, quotation marks or exclamation points. Because the set of words and phrases which can be used rhetorically is almost limitless, and because separating ironic use from literal use is often difficult, the straightforward use of common rhetorical devices does not usually qualify for inclusion.

This means, for example, that big should not be defined as "(sarcastic) small", "(understatement) gigantic" or "(hyperbole) moderately large"; the fact that an English speaker might use the word this way is obvious and not especially noteworthy. Common rhetorical use can be explained in a usage note, a context tag (such as (Usually sarcastic)) or as part of the literal definition.

Figures of speech that are not obvious from their parts – for example, a euphemism which successfully disguises its true meaning, or a sarcastic turn of phrase which is more than a simple inversion of meaning – or which are never used literally are not covered by this rule, and can be included on their own merits.

Alternative wordingEdit

The straightforward use of sarcasm, irony, understatement and hyperbole does not usually qualify for inclusion: these are standard rhetorical devices which affect the meaning of a statement as a whole, but do not change the meaning of the words themselves.

This means, for example, that big should not be defined as "(sarcastic) small", "(understatement) gigantic" or "(hyperbole) moderately large"; the fact that an English speaker might use the word in these ways is obvious and not especially noteworthy. Common rhetorical use can be explained in a usage note, a context tag (such as (Usually sarcastic)) or as part of the literal definition. Figures of speech that are not obvious from their parts or which are never used literally are not covered by this rule, and can be included on their own merits.

Phonetic transcriptions (narrowness, number)Edit

I have been informed that phonetic transcriptions on this site are only to be done on a certain level of depth. As I am personally interested in the variant pronunciations of languages, non-phonemic ones included, I would like to ask whether there are really any great arguments against giving a medium number of regional narrower pronunciations under a broad heading, like in the examples here and here. Korn (talk) 10:39, 12 March 2015 (UTC)

  • I feel like such fine phonetic detail doesn't belong in a dictionary because it's not a lexical property of the word in question. The fact that /ʁ/ is realized as [r] in Bavarian is a fact about the phonology of Bavarian, not a fact about robben. I also wonder how verifiable a lot of these pronunciations are. Who says that it's [ˈʁɔ.m̩], with a highly unusual and almost unpronounceable sequence of vowel plus syllabic consonant in northern and central German? I live in Berlin, and while I've certainly heard [ˈʁɔbm̩] (which isn't even listed), I don't think I've ever heard [ˈʁɔ.m̩]. I don't think I can even produce [ˈʁɔ.m̩] in a way that is reliably distinct from [ˈʁɔm]. And who says that the standard German pronunciation of Madrid is [ˈmadʁɪtʰ] with an aspirated [t] at the end of a syllable? I've never read a phonological description of standard German that permits aspirated consonants at the end of a syllable. I'm also curious about what inflected and derived forms of Madrid are attested to verify the claim that the final consonant is underlyingly /t/, i.e. that the word works in German as if it were spelled Madrit. —Aɴɢʀ (talk) 10:31, 14 March 2015 (UTC)
  • I lived in Berlin (north east) for five years and my impression is that [m̩] is by far the dominant Berlin and German pronunciation. -ben is certainly not pronounced with a fully released plosive like Bad and preventing [b̚m̩] from becoming [m̩] requires some carefulness in speech. When speaking careful, though, I think people normally end up with some form ending in [n] again. Concerning Madrid: The adjective, 'madrider'. Hearing it pronounced with [d] would make me assume the speaker was from an area with intervocalic consonant voicing, i.e. Schwaben, Sachsen, the north et cetera. Its pronunciation with /t/ is based in the devoicing in the noun.
  • As for the lexical property, it could just as well be stated that the fact that /r/ is realised as [ʁ̞] in Western, Central and parts of Northern Germany is a fact about the phonology of Central, Western and parts of Northern Germany and not about the word in question. But at the end of the day, both pronunciations are both permissable and spread variants of the standard language and not features of a non-standard dialect. Hence, if either deserves a place in the list, so does the other. And a note about where they are used seems a reasonable service of convenience. Actively excluding them would mean to blot out a considerable portion of German speakers and creating North-Central-centric bias in this dictionary. Especially with comparison to the English entries, which always differentiate between at least two or more variants (English, American, Australian, Canadian and American dialects), or Indonesian entries which list both /o/ and /ʊ/ (sarung#Malay) and /e/ - /ɪ/, there certainly is some precedent for, at the very least, more level of detail than just a phonemic description of one single accent; even when that accent is the one considered to be the educated regiolect in the cities where most of Germany's TV, radio and cinema is produced.
  • Lastly, as for the aspirated /t/, English Wikipedia cites the Duden Aussprachewörterbuch (which I don't have around to check) as a source for consonants having the same level of aspiration in all positions. It is also mentions that initial-only aspiration is a distinctive feature of northern northern Germany, which is reasonable as the same has been said by Low German grammarians over a century before. Korn (talk) 13:59, 14 March 2015 (UTC)

Coupla new votesEdit

Thanks to their recent vandal-fighting, I've started a couple of votes for adminhood to be bestowed upon Mr Granger and ISMETA --Type56op9 (talk) 12:43, 12 March 2015 (UTC)

SUL finalization updateEdit

Hi all, please read this page for important information and an update involving SUL finalization, scheduled to take place in one month. Thanks. Keegan (WMF) (talk) 19:45, 13 March 2015 (UTC)

Striking a Blow Against a SpammerEdit

I just deleted an entry for the name of a business/its website domain name where the definition was a verbatim quote of a slogan from their website (I'm not going to mention the details to avoid giving them the search-engine-ranking boost they were aiming for- I've given enough information here so you can easily find them).

After deleting the entry and blocking the IP for 6 months as a spammer, I took it a step further: I noticed a entry for their business, so I signed up there with an account under my own name and zip code and posted a negative review- citing only facts verifiable in the deletion log and noting the lack of direct evidence. Now, whenever anyone searches for the website, this review will come up. Unless I'm missing something, this tactic has the potential to remove some of the incentive/reward for search-engine spam in cases where a negative review would make a difference (this is an advertising/marketing business in Texas).

What does everyone else think about this? Chuck Entz (talk) 00:17, 14 March 2015 (UTC)

This could be an effective approach. There's always the possibility Person A would create an entry for rival Person B's business, knowing we'd delete it and smack Person B, but our historical experience suggests most spammers aren't that smart or else they would have realized by now we delete spam pages and they don't gain any SEO. - -sche (discuss) 02:53, 14 March 2015 (UTC)
I doubt it will make much difference, since spammers are such single-minded meatheads, but it can't actually hurt. If you feel you've got time to mess about filling various online forms then go for it. Equinox 02:59, 14 March 2015 (UTC)


Do we still need {{lang}}? Is there anything that {{lang|it|Nel mezzo del cammin di nostra vita}} does that {{l|it||Nel mezzo del cammin di nostra vita}} (note the two vertical bars after it) doesn't? If I want to put a link inside {{lang}}, e.g. {{lang|it|Nel [[mezzo]] del cammin di nostra vita}}, it doesn't even tell the link to go to the Italian section, while {{l|it|Nel [[mezzo]] del cammin di nostra vita}} does tell the link what language it is. —Aɴɢʀ (talk) 10:48, 14 March 2015 (UTC)

In which situations is {{lang}} used anyway? I’ve only seen it used in quotations, but I think we would benefit from a template specifically for that (one that works like {{usex}}). — Ungoliant (falai) 17:57, 14 March 2015 (UTC)
Besides quotations, I've sometimes used it in inflection-table templates for forms that don't need linking. —Aɴɢʀ (talk) 19:51, 15 March 2015 (UTC)
Looks like replacing it with {{l}} is the way to go. — Ungoliant (falai) 14:28, 16 March 2015 (UTC)
My gut is to keep both. As I've said time and again, merging and moving templates does little other than confuse a lot of editors. Purplebackpack89 14:33, 16 March 2015 (UTC)
The way this process should and used to work is that, if folks agree, the template is deprecated, then its use converted to some other, then deleted.
Deprecation can be preceded by discouraging use. Should we discourage use of this in any of its applications? In all of its applications? The discouragement can be in the form of changing the documentation, gradually converting some or all uses to some other template, as well as any adverse conclusion of discussions such as this. Ii also might be a a good time to determine whether the replacement templates are as good as they could be and to review their documentation. It is a bit more work, but a gradual process should reduce the adverse effects on contributor habits, and extend the utility of edit histories that use older templates. DCDuring TALK 17:05, 16 March 2015 (UTC)
I don't really give a flying fox if we delete it or not; I just want to know if there's any particular reason I should keep using it. —Aɴɢʀ (talk) 21:23, 16 March 2015 (UTC)
Based on {{lang/documentation}} it's basically a shortcut to <span lang="LANGCODE"></span>, which I think is still needed because of browsers that don't work out the script for themselves. How useful it is for languages that use the Latin script, well, I think it only changes the HTML, to a human user, it's no different. Renard Migrant (talk) 20:29, 17 March 2015 (UTC)
  • Usability perspective:
My current understanding is that various accessibility and other tools can make use of linguistic metadata provided by {{lang}} to decide how to handle text. I've been using it for some time to specify that non-link text I am entering is not English.
From what I've been able to test, both {{lang|LANGCODE|$Text}} and {{l|LANGCODE||$Text}} produce identical output in the browser:
<span class="LANGCODE-SCRIPT" lang="LANGCODE" xml:lang="LANGCODE">$Text</span>
This proposed change would thus only 1) affect what templates editors use, and 2) require that someone go through and change all instances of {{lang}} over to use {{l}} instead.
I'm fine with that. I can't think of any other real downsides. ‑‑ Eiríkr Útlendi │ Tala við mig 23:15, 17 March 2015 (UTC)
There are some differences. Compare {{lang|ru|[[тест]]}} and {{l|ru||[[тест]]}}. —CodeCat 00:08, 18 March 2015 (UTC)
  • With the [[]] link brackets, {{lang}} produces:
<span class="Cyrl" lang="ru" xml:lang="ru"><a href="/wiki/%D1%82%D0%B5%D1%81%D1%82" title="">тест</a></span>
Meanwhile, {{l}} produces:
<span class="Cyrl" lang="ru" xml:lang="ru"><a href="/wiki/%D1%82%D0%B5%D1%81%D1%82" title="тест">тест</a></span> (<span lang="" class="tr" xml:lang="">test</span>)
Without the [[]] link brackets, {{lang}} produces:
<span class="Cyrl" lang="ru" xml:lang="ru">тест</span>
{{l}} produces:
<span class="Cyrl" lang="ru" xml:lang="ru">тест</span> (<span lang="" class="tr" xml:lang="">test</span>)
It looks like the key difference is addition of transliteration for those languages for which our infrastructure supports transliteration.
Query: Are there any use cases where users would want to 1) mark text as a specific language, but 2) not have any automatic transliteration? ‑‑ Eiríkr Útlendi │ Tala við mig 18:21, 18 March 2015 (UTC)
Our templates already support tr=- to suppress transliteration. So you only have to search for entries which have that. It's probably used mostly in inflection tables. —CodeCat 19:46, 18 March 2015 (UTC)
I think the idea is something like this, on aduire, where the intention is not to link. Renard Migrant (talk) 17:33, 19 March 2015 (UTC)
But that's where you'd use {{ux}}. —CodeCat 18:36, 19 March 2015 (UTC)
It's a citation, not a usage example. Renard Migrant (talk) 12:45, 21 March 2015 (UTC)


Whyyyy do we have both {{ux}} and {{usex}}??? ‑‑ Eiríkr Útlendi │ Tala við mig 18:38, 19 March 2015 (UTC)

See Wiktionary:Grease pit/2014/February#Template for eg over usex like label over context. —CodeCat 18:59, 19 March 2015 (UTC)
They work the same, one being a redirect to the other.
{{usex}} came first and its name is a bit more intuitive, so some users are accustomed to it and it might be a little bit easier for someone new to Wiktionary to figure out what was intended. As evidence of usex being more intuitive, it gets some use on our discussion pages as an abbreviation of usage example, whereas I don't recollect a single instance of such use of ux". OTOH, {{ux}} is shorter. If there were a big shortage of two-letter codes or a clearly better use for either of the template names we could revisit the matter. DCDuring TALK 20:54, 19 March 2015 (UTC)

Pronunciation formattingEdit

Should phonetic or phonemic transcription be preferred, by default? WT:PRON appears to be silent on this. Yet this can be a relatively large difference for languages where a word's surface realization involves several phonological processes.

Also, {{IPA}} seems to link every pronunciation to the corresponding [[w:$LANG phonology]] article, even if one does not exist. This seems like a bad idea, given the policy that "[i]deally, every entry should have a pronunciation section". I would suggest instead directing it by default to [[w:$LANG language#Phonology]] (though it seems possible to contemplate defining a set of languages for which it instead links to the separate phonology article). --Tropylium (talk) 14:23, 16 March 2015 (UTC)

I prefer phonemic transcription because that's what most dictionaries use and because that's what lexical. That said, the phonemic transcription need not be highly abstract; for example, if the distinction between two phonemes is loss in a certain environment, then the sound that surfaces can be transcribed even if an abstract analysis would regard the other sound as the underlying one. (For example, German Rad can be transcribed /ʁaːt/ rather than /ʁaːd/ since /t/ and /d/ are distinct phonemes in German, even though an abstract analysis would posit /ʁaːd/ as the underlying form.) But that's just my preference; we have plenty of examples of narrow transcription being used, and there's no reason we can't use both. —Aɴɢʀ (talk) 21:06, 16 March 2015 (UTC)
If allophonic differences cause the distinction between two phonemes to collapse, then that collapsed phoneme should really be treated as new phoneme in itself, rather than either of the original phonemes. For example, in Eastern Catalan, unstressed /a/ and /e/ fall together as /ə/, and you can't really say which of the two it originally belongs to. It's a new phoneme altogether, albeit one that occurs in complementary distribution to both /a/ and /e/. For final devoicing, the same applies in principle, albeit that the phonetic realisation of the new phoneme coincides with the realisation of one of the two phonemes that it results from. But the distinction is definitely phonemic, and it's only when you go into morphophonemics, comparing related forms of a lemma, that the original /d/ arises. Another way to look at it is to ask: if Rad were the only possible form and had no other forms or related terms to compare it with, how would you know it was /d/ underlyingly? You couldn't, and therefore the phoneme is /t/. —CodeCat 21:33, 16 March 2015 (UTC)
I agree. —Aɴɢʀ (talk) 22:26, 16 March 2015 (UTC)
Phonemic, please! I don't think anyone wants to see a whole raft of vowel variants for Yorkshire, London, Manchester, Essex, Scotland, etc. — and that's just the UK! Equinox 21:17, 16 March 2015 (UTC)
I'll agree with everyone then. Renard Migrant (talk) 20:29, 17 March 2015 (UTC)
I'm not asking due to dialects as much as languages with several surface filters between phonemics and phonetics. For an example: Tundra Nenets леды (skeleton) is phonologically analyzable as IPA(key): /lediă/, phonetically realized as IPA(key): [lɤːðɨː]. Would you mandate transcribing the former? Or would you be OK with using "subphonemic" transcription where e.g. the vowel backing process, universal in all varieties of the language, is transcribed? How about the lenition of /d/, which is almost universal — would you consider the fact that there exist a few dialects that have [d] in this position sufficient grounds to not mark [ð] at all?
(For that matter, suppose I were to indicate an underlying phonemicization IPA(key): /lixt/ or even just IPA(key): /līt/ for light, citing w:The Sound Pattern of English…?)
"Do not put in tons of dialectal pronunciations" is not at all the same as "put everything in purely phonemic transcription". --Tropylium (talk) 00:04, 18 March 2015 (UTC)
That's a difficult case. On the one hand, you don't want to give such a highly abstract representation (like the SPE ones you mentioned) that the word would be unrecognizable to native speakers if pronounced the way it's transcribed. On the other hand, you don't want to overwhelm the user with a bunch of fine phonetic detail whose absence would probably not be noticed by native speakers. One rule of thumb I sometimes try to follow in cases like this is "How narrow a transcription can I get without using any IPA diacritics, superscripts, etc., but only the basic characters?" Obviously that rule can't be applied exceptionlessly in all cases, but if [lɤːðɨː] is unambiguous as it stands, then don't go overboard and transcribe it [l̪ˠɤ̽ːð̺ɨ̠ː] or whatever. —Aɴɢʀ (talk) 20:02, 18 March 2015 (UTC)
Would it not be possible to automatically generate phonetic transcriptions from the phonemic one? After all, it's predictable by definition. —CodeCat 21:11, 17 March 2015 (UTC)
In the past, a few users suggested using super-broad/"diaphonemic" transcriptions. Perhaps one day English entries will have expandable templatized pronunciation sections like Chinese entries, where phonemic and semi-narrow phonetic transcriptions into major dialects are shown by default, while smaller dialects' pronunciations, and super-broad/"diaphonemic" and super-narrow transcriptions, are shown when the template is expanded. (Check out the obscure dialect+chronolect in dirty.) PS I definitely agree that Rad should be transcribed as ending with /t/, not /d/. - -sche (discuss) 01:24, 19 March 2015 (UTC)
For what it's worth, I heavily support pursuing this idea. Not that my word seems to be worth much, as before I both asked about how to create such a collapsible template and just a bit further up this page asked more or less the same question about narrowness and diaphonemic/dialect IPA policy and was widely ignored both times. Korn (talk) 11:45, 19 March 2015 (UTC)

Request for citations (!= RFV) for entries not in other dictionariesEdit

We have a good number of entries (definitions in entries) not in other dictionaries that have no citations. They really should have some citations to confirm our definition and to make us look a little more systematic than Urban Dictionary. The RfV process gives urgency to the process of attestation, but that urgency may be excessive for many of these. Would it make sense to have {{rfcites}} (or something) for entries that were not in {{R:OneLook}}, {{R:Century 1911}}, or any glossary or dictionary in Google Books (template to be written)? I suppose it would be most productive for this to be applied first to entries. Attempting to determine whether a definition is or is not in another dictionary is much harder than determining whether a term is. DCDuring TALK 17:08, 17 March 2015 (UTC)

I like the idea of some sort of collaborative wiki project where you can grab any word/sense without the requisite three citations and go away and cite it, and it is then removed from the list. This would take a lot of organisation, and a bot. Still, it could be done, and I would rather that we generate a separate list, based on our current entries, than change those entries by adding yet more template markup to them. Equinox 19:43, 17 March 2015 (UTC)
There are {{rfquote}} and {{Template:rfquote-sense}}, created on 24 October 2007‎ and 22 October 2007‎. They categorize into Category:English entries needing quotation, which now has 10,750 items. --Dan Polansky (talk) 19:54, 17 March 2015 (UTC)
Thanks. I had looked at Category:Request templates, really. There are only about 60 uses of {{rfquote|lang=en}} AFAICT, somwhat fewer of {{rfquote-sense|lang=en}}. Would it be unreasonable to categorize the English ones into a specific category? DCDuring TALK 01:02, 18 March 2015 (UTC)
I sometimes put {{rfex}} on senses that strike me as dubious but don't seem worth an RFV. I don't think those categorise, but at least it's something editors will see while editing. Equinox 19:55, 17 March 2015 (UTC)
Would it be helpful to have {{rfex}} categorize into the same category as {{rfquote}} or into a different category or none at all. If it contained "en" or "lang=en", the new search could find it, even without a category, but someone would have to know search and know what to look for. DCDuring TALK 01:01, 18 March 2015 (UTC)
Tentative support but not on the main page, perhaps. Also, we need to consider normalisations of spellings and rare languages, for example, quoting Chechen word чӏогӏа (č̣oġa, strong) would be difficult for two reasons - non-standard spelling "чlогӏа" is more common (problems with palochka, especially lower case "ӏ") and Chechen doesn't have a lot of digitised books published. --Anatoli T. (обсудить/вклад) 01:24, 18 March 2015 (UTC)
{{newrfquote}} could be made less conspicuous, like {{rfelite}}, and placed at the bottom of the L2 section. {{rfquote-sense}} is relatively inconspicuous and could me made a bit less conspicuous.
As to the other problems, of course, I'm thinking mainly of English. Judgment needs to be applied for each language, indeed for each individual use. DCDuring TALK 01:47, 18 March 2015 (UTC)

Walser German and SwabianEdit

It has been brought to my attention that we have Category:Walser German language and Category:Swabian language.

In my opinions, we shouldn't treat these two as separate languages. They are part of the Swiss German dialect continuum, which is covered via Category:Alemannic German language. There is no reason at all to keep Swabian. As for Walser German, it is the least intelligible of the dialect continuum, virtually incomprehensible even to other Swiss German speakers, but linguistic tradition has always treated it as just a variety of Swiss German, and there is no reason why we shouldn't follow suit. We can always use dialect labels to distinguish the different languages, and there are many, many more varieties of Swiss German (like Alsatian) that are not covered. -- Liliana 22:18, 17 March 2015 (UTC)

Yes, merge them into gsw (Category:Alemannic German language). There are lexical distinctions and phonological and hence orthographic distinctions that can be drawn between the lects, but none of them are so great that it would be sensible to treat the lects as separate languages. (And there are many equally distinct varieties of the Alemannic dialect continuum which have not been granted codes, as you've noted.) A cynic might wonder if the reason Ethnologue et al are so much quicker to grant codes to the dialects of other languages than to the dialects of English is that they all speak English well enough to recognize how silly it would be to consider da yooge boid ate da olykoek /də judʒ bɜjd eɪt də ˈ(oʊ~oə).lɪ.kʊk/ and the huge bird ate the doughnut /ðə hjudʒ bɝd eɪt ðə ˈdoʊ.nʌt/ different languages. - -sche (discuss) 20:01, 20 March 2015 (UTC)


Any thoughts on Wordset? They open sourced their code and data recently and emphasize a structured data approach (in contrast to Wiktionary). Their claim that Wiktionary is "unstructured" is not really correct, there a number of tools which can successfully parse the content (I contribute code to one of them). At best I would call Wiktionary "semi-structured". What I agree with however is that it is time to try out new ways to build a collaborative platform at scale. For instance there is a voting system built into wordset which is used to reach consensus on proposed changes. The big problem is that Wiktionary (and Mediawiki) can be quite intimidating to potential new contributors, the templating system is powerful but also complex. And it was obviously never designed to create a dictionary. On the other hand Wordset's data model is quite limited at the moment (for a project that aims to be more structured), and they only focus on English headwords, at least initially. Jberkel (talk) 18:08, 18 March 2015 (UTC)

  • I'm not impressed. SemperBlotto (talk) 08:03, 19 March 2015 (UTC)
  • It is quite easy to have a data structure when only focusing on one language and only looking for a definition. But try to do that with all languages (described in several languages), with much more diverse information to store and organize (pronunciations, etymologies, flexions, synonyms...) and it becomes very difficult. The semi-structured Wiktionaries allows to have all of these, but at the cost of a real structure (also parsers only work to some extend, and usually only for one Wiktionary language), which indeed make it difficult to reuse the data. Wikidata may be able to improve this, but it is going to be very difficult. Nonetheless, this Wordset site is open-source, including the definitions, and with a philosophy close to the Wikimedia projects, so we should not try to see it as an adversary. — Dakdada 09:16, 19 March 2015 (UTC)
    @BD2412: If they don't import content from dictionaries like us, it will take them a long time to achieve coverage. Some of their content is apparently from WordNet and is available on what looks to me like a non-standard license. Their content is "Creative Commons Attribution-ShareAlike 4.0 International License". Can they simply import our content given that license? Can we use their content provided we include them as a reference? DCDuring TALK 15:22, 19 March 2015 (UTC)
    Yes, and yes. They claim to use "the same CC license for the content as Wikipedia uses, CC-BY-SA", and we can hold them to that. Like everyone else in the world, they are free to copy and reuse our content so long as they credit us for it, and we are free to do the same as to theirs. I would not hold my breath on their providing anything that we can actually use, however. bd2412 T 19:07, 19 March 2015 (UTC)
    Thanks. They might have some particularly well-worded definitions and usexes from time to time. BTW, can we copy WordNet with acknowledgement or is their license a little different?
I've been thinking that it would be handy to have a definition-writers custom edit interface that automatically generated links to various copyright-free and appropriately licensed dictionaries' entries for the headword being edited. Other links might be to various corpora and gateways. Standard boilerplate to credit the sources that needed crediting could be part of it too. At a very basic level templates like {{taxlook}} and {{REEHelp}} do a little of this, but a complete editing interface would be much better. DCDuring TALK 20:42, 19 March 2015 (UTC)
@BD2412: Indeed. In contrast Wordset needs a simple acknowledgement and link to their site. Or is our way of tracking changes not sufficient? DCDuring TALK 21:04, 19 March 2015 (UTC)
From a first glance, it doesn't look like their licence is compatible with ours, in particular the CC ShareAlike clause. ShareAlike means copyleft; they can't "add" restrictions on content that are not present on its original form on Wiktionary. If licencees have to include their copyright notice, it violates that, because such a requirement does not exist here. So Wordset cannot use Wiktionary content. —CodeCat 21:10, 19 March 2015 (UTC)
But our license is also CC ShareAlike. —Aɴɢʀ (talk) 21:39, 19 March 2015 (UTC)
I was referring to ours. Theirs is not, as far as I can tell. So it's probably incompatible. —CodeCat 21:47, 19 March 2015 (UTC)
It is the same, per their own words: "Specifically, we’re going to be choosing the same CC license for the content as Wikipedia uses, CC-BY-SA." [4]Dakdada 16:31, 20 March 2015 (UTC)
Yeah, it's confusing because we're simultaneously talking about Wordnet and Wordset in this thread. Wordset has the same license we do, but Wordnet doesn't. —Aɴɢʀ (talk) 16:41, 20 March 2015 (UTC)
Anyone else notice how they function on "yae" [sic] votes? Heh. Equinox 14:25, 19 March 2015 (UTC)

Interlanguage (interwiki) linksEdit

Does anybody know what the plan is for interlanguage links? Wikidata (as used in Wikipedia) is not yet used in Wiktionary. New articles lack interlanguage links (one example is lägel, created by me on February 21, which also exists on sv.wiktionary, but isn't linked) and existing articles here lack interlanguage links to newly created articles in other languages of Wiktionary, apparently because the interwiki bots have stopped. Should the bots be restarted? Or will Wikidata support come soon? --LA2 (talk) 21:58, 23 March 2015 (UTC)

@LA2: See d:Wikidata:Wiktionary. The fact that interwiki links aren't handled by Wikidata is pretty ridiculous, really. In (e.g.) Wikipedia, there won't be a direct one-to-one equivalent of every idea in every language edition and figuring out where all of them should point can be really tricky. In Wiktionary, it's irrelevant: the entry at wikt:en:foot and wikt:es:foot should link together no matter what (as long as neither of them is a redlink). This could all be accomplished painlessly in an afternoon. —Justin (koavf)TCM 01:52, 24 March 2015 (UTC)
"the entry at wikt:en:foot and wikt:es:foot should link together no matter what [...] this could all be accomplished painlessly in an afternoon": Indeed. And no shortage of Wiktionarians have pointed that out to the folks at Wikidata. They, in turn, have made it clear they are not going to do it. - -sche (discuss) 03:57, 24 March 2015 (UTC)
@-sche: Do you have links or diffs? I can't imagine that the Wikidata community refuse to make interwiki links on Wiktionary. —Justin (koavf)TCM 04:35, 24 March 2015 (UTC)
I think they would like to do all of Wiktionary at once, not just interlanguage links. And since defining a structure for Wiktionary linguistic data is really hard (and much discussed), they will probably not attack the problem until the other projects are converted to Wikidata.
Also, there are some small exceptions that we need to take care of for interlanguage links: see the table I made in d:Wikidata_talk:Wiktionary#First_and_second_phases, in particular the "apostrophe", "capital" and "other" interwikis. Those are due to different communities typographic rules (and some errors). — Dakdada 10:24, 24 March 2015 (UTC)
This sounds like a deadlock situation. How sad! In the meanwhile, it couldn't hurt to restart interwiki bots, could it? I still have bot status (LA2-bot) on some languages of Wiktionary, so should I just go for it? LA2 (talk) 15:29, 24 March 2015 (UTC)


Hi. I'm not gonna be using this username anymore. Time for a change. See you soon with a new name. --Type56op9 (talk) 12:49, 24 March 2015 (UTC)

OK thanks for letting us know ♥ Soap (talk) 15:07, 26 March 2015 (UTC)
It wasn't one of your better names, really. Equinox 02:06, 27 March 2015 (UTC)

Entries from the GCIDE labeled "Webster 1913 Suppl."Edit

So I've found entries in the GCIDE which are missing from Wiktionary, such as "Pimola":

 <hw>Pim*o"la</hw> <pr>(?)</pr>, <pos>n.</pos> <def>An olive stuffed with a kind of sweet red pepper, or pimiento.  </def><br/
 [<source>Webster 1913 Suppl.</source>]</p>

Apparently these are from the "Webster 1913 Suppl."

My question is this: Should these be copied into Wiktionary? Is it OK for me to copy this definition into Wiktionary, or are there license restrictions for the "Webster 1913 Suppl." ?

Oh, interesting. There are also other words missing which are labeled, simply, "1913 Webster", such as Pinxit:

 \'d8<hw>Pinx"it</hw> <pr>(?)</pr>. <ety>[L., perfect indicative 3d sing. of <ets>pingere</ets> to paint.]</ety> 
 <def>A word appended to the artist's name or initials on a painting, or engraved copy of a painting; <as>as, <ex>Rubens pinxit</ex>, Rubens painted (this)</as>.</def><br/
 [<source>1913 Webster</source>]</p>

Should these be copied in? "Pinxit" seems like a pretty useful word. Are there issues with using the GCIDE definitions?

It's out of copyright due to its age, so you can do what you like with it. Please add them! Equinox 13:33, 28 March 2015 (UTC)
The worst that could happen is that some of them won't meet our attestation standards. Add 'em and we'll sort that out eventually. DCDuring TALK 13:41, 28 March 2015 (UTC)
Oh, yea. You should register. It makes it easier for us to communicate with you in a friendly way. DCDuring TALK 13:44, 28 March 2015 (UTC)
Thanks, I'm registered now. User:Pnelsonmusic But obviously a newby. I've been reviewing the GCIDE for a search engine linguistic processing project and have noticed these differences between it and Wiktionary. Maybe I'll write a program to identify all missing items. TALK 13:52, 28 March 2015 (UTC)
We look forward to your contributions. Equinox has done a lot of work on getting entries from Webster 1913. DCDuring TALK 15:43, 28 March 2015 (UTC)
Not all of us really approve of copying definitions from other dictionaries, even when they are out of copyright. Definitions are supposed to be our own work. But I have no objections on obtaining lists of words from ANY dictionary or similar source - I do that myself. SemperBlotto (talk) 09:15, 29 March 2015 (UTC)
But we see farther if we stand on the shoulders of the giants who preceded us. DCDuring TALK 09:43, 29 March 2015 (UTC)

How can I add a "thank" note?Edit

How can I add it to edits in entry histories, next to "undo"? Or is it visible to other users and not to myself? I'm more used to the "undo" function being used, or just tacit approval of edits. Donnanz (talk) 17:18, 28 March 2015 (UTC)

The person receiving thanks gets a notification. Others have to read the log. — Ungoliant (falai) 17:37, 28 March 2015 (UTC)
I understand that, I've done that myself and have also received thanks. But I'm afraid that doesn't really answer the question. It doesn't show in the entry history for edits I do. Donnanz (talk) 18:55, 28 March 2015 (UTC)
Maybe your page histories look different from mine, but when I look at a page history, the "thank" button is there for all diffs except my own and those made by anons. —Aɴɢʀ (talk) 10:16, 29 March 2015 (UTC)
Ah, I was beginning to suspect / think that. Thanks, I guess that solves that. Donnanz (talk) 10:20, 29 March 2015 (UTC)