Wiktionary:Beer parlour/2012/March

March 2012Edit

including gender (definite articles) and verb principal partsEdit

My question here is about Dutch words on English Wiktionary, but probably pertains to many or even most languages: Would it not be far better to routinely include definite articles (to show gender) on noun pages, or at least signify gender in some fashion? I realize that Dutch is a tricky case; do you use m,f,n (for masculine, feminine, neuter) or appropriate Dutch equivalents OR do you use c,n (common gender , neuter)? I would say that ideally, to maximise information content, m,f,n would be used, but that c (common gender, incorporating both masculine and feminine), is perfectly good as an intermediate step or where the editor does not know the original gender (i.e. masculine or feminine).

My second point concerns verbs. I think it is absolutely necessary to include the so-caled "principal parts".

From Wikipedia:"In language learning, the principal parts of a verb are those forms that a student must memorize in order to be able to conjugate the verb through all its forms." "The principal parts of an English verb are the bare infinitive, past tense and past participle. For example the verb 'to take' has the principal parts take–took–taken. The verb 'to do' has do–did–done and the verb 'to say' has say–said–said."

If anyone can think of another place where these questions would be appropriately placed, please leave them here AND place them there. It is FAR too easy for proposals and questions to get lost in the shuffle. Thank you. Heavenlyblue (talk) 22:58, 1 March 2012 (UTC)

We use m,f,n for Dutch. The gender is found between the definition and the ===Noun=== header. For example in the entry for the word mens, it's labeled as neuter:
mens n (plural mensen, diminutive mensje)
As for verbs, we include the entire conjugation under the ====Conjugation==== header. Ungoliant MMDCCLXIV 23:23, 1 March 2012 (UTC)
As Ungoliant points out, for Dutch, we usually use m, f, n, or mf. Using c seems okay to me as long as it really is the case, and not merely that the editor does not know which gender is correct. We use c for the Scandinavian languages all the time. There are a few Russian words that could be c, but I think we always put mf no those. Same with French, Spanish, and Portuguese, m, f, or mf.
I don’t think that using definite articles would be to any advantage, and might even be confusing. There are some words and phrases that have to have the article, such as The Hague. In Semitic languages, the article does not indicate gender or number.
For principal parts, it depends on the language. For English verbs, we do include the principal parts that you named. For Dutch, Russian, Finnish, Latin, French, Spanish, etc., we include the complete conjugation. If you find a Dutch verb without a conjugation table, please add one. —Stephen (Talk) 23:27, 1 March 2012 (UTC)

from the Grease Pit:

We do signify gender in some way: we do include "m" or "f" or "n" or "c" (as appropriate to the language and the word) after nouns, such as "woordenboek n". If you see a noun that's missing such a gender tag, please add it. We also do routinely include the principal parts (and even entire conjugations) of verbs (such as lopen). Again, if there are specific entires that lack such information, them with {{attention|nl}} or add the information yourself if you can get the hang of our admittedly complex conjugation templates. - -sche (discuss) 23:18, 1 March 2012 (UTC)

Thank you for your responses! I have only recently begun to look up Dutch words on Wiktionary in any number, and I can see now that I must simply have come across a string of "stubs" with no genders and conjugations. I'd like to remedy some of those as I come across them. How complicated is it to add conjugation tables? Is there a simplified table one can begin with? (This is why I thought of having "principal parts" immediately visible - it is quick to set up, quick to look over, and imparts all the basic information necessary in a single glance.) On my browser, anyway, the conjugations I now see seem to have to be opened; what about having the "principal parts" visible up front (e.g. lopen - loopt, liep, (hebben/zijn) gelopen (I believe 3rd person singular is the form typically given))? This would make the whole browsing experience more user-friendly. Heavenlyblue (talk) 00:19, 2 March 2012 (UTC)
I've been considering adding principal parts to the {{nl-verb}} template, but it would mean all the existing entries would have to be fixed, because currently the template shows the conjugation type instead of principal parts. For tables there are three possibilities: {{nl-conj-wk}} for weak verbs, {{nl-conj-st}} for strong verbs, and {{nl-conj-irr}} for irregular verbs. Gender in Dutch is confusing but it's not helped by the fact that many dictionaries list words as 'v, m', meaning both feminine and masculine. These words are in fact feminine, but they are used with masculine pronouns in those dialects that don't distinguish the two genders clearly. It seems strange to call them masculine for that reason, when the dialects in which those words are supposedly masculine don't even have a masculine gender but only a common gender! In any case, I try to add the proper gender whenever I can and otherwise I look it up. —CodeCat 00:29, 2 March 2012 (UTC)
Yes, very complicated! Hard to see a single solution for the gender problem! As for the principal parts, it would be lovely to have them visible up front. I'm using Wiktionary as an aid in learning to read Dutch, and having to separately open tables (where they even exist), then scan them to parse out basic information quickly becomes tedious. (And, as a side note, editing those tables looks next to impossible without significant study and experimentation!) Heavenlyblue (talk) 00:53, 2 March 2012 (UTC)

Uploading filesEdit

Hi, I'm a Czech native speaker and I want to upload some audio files to the Czech words. But it says it has to finish with -.ogg. Is there a free programme what is able to make recordings endind with -ogg.? I could not find any. Sorry my broken English and thanks for the response, --Istafe (talk) 19:59, 2 March 2012 (UTC)

This may help you: w:Creation_and_usage_of_media_files#Audio. Ungoliant MMDCCLXIV 21:18, 2 March 2012 (UTC)
That link doesn't work... and I looked on Commons and Meta for that page title, and didn't find it there, either. :/ - -sche (discuss) 21:44, 2 March 2012 (UTC)
It's w:Wikipedia:Creation_and_usage_of_media_files#Audio. —CodeCat 22:04, 2 March 2012 (UTC)

Yes, but please send me a link to a free programme what helps to make -.OGG files, and not for example Audacity (to which w:Wikipedia:Creation_and_usage_of_media_files#Audio links). i don't know how to convert it to -.ogg in Audacity, because I'm an IT lamer. Why Wikipedia (Wiktionary) don't accept files in the -.mp3 format?? It would be great, because then many users will be able to upload files more easily. Thanks, --Istafe (talk) 17:16, 3 March 2012 (UTC)

The page explains why mp3 isn't accepted; it's not a free format so it's not safe to use on a free wiki. I think Audacity is probably the best program to use though. You need to open your file in Audacity, and then save it as 'Ogg Vorbis'. That's really all you need to do in Audacity. You could record in any program you like and then save it in a format such as wav that Audacity can open. —CodeCat 18:06, 3 March 2012 (UTC)

And where is in Audacity the button "save it as Ogg Vorbis"? I can't see any. --Istafe (talk) 14:02, 4 March 2012 (UTC)

File > Export —CodeCat 14:13, 4 March 2012 (UTC)

Yes, and what should I do then? --Istafe (talk) 15:08, 4 March 2012 (UTC)

Should be uploaded to commons, commons: as your own work, and start cs (for Czech) NOT cz. Mglovesfun (talk) 14:42, 5 March 2012 (UTC)

Hey guys, I want to contribute Wiktionary. If I go to File and then to Export, there is only a template and I don't know what I should do with it. And it is impossible to upload it to Commons. I don't understand it, this is too difficult for me. --Istafe (talk) 15:43, 6 March 2012 (UTC)

Entries from unreliable sourcesEdit

In Wiktionary:Requests_for_verification#cha-sat, people noticed that we got lots of incorrect entries in some languages that originate from unreliable, often public-domain sources.

As of right now, we know that this affects Aleut (from [1]) and Kalispel-Pend d'Oreille ([2]). Now, my plan is to delete the incorrect entries, but I wanted to ask if it's okay with everyone else beforehand. -- Liliana 21:21, 3 March 2012 (UTC)

Yeah, delete the ones that can't be verified in newer, more reliable sources. Remember to clear them out of translations sections, too (~10 entries have Kalispel translations, ~70 have Aleut). There are a few etym sections that should be checked, too (like aleúte, which has info Aleut amusingly lacks). - -sche (discuss) 21:47, 3 March 2012 (UTC)
I added that some time ago. The Portuguese WP had a link to a website which gave that etymology ([3]). I've now searched for that word in google books, which gives a book in Swedish ([4]). If google translate is correct, the current etymology may be wrong; could any Swedish speaker translate that?
I will remove that second part of the etymology section for aleúte until this is resolved. Ungoliant MMDCCLXIV 00:52, 8 March 2012 (UTC)
Wow, what the h*ll was that? Plural verb forms (though inconsistently used), using archaic language trying to sound smart and self published (books on-demand)? I would find another source if I were you. Diupwijk (talk) 13:52, 11 March 2012 (UTC)
Really? But what is it saying? Ungoliant MMDCCLXIV 16:42, 11 March 2012 (UTC)
"The word Aleut may derive from the Chukchi [?] word aliat 'island', but they also called themselves that, from the word allíthuh "society", alongside other terms such as Unangax' / Unangan / Unanga 'coastal people, people from the coast'." Diupwijk (talk) 16:54, 11 March 2012 (UTC)
I found at least one source (and put it on Talk:Aleut) that calls the etymology uncertain. - -sche (discuss) 16:58, 11 March 2012 (UTC)

Done with Aleut. However, many more entries which did not come from the above source still need checking. -- Liliana 01:43, 11 March 2012 (UTC)

Done with Kalispel, too. -- Liliana 13:08, 11 March 2012 (UTC)

Survey invitationEdit

The Wikimedia Foundation would like to invite you to take part in a brief survey.

With this survey, the Foundation hopes to figure out which resources Wikimedians want and need (some may require funding), and how to prioritize them. Not all Foundation programs will be on here (core operations are specifically excluded) – just resources that individual contributors or Wikimedia-affiliated organizations such as chapters might ask for.

The goal here is to identify what YOU (or groups, such as chapters or clubs) might be interested in, ranking the options by preference. We have not included on this list things like “keep the servers running”, because they’re not a responsibility of individual contributors or volunteer organizations. This survey is intended to tell us what funding priorities contributors agree and disagree on.

To read more about the survey, and to take part, please visit the survey page. You may select the language in which to take the survey with the pull-down menu at the top.

This invitation is being sent only to those projects where the survey has been translated in full or in majority into your language. It is, however, open to any contributor from any project. Please feel free to share the link with other Wikimedians and to invite their participation.

If you have any questions for me, please address them to my talk page, since I won’t be able to keep an eye at every point where I place the notice.

Thank you! Slaporte (WMF) (talk) 22:17, 5 March 2012 (UTC)

Aching to bake a cakeEdit

Heard at work today: "I need some labels printing this afternoon." A similar example from the Web: "Anytime you want a cake baking you know you just need to ask." The -ing form seems sort of orphaned: it's not "a cake's baking" (the baking of a cake), and presumably label and cake are not printing and baking themselves. What is going on here, grammatically, and what is it called? Equinox 12:53, 6 March 2012 (UTC)

Naive question: is there was no 's' at the end of labels, how would that be different from "horse riding" or "house cleaning" ? — Xavier, 13:46, 6 March 2012 (UTC)
That would be a totally different construction. "Printing of labels" = "label printing" but not "labels printing". The structure of my sentences is more akin to "I need these people found". Equinox 13:55, 6 March 2012 (UTC)
Or maybe even 'I like my coffee strong'. A noun followed by an adjective. —CodeCat 14:07, 6 March 2012 (UTC)
I believe that the two examples are just short for "being printed" and "being baked". SemperBlotto (talk) 14:11, 6 March 2012 (UTC)
But they are "printing" and "baking". After CodeCat's comment I suppose they could be an elision of "some labels (to be) printing (at some particular point during) this afternoon". Equinox 14:13, 6 March 2012 (UTC)
@SemperBlotto: Are you saying that "I need some labels being printed this afternoon" and "Anytime you want a cake being baked" sound O.K. to you? Because to me they sound just as bad as the original examples: my idiolect doesn't allow need or want to take an object plus a gerund-participle, regardless of whether the object's relationship to the gerund-participle is that of a subject or that of an object. I would have to use a past participle or an infinitive, as in "I need some labels printed" or "you want a cake to be baked." (Of course, the "being printed"/"being baked" versions are unambiguously grammatical with a different parse: I'd read them as meaning "I need some labels that are being printed" and "you want a cake that is being baked.")RuakhTALK 19:41, 11 March 2012 (UTC)
By the way, if y'all do find "I need some labels being printed this afternoon" to be grammatical, then I bet that "I need some labels printing this afternoon" is related to the passival. Older forms of English used be Xing (plain progressive) where we would use be being Xed (passive progressive); for example, I recently came across the clause "The clock struck ten while the trunks were carrying down" when reading Northanger Abbey. (It's fairly common in older books, but IMHO it doesn't really stand out; before learning of the passival I never noticed it, and since then I've noticed it several times.) Equinox's examples are slightly different in that they're not after be — and in that they're not two hundred years old ;-) — but even so, I bet they're related. —RuakhTALK 15:16, 12 March 2012 (UTC)
  • Interesting. I would expect "I need some labels printed this afternoon". "Anytime you want a cake baked you know you just need to ask." The expressions in question emphasize the process and work involved rather than the result. What seems the most grammatically questionable to me is the use of "a" (countable) rather than say "some" (uncountable): "Anytime you want some cake baking (done) you know you just need to ask." It seems awkward without the "done" and a bit awkward with it, but not wrong. DCDuring TALK 14:44, 6 March 2012 (UTC)
I don't think that's how it's intended, because I can also find "if you want anything doing" and "They ... don't want it putting away" (referring to children who want their lunch left out). "Some" wouldn't work there. Equinox 14:52, 6 March 2012 (UTC)
Is there any chance that this is Irish English? I seem to recollect that the English progressive is used to make constructions that resemble some kind of progressive in Gaelic. DCDuring TALK 15:10, 6 March 2012 (UTC)
That would be with 'after', like 'I'm just after cleaning the house and there's mud all over already!' —CodeCat 15:15, 6 March 2012 (UTC)
Yes, thanks. I just found that the discussion of "be after" is what I couldn't properly remember. In any event, there seems to be a progressive aspect to the construction(s) in question. I gained the impression that there may be other ways that Gaelic progressive aspects wants expressing in English and I'm after finding out how. DCDuring TALK 15:30, 6 March 2012 (UTC)
This work on Gaelic is suggestive, but too technical for me, especially since I know ε about Gaelic, where ε approaches 0. DCDuring TALK 15:50, 6 March 2012 (UTC)


Aloha! A few days back I came over to Wiktionary and happened upon some vandalism (see the last few edits in my contributions, linked with the "C" in my signature), and since then I've found myself checking the recent changes for vandalism whenever I venture to this project. I have to admit, it was quite a blast from the past to have to revert vandalism using only the undo button with my having been a rollbacker since mid 2009 and administrator since last September on the English Wikipedia. I was wondering if y'all would have any objections to giving me rollback rights here in the event that I happen on more vandalism in the future. Given this previous discussion I get the feeling it's not likely I'll get it, but I figured I may as well ask. Thanks in advance, Ks0stm (TC) 20:58, 7 March 2012 (UTC)

I'd say you don't have enough edits on this wiki. But thanks for asking, it can never hurt to ask. Perhaps other contributors won't agree with me anyway, we'll see. Mglovesfun (talk) 22:31, 7 March 2012 (UTC)

Extending Wiktionary:Votes/2008-01/IPA for English r to all English wordsEdit

Should we extend Wiktionary:Votes/2008-01/IPA for English r to include all English words, not just "words like red, green and orange" (whatever that means). Some might say that the original intention of the vote was to include all English words, but to me at least, it's worded specifically to say that it doesn't include all English words. Why say "words like red, green and orange" to mean "all English words". Under what circumstances would those two be considered synonymous? Mglovesfun (talk) 18:01, 8 March 2012 (UTC)

You are misunderstanding the text that you quote. This is the complete text:
Voting on: For the pronunciation of English terms, agreement to use the specific IPA character /ɹ/ instead of /r/ for the r phoneme in words like red, green and orange.
You apparently read "in words like red, green and orange" as modifying "use"; that is, you read the text as meaning roughly this:
Voting on: For the pronunciation of English terms, agreement to use, in entries like [[red]], [[green]] and [[orange]], the specific IPA character /ɹ/ instead of /r/ for the r phoneme.
But in fact, "in words like red, green and orange" modifies "r phoneme"; that is, the text actually means roughly this:
Voting on: For the pronunciation of English terms, for the r phoneme that occurs in words like red, green and orange, agreement to use the specific IPA character /ɹ/ instead of /r/.
The vote already explicitly applies itself to "the pronunciation of English terms", and there's no need to go back and "extend" it to "include all English words", unless there are English words that are not English terms.
RuakhTALK 18:27, 8 March 2012 (UTC)
Surely it only includes the "the r phoneme that occurs in words like red, green and orange", like the text of the vote says. Mglovesfun (talk) 18:44, 8 March 2012 (UTC)
BTW Ruakh, I don't disagree with what you've said, but it is speculation. There's no way to know what was going on in the heads of the people who voted when they read that text and voted. Mglovesfun (talk) 18:45, 8 March 2012 (UTC)
What I mean by that is, you can't possibly know which interpretation the people who voted were using. Furthermore "For the pronunciation of English terms" doesn't have to mean all English terms, in the same way "house and castle are English words" doesn't imply that these are the only English words. Mglovesfun (talk) 18:51, 8 March 2012 (UTC)
So is it your opinion that there is an r phoneme in "all English words"? The text says "the r phoneme in words like red, green and orange" to explain what r phoneme is meant. It's not speculation, it's common sense.
Don't get me wrong — there's definitely room for arguing over the extent of the indicated r phoneme. Does it include linking and intrusive r, for example? But inserting the phrase "all English words" would not help.
RuakhTALK 19:00, 8 March 2012 (UTC)
I think I only need to repeat what I said "you can't possibly know which interpretation the people who voted were using." Mglovesfun (talk) 19:04, 8 March 2012 (UTC)
Well, this is obviously the interpretation of those who voted in favor of it. Since they carried the day, their interpretation is what matters! —RuakhTALK 19:19, 8 March 2012 (UTC)
Why is it obvious? How can something that happened inside someone's head 4 years ago be 'obvious'? Mglovesfun (talk) 19:23, 8 March 2012 (UTC)
It's obvious because it's the only interpretation that makes any sense. If you think otherwise, then please provide an alternative interpretation that makes sense, and then explain why 17 editors voted in favor of it. —RuakhTALK 19:46, 8 March 2012 (UTC)
I agree with everything Ruakh says above. - -sche (discuss) 20:13, 8 March 2012 (UTC)
@Ruakh, you've missed the point, or are ignoring it. I'm saying what makes you qualified to speak for those people who voted? Seems to me you either haven't understood what I've said, or are trying to cover up that you don't have an answer for it by changing the subject. Mglovesfun (talk) 20:49, 8 March 2012 (UTC)
Also I take offense, because I think my interpretation not only makes sense, it's the more literal interpretation. You're using the more 'abstract' interpretation. Mglovesfun (talk) 20:52, 8 March 2012 (UTC)
Again, I asked ""words like red, green and orange" to mean "all English words". Under what circumstances would those two be considered synonymous?" Ruakh, have you actually addressed anything I've said? Because if you just want to go on a monologue, could you start a separate thread in case anyone actually wants to contribute to this one. Mglovesfun (talk) 20:53, 8 March 2012 (UTC)
I'm sorry: I didn't mean to give offense, and I didn't mean to imply that your interpretation doesn't make sense. The thing is — I still don't understand what your interpretation is. Can you present a clear explanation of what the text means to you? As for your question about the synonymous-ness of "words like red, green and orange" and "all English words" — I had assumed that that was a rhetorical question. I should think it would go without saying that "words like red, green and orange" does not mean "all English words". And it's not supposed to. "The r phoneme in words like red, green, and orange" does not refer to an r phoneme that occurs in all words; rather, it occurs to an r phoneme that occurs in, well, words like red, green, and orange. So you ask me to justify a position that I don't hold — a position that no one holds — and whose relevance you have not given any justification for; and then, when I do not attempt to justify that apparently-irrelevant position, you accuse me of failing to address what you have said. —RuakhTALK 21:49, 8 March 2012 (UTC)
You didn't answer the bit about how you know what other people were thinking when they voted back in 2008. So now you're saying nobody holds the position "The vote already explicitly applies itself to "the pronunciation of English terms", and there's no need to go back and "extend" it to "include all English words", unless there are English words that are not English terms." Well that's ok then, but I do wish you'd never replied, you've hijacked the whole topic and wasted everybody's time. Thanks for that. So now can we get back on topic? What I'm suggesting is rewording the vote to explicitly include all English words with a rhotic r, as it's not clear what "words like red, green and orange" is supposed to mean. Thoughts? On-topic only please. Mglovesfun (talk) 22:00, 8 March 2012 (UTC)
Wait, what? You've misunderstood me somehow. The vote does explicitly apply itself to "the pronunciation of English terms"; that's a verbatim quotation from the text of the vote. What the vote does not do is use the phrase "words like red, green and orange" to refer to all English words; rather, it uses that phrase to refer to English words that contain the "r" phoneme found in those words. Now do you understand? —RuakhTALK 22:20, 8 March 2012 (UTC)
@Mg: The usual/standard English "r" phoneme, as typified by words like "red" (where it occurs before a vowel) and "orange" (where it occurs after a vowel), is indeed an alveolar approximant, and pursuant to the vote, is to be represented by the approximant symbol [ɹ], rather than the trill symbol [r]. [r] should not appear in the transcription of any English word, unless the transcription is of a dialect that actually uses the trill. Meanwhile, other "r" phonemes exist, such as the "r" in [sɝ], which is already represented by something else, namely [ɝ]; the vote's reference to "red, green and orange" excludes phonemes like [ɝ] (r-coloured vowels), making the vote apply only to the standard, "full" "r", [ɹ] ([r]), and allowing "sir" to continue to be transcribed [sɝ]. It is admittedly confusing. - -sche (discuss) 22:31, 8 March 2012 (UTC)
I always understood, but the text is ambiguous. Though apparently, you both agree with, and dispute that. You say "It's obvious because it's the only interpretation that makes any sense" and then "I didn't mean to imply that your interpretation doesn't make sense". I'll be damned if I know what you did mean. It's rather tempting just to butt out and let you argue against your own position. I've changed my mind, you're not engaging in a monologue, but I dialogue, with yourself! Anyway, at the risk of being on topic, my proposition is the same vote, but without the ambiguous wording. Having said that, based on recent comments, it wouldn't pass anyway. Mglovesfun (talk) 22:37, 8 March 2012 (UTC)
To clarify: So far as I'm aware, mine is the only interpretation that makes sense. If and when you present an alternative interpretation, I'm open to the possibility that it will make sense (though to be frank, I seriously doubt it). But to date, so far as I can tell, you haven't presented any interpretation at all — not one that makes sense, and not one that doesn't. —RuakhTALK 23:27, 8 March 2012 (UTC)
I'm not remotely understanding how anything is ambiguous in that vote. The vote seems clear enough, and the comments in the voting section make it entirely clear.--Prosfilaes (talk) 23:01, 8 March 2012 (UTC)
Discussed at User talk:Dan Polansky#Wiktionary:Votes/2008-01/IPA for English r. If you're having trouble seeing my interpretation of the vote um, read the words and interpret them literally, assuming no prior knowledge or non-literal implication. If that doesn't work... ask a friend. Mglovesfun (talk) 00:16, 9 March 2012 (UTC)
It's simply unproductive to try and insist that every thing be interpretable literally with no prior knowledge. Read in full in context, it has a clear meaning. That link does not offer an interpretation of the vote that's consistent with the fact that it was put up for voting and approved.-- 01:59, 9 March 2012 (UTC)
"Words like red, green, and orange" are English words which have the [ɹ] IPA sound in their close transcriptions. Some English dictionaries, such as Longman Advanced English American Dictionary, use /r/ to represent [ɹ] within broad transcriptions, for the sake of convenience (I guess). Here in EN WT, as a result of the herein above-mentioned vote, we use /ɹ/ to represent [ɹ] for that class of English words (such as "red", "green", and "orange") which contain the [ɹ] sound in their close transcription. —AugPi (t) 03:24, 9 March 2012 (UTC)
@ no it doesn't have a clear meaning. That's why we're having this discussion. If anything, the meaning I see is the most straightforward. It seems to me the vote is deliberately worded not to include all English words. Ruakh and -sche have two other interpretations. And still Ruakh won't tell me how he's able to read other people's minds. Mglovesfun (talk) 11:59, 9 March 2012 (UTC)
It is very deliberately worded to include only this sound. Not all English words include this sound. This vote does, not for example, dictate that cat's pronunciation be given as /ɹɹɹ/. Is that really so hard to understand? (And I "read their minds" by applying common sense. No one would have voted for it if they thought it made no sense; ergo, they thought it made sense; ergo, they interpreted it the same way that I, -sche, AugPi, and Prosfilaes interpret it — the way that everyone else except for you apparently interprets it — because that is the interpretation that makes sense.) —RuakhTALK 12:27, 9 March 2012 (UTC)
The alternative argument does make sense; you've explained it perfectly well above. If I can understand it, surely you can too, I mean, you did write it! Mglovesfun (talk) 15:06, 9 March 2012 (UTC)
Luckily as long as nobody agrees with me, it doesn't matter! It does mean there are a lot of pages that violate this rule, in fact, I think we use /r/ much more than /ɹ/. I was gonna rename the pages for the rhymes, but feared that someone might say that the vote doesn't mandate this, as it only refers to the r-phoneme in certain situations, not at. Mglovesfun (talk) 12:24, 9 March 2012 (UTC)
I think I read the vote the same way as most people here. There is something referred to as "r phoneme", but as the letter "r" does not uniquely identify the phoneme (as "sir" uses a different phoneme), some words that use that phoneme are given as examples. The phoneme could be called "r-as-in-red-phoneme". What the vote says is that "r-as-in-red-phoneme" should me marked using /ɹ/ in all words that use the phoneme. --Dan Polansky (talk) 12:57, 9 March 2012 (UTC)
I really think there's nothing wrong in just saying in the least ambiguous terms possible what one means. It seems to be really rare to find any paragraph of any vote or protected policy page that's been well-enough thought through to be non-ambiguous. Often it's worse than that. The names of specific entities text in WT:CFI before it was removed was so bad that nobody claimed to understand it. I really wish we weren't so bloody amateurish, but getting through reforms when they all need at lest 70% community approval is rock hard. So often we end up with rules that nobody wants, or in some cases that nobody understands. Our usual solution is just not to use our own rules, like how most people skip the "physical product" bit in WT:BRAND because it's convenient to do so. Mglovesfun (talk) 15:06, 9 March 2012 (UTC)
The only wording that would be even clearer would be "In English words, the alveolar approximant should always be transcribed /ɹ/; only the alveolar trill should be transcribed /r/." Even that is only clearer to those who know the technical terminology; "red, green, orange" seems to have been an attempt to make the vote intelligible to more people. I'm not opposed to changing the wording to that, I just don't see it as necessary. - -sche (discuss) 21:40, 9 March 2012 (UTC)
Mkay. Anyway, turns out there are over 800 rhymes pages using /r/ not /ɹ/, so I can't rename them by hand (well, it's impractical). Could someone write a script to do it? I'm thinking of Ruakh, as I believe he's capable of it, and I don't know who else is. Mglovesfun (talk) 12:51, 10 March 2012 (UTC)
I don't know very much about the rhymes pages. Does using /ɹ/ in entries necessarily imply that we use it in titles of rhymes pages? (That would make sense, but I want to make sure people are on-board with that.) —RuakhTALK 14:21, 10 March 2012 (UTC)
Actually (perversely), we use /r/ in many Rhymes pages (such as Rhymes:English:-iːtʃə(r)) where / ˞ /, not /r/ nor /ɹ/, is technically correct. - -sche (discuss) 17:54, 10 March 2012 (UTC)
My wish is that we put words onto multiple rhymes pages when there are multiple pronunciations. "ə(r)" doesn't exist: nobody says /ˈbænə(r)/, it's either /ˈbænə/ (e.g. British) or /ˈbænɚ/ (e.g. American). An American wouldn't rhyme banner with banana. DAVilla 22:41, 12 March 2012 (UTC)

Removing text from "WT:ELE#Category links"Edit

I'd like to remove this from WT:ELE#Category links:

The list of entries on a category page will be alphabetized in the strict Unicode order of the titles unless you dictate otherwise. One effect of this is that all English entries beginning with a capital letter will be listed before any that begins with a lower case letter. You can change how an item is sorted with a piped link. By placing [[Category:Drugs|*]] in the entry drug will force that term to be at the top of the list since Unicode lists the asterisk before any letter. Words that define a category name should be “piped” in this way. Similarly, putting [[Category:Drugs|aspirin]] in the entry Aspirin will force it to be alphabetized among words that begin with a lowercase letter.

In most cases the category name should begin with a capital letter. This takes advantage of Unicode sorting to create separate lists for each foreign language that is represented within the broader set of categories. Foreign-language categories can begin with the language code in lower case.


--Daniel 23:30, 9 March 2012 (UTC)

"By placing [[Category:Drugs|*]] in the entry drug will force that term to be at the top of the list since Unicode lists the asterisk before any letter." Is that even grammatically correct? - -sche (discuss) 17:57, 10 March 2012 (UTC)
Almost. "By" should be dropped from the start. Equinox 12:50, 11 March 2012 (UTC)
I agree with -sche and Equinox. Mglovesfun (talk) 11:42, 12 March 2012 (UTC)
Anyway, remove the text. Like Daniel says, it is doubly outdated. - -sche (discuss) 19:58, 12 March 2012 (UTC)

I created Wiktionary:Votes/pl-2012-03/Removing outdated ELE category text. --Daniel 11:47, 22 March 2012 (UTC)

Glosses in descendants sectionsEdit

I've started following the advice of (I think) EncycloPetey and Widsith by using the gloss {{qualifier|borrowed}} for languages that are not descended from the language in question. Such as, English doesn't descend from Latin. Is this just going to confuse people reading the descendants section, or will it provide useful, comprehensible information. I'm actually pretty split over this one. Mglovesfun (talk) 11:41, 12 March 2012 (UTC)

I don't see how anyone can be confused by it, and doing this will certainly provide useful information. By the way, even if the language is a descendant the term may be a borrowing. For example, Portuguese has ancho and amplo (both meaning ample and from Latin amplus, but the latter is a borrowing and the former an "evolved" word). Ungoliant MMDCCLXIV 12:35, 12 March 2012 (UTC)
Fascinating! Why did amplus turn into ancho? That's quite a change, from the consonants "mpl" to "nch". - -sche (discuss) 21:45, 12 March 2012 (UTC)
Certain consonants followed by /l/ often became <ch> (/t͡ʃ/ in Old Portuguese, now /ʃ/). Other examples include chamar from clamare, chama from flamma, chão from planus; but Portuguese also has many borrowings, respectively clamar, flama and plano.
I hope these examples illustrate how useful distinguishing between borrowed and evolved terms would be. Ungoliant MMDCCLXIV 22:44, 12 March 2012 (UTC)
I agree with Ungoliant; glosses are helpful. - -sche (discuss) 21:45, 12 March 2012 (UTC)
I agree that labeling borrowings as such is helpful, but I consider referring to such labels as "glosses" to be very confusing. They aren't glosses. Glosses are minitranslations telling you what a word means. —Angr 10:45, 13 March 2012 (UTC)
Historically, a "gloss" consists of any explanatory comments inserted into the margin around a text by a later author. Today, we more often use footnotes, endnotes, or Cliff's Notes for this purpose, but that's the origin. --EncycloPetey (talk) 19:42, 28 March 2012 (UTC)

Can someone point to an example of how this is used? I don't understand why we want to start duplicating amplo's etymology in amplus's entry. If some information about ample is useful, then more of it will be more useful. And when we're done, the entry for amplus contains the full text of a dozen other entries. Michael Z. 2012-03-28 19:55 z

See amplus. This practice has long been recommended at WT:ALA, based on suggestions by Widsith. --EncycloPetey (talk) 20:11, 28 March 2012 (UTC)
The label loanword might be clearer when it appears in isolation. When I see borrowed, I think “borrowed from where?” and do a double-take on the header. It's awkward in this context, because we mean “borrowed to there.”
Shall we also label descendants appropriately as calqued, compounded, portmanteaud, etc.? Michael Z. 2012-03-28 20:33 z
That question is moot, as we currently do not list those sorts of words in Descendant sections, nor would I care to see such items listed as Descendants. A calque is, in effect, a translation of a word or phrase, rather than a Descendant, and compounds tend to be formed regularly from roots in the same langauge or from borrowed pieces, rather than wholesale across languages. In any event, it is highly unlikely that a word in one language would originate as a portmanteau of words from another language. --EncycloPetey (talk) 01:25, 29 March 2012 (UTC)
Japanese is great for just that -- portmanteaus created from non-Japanese words. Take pasokon, for instance -- their version of "personal computer". Or konbi māto from "convenience mart". Or sumaho from "smart phone".
(Not making a case one way or the other about labeling; simply providing examples of portmanteaus in one language created from words from another language.) -- Eiríkr ÚtlendiTala við mig 01:56, 29 March 2012 (UTC)
and バックシャン... - -sche (discuss) 02:20, 29 March 2012 (UTC)

U, VEdit

Old, related discussion: Talk:vp. - -sche (discuss) 20:03, 12 March 2012 (UTC)

I'd like us to consider heuenly to be hevenly. Ditto in the Bayeux Tapestry there's the word dvx which I would like us to consider to be dux. My reasoning is the following: The word dvx is actually spelled d-u-x but, the U is written with two non-parallel straight lines making it look like a V. Ditto for heuenly, the V is written with a single curved line which looks like a U but is in fact a V. For me, it's why we don't consider uſe to be an archaic form of ufe just because they similar. The counter-argument is that a Wiktionary user won't know that in dvx, the 'v' is actually an obsolete way of writing a 'u' and so will look up dvx not dux (and so on). Though I could say the same for ufe and uſe. To avoid confusion, I hope, I will end with a question. Does anyone think that for heuenly and dvx, they should not be treated as hevenly and dux. If so, why? All relevant comments welcome. Mglovesfun (talk) 15:11, 12 March 2012 (UTC)

I get the idea, but as for "a sea open to all windes, which sometime within, sometime without neuer cease to torment vs: a weary iorney through extreame heates, and coldes, ouer high mountaynes, steepe rockes, and theeuish deserts." (from A Discourse of Life and Death), I recall needing help to figure theeuish out the first time, even knowing about the u/v merger. There's a number of words there that aren't found here in that spelling, but the only one that caused me trouble was the one you don't want to include. (Note also there's iorney, which is cited as Middle English; if we do this, should we do cite jorney?)
I would also go for a more pragmatic reasoning. It's not really a u that looks like a v; it is a u that's used in a now-unusual way. One could say that u and v were positional variants of each other, like s and ſ were.--Prosfilaes (talk) 22:46, 12 March 2012 (UTC)
I say we keep heuenly at the u spelling, for the reasons Widsith and I gave at Talk:vp. As Prosfilaes says, it isn't "a u that looks like a v; it is a u that's used in a now-unusual way". And one could say that u and v were variants like s and ſ, but because u and v are contrastive in English and other languages, redirects won't always work. - -sche (discuss) 23:06, 12 March 2012 (UTC)
As a rebuttal, how is vp pronouned? Is it /vp/ or /ʌp/. If it's the second, then that leads me to believe it is a 'u' but looks like a 'v'. 00:45, 13 March 2012 (UTC)
(That was me). Mglovesfun (talk) 00:47, 13 March 2012 (UTC)
On the other hand, the "ctu" in "victual" is a "ctu", not a "t" that looks like a "ctu", even though the pronunciation is /ˈvɪtəl/. - -sche (discuss) 04:15, 13 March 2012 (UTC)
@Prosfilaes: I don't think that's the way it really is. Before the v was invented, there was one letter, which was used in all contexts. If you were chiseling letters into stone, for instance, you might use what we would call v because of the straight lines- for both the vowel and the consonant uses. It was rather arbitrary. It was only after distinct letters were developed for the consonantal vs the vocalic sounds that scholars retroactively applied them to older texts. You really can't talk about "u that's used in a now-unusual way", because the older letter wasn't a u or a v as we know them today, but a single letter used for both. What happened, in effect, was the old phonemically-ambiguous letter being split into two along phonemic lines. Because of this, a word may be found in older texts spelled with either v or u, but we should stick with the standard u/v distinction for the lemma. Chuck Entz (talk) 06:25, 19 March 2012 (UTC)
I think you're missing or abridging over a key part. When u and v were developed, in what Wikipedia says was the late Middle Ages, they were positional variants of each other. v was at the start of words and u was in the middle or end. It wasn't the old phonemically-ambiguous letter being split into two along phonemic lines; it was the old letter being split into two along positional lines then getting reinterpreted phonemically. You will never find "mountaynes" (or mountains, or whatever) spelled with a v. Modern English always had a separate u and v, whether they were assorted by location in the word or by phonemic use.--Prosfilaes (talk) 23:18, 19 March 2012 (UTC)

I don't understand the question. What do you mean “consider it to be” the other word? Shouldn't the definition for heuenly just contain “obsolete form of heavenly?” Michael Z. 2012-03-13 02:18 z

Specifically move heuenly to hevenly and delete the redirect. Mglovesfun (talk) 18:50, 15 March 2012 (UTC)
Move heuenly to hevenly- sure. Delete the redirect- no. Heuenly is a purely arbitrary variant (I would go so far as to say a graphical rather than a spelling one), but it's one people will run into.Chuck Entz (talk) 06:25, 19 March 2012 (UTC)

I think this discussion should be extended to other languages, since such variant uses of letters occur in many languages that were later standardised. Old Norse is a notable example, it's normally cited in a 'normalised' spelling, but that's not actually the spelling used in the original documents. And in many old West Germanic languages (Middle English probably included), uu/vv was used instead of w, so does this mean uu/vv is a kind of w? Personally I don't mind including all spellings in the form that they are attested, as long as the normalised spelling is considered the lemma even if it's not actually attested in that form. So I think heuenly should exist, but be defined as an alternative spelling/form of hevenly, even if the latter is not attested. —CodeCat 19:08, 15 March 2012 (UTC)

I'm not an expert on the matter, but Old French, Middle French and Anglo-Norman also encounter such issues. For example in a paper copy of the 'Roman de Brut', it used trouer in the opening lines to represent trover. Similarly, in more than one Middle French text on the French Wikisource, one can find vn for un. Mglovesfun (talk) 19:37, 15 March 2012 (UTC)
Almost any language written in a Latin script before a certain date will have these issues, though Old English used the runic letter wen for w most of the time (I seem to remember a few cases of vv for w). Many of the edited texts have the distinction added retroactively, but the manuscripts themselves don't. @CodeCat: w is called double-u because it was originally a single-letter representation of vv Chuck Entz (talk) 06:25, 19 March 2012 (UTC)

Here's some oil for troubled waters, or fuel for the fire, depending on how you look at it: Take the First Folio edition of Shakespeare's Romeo and Ivliet (that's how Juliet's name appears in the title), and note that the name Juliet is spelled in the body of the play as Iuliet, but as Iuliet and Juliet in the page headers, often differently on facing pages. The same play has "As I did ſleepe vnder this young tree here," which exhibits both v and u in the same line of print. What I see fron a quick scan of several pages is that v is used at the beginnings of words (vp, vnder, vpon, vnkind, vnnaturall, vault), whereas u is used within words (cup, houre, graue, loue, Heauen), irrespective of modern distinctions between the two orthographies. --EncycloPetey (talk) 19:39, 28 March 2012 (UTC)

More on the Wayback MachineEdit

The earliest citation I can find for kailan is 1990, but it is on a blog, and another Wiktionarian therefore deleted that citation. This is unfortunate because it brings the oldest citation to a date 12 years later, which does not properly represent the record of the word. (It doubtlessly has an older oral history that unfortunately cannot be documented at all.)

I looked through the archives of this forum, and all I can find on the Wayback Machine is discussion about how the Wayback Machine is not allowed. I don't see anything where a consensus was formed, and in fact, there seem to be people of differing opinions on the subject. In addition to the Wayback Machine [[5]] itself, Biblioteca Alexandrina ([[6]]) backs up the Wayback Machine so that there are two archives of the Internet. I have confirmed by find of kailan both there [[7]] and on the Wayback Machine [[8]].

As I understand it, the two Wiktionary pages of relevance are WT:CFI and WT:CFIEDIT.

Is there a consensus that the Wayback Machine and Biblioteca Alexandrina are not acceptable as durable sources?

(There is an additional issue that while Google claims 1990 as a date, the Wayback Machine and Biblioteca Alexandrina have 1999 as the earliest capture. BenjaminBarrett12 (talk) 21:23, 12 March 2012 (UTC)

Haven't heard of BA but I consider WM not durably archived because it is run by some Internet companies who might disappear at any time. Geocities seemed immortal once. That's not like ISBN-assigned books and daily newspapers, which all have a copy in the British Library (or equivalent institutions in other countries). Equinox 21:28, 12 March 2012 (UTC)
Similarly, Wiktionary might also disappear at any time. Like Wiktionary, the Wayback Machine ([[9]]) is a not-for-profit (not merely some Internet company). Biblioteca Alexandrina is run by the Egyptian government ([[10]]). It may be noted that as stated on that Wikipedia page, there is criticism that the Egyptian government cannot maintain the BA. As demonstrated by the kailan example, these are invaluable resources for demonstrating the history of words. It occurs to me that in addition to the Wayback Machine and Biblioteca Alexindra which seem to me to be durable sources, there are two other sources: Google and the webpage itself ([[11]]). Combined, these four sources seem reasonable for attestation and citation. BenjaminBarrett12 (talk) 21:43, 12 March 2012 (UTC)
Well, if Wiktionary disappears then the whole discussion is moot :) And sure, the British Library and all its paper archives might be burned or nuked. Who knows. I would suggest putting your "non-durable" (per consensus) citations on the Citations page, where they are allowed, and they will serve as evidence and — possibly — some day become valid for the entry page. There's a {{seeCites}} tag that can draw attention to their presence. Equinox 21:46, 12 March 2012 (UTC)
A lot of people miss this point, but: non-durably-archived citations can still be in entries (and citations pages) if there's a compelling reason for them to be (as there is here), they simply can't count for attestation. See eg User_talk:-sche#ubersexual. - -sche (discuss) 21:36, 12 March 2012 (UTC)
That seems like a reasonable application here, though I've added additional discussion about whether the WM, BA, Google and the page itself constitute solid sources for attestation. BenjaminBarrett12 (talk) 21:43, 12 March 2012 (UTC)
I totally agree, sche. While there's some real trash on the net, bloggers can sometimes be quite serious writers, and there's nothing wrong with including a good quote, even if it isn't durably archived. I don't even think the reason has to be "compelling".
The Wayback Machine is not durably archived because any website owner can easily have their content removed, retroactively applying to all content for the site. The policy doesn't even bother to delve into the question of copyright. It's no questions asked. DAVilla 22:05, 12 March 2012 (UTC)
That (removal of content) is an interesting point, because Google Groups is willing to do the same with Usenet posts (which we consider durable), if you include an "electronic signature" (uh, type your name) swearing that the posts are yours. Yes, Usenet is a distributed system, not owned or originated by Google, but where else can you find its archives online? Equinox 22:11, 12 March 2012 (UTC)
A "good" blog or even just a Web citation is usually better than our typical made-up usage example. But our tougher standard for attestation seems right for the foreseeable future, ie, this year. The more tolerant treatment of Usenet seems like a practical accommodation to facilitate the attestation of currently popular slang. Blogs and the Web as a whole are more susceptible to protologisms. DCDuring TALK 22:19, 12 March 2012 (UTC)
I don't want to sound anti-Usenet BTW. It's been the only way for me to cite a fair few terms of respectable age (and it goes back to the 1980s, making it significantly older than Google and Weblogs). Just saying. Equinox 22:23, 12 March 2012 (UTC)
The argument that content can be removed from the Wayback Machine by a mere request does not sway me in this case because kailan has been up there since 1990. That alone seems to provide evidence that kailan on the Wayback Machine has become a durable record. (In contrast, Wiktionary has been online about half that time, since 2002 according to Wikipedia.) BenjaminBarrett12 (talk) 22:28, 12 March 2012 (UTC)
Uh huh, but the point is that some malicious person could now contact Google and claim to be the poster of all those 1990 "Kailans" and get them removed, and then our citations would be unevidenced and uninstantiable. Equinox 22:34, 12 March 2012 (UTC)
What if the blog owner/host decides to create a w:robots.txt preventing the WBM from archiving it? Would they delete what was already archived? Ungoliant MMDCCLXIV 22:55, 12 March 2012 (UTC)
Yes, they do. I've seen this with domains previously owned by others that were later cybersquatted. Equinox 22:58, 12 March 2012 (UTC)
The Wayback Machine's FAQ ([[12]]) says: "By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine." Also, that page and the removal policy page ([[13]]) seem to indicate that random malicious people cannot get items removed by a simple request. BenjaminBarrett12 (talk) 23:03, 12 March 2012 (UTC)
Right, so if you post a lot of stuff about kailan on your Weblog, and then you go away for a few years, or die, and the domain ownership expires, and some spam-scum grabs it to fill with advertising, then their exclusory robots.txt will retroactively remove everything you ever wrote from the Wayback Machine archive, even though the new owner had nothing to do with it. Equinox 23:06, 12 March 2012 (UTC)
That seems like a horribly flawed policy... Does the US Library of Congress have any web archiving initiative? Or does any other government friendly to the general public? -- Eiríkr ÚtlendiTala við mig 23:11, 12 March 2012 (UTC)
AFAIK, a blog (sub)domains do not expire and cannot be reused on such blogging sites as WordPress and Blogger. BenjaminBarrett12 (talk) 23:14, 12 March 2012 (UTC)
That doesn't help, because that's only the decision of the owner of the domain (WordPress.com, Blogger.com). As soon as that domain expires, the new spammy purchaser can put any robots.txt on any subdomain. The owner of the domain owns all subdomains automatically. Equinox 23:25, 12 March 2012 (UTC)
Yes, but the same problem applies to Wiktionary and Google Books. As soon as the wiktionary.org and google.com expire, a malicious person can buy them and do all kinds of bad things. BenjaminBarrett12 (talk) 23:46, 12 March 2012 (UTC)
As I said, if Wiktionary expires, this entire discussion becomes meaningless, just as any decisions we make about our lives become meaningless if we die. That isn't an excuse for us to skimp on attestation! Equinox 23:48, 12 March 2012 (UTC)
I'm using your argument as a demonstration of the durability of those sites. I do not see WordPress pulling the plug on their domain any more than the Wikipedia Foundation abandoning the Wiktionary domain. BenjaminBarrett12 (talk) 23:59, 12 March 2012 (UTC)
Wikimedia has become vastly important (regularly in the international news) and manages to pull in large amounts of donations. This is not the case with most blogging or journalling sites. Remember Diary-X? Or Six Apart flogging LiveJournal to the Russians? Equinox 00:02, 13 March 2012 (UTC)
I disagree. WordPress makes money and Google (i.e., Blogger) is rich beyond imagination. In any case, I appreciate the civil discussion. I think my mind is made up on this issue, so I will wait now to see what others have to say. And in the meantime, I'll add those kailan citations back in as examples, not attestations :) BenjaminBarrett12 (talk) 00:11, 13 March 2012 (UTC)
If the Google Books domain expires, the physical copies of the books it hosts will still exist. Ungoliant MMDCCLXIV 23:52, 12 March 2012 (UTC)
In that case, do all citations referring to Google Books have to be deleted until confirmation with the actual books? BenjaminBarrett12 (talk) 23:59, 12 March 2012 (UTC)
I don't think we need to worry about Google lying about the content of books... --Yair rand (talk) 03:39, 13 March 2012 (UTC)
I fail to see why Usenet is the only online resource that we expect to last forever. Mglovesfun (talk) 11:11, 13 March 2012 (UTC)
It's very decentralised. Ungoliant MMDCCLXIV 13:19, 13 March 2012 (UTC)

The first citation on Citations:kailan is a web page that ends in "4 / 30 / 90", but there were no web pages on April 30, 1990. The web as we know it was invented in 1991 and the Internet Archive was established in 1996. The date 1990 comes from somewhere else. You should check the books mentioned on that page, dating from 1983-1990. Do they actually use the word kailan? --LA2 (talk) 18:21, 14 March 2012 (UTC)

Not sure where to put this, but my two cents: A non-durable cite is a great thing to have in an entry or on a citations page if it shows earliest use or is for any other reason worthwhile having. We don't accept such a cite for RFV purposes, but otherwise it's great.​—msh210 (talk) 20:31, 28 March 2012 (UTC)


How would Index:Asturian get made? We've got loads of others, like Index:Spanish, and they're very handy to find words of a given language we have. --Cova (talk) 20:01, 13 March 2012 (UTC)

I think they are usually created by a bot. I don't know which bot creates them though. —CodeCat 20:07, 13 March 2012 (UTC)
re: the initial question, you would I think need to supply the Asturian alphabet, so an index can be created. -- Liliana 00:30, 14 March 2012 (UTC)
Oh OK. Would Wiktionary treat a and á as the same letter in an index? --Cova (talk) 20:56, 16 March 2012 (UTC)
We could, and we will if that's how other reference works alphabetise things ('ab', 'ác', 'ad'). Is it? - -sche (discuss) 21:34, 16 March 2012 (UTC)
The Index pages are usually created and updated by Conrad.Bot. --EncycloPetey (talk) 19:23, 28 March 2012 (UTC)

metasyntactic variablesEdit

We have foo, bar, baz, foobar, quux, and fnord. Since we don't generally allow computer code, and (for example) a lot of the APL symbol entries got deleted, what is the exemption for these? Equinox 00:25, 14 March 2012 (UTC)

It's not necessarily computer code. For example, in w:Nethack jargon it's common to say foocubus, meaning: either a succubus or an incubus. (This is the only example I can think of). Ungoliant MMDCCLXIV 00:34, 14 March 2012 (UTC)
They also say footrice for a chickatrice or cockatrice. I've not seen that kind of foo outside of rec.games.roguelike.nethack. Equinox 00:46, 14 March 2012 (UTC)
These aren't keywords or function names from a computer language. They are (English) placeholder words, often used as names for arbitrary things in conversation or in generalized statements. (And also as names for computer variables.) Michael Z. 2012-03-14 18:33 z


This is another discussion remotely related to the "unreliable sources" thing above.

Our Category:Old Prussian language does in fact not contain any entries in Old Prussian. What it does contain, is words in a new language supposedly created from Prussian and other sources. This should be treated no different from constructed languages, and thus be deleted per our CFI on constructed languages. -- Liliana 00:39, 14 March 2012 (UTC)

This is also being discussed on the German Wiktionary, here, here and here. Basically, some of the words are "new", while others, such as kaīls/kails and wundan, are in old records. (Kails is even recorded in use in several sentences, whereas wundan is in a glossary.) I'll help appendicise the "new" ones, if the community decides they shouldn't be in the main namespace. — Beobach (talk) 01:34, 14 March 2012 (UTC)
I would think they do, but as it seems, no one is interested (or knowledgeable) enough to give their opinion in this discussion. -- Liliana 13:33, 22 March 2012 (UTC)

IPA bracketsEdit

There's been some discussion at WT:Grease pit recently about IPA rendering issues, and that brought something to my attention -- there's some confusion about whether to use the [IPA][ square brackets or the /IPA/ slashes around IPA renderings.

This confusion could well be my own, but just in case I wanted to bring it up here. w:IPA#Brackets_and_phonemes notes that:

  • [square brackets] are used for phonetic details of the pronunciation, possibly including details that may not be used for distinguishing words in the language being transcribed, but which the author nonetheless wishes to document.
  • /slashes/ are used to mark off phonemes, all of which are distinctive in the language, without any extraneous detail.

As seen at have#Pronunciation, it looks like the rule above has been interpreted backwards here at Wiktionary. My understanding from the WP page is that square brackets should be used when showing a strict representation of the actual sounds a speaker makes, regardless of their impact on meaning, whereas slashes should be used when indicating more roughly what sounds are important for conveying the meaning of a word or phrase. Do we need to change the slashes at have#Pronunciation to brackets? Or am I missing something?

TIA, -- Eiríkr ÚtlendiTala við mig 17:43, 14 March 2012 (UTC)

I think you are right, and someone has been a bit overenthusiastic in rendering the pronunciation details in have. I would just change the second and third pronunciations to square brackets. Michael Z. 2012-03-14 18:19 z
I don't think including a more phonetic pronunciation is a bad thing, but a phonemic transliteration should always be included as well. So I propose that we make the rule like this: always include a phonemic transliteration, optionally include phonetic if it's different. —CodeCat 01:58, 15 March 2012 (UTC)
Wouldn't phonetic be potentially more useful? Especially for anyone hoping to learn how a language is supposed to be pronounced? And what looks like phonemic notation to a native speaker might be unclear to a non-native speaker. -- Eiríkr ÚtlendiTala við mig 02:06, 15 March 2012 (UTC)
But what about people who already understand the phonetics of the language, and just want to know what phonemes the word is made up of? Not all users of Wiktionary are language learners, and the exact pronunciation can always be derived from the phonemes plus the phonetic rules of that language. The reverse is not true on the other hand. —CodeCat 02:42, 15 March 2012 (UTC)
Okay, point taken, thinking it through further -- and rereading your earlier post, I'm happy with your proposed guideline of including phonetic if it differs from phonemic. -- Cheers, Eiríkr ÚtlendiTala við mig 18:17, 15 March 2012 (UTC)
Also, I think it's done correctly with have#Pronunciation, because /hæv/, /(h)əv/, and /hæf/ are all different phonemic realizations of this word (hence the slashes //). They contain sounds that are phonemically distinct (/v/ vs. /f/ and /æ/ vs. /ə/), and these forms are not used in free variation, as alternate pronunciations, but rather they are allomorphs that occur in different environments (as given before the pronunciation on the page). - Jmolina116 (talk) 02:06, 15 March 2012 (UTC)
Thank you for that explanation, Jmolina, I'd gotten my thinking somewhat in a knot after misunderstanding the examples at w:IPA#Usage. -- Cheers, Eiríkr ÚtlendiTala við mig 18:22, 15 March 2012 (UTC)
I'm not sure if we want to include /hæf/, because it's just voicing assimilation to the next word, and this is essentially a sandhi phenomenon that occurs in almost any language to some degree. And it also happens to any word in English. Just think of is she; are we going to include /ɪʃ/ as a possible pronunciation of is? And similarly for any other word ending in /s/, /z/, /(t)ʃ/ or /(d)ʒ/? —CodeCat 18:28, 15 March 2012 (UTC)
I disagree; "is she", pronounced slowly or with stress on "is" (as when asking a question, "is she?"), is unremarkably /ɪz ʃi(ː)/, but if someone spoke "I have to" slowly or emphasised it as /aɪ hæv tu(ː)/, rather than /aɪ hæf tu(ː)/, I would misinterpret it as "I have two". The two are contrastive. - -sche (discuss) 21:20, 15 March 2012 (UTC)
Slashes // are for phonemical transcription and they are better for pronunciation of a word for a native speaker in the language of this word. Brackets [] are for phonetical transcritpion. For example: /p/ is a phoneme in English and it has two (or more, but I don't know other) allophones: [p] and [pʰ]. [p] is the main allophone of the phoneme /p/. If someone understand a difference between phoneme and allophone, they understand when to use slashes or brackets.
The correct phonemical transcription of the word 'put' in English is /pʊt/ and phonetical: [pʰʊt]. Transcription like /pʰʊt/ is incorrect because is an allophone and it's not a main allophone of the phoneme.
The better transcription for non-native language speaker is of course that with brackets. An English speaker know that "p" in "put" is pronuounced [pʰ], a non-native language speaker might not know that. If a non-native language speaker pronounce "put" like [pʊt] it can be curious for an English speaker, but he will understand that, because changing allophone doesn't change the meaning of a word.
Some allophones are indifferentiable for a native speaker, but they can be differentiable for a speaker of other language. See also: w:Allophone#Examples_in_English_vs._other_languages. Maro 22:57, 15 March 2012 (UTC)

literal translations of idiomsEdit

There's a discussion here about whether or not to include the literal translations of idioms, when those literal translations are not idiomatic. I'm not necessarily opposed to including literal translations, but doing so directly contravenes WT:ELE#Translations, so I feel it needs to be discussed here. Really, it might be more useful to include the meaning of proverbs in languages that have no idioms with the same meaning, rather than to translate each word. - -sche (discuss) 01:50, 15 March 2012 (UTC)

As I said in the discussion, not all idioms have an exact equivalent but nevertheless, they are translatable. The section in WT:ELE#Translations says to avoid literal translations if they are not idiomatic and don't mean the same thing as the original. Surely, if a literal translation is unknown to mean the same thing or even misleading, it should be avoided, which is true in many cases. The topic needs more judgement and understanding the topic of translation and probably should be discussed case by case, if in doubt. My point is, almost everything is translatable, even if an exact idiom doesn't exist in the target language. Not only words but proverbs are translatable. We reuse foreign expressions translated into our language(s) too. Marking what type of translation it is can always be done - literal or idiomatic, of course. I'm not cotradicting WT:ELE#Translations but I think it may need to have more clarification and examples. --Anatoli (обсудить) 03:09, 15 March 2012 (UTC)
To me it seems that a literal translation can often be helpful to someone trying to understand an English idiom or proverb. The underlying metaphor can make sense even when it does not underly the idiomatic translation, if indeed there is one. Such a literal translation and the idiomatic translation(s) each need to be marked, at least during the long transition to the full provision of literal translations. Could a policy on this be formed and implemented, say, for proverbs or a subclass of idioms before attempting a policy for all idioms and proverbs. It seems to me that many entries marked as "idioms" probably do not really require literal translation for very many users. DCDuring TALK 11:51, 15 March 2012 (UTC)
If you want to know what none of your beeswax means in Japanese, shouldn't you look at ja:none of your beeswax? (Funny, ja has mind your beeswax, which we don't, but they don't have none of your beeswax. Still, the point stands.)--Prosfilaes (talk) 23:16, 15 March 2012 (UTC)

Crystal Clear action loopnone.png As a dependent side question, asked for clarification:

When adding a literal translation to a translation table, my current understanding is that we should only use the {{t}} template IFF that literal translation has enough currency in the target language to meet WT:CFI, and otherwise, we should add it as straight text at the bare minimum, or ideally with the individual target language terms linked to their respective WT entry pages.
By way of example, "you can lead a horse to water but you can't make him drink" has no clear corresponding Japanese idiom or proverb that I'm aware of, and I don't think this expression in translation has much currency among Japanese speakers. Consequently, when adding a translation to the translation table on that page, #1 below would presumably be incorrect, leaving #2 or #3 as alternates.
  1. Uses {{t}} (creates link to page that fails WT:CFI):
    • Japanese: 馬を水辺に導く事は出来るが馬に水を飲ませる事は出来ない (うまをみずべにみちびくことはできるがうまにみずをのませることはできない, uma o mizube ni michibikukoto wa dekiru ga uma ni mizu o nomaserukoto-wa dekinai) (literal, non-idiomatic)
  2. Basically straight text (simplest, but less useful for learners):
    • Japanese: 馬を水辺に導く事は出来るが馬に水を飲ませる事は出来ない (うまをみずべにみちびくことはできるがうまにみずをのませることはできない, uma o mizube ni michibikukoto wa dekiru ga uma ni mizu o nomaserukoto-wa dekinai) (literal, non-idiomatic)
  3. Links through to individual terms (ideal usability, incredibly ugly wikicode, partly due to Japanese display oddities):
    • Japanese: 水辺導く出来る馬に飲ませる事は出来ない (うまをみずべにみちびくことはできるがうまにみずをのませることはできない, uma o mizube ni michibiku koto wa dekiru ga uma ni mizu o nomaseru koto wa dekinai) (literal, non-idiomatic)
Is my understanding here correct? -- Eiríkr ÚtlendiTala við mig 16:25, 15 March 2012 (UTC)
For situations such as these it may be useful to have a template that links all of its parameters separately instead of just one. Something like {{links|en|you|can|lead|a|horse|to|water|||||}}. —CodeCat 17:33, 15 March 2012 (UTC)
I think I agree with Anatoli, if I understand his position correctly. I read ELE as basically saying that [[none of your beeswax#Translations]] should not list translations for a quantity of waxy bee secretion that does not belong to you: any translation at [[none of your beeswax#Translations]] should have the actual (figurative) meaning of "none of your beeswax", which a quantity of waxy bee secretion that does not belong to you does not. —RuakhTALK 21:32, 15 March 2012 (UTC)
Probably nothing new in my opinions here, but for speakers of the source language, I see no point at all in providing literal translations of idioms that are not idiomatic in the target language. For users of the target language, the most important thing in such cases is to provide a translation of what the idiom actually means, whether that be in the form of a different idiom in the target language, or in a plain descriptive form. However, a literal translation of the idiom can also be provided for interest, provided it is appropriately marked. 18:15, 19 March 2012 (UTC)

Problems with content outside any language sectionEdit

Our longstanding practice has been to include certain kinds of information outside any language section; that is, before the first language header. The most common of these are {{wikipedia}}, {{also}} and several Unicode character boxes. But there are some problems with this practice both from a semantic and from a usability point of view. Most of these templates, save for {{also}} and maybe a few others, in fact do belong to a particular language. For example, {{wikipedia}} belongs in the English language section when it links to an article in the English Wikipedia about an English term. This practice also makes pages look quite strange when used with tabbed languages, because any content before the first language header will appear above any language, no matter which tab is selected, which is obviously not usually what's wanted. So I'd like to try to work towards some form of policy banning any kind of content, except for a few specific cases, from appearing before the first language header. —CodeCat 15:50, 17 March 2012 (UTC)

I've been moving {{wikipedia}} into the ==English== section whenever I see it outside, and I know I'm not alone. Do you think that a policy would help? —RuakhTALK 16:46, 17 March 2012 (UTC)
I don't know, but it would be nice to have some kind of consensus on the subject...? —CodeCat 17:04, 17 March 2012 (UTC)
I don't think a policy banning things is necessary; just move the content to where it makes logical sense. —Angr 18:41, 17 March 2012 (UTC)
I have a line of code in User:Mglovesfun/vector.js for moving {{wikipedia}} directly under the English header instead of directly above it. Mglovesfun (talk) 20:42, 17 March 2012 (UTC)
There is a list for such problems: Wiktionary:Todo/anomalous section0 content (currently empty). Maybe you could have it updated according your wishes and use it as a starting point for repair campaigns. --MaEr (talk) 12:13, 18 March 2012 (UTC)

What does rare mean?Edit

I found some discussion about this label here in the parlour, but I do not find a definition of it at Appendix:Glossary#R. I also found something that looks relevant at Wiktionary:Votes/2011-04/Lexical_categories.

The reason I'm asking is because of the rare label on noodlemania. Google Books gives it 29 hits and 58 when spelled with a space. Google everything gives it nearly 8000 without the space and just under 55K with the space. That seems like it would knock it up a notch from rare into regular usage. BenjaminBarrett12 (talk) 03:35, 18 March 2012 (UTC)

I don't think that constitutes rarity, but others probably disagree. DCDuring TALK 12:19, 18 March 2012 (UTC)
It would open a can of subjectiveness-worms, but we could stop redirecting {{uncommon}} to {{rare}}.
"noodle mania" (with space) doesn't seem rare; I'm on the fence about "noodlemania". - -sche (discuss) 17:57, 18 March 2012 (UTC)
Uncommon is not defined at Appendix:Glossary#U, either. Even with a definition to provide guidance, there is certainly some subjectivity in labels like this, but without a standard, discussing whether a word is rare or uncommon is like arguing about the number of angels on the head of a pin. BenjaminBarrett12 (talk) 21:30, 18 March 2012 (UTC)
I could not find guidance to how "rare" is used on the OED site or in my AHD. In Landau's "Dictionaries" (1984, p.. 176), he says: "Frequency of use is usually indicated by the label 'rare.' Although frequency is related to currency, the distinction is worth preserving, since a word may be rare and still be current a principle that the OED consistently recognizes by doubly labeling those words that are both obsolete and rare, such as registery as a form of registry. The inclusion of rare words is confined by a large to unabridged, historical, and technical dictionaries...." BenjaminBarrett12 (talk) 22:03, 18 March 2012 (UTC)
To me, I guess the most important thing is saying that within the universe of that language's words for the concept, this is a rare way of saying this; alternately that this is a word that your audience (as writer) may not recognize and that you may need to define or rephrase. I could go to plutophile and expand that rare into a Usage note pointing out that most usages have been spontaneous recreations and as such it's likely that an audience would understand the word, but it's likely to jump out as an unusual word to some of them. I don't know if any of this maps well to how anyone has used rare on Wiktionary.
An important question is what do we want rare to communicate to our readers? What information that they want or need is being communicated by that tag? (That's not rhetorical.)--Prosfilaes (talk) 23:46, 18 March 2012 (UTC)
When I see the rare label in a dictionary, what I take from it is that I probably shouldn't use that spelling or word. Is there any other useful information the label provides? BenjaminBarrett12 (talk) 01:27, 20 March 2012 (UTC)
We're supposed to be descriptive rather than prescriptive, so you should understand it as "a word not commonly used" rather than "a word that you should not be using". They might be equivalent, because a word not commonly used might not be understood by many people, but that's your call based on your audience and how you want to come across. Equinox 01:45, 20 March 2012 (UTC)
I should have explained more clearly. The reason I understand the rare label to mean I shouldn't use it is because it won't be understood widely (which is part of what you're saying). Is there any other useful information that the rare label conveys? BenjaminBarrett12 (talk) 02:49, 20 March 2012 (UTC)
As far as noodlemania goes (since I created that entry): I tend to use the "rare" gloss if a word is noticeably difficult to cite to WT:CFI standards (i.e. basically from Books and Usenet). Equinox 01:48, 20 March 2012 (UTC)
It may have been that "noodlemania" is now more common than when you created the entry; in any case, what does "difficult to cite means." Perhaps, for example, that there are only four citations in Google Books/Usenet when three are required? BenjaminBarrett12 (talk) 02:49, 20 March 2012 (UTC)
I thought of an example: plutophile. That's one that I had trouble with and should perhaps get this label. BenjaminBarrett12 (talk) 12:10, 20 March 2012 (UTC)

I'm thinking that it has to be a (relatively) rare synonym of something significantly more common.

Obviously, if only ten scientists know of the squigglefinch, then the term squigglefinch will see very little use. It will have a tiny Google results count and appear in very few publications. But that doesn't mean it's a rare term, only that its referent is little known, or little written-about.

On the other hand, if the house sparrow is also called the squigglefinch, but only in East Spleenworth (pop. 91), then perhaps the term is rare. (On the other hand, this term's limited usage would be better labelled regional or dialectal, or East Spleenworth.) Michael Z. 2012-03-20 01:54 z

"a (relatively) rare synonym of something significantly more common" - this makes sense to me. It tells the reader, "Hey, you can use this word if you like because it is a word, but people typically use a different word, so think about using that other word!" BenjaminBarrett12 (talk) 02:49, 20 March 2012 (UTC)
No, no. “Use this one to look smart!” Michael Z. 2012-03-20 03:34 z

Yeah, people can use the information that way, too, if it suits them :)

One possibility is to define both "rare" and "uncommon," along these lines:

  • rare - a spelling or form that is less common than another spelling or form.
  • uncommon - relating to a word that is found on occasion but without widespread use.

BenjaminBarrett12 (talk) 21:14, 20 March 2012 (UTC)

The key question is: does "rare" mean "this word does not occur in many books (Usenet posts, etc)" or "this is not as common a term for [somety=hing] as [some other term]"? I always used it in the first way (for words which simply didn't occur often), but the consensus above seems to be to use it in the second way (for words which are rare synonyms of other words). I suppose a similarly vexing question presents itself (but has already been resolved) with regard to "historical": does that mean the term is historical, or the referent? Can an alicorn be described as historical, given that unicorns never existed? Well, we must add the result of this discussion to our Glossary, and make the tag link to our Glossary. - -sche (discuss) 21:27, 20 March 2012 (UTC)
Information about the referent, if it belongs in the dictionary at all, goes in the definition. A usage label like historical or rare represents information about a term's usage, typically a restricted context in which it is used. (We also throw grammatical labels into our “context labels,” but they are different, qualifying the POS heading.) Michael Z. 2012-04-01 20:34 z

Revisiting WT:COALMINEEdit

I've seen dissatisfaction on RFD with the keeping of many entries only because of COALMINE. I myself am on the fence about COALMINE, but (like User:Bequw in the vote) I do see no problem with the definition of [[coalmine]] being "alternative spelling of [[coal]] [[mine]]", {{&lit}}-style, rather than "of [[coal mine]]". I've also seen users who like COALMINE as-is, but I've seen enough dissatisfaction that I think there should be another VOTE. But what should the vote say? "Unidiomatic multi-word phrases are not granted exemption from our usual CFI, even when they are the more common spellings of single words." ? I don't think a single vote should ban unidiomatic multi-word phrases, as that would hit our Phrasebook — and while many do dislike the phrasebook, it's a separate issue that shouldn't be logrolled into this. (Incidentally, we're missing that sense of logroll/logrolling, or our current senses are too narrow: to logroll two unrelated issues is to have a unified vote on them, in the hope that both will be approved where in separate votes one might fail.) - -sche (discuss) 04:29, 18 March 2012 (UTC)

As the person who initiated the vote, I agree that it more or less creates as many problems as it solves. If annulled, the problem would be (or could be) that the rare form coalmine be accepted, and coal mine be deleted due to the space between the words. I do like the suggestion by Bequw, which I think was separately proposed by msh210 also. Mglovesfun (talk) 11:16, 18 March 2012 (UTC)
I missed the earlier vote somehow and would have opposed it. User:Bequw's suggestion would have been fine with me. The number of lame compositional entries justified by rare solid spellings is not large, but grows steadily. DCDuring TALK 12:26, 18 March 2012 (UTC)
Possibly of interest to you, DCDuring.​—msh210 (talk) 18:18, 19 March 2012 (UTC)
See Talk:hisown. DCDuring TALK 20:21, 19 March 2012 (UTC)
I have created Wiktionary:Votes/pl-2012-03/Overturning COALMINE. Critique or embetter it, please. :) Note my coment on the talk page. - -sche (discuss) 19:40, 19 March 2012 (UTC)

Format of definitionsEdit

Some Wiktionary definitions start with a capital letter and end with a full stop, while others don't. This is seemingly at random, and it is not uncommon to see both styles used under the same headword. The layout instructions are very unhelpful, saying that "Each definition may be treated as a sentence: beginning with a capital letter and ending with a full stop." (my italics). I think there should be a decision one way or the other because currently it looks messy. —This unsigned comment was added by (talkcontribs) at 05:16, 19 March 2012‎.

I brought this up recently, and there was no consensus on what to do. Am afraid I can't remember what the thread was called, so I can't link to it. Mglovesfun (talk) 09:39, 19 March 2012 (UTC)
Here is the link: Wiktionary:Beer_parlour#Definitions_as_sentences. BenjaminBarrett12 (talk) 09:58, 19 March 2012 (UTC)
Oh, OK, thanks. I think it's a shame that there cannot be agreement on this, because in my view it looks sloppy and unprofessional to have randomly varying styles. At least, when I come across different styles under the same headword, am I allowed to make them all consistent? 11:48, 19 March 2012 (UTC)
Yes, within the English language section (and Translingual?) you have a choice of formats, of which I prefer the begin-with-uppercase-end-with-a-period. Non-English sections are supposed to follow the other format, AFAIK. DCDuring TALK 13:42, 19 March 2012 (UTC)
What other format is this? Mglovesfun (talk) 13:53, 19 March 2012 (UTC)
Begin-with-lower-case-end-without-period. I don't think the other two combinations have any sanction. DCDuring TALK 14:57, 19 March 2012 (UTC)
I use begin-with-uppercase-end-with-a-period for both types. And I think most form-of entries, in all languages, use begin-with-uppercase-end-with-a-period (and if they don't, it was due to admin recklessness rather than to lack of community sanction). —RuakhTALK 20:28, 19 March 2012 (UTC)
@DCDuring, a little to my surprise, no format is sanctioned in any 'official' policy. The two you mention are de facto the most common, but the other combinations (initial cap no period, no initial cap period) are used, but less 'socially acceptable'. Mglovesfun (talk) 21:55, 19 March 2012 (UTC)
@Ruakh. I inferred from relative frequency that we preferred the "no-caps, no period" format for glosses in non-English sections. The logic of non-gloss definitions for non-English sections would push me toward preferring "caps with period" for them too. DCDuring TALK 23:20, 19 March 2012 (UTC)
@MG & Ruakh: I also inferred from Wiktionary:ELE#Variations_for_languages_other_than_English, which seems to recommend one-word glosses where possible for non-English sections, that the no-caps, no-period format was more consistent with that recommendation.
BTW, I don't see that the recommendation of a single-word gloss is a good one without some explanation of how to handle polysemic English definiens and providing examples for which there is no non-rare, non-obsolete, non-archaic single-word English gloss available. DCDuring TALK 23:20, 19 March 2012 (UTC)
Eh? It just says to follow the standard format, and has been pointed out, on this particular issue, WT:ELE offers no useful advice. Mglovesfun (talk) 23:40, 19 March 2012 (UTC)
Apparently most contributors favor the no caps, no period format with one-word definitions. DCDuring TALK 00:43, 20 March 2012 (UTC)
It may be better to say that most contributors who supply one-word definitions favor the no-caps,-no-period format for them. Many contributors do not. Personally, I think one-word definitions are unacceptable (not as in "they shouldn't be allowed", but as in, "they require further attention and improvement"), and one-word-plus-parenthetic-note definitions are acceptable but not ideal. There often (usually?) is not a perfect correspondence between sense #m of word X in language L and sense #n of word Y in English, so any attempt to give a single English word, even with a parenthetic note to clarify which sense of that English word is meant, will necessarily be incomplete. —RuakhTALK 15:46, 21 March 2012 (UTC)
  • I think the very short defs are often given as a single lower-case word or phrase with no punctuation simply because [[Word]] and [[word]] link differently on WT, and typing out [[word|Word]] is more work.
  • FWIW, there are a number of places in Japanese where one-word defs are really all that is appropriate, such as many (most?) concrete nouns, for instance. Take (su), which I'm currently expanding in a separate browser tab -- this is just vinegar. There's not really much else to it. The def given in my JA-JA dictionary is a bit long-winded by comparison, but that's because it's explaining what vinegar is.
Now, I'm not saying that that single word alone makes for a complete entry -- there are idioms, related terms, derived terms, etc. that should all be accounted for. But when it comes to definitions, sometimes a single word suffices, where saying more would actually be excessive. -- Cheers, Eiríkr ÚtlendiTala við mig 16:15, 26 March 2012 (UTC)
My problem is that there's a lot of bad examples out there; if you define a noun simply as cat, which definition of the 16 noun definitions do you mean? You could say there's one obvious definition but: (1) while most English speakers will guess which one "cat", without disambiguation, probably labels but I don't know if non-English speakers could tell, and (2) there's actually two subtly different definitions for cat (noun) that cat without disambiguation can label, the domesticated cat and any member of Felidae. mačka says that Slovene just uses mačka for the domestic species (though cat translates "member of Felidae" into Slovene as mačka) but what about the others? ᏪᏌ and ᏪᏏ are both defined as cat; do they both treat cat in the same way as English does?--Prosfilaes (talk) 09:26, 27 March 2012 (UTC)
When I'm working in Latin, I prefer one word translations whenever possible. However, depending upon the word and difficulties in translating it, I may expound in one of several ways: (a) use three or four synonyms separated by commas, when more than one English word closely matches the Latin, (b) include a parenthetical gloss to disambiguate a translation into English with more than one possible sense, (3) include a full sentenciform definition because there isn't a good English translation (or the only English translation is actually a borrowing of the Latin). --EncycloPetey (talk) 19:18, 28 March 2012 (UTC)

Let's make a group on FBEdit

What do you think about all-language Wiktionary's contributors' international(not only for English) community, realized as a group on FB. It would be helpful for other struggling wiktionaries, for newcomers (you know you cannot ask everything here, one may even get shy about asking silly questions or proposing absurd idea). There we also can discuss inter-wiktionary matters. There we can establish non official somewhat standards. or we just can have a fun :D. not only MG loves fun :D--Wikstosa (talk) 21:26, 19 March 2012 (UTC)

I hate FB, have never had an account, and probably never will, but this might be a good idea to "raise awareness" and let people know that Wiktionary actually exists. People mostly haven't heard of us, whereas they have all heard of Wikipedia. Definitely don't move any decision-making there though. Equinox 21:34, 19 March 2012 (UTC)
"People mostly haven't heard of us, whereas they have all heard of Wikipedia."
So, let's make sure all Wikipedia pages link to Wiktionary when possible.
e.g., w:Engineer links to engineer through a box at the bottom. --Daniel 12:43, 20 March 2012 (UTC)
I also loathe Facebook. As an occasional minor contributor to Wiktionary, I would be dismayed if any significant part of it was hived off there. 21:38, 19 March 2012 (UTC)
I don't see how its existence would be worse than WT:IRC.​—msh210 (talk) 21:51, 19 March 2012 (UTC)
I think real-time communication is a good thing. Nobody will be obligated to use it. Mglovesfun (talk) 21:56, 19 March 2012 (UTC)
Maybe my English was poor? was actually my post saying that "any significant part of it" would move there?. I think there should be a place to discuss Wiktionary as a whole. Wiktionary ,i think, has got to make standards , to which all wiktionaries have to conform (it may exist, i dunno, but if it does, then I have questions about some Wiktionaries of different languages).
I dont usually go into IRC and wait an hour for someone to say something, besides it as i know doesnt save past talk. also, as MG said, real-time communication where past comments and post are kept are much better.--Wikstosa (talk) 22:03, 19 March 2012 (UTC)
I would argue that any forum dedicated to discussions about Wiktionary, especially if these are decision-making or consensus-forming, is, or could very easily become, a "significant part" of the project. 22:45, 19 March 2012 (UTC) BTW, it is very heartening to see how many people here hate Facebook.
I also loathe Facebook, but I wouldn't be too much bothered if it was created. I'm happy enough with IRC (although I haven't seen any serious discussion there yet). Ungoliant MMDCCLXIV 22:07, 19 March 2012 (UTC)
If such a group were created, I would join it as a show of support, but I doubt I would participate. —RuakhTALK 22:18, 19 March 2012 (UTC)
I don't like FB much. But it might be useful, 1., to publicize Wiktionary a bit by posting WOTD, 2., to collect "likes", and, 3., to possibly get some comments. Absolutely no decision making. DCDuring TALK 23:28, 19 March 2012 (UTC)
BTW, 925 have already liked this fairly lame FB page for Wiktionary. DCDuring TALK 23:32, 19 March 2012 (UTC)
I think this is just part of some project to put Wikipedia articles on Facebook, which seems to be endorsed by Wikipedia. See, for example, http://creativecommons.org/weblog/entry/21721. I suppose we should at least be thankful that these articles are not accompanied by the usual rivers of pointless Facebook crap. (No personal offence intended to any Facebook users.) 00:49, 20 March 2012 (UTC)
Not only does Wikipedia have an FB page (http://www.facebook.com/wikipedia), but even some individual Wikipedia pages have their own FB pages (e.g., http://www.facebook.com/pages/Navajo-language/219949658046150). Having a Facebook page does not require anyone’s participation. People who want to go there can look, or ask a question, can. People who want to go there and comment can. There is already a Wiktionary page on FB, but it is from the Wikipedia page about Wiktionary (http://www.facebook.com/pages/Wiktionary/103949032974824). —Stephen (Talk) 21:31, 20 March 2012 (UTC)

It's funny how nobody mentioned Google+. I don't have any objection to a FB page or G+ page. I find it hard to keep up with discussions sometimes because only the most recent edits show up on a watchlist. A social network might actually work better than a wiki page for collaboration. Looks like there's a wall of opposition to such a thing though. How about a twitter hash tag? #wiktionary has a few tweets. Sounds fun. --Haplology (talk) 13:23, 21 March 2012 (UTC)

re: Google+: a social media short course. DCDuring TALK 13:40, 21 March 2012 (UTC)

New Wikimedia Shop feedback/help requestedEdit

Hey all,

Some of you may already know that we've opened a shop at http://shop.wikimedia.org to sell Wikimedia Merchandise. We're now entering our "Community Launch" allowing us to hopefully get as much feedback from the community about the store, it's products and everything else involved. For those that are interested we've set up an FAQ/information page, feedback page and design page. We also have a 10% discount up for at least the next 2 weeks (CLAUNCH or 'Wikimedia Community Launch' in the discount box at checkout) and a $10 maximum shipping fee world wide for most orders.

However the big thing I wanted to ask you about was Wiktionary gear. Right now everything on there is Wikipedia related but we want to make sure we have merch from all of the projects as well. So far we have a couple things on order:

  • Stickers from all of the projects
  • 1" buttons (or 'badges' ) from all of the projects
  • Are in the design and digital mockup phase of lapel pins for all of the projects to both go independently and as a set. Right now we're getting mockups to see how they look and to see if we want to go with the Pewter look that we have right now for the globe (this new set will have an interlocked v W for the wikipedia piece) or the full color enamel look like This Strike Command pin.

We want to have more though both soon and in the future and I wanted to know what you thought. One of my thoughts for something early on was a series similar to the I Edit Wikipedia shirts (we have two versions right now) on the shop for each project. If we did something like that should we just use Edit or adjust the verb? I spell? Any other product ideas? Jalexander (talk) 00:31, 20 March 2012 (UTC)

Anything with this logo would be nice (not necessarily in Lithuanian though). Ungoliant MMDCCLXIV 00:50, 20 March 2012 (UTC)
Edit is the verb we use here, yes. (Or contribute to. Occasionally vandalize/vandalise.  ;-) )​—msh210 (talk) 07:07, 20 March 2012 (UTC)

derived from baseballEdit

I think we could have an appendix or category of terms derived from baseball. I know we have Category:en:Baseball, but it could be useful to have a category of idioms derived from the sport too, such as bat for both sides. --Cova (talk) 08:40, 20 March 2012 (UTC)

I'd be surprised if anyone objected to an Appendix titled something like "English [terms|idioms] based on baseball metaphors" (or something more felicitous). Such an Appendix would have a bit of overlap with a similar one for cricket.
I'd prefer an Appendix to a Category for such efforts. I'm not sure what the best ways to link from entries to the Appendix would be: under "See also", in Etymology, on the sense line? DCDuring TALK 14:22, 21 March 2012 (UTC)

Creating Wiktionary:About NavajoEdit

It looks like there's enough activity in creating and editing Navajo entries that it might make sense to create a Wiktionary:About Navajo page. Any objections to starting one? -- Eiríkr ÚtlendiTala við mig 21:09, 21 March 2012 (UTC)

I don't think anyone can object to starting such a page... although those who speak and edit in Navajo might debate what to put on it. - -sche (discuss) 21:13, 21 March 2012 (UTC)
I created the page just as a simple stub. -- Eiríkr ÚtlendiTala við mig 21:55, 21 March 2012 (UTC)

CFI for endangered languagesEdit

WT:CFI does not address endangered languages specifically. The first criterion provided is "Clearly widespread use, or..."

Recently, a Ditidaht speaker has talked about adding entries, and there appears to be a movement for Navajo as well. Although Navajo has a vibrant community of speakers, as with Ditidaht, it does not have a large corpus like languages with larger populations to provide extensive citations.

Is "clearly widespread use" something defined by the speakers of that language? In that case, if there is only one Ditidaht speaker active on Wiktionary, for example, then is that person the sole arbiter of what constitutes "clearly widespread use"? BenjaminBarrett12 (talk) 23:47, 21 March 2012 (UTC)

I think "clearly widespread use" has been stated as an explicit alternative to adding cites for apple just because someone wants them. I'd almost say that clearly widespread use is not for any word where we can't wave at Google Books or Usenet and go look, "a metric assload of cites", which includes all the words from most languages.--Prosfilaes (talk) 00:14, 22 March 2012 (UTC)
So if a speaker of a language with only 100 people wants to write down words that are clearly basic but not in a published work, those words are not acceptable? BenjaminBarrett12 (talk) 00:18, 22 March 2012 (UTC)
He'd be eminently welcome to do so at nl.wiktionary and I'll be happy to take care of the Dutch translation. Jcwf (talk) 01:06, 22 March 2012 (UTC)
That is very generous, Jcwf. BenjaminBarrett12 (talk) 01:57, 22 March 2012 (UTC)
Perhaps if the user could point to some reference, the community wouldn't be so sticklerish as to demand citations for everything: but we would like some reference or citation, more than just the word of someone on the internet, lol. - -sche (discuss) 01:13, 22 March 2012 (UTC)
Although it's difficult to quantify [[14]], a lot of the languages of the world are unwritten, which means that the stated purpose of the English Wiktionary "...to describe all words of all languages using definitions and descriptions in English" would not be possible even if representatives of all the languages of the world contributed here. While I understand on one hand the concern that allowing someone who claims to speak an endangered language to go loose on Wiktionary, on the other hand, that seems counter to the stated purpose of Wiktionary and its spirit. BenjaminBarrett12 (talk) 01:57, 22 March 2012 (UTC)
I see a different spirit of Wiktionary then you. I see a Wikimedia project, akin to Wikipedia, where the goal is not to publish original studies, but to refine what has been published in such a way that other people can look at our citations and check our work.
I don't see that this is a valuable thing. Professional linguists are working on recording these languages; the joke is that the typical Navaho family contains a father, a mother, children, and an anthropologist. One untrained, unreviewed person is more likely to add junk that no one will ever use to Wiktionary then they are to add stuff that is (a) correct and (b) of interest to anyone. (Seriously; who's going to be looking up 100-people languages here? The linguist audience can't use anonymous non-peer-reviewed material added here.)
And on the flip side, for every Ditidaht speaker, we probably have a dozen people wanting to add works in Prussian or Siberian or some other constructed language masquerading as a natural one.--Prosfilaes (talk) 13:30, 22 March 2012 (UTC)
If the Wiktionary community wants to change "...to describe all words of all languages using definitions and descriptions in English" to "...to describe all words of all well documented languages using definitions and descriptions in English," that would work, too, but there seems to be a clear contradiction here. The Dutch page already has a Ditidaht word: [[15]]. Jcwf put it up. Evidently they do have a different policy that allows for this. BenjaminBarrett12 (talk) 17:46, 22 March 2012 (UTC)
I don't see the contradiction. "all words of all languages" is a high-flying mission statement. We don't mention that we don't include English words spoken only at one elementary school for a short period of time. It doesn't invalidate our basic citation requirements.
In any case, according to w:Ditidaht language, there have been publications in the language. I certainly wouldn't compare it to something like Navaho, which is in the top 20% of the world's languages by size. There are a number of publications in Navaho both anthropological and local.--Prosfilaes (talk) 04:01, 23 March 2012 (UTC)
A speaker of a language with only 100 people probably is speaking terms with a linguist who's working on publishing the language. It wouldn't amuse everyone, but publishing texts on Usenet would be a step above just writing definitions here.--Prosfilaes (talk) 13:30, 22 March 2012 (UTC)
This seems like a reasonable work-around. BenjaminBarrett12 (talk) 17:46, 22 March 2012 (UTC)
We have a special rule for extinct languages, which for me, misses the point a bit. It shouldn't be about whether a language is extinct or not, but the amount of attestation available in the language. So all poorly attested languages should require only one attestation. The problem is how to legislate for this, nobody knows! That's why the rule for extinct languages is a good one, it's basically the best we can do. Mglovesfun (talk) 18:00, 22 March 2012 (UTC)
I don't see why we can't just edit that line in the CFI about extinct languages to include languages under a certain number of speakers (arbitrarily chosen, of course, but maybe in the 5,000-10,000 range). A listing in a modern anthropological or linguistic work ought to be sufficient. --Μετάknowledgediscuss/deeds 04:12, 23 March 2012 (UTC)
One criterion for endangered languages could be a listing in the UNESCO Interactive Atlas of the World’s Languages in Danger. If this excludes something that should be included, then the definition could be revisited.
And as it turns out, Ditidaht actually falls between the cracks. Although it has its own ISO 693-3 code (dtd) and Wikipedia page (wikipedia:Ditidaht_language), the Ethnologue recognizes it as a dialect of Nootka, and the UNESCO Atlas lists Nootka, not Ditidaht. Wiktionary says there is no consensus on dialects (Wiktionary:Dialects), so perhaps this is acceptable. BenjaminBarrett12 (talk) 05:25, 23 March 2012 (UTC)
We can always edit the CFI by vote, but I don't see it as a non-controversial proposal. I don't want anyone adding material to Wiktionary backed up by "I said so" for any language. Besides the theoretical reasons, stuff like "Siberian" seems much more common then real tiny languages, and orthographies for small languages are frequently controversial, and amateur-created orthographies are often pretty bad.--Prosfilaes (talk) 06:30, 23 March 2012 (UTC)
As I read this thread, the proposal is to allow only one source as attestation, including a Usenet upload, for endangered languages as defined by UNESCO, including dialects even if not specifically mentioned. Is that controversial? BenjaminBarrett12 (talk) 06:59, 23 March 2012 (UTC)
It's a good idea; what do you mean by 'controversial'? Mglovesfun (talk) 10:34, 23 March 2012 (UTC)
You're right; if it is to allow only one source as attestation, it's probably not controversial.--Prosfilaes (talk) 13:46, 23 March 2012 (UTC)

I have created a voting page at Wiktionary:Votes/2012-03/CFI_for_Endangered_Languages. BenjaminBarrett12 (talk) 17:47, 23 March 2012 (UTC)

Removing Interwicket from ELEEdit

User:Interwicket is inactive since November 2010 (and her work has been done by other bots, over the years; I'm not sure which, honestly - I just know Interwicket is not one of them, nowadays), so the statement "interwiki links are normally entered by User:Interwicket in an automated fashion" from the last line of WT:ELE is inaccurate.

Maybe the generic word "bots" would fit it better, this way: "interwiki links are normally entered by bots in an automated fashion", unless we do want to name specific bots on that page. --Daniel 11:58, 22 March 2012 (UTC)

I support the change from "User:Interwicket" to "bots". (I also support that change's going through if we reach consensus here (that is, with no vote), though I doubt that that (=that it will go through with no vote) will happen.)​—msh210 (talk) 15:08, 22 March 2012 (UTC)
I support making the change, I support doing it without a vote, and I oppose starting a vote over it. —RuakhTALK 17:35, 22 March 2012 (UTC)
It needs a vote, which'll take a month at least. You better get started right away. -- Liliana 18:33, 22 March 2012 (UTC)
I've created a vote that starts tomorrow and lasts 7 days: Wiktionary:Votes/pl-2012-03/Minor ELE fix. Quickly started and quickly completed via a lean process that is at the same time formally clean. --Dan Polansky (talk) 18:40, 22 March 2012 (UTC)
It requires a vote right, but of no fixed duration. Why not make it 24 hours. Mglovesfun (talk) 18:46, 22 March 2012 (UTC)
There should be some minimum time for people to be able to take notice; 1 day is extremely short time. 7 days seems okay to me for such a trivial matter. I mean, nothing horrible happens while "Interwicket" stays in ELE, right? --Dan Polansky (talk) 18:52, 22 March 2012 (UTC
That's kinda my point, if it's something really trivial, make it as short as possible. I'd consider an hour but we might not enough votes in an hour to have the vote pass. Mglovesfun (talk) 18:55, 22 March 2012 (UTC)
I guess we won't ever have one-hour votes. But, anyway, in that time a minor uncontroversial vote probably would pass as the creator supports it and nobody else opposes it (since there's no reason to oppose it in the first place).
Nonetheless, people sleep and have other things to do; they don't check Wiktionary every 60 minutes. --Daniel 21:31, 22 March 2012 (UTC)
I seem to think I managed to add an interwiki to Wiktionary:Criteria for inclusion without a vote. Mglovesfun (talk) 12:43, 23 March 2012 (UTC)
That was a few years ago. The "every minor change requires a vote" thing used to be much less strict, and has grown steadily stricter over time. (Even today, though, if you make an unobjectionable change, you're unlikely to be reverted. It's only if you try to propose an unobjectionable change that the vote-bureaucracy kicks into place.) —RuakhTALK 13:21, 23 March 2012 (UTC)
Indeed, Atelaes (talkcontribs) has actually made the proposed edit, and not only has nobody undone it, nobody's discussing it either! Mglovesfun (talk) 13:23, 23 March 2012 (UTC)
I and numerous others modified Wiktionary:Criteria_for_inclusion/Brand_names without votes. Incidentally, what is the status of that page? Is it still policy? If so, can we fix it up with a shortcut? Its old one (WT:BRAND) was reassigned. - -sche (discuss) 18:35, 23 March 2012 (UTC)
The brand policy resulting from Wiktionary:Votes/pl-2007-08/Brand names of products 2 is split into two parts, one of which is directly in CFI, and the other one is in Wiktionary:Criteria_for_inclusion/Brand_names. Wiktionary:Criteria_for_inclusion/Brand_names alone does not make up the whole brand policy, and, furthermore, the subpage is essentially dispensable. The subpage is linked from the end of the relevant section in WT:CFI, in "... See examples" of Wiktionary:Criteria_for_inclusion#Brand_names. --Dan Polansky (talk) 11:29, 25 March 2012 (UTC)


How is a category a policy? (It's listed as such in the {{policy}} template.) Or is it shorthand for, "these other pages are also policies", in which case: some of them aren't; at least, some of them aren't full policies, they're merely thinkthanks, etc. - -sche (discuss) 18:59, 23 March 2012 (UTC)

I think the latter (it's shorthand), and I agree with you that some aren't.​—msh210 (talk) 17:41, 25 March 2012 (UTC)

Status of Low German varietiesEdit

There are several new language codes intended for varieties of Low German, such as {{gos}}, {{stl}}, {{drt}}, {{twd}}, {{vel}}, {{wep}}, {{frs}}, and maybe more. This seems like a situation similar to Serbo-Croatian. Should these individual languages be allowed or should they be treated as dialects of Low German? —CodeCat 00:29, 25 March 2012 (UTC)

See also User talk:Liliana-60#Plautdietsch. Plautdietsch and Dutch Low Saxon also have codes, but it might be best to combine them — or, because the different varieties of Low German tend to have different orthographies, and sometimes different words, it might be best to keep them all separate. It's hard to say. - -sche (discuss) 00:49, 25 March 2012 (UTC)
I wouldn't object to lumping the Dutch Low Saxon varieties ({{act}}, {{drt}}, {{gos}}, {{sdz}}, {{stl}}, {{twd}}, {{vel}}) together and the Low German varieties spoken in Germany ({{frs}}, {{nds}}, {{wep}}) together. I would want to keep Plautdietsch different because of its different sociolinguistic status - it's closely associated with a particular religious group, and is hardly (if at all) used in Germany or the Netherlands so it isn't in the same diglossia situation with standard Dutch or German that the other Low German varieties are. —Angr 21:34, 25 March 2012 (UTC)
Indeed, there is some similarity and mutual inteligibility between the Low German Variants, however each Variant has its standards and norms, thus applying the norms of one Variant on another would be greatly noticeable. An overgeneralisation of these Variants would surely not explain the situation of the Low German Variants, and thus it is better to keep them seperate. Moreover the Low German Variants have a different history, and they are each of value to Historical Linguistics. An overgeneralisation might errase some parts of this history, and might lead to errorous assumptions. So why have one Standard Low German Dialect, when there are more Standard Variants? There is nothing to say about the mutual inteligibility nor to deny it, but these Standard Variants would demand a treatment like a Standard Language. —Dyami Millarson DM 15:18, 26 March 2011 (UTC)
We can still use context labels to indicate where the various variants differ, just as we do for the different variants of Serbo-Croatian, or for the different variants of English for that matter. —Angr 14:23, 26 March 2012 (UTC)
What do we want to do with {{nds-nl}}, in this case? -- Liliana 16:21, 26 March 2012 (UTC)
@Angr. Of course we could do it in that way, however I think that people will have less interest in adding information for these variants, moreover the merging does not tackle the differences in writing as well as some grammatical differences. You can see the example in Old East Norse, wherefore nobody really adds information to the Old West Norse section, even though the Old East Norse words might be very crucial for the reconstruction of Proto-Germanic as well as the etymology, since Old East Norse represents in some instances a more archaic variant of Old West Norse. Moreover the differences between Old West Norse and Old East Norse are less than the differences between the Low German Varieties. —Dyami Millarson DM 17:57, 27 March 2011 (UTC)
I must object Dyami Mallarson. There are no standards at all in Low German. (Speaking from a German viewpoint.) There are some house orthographies and books about them, but there is no broad consensus on 'correct' orthography and nobody would ever look up how to write a word. (At least, that's what 'standard' implies to me.) People write like they feel and with what they grew up with. Thus the Dutch write use z and ij and Germans s and ei. And that's often the whole difference. I always objected to {{nds-nl}}, which was founded, if I remember right, by Wikipedia because people were to lazy to deal with several writing systems. (i.e. the Dutch and the German; the native Low German writing schools died in the 17th century.) And I also doubt the worth of the whole cluster of sub-standards. What we basically have now is seven or so codes for dialects of cities no more than an hour apart from each other and sometimes heavily interwoven with Dutch/German. There are no vast lexical or grammatical differences throughout all of the dialects. The biggest differences I can think of are in the vein of monophthong/diphthong, rounded/unrounded front vowels (/zeven/ vs. /zöven/), fricative/plosive /b/.
The situation can rightfully be compared to standards of English or Serbo-Croatian and would vote to merge it. If necessary, a tag can be added. The current situation has no apparent advantages to me while diminishing the overview about Low German entries.ᚲᛟᚱᚾ (talk) 20:26, 3 April 2012 (UTC)
So you think that the more specific codes should be deprecated and orphaned, if not deleted? —CodeCat 12:22, 7 April 2012 (UTC)
Aye. They're dialect-codes and, as said, the dialects do not differ that much and often do not exist in greater nets. I.e. a certain pronunciation/form might differ greatly from another of a neighbouring area, but would probably not only be found in one connected region but rather in several hot spots quite some distance apart. (E.g. /zœvɛn/, found at the Dutch-German, Polish-German and the Danish-German borderlands but not as often in between). In my opinion we wouldn't need eight Low German entries for 'water', 6 of which would be written and pronounced identically. I am well aware that they are ISO and thus here to stay. But they provide (as far as I can oversee it) often not more distinction than RP/GenAm, and I wouldn't want to have ISO-codes for those either. I must repeat: There are seven Dutch Low Saxon codes for an area which is only (mere guess) maybe a quarter of the rest of the Low German area, which only has 2 codes. Seeing, though, that Dutch Low Saxon is often rather close to Dutch (either by result of lack of education or simply because it is a very free dialectal continuum), it might be a good idea to consolidate them to {{nds-nl}} and add a tag should there really be one or another word standing out.Korn (talk) 13:24, 7 April 2012 (UTC)

frs and stqEdit

Previous discussion: Wiktionary:RFM#Template:frs_-_Template:stq

{{frs}} and {{stq}} are one and the same language; neither are closely related to Low Saxon. (I proposed a merger of the two codes a while ago but as it seems, nothing happened...) -- Liliana 12:43, 25 March 2012 (UTC)

w:East Frisian Low Saxon says otherwise... —CodeCat 12:59, 25 March 2012 (UTC)
Check ethnologue:frs, ethnologue:stq, linguistlist:frs, linguistlist:stq. They don't differ. -- Liliana 13:01, 25 March 2012 (UTC)
The Ethnologue page for frs says it's a Low German dialect too. —CodeCat 13:04, 25 March 2012 (UTC)
It also says "Reportedly used only in Saterland, Eastern Frisia in 1998.", which matches the stq language code. The rest may be either referencing the old, extinct Frisian dialect, or be an editorial mistake. -- Liliana 13:05, 25 March 2012 (UTC)
But [16] does say that that it's "Not intelligible with Western Frisian [fry] of the Netherlands or Northern Frisian [frr] (1978 E. Matteson) or Saterfriesisch [stq] (2001 W. Smidt)". Ungoliant MMDCCLXIV 14:40, 25 March 2012 (UTC)
I don't understand that line, especially if you compare it with the "Reportedly used only in Saterland, Eastern Frisia in 1998. " above. They directly contradict each other! Compare also to the Linguist List links I gave. -- Liliana 14:42, 25 March 2012 (UTC)
My interpretation of that is: a headcount done in 1998 showed that, at that time, only Saterland, Eastern Frisia had speakers of this language. Ungoliant MMDCCLXIV 14:46, 25 March 2012 (UTC)
That is true for {{stq}}. East Frisian, as in the Low Saxon dialect, is spoken in a much larger area, and was even in 1998. -- Liliana 15:09, 25 March 2012 (UTC)
Here is the previous, short RFM discussion, which anyone unfamiliar with it should read. Basically, I support Liliana's proposal, but (even) if we don't follow it, we have work to do, because AFAICT we are currently using "frs" to refer to a language other than the one the ISO refers to as "frs". - -sche (discuss) 17:27, 25 March 2012 (UTC)
I'd say we should keep the codes separate and start using {{frs}} correctly, to refer to a dialect of Low Saxon rather than to a Frisian language. —Angr 21:22, 25 March 2012 (UTC)
There's no proof though that {{frs}} refers to the Low Saxon dialect, except for an unclear Ethnologue statement and Ethnologue has been known to be wrong on various occasions. (If you know Germanic languages and want a good joke, read ethnologue:vmf) -- Liliana 05:26, 27 March 2012 (UTC)
Yes, {{vmf}} is a mess, which is the main reason why the request for an East Franconian Wikipedia is on hold, because it isn't clear exactly what language "vmf" is supposed to refer to. Part of the problem is that SIL is the only organization that defines ISO 639-3 codes, and SIL writes Ethnologue and so uses Ethnologue to define what language each code refers to. There isn't really any independent authority with which one can double-check the definitions of the codes. And Ethnologue (like everything human beings do) is imperfect and sometimes mistaken. In this case, however, Ethnologue's definition of {{frs}} as the variety of Low Saxon spoken in East Frisia is coherent and sensible and is adequately distinguished from the definition of {{stq}} as Saterland Frisian, so in this case I think it's Ethnologue that's gotten it right and the Linguist List that's gotten it wrong. —Angr 18:13, 27 March 2012 (UTC)
To add my two cents and give an overview:
  1. Seeltersk (stq) is a dialect of East Frisian, that is: a lect which developed from Old Frisian. There were formely other East Frisian dialects which were not Seeltersk and which died rather recently. My first thought would have been that {{frs}} was made to refer to those. Frisian has rather strong dialectal differences due to the Frisian's insular style of living. (I'd hence assume the distinction frs/stq to be a mistake made by somebody confused by terms.)
  2. 'East Frisian Low German' is a Low German dialect spoken in the area known as 'East Frisia', because it was formerly home to people who spoke East Frisian dialects. East Frisian Low German has some East Frisian substrate, but is generally a 08/15 (that is: uninterestingly normal) Low German dialect typical to the 'Northern Low Saxon' group. It has not enough distinctive features to merit an own ISO code different from nds.
That said: I doubt that 'frs' refers to a Low German dialect, when there is no such code for much much more distinctive dialects such as West- and Eastphalian. I would have split it that way: stq=Saterland Frisian; frs=overarching/other East Frisian, and the rest is just bland nds. edit: Westphalian has an ISO, but my statement on EFLG still stands.ᚲᛟᚱᚾ (talk) 19:37, 3 April 2012 (UTC)
EFLG and Groningen LG (ethnologue:gos, linguistlist:gos) are even closer. GLG written using (High) German spelling conventions, looks a lot like EFLG, EFLG written using Dutch spelling conventions looks a lot like GLG. Both have plural present in -n, not -t. -- 16:15, 8 May 2013 (UTC)

Ancient Greek headlineEdit

This seems like too slight an issue for the mighty Beer Parlour, but I wanted some feedback, and couldn't think of where else to take it (if anyone else can, please take my blessing in moving it). Ancient Greek verbs have a rather large and complex inflection. They have something on the order of 500 forms, when all is said and done, and those 500 forms are created based off of six principle parts, which, while usually forming a predictable set with each other, can be somewhat independent as well. As an example, take a look at παρίστημι (paristēmi). As you can see, our current approach is to attempt to capture those principal parts in the headline of Ancient Greek verb entries. The full inflection is then given under the "Inflection" header. The more verb entries I do, the more I'm convinced that this is a bad approach. There is simply too much information to be reasonably placed in a single line. As you can see from παρίστημι (paristēmi), these six forms can have regional or temporal dialectical alternatives. I can't find any that are ridiculously crammed with information, but I suspect that this is because we Ancient Greek editors are leaving stuff out because it fits poorly in the current format. What I suggest is that the headline is cleared of the principle parts, and simply show the entry title and its transliteration (which you might notice is currently lacking, because where the hell would you put it?). The hidden form of the templates should be expanded to show all voices (active, middle, passive), or as many as exist within the template. This makes for a much more scalable, and still fairly digestible way to convey all the necessary information to the user. Thoughts? -Atelaes λάλει ἐμοί 03:02, 25 March 2012 (UTC)

I agree with your conclusion: if there's a full ====Inflection==== section, then the headword line doesn't really need to list any of the forms that that section covers anyway. (By the way, the term is "principal parts", as in "main"; I think the idea is that all the other forms are in some way "secondary", since they can be derived from the listing of principal parts.) —RuakhTALK 03:10, 25 March 2012 (UTC)
That's what I get for making a spelling mistake on a dictionary site.  :-) -Atelaes λάλει ἐμοί 03:15, 25 March 2012 (UTC)
As a newbie with an incomplete knowledge of the Ancient Greek verb, I find the status quo intimidating. If I want to create a verb entry, I'm either going to have to spend time looking up principle parts I'm not familiar with, or have the template show "unknown". I wish there was some way to have the equivalent of a stub, so that the information I don't know shows up as "not provided" rather than "unknown", or a note shows up that says "this entry is missing x, y, and z. If you know it. please provide it". Right now, you can't use the verb template without all the data unless you want to have the entry lie about the scholarly state of knowledge about the verb. The "all or nothing" nature of many headline templates probably inhibits a lot of gradual improvements that would be better than what we have now. Chuck Entz (talk) 15:42, 25 March 2012 (UTC)
FWIW, you could use {{head|grc|verb}} {{attention|grc}} as a headword line in those cases; that displays a headword, puts the verb in a verb category, and tags it for someone knowledgeable to improve with a more specific headword template. - -sche (discuss) 17:32, 25 March 2012 (UTC)
Well, actually, {{head}} does everything I want the new headline template to do, so I think I'll start using it for all my verb entries (as I've done on ἠχέω (ēkheō)). So, for the time-being at least, folks should feel free to use it on grc verb entries without tagging it for attention. -Atelaes λάλει ἐμοί 22:59, 25 March 2012 (UTC)

So, I've instituted my vision into the present and aorist templates, but, as is so often the case, my vision looked better in my head. You can see what they look like under the Inflection header of παρίστημι (paristēmi), among others. I'm going to try fiddling with the formatting, and get it looking less like a wordy shit salad, but design's never been my strong suit. If anyone has the capacity and interest, I'd love some assistance. -Atelaes λάλει ἐμοί 21:14, 26 March 2012 (UTC)

I could change {{grc-verb}} to look like {{head|grc|verb}}, without these forms, if it's that what you mean. Currently {{grc-verb}} without parameters looks like that: "present unknown, future: unknown, aorist: unknown, perfect: unknown, perfect m/p: unknown, aorist passive: unknown". All these parameters should be optional, not obligatory. Maro 22:09, 26 March 2012 (UTC)
For the time-being I think we want to leave {{grc-verb}} as is, as a number of the verb entries don't have the inflection information anywhere else (i.e. no one has created full inflection tables under the Inflection header. Eventually, we'll want to run a bot through and change the entry code, but that's a ways off yet. What I need help with is changing the look of the hidden form of the inflection tables. -Atelaes λάλει ἐμοί 22:13, 26 March 2012 (UTC)

Wiktionary:Wiktionary for WikipediansEdit

I've created this page as a guide for new users coming here from Wikipedia. I hope it's useful and please improve it where you can and add links to it where appropriate. —CodeCat 16:38, 27 March 2012 (UTC)

Very well written. I think it will be helpful. Ungoliant MMDCCLXIV 22:47, 27 March 2012 (UTC)
I agree. —RuakhTALK 23:44, 27 March 2012 (UTC)

Order of semantic and etymological headingsEdit

Please see WT:ELE#Order of headings.

I'd like to change the order a little, from this current state:

  • Synonyms
  • Antonyms
  • Other allowable -nyms
  • Derived terms
  • Related terms
  • Coordinate terms
  • Descendants

...to this proposed state:

  • Synonyms
  • Antonyms
  • Other allowable -nyms
  • Coordinate terms
  • Derived terms
  • Related terms
  • Descendants


  • Keeping "synonyms", "antonyms", "other allowable -nyms" and "coordinate terms" all together, as sections of semantic relations
  • Keeping "derived terms", "related terms" and "descendants" all together, as sections of etymological relations.

By the way, in a number of entries, people already keep the semantic relations separate from the etymological relations, even if this decision contradicts ELE. Examples: axis, iron, quality, study, mother, penultimate.

For contrast, in joke and diary, the order of headings from ELE is obeyed. I don't know exactly how many entries obey the order and how many don't, as I simply checked some manually. In theory, some script can be written to count that in all entries, if needed.

--Daniel 05:03, 28 March 2012 (UTC)

I suppose this proposal. Also, [[iron]]...wow. It looks like it uses every header there is! - -sche (discuss) 05:22, 28 March 2012 (UTC)
I support this proposal. ELE seems to have a contradiction anyway; the headers are ordered as follows:
3.3.4 Synonyms
3.3.5 Further semantic relations
3.3.6 Derived terms
3.3.7 Related terms
And Coordinate terms is listed inside WT:Semantic relations. So one could assume the order you propose. Ungoliant MMDCCLXIV 15:31, 28 March 2012 (UTC)
Note: This would entail a change to the order of headings established by a previous vote, so it would require a new vote to enact the proposal. --EncycloPetey (talk) 19:04, 28 March 2012 (UTC)
I agree. But I agree with the proposal and AFAICT would vote for it.​—msh210 (talk) 20:34, 28 March 2012 (UTC)

Wiktionary:Votes/pl-2012-03/Moving "Coordinate terms" up in ELE --Daniel 21:09, 30 March 2012 (UTC)

Minor edits in the "part of speech" paragraphEdit

In WT:ELE#The essentials, I'd like to make these minor edits:

2. Part of Speech may be a misnomer, but it seemed to make sense when it was first chosen. It is the key descriptor for the lexical function of the term in question (such as 'noun', 'verb', etc). The definitions themselves come within its scope. In addition to the traditional “parts of speech” it has come to include entities that are less than words, such as initialisms and suffixes, and items that are more than words, such as idiomatic expressions, phrases and proverbs. This heading is nestable. It is most frequently in a level three heading, but may have a lower level for terms that have multiple etymologies or pronunciations.

2. Part of speech may be a misnomer, but it seemed to make sense when it was first chosen. It represents the lexical function of the term in question, such as "noun", "verb", etc. As less traditional examples, there are parts of words, such as initialisms and suffixes, and groups of words, such as idiomatic expressions, phrases and proverbs. Each entry has one or more part of speech sections, where the definitions themselves are found. The sections, most frequently, are level three, but may have a lower level for terms that have multiple etymologies or pronunciations.


  • In all titles of sections ("Entry name", "The essentials", etc.), only the first word has an initial uppercase letter. "Part of Speech" is not a title of section, but I think it should imitate that format. That majuscule "S" is kind of ugly.
  • Ordering the ideas: first, what is a part of speech; second, what is a part of speech section.
  • And some wording.

--Daniel 10:27, 28 March 2012 (UTC)

I'm not sure if the characterization of parts and groups of words is useful. An initialism isn't part of a word, but another kind of word (and what about abbreviations?). A compound word is a word, and also a group of words.
Shouldn't we describe this in terms of a term, which is our unit, rather than a wordMichael Z. 2012-03-28 17:57 z
Yes, "parts of words, such as initialisms" is actually wrong; this part should be rewritten.
I agree that mentioning "term" somewhere looks like a good idea. --Daniel 18:59, 28 March 2012 (UTC)

Rewriting the proposed text.

2. Part of Speech may be a misnomer, but it seemed to make sense when it was first chosen. It is the key descriptor for the lexical function of the term in question (such as 'noun', 'verb', etc). The definitions themselves come within its scope. In addition to the traditional “parts of speech” it has come to include entities that are less than words, such as initialisms and suffixes, and items that are more than words, such as idiomatic expressions, phrases and proverbs. This heading is nestable. It is most frequently in a level three heading, but may have a lower level for terms that have multiple etymologies or pronunciations.

2. Part of speech may be a misnomer, but it seemed to make sense when it was first chosen. It represents the lexical function of the term in question, such as "noun", "verb", etc. Some less traditional examples are initialisms, suffixes, idiomatic expressions, phrases and proverbs. Each entry has one or more part of speech sections, where the definitions themselves are found. The sections, most frequently, are level three, but may have a lower level for terms that have multiple etymologies or pronunciations.

--Daniel 10:36, 29 March 2012 (UTC)

I'd like to create a vote for rewriting that paragraph, as shown above, soon. If someone would vote for it, please let me know. --Daniel 17:58, 31 March 2012 (UTC)
Hm, I'd vote for it. - -sche (discuss) 17:38, 1 April 2012 (UTC)
OK. Wiktionary:Votes/pl-2012-04/Editing the "part of speech" paragraph in ELE. --Daniel 18:14, 2 April 2012 (UTC)

Etyls for borrowed words -- how far back to track?Edit

So I just added the term ピンセット (pinsetto, tweezers), which made it into Japanese from Dutch, as I've marked in the etyl. The Dutch term comes from French pincette, from pincer, and so on back to PIE, as indicated by the etyl at pinch.

How much of this history should I include on the ピンセット page? Is it enough to give the link to Dutch?

(Incidentally, if the etyl at pinch is correct, at least some of that content could/should be added to the pincer entry.)

-- Cheers, Eiríkr ÚtlendiTala við mig 18:20, 28 March 2012 (UTC)

Why not include all of it? --Yair rand (talk) 18:36, 28 March 2012 (UTC)
I was about to when I found myself wondering if there was any community position on that. If I don't see any objection to doing so in the next few hours, I'll go ahead and add the full etyl as far back as we have here at WT. -- Eiríkr ÚtlendiTala við mig 18:48, 28 March 2012 (UTC)
I support having the whole etymology at ピンセット, so that people won't have to navigate four pages if they want to see it completely. --Daniel 19:03, 28 March 2012 (UTC)
I oppose having the whole etymology at ピンセット. Duplication makes it hard to improve and expand etymologies. --Vahag (talk) 22:00, 28 March 2012 (UTC)
I support having the whole etymology too, not only for navigation reasons but also to include the relevant etymology categories. Ungoliant MMDCCLXIV 22:03, 28 March 2012 (UTC)

Arrowred.png Should this be turned into a vote? (I've taken the liberty of bolding the "support/oppose" above for clarity in case that's where this goes.) -- Eiríkr ÚtlendiTala við mig 22:40, 28 March 2012 (UTC)

Is there a way to get both? It certainly makes sense that if you copy the etymology from the Dutch page to the Japanese page and then the Dutch page etymology is changed, you don't get that change on the Japanese page. BenjaminBarrett12 (talk) 22:47, 28 March 2012 (UTC)
There are fancy ways to transclude just portions of a page, but they get kinda ugly and require some technical expertise. One is labelled section transclusion such as that described at the top of WT:ES, but that only works for whole sections. Another option that allows transcluding arbitrary portions of another page would be to use conditionals with parameters, such as {{#ifeq:{{{transcludesection|}}}|some_value|[wikitext to include]}}, which could conceivably be used in succession -- starting on the deepest root, maybe a PIE page -- such that any changes to etyls further down the chain would propagate automatically. So if there's a term in JA from EN from ME from proto-Germanic from PIE, any changes to ME or PIE would show up on the JA page, for instance.
My suspicion is that this is too hacky and fragile for broad adoption here at WT, but who knows.  :) -- Eiríkr ÚtlendiTala við mig 23:04, 28 March 2012 (UTC)
  • I spoke (wrote) too soon -- labelled section transclusion works on arbitrary portions of a page, and can be embedded in running text. Different sections can overlap, either nested or not, or even with one end tag after another section's begin tag, without screwing up transclusion. This might actually be a good way to go about what Benjamin proposes. -- Eiríkr ÚtlendiTala við mig 23:31, 28 March 2012 (UTC)
  • Could you create a sequence of example entries? I'm curious how this would work in practice, and whether or not small things like the fact that some etymologies begin with capital letters and end with [[.|dots]] would trip us up and result in "From Dutch pincet, From French pincette. from pince + -ette From pincer" (i.e. bad caps and the implication, due to placement, that -ette rather than pince is from pincer). - -sche (discuss) 23:52, 28 March 2012 (UTC)
There are a couple possible ways to do this that I can think of; I'll see what I can mock up. -- Eiríkr ÚtlendiTala við mig 00:44, 29 March 2012 (UTC)
  • Somewhat grotty markup. Initial caps are not a problem, thanks to the {{lcfirst:}} magic word. Final punctuation can be handled by leaving it outside the <section end="..."/> tag.
  • Complicated handling. The further up the etyl tree you go, the denser the information becomes. Branching etyls require some deciding. The sample tree linked from the sample edit above keeps the etyls inline at the pince + -ette branch, as the -ette etyl is rather short, but longer branches could be problematic.
  • Target language confusion. This approach works fine as-is in a single-target-language line, as this etyl is up until the French term pincette (i.e. all transcluded etyls are for French terms; older terms don't yet exist as WT entries), but once the etyl tree is transcluded into the Dutch term pincet, the Dutch entry has etyls categorized for French.
Arrowred.png This cannot be worked around by including a "lang" param, as labeled section transclusion does not know how to handle named parameters.
Crystal Clear action loopnone.png This might be work-around-able by using the alternate approach for selective transclusion as described at w:Wikipedia:Transclusion#Selective_transclusion, as this does allow for named params -- but testing indicates that this may be tricky to get right. Once a workable approach is found, it can probably be templatized, so it should only be tricky to figure out the first time. -- Cheers, Eiríkr ÚtlendiTala við mig 06:32, 29 March 2012 (UTC)

If we did not include the whole etymology, it would be a huge hassle for someone to find out a word's etymology. The pages would then look like this:

This means someone would have to open four entries just to look up the etymology of ピンセット! We cannot force this upon any reader. -- Liliana 23:11, 28 March 2012 (UTC)

I agree with Liliana. Furthermore, what would we do if were were missing an intermediate entry? Force the person adding etymologies to Japanese words to create any French entries? Put the whole etymology in the Japanese entry as long as the next entry back was a redlink, but try to remember to move it when the other entry was later created? - -sche (discuss) 23:17, 28 March 2012 (UTC)
I think it reasonable to go back a few entries, in order to give the user a fuller view of the word's history without a bunch of clicks, but I would caution against going too far. For starters, the more overlapping content we have, the more difficult it becomes to maintain our etymologies. Additionally, etymologies come before definitions, and consequently large etymologies make it even more difficult to see the defs at a glance. I would say that going back four or so steps is reasonable, as long as they're fairly simple and concrete. When dealing with more speculative etymologies, it's probably best to leave the speculation on a single page. -Atelaes λάλει ἐμοί 00:09, 29 March 2012 (UTC)
Each entry's etymology has its own needs for depth and detail. While the entry for ピンセット may trace its Dutch lineage all the way back to PIE, it probably needn't list all of the Old Latin cognates, or whatever. We should trust etymology writers' skills and judgment. Michael Z. 2012-03-29 07:27 z
Somebody has probably said this, but if you copy a whole etymology, and someone then edits one of the etymologies, they don't say the same thing anymore, and can even contradict each other. It's similar to the argument supporting have flavour as an alternative form of flavor, and nothing else. Mglovesfun (talk) 10:01, 29 March 2012 (UTC)
There are ways of using transclusion to ensure that the content is identical, even for terms such as flavour and flavor. The issue seems more one of policy than technology.
Besides, I didn't think concerns about content synchronicity trump concerns about entry completeness? Am I wrong? The last time the flavour/flavor issue came up, my recollection is that the main concern was how to make sure that both entries were as complete as possible, with the consensus leaning towards copying content from entry to entry if appropriate, but I'm happy to grant that it's been a while and my memory's been wrong before. -- Eiríkr ÚtlendiTala við mig 15:11, 29 March 2012 (UTC)
Flavour/flavor is different: the same lemma entry in two different places, because we are slaves to political correctness. The etymologies for two different terms, on the other hand, needn't, and often shouldn't be the same. It is appropriate for a Latin root to have more detail about ancient ancestors and cognates, while the same information may be inappropriate in the etymology of a Japanese borrowing of its late Dutch descendant. I'd rather see good writing in etymologies than dumb transclusions Michael Z. 2012-03-29 18:41 z
So then, strictly speaking, you are opposed to including complete etyls further down an etymological inheritance chain? (Just trying to clarify.) -- Eiríkr ÚtlendiTala við mig 19:02, 29 March 2012 (UTC)
Not necessarily, but I am opposed to mechanically duplicating the complete text of other entries' etymologies without any editorial judgment.
(But if one were seeking a problem for this solution, we need to find a way to reuse quotations. They are duplicated in entries' main sections and in the respective Citations: pages, and could also be reused to attest other terms appearing in them. This is a mainly-untapped resource, at our fingertips.) Michael Z. 2012-04-01 01:32 z

2/3 supermajorityEdit

Is it written somewhere that votes need a 2/3 supermajority to pass? --Daniel 15:27, 29 March 2012 (UTC)

No. In fact, we frequently used to hold votes to an even higher standard than that, usually between 70% and 75%; and a few votes have been passed with even less than a two-thirds supermajority, such as Wiktionary:Votes/2010-04/Voting policy (though that was an exceptional case). —RuakhTALK 16:16, 29 March 2012 (UTC)
AFAIK, there is no evidence that "we frequently used to hold votes to an even higher standard than that, usually between 70% and 75%"; I have failed to find evidence the last time I have tried. That is to say, there are very few votes that had slightly less than 70% support, and yet were closed as "no consensus". --Dan Polansky (talk) 17:11, 29 March 2012 (UTC)
Interesting. Did you find any votes that had less than 70% support, and yet were closed as "approved"? —RuakhTALK 17:16, 29 March 2012 (UTC)
A good question. The obvious answer is the vote you have just mentioned: Wiktionary:Votes/2010-04/Voting policy. I cannot recall any other such vote, though. The range 66,7-70% is rather small, so the overwhelming majority of votes falls outside of the range that would help test the hypothesis. --Dan Polansky (talk) 17:25, 29 March 2012 (UTC)
As regards the 75% threshold you've mentioned, that is not only lacking evidence in the form of votes closed as "no consensus", but there are also recent votes that were closed as "passes" with less than 75% support:
I have only gone through some of the recent votes, as the search is quite tedious. --Dan Polansky (talk) 17:41, 29 March 2012 (UTC)
The option 1 of this vote passed with 70% supporting votes, according to the "Decision" section at the end.
This vote failed with exactly a 2/3 (66,6666...%) supermajority:
--Daniel 17:58, 29 March 2012 (UTC)
Please see [[used to#Verb]]. —RuakhTALK 17:59, 29 March 2012 (UTC)
What is there to be seen? --Daniel 18:21, 29 March 2012 (UTC)
I don't think emphasis on used to changes much. The thing is, you have provided no evidence, whether on what recently has been the practice or on what used to be the practice a long time ago. --Dan Polansky (talk) 18:27, 29 March 2012 (UTC)
I'm really not sure what you want from me. Daniel asked if a certain threshold were documented somewhere; I replied that there wasn't, and noted that said threshold was not only not documented, but also not consistently the case (since formerly we sometimes applied a higher standard, and latterly sometimes a lower one). If you'd like to contend that we have consistently applied a threshold of exactly two-thirds, then you need only re-read this section to see that you're mistaken, since I've already given one example where a vote was passed at a lower threshold, and Daniel has added an example where a vote was not passed at that threshold. In addition, google:site:en.wiktionary.org votes 70 and google:site:en.wiktionary.org votes 75 will show you other discussions on the subject (which are not the sort of evidence you seem to want, but you haven't deigned to justify your insistence on that sort of evidence, nor your implication that there is some sort of onus on me to furnish it). —RuakhTALK 19:09, 29 March 2012 (UTC)
The searches that you have provided only show that 75% has been mentioned. I have looked at them, and many of them do not serve as evidence. If you want to provide specific evidence that we used to apply "75%", evidence of the form that you deem appropriate, I am looking forward to see that evidence. The Google searches are a poor evidence; the first find (Wiktionary:Votes/bt-2009-12/User:JackBot) is a page containing 'Ruakh said 67.6% would fail, as "we generally require 70–75%'. This you may have said back then, but I am afraid you had as much evidence back then as you have now. --Dan Polansky (talk) 19:19, 29 March 2012 (UTC)
I think this discussion has as much evidence as it needs to: as Ruakh notes, he's "already given one example where a vote was passed at a lower threshold, and Daniel has added an example where a vote was not passed at that threshold". Citing such things shows (and such things are cited to show) that we haven't consistently required X% or Y% support to pass something. Because no-one is claiming that we now do or should require 70% or 75% support to pass things, I think it's chasing a rabbit (putting effort into a distraction) to be looking for more, specific proof of one former number or the other. - -sche (discuss) 19:49, 29 March 2012 (UTC)
I agree with -sche, but to clarify regarding "The searches that you have provided only show that 75% has been mentioned": Yes, that's really all that I meant by that part of my statement above. I think that 75% is really the absolute upper bound: a vote that concluded with 75% or higher in support has always been a clear and unambiguous a "pass", and whereas "75%" has been mentioned a number of times in this context, I don't think any higher figure has. —RuakhTALK 20:37, 29 March 2012 (UTC)
As regards what has been mentioned, even 80% has been mentioned at least once, by you: Wiktionary:Votes/pl-2009-03/Removing_vote_requirements_for_policy_changes: "Of course, the current proposal doesn't solve the biggest problem, which is that ELE and CFI aren't actually policy at all, but rather a poor approximation to policy, such that requiring >75–80% consensus doesn't work (there's no stance that has anywhere near 75–80% consensus, at least none that's as detailed as CFI and ELE currently are); but I don't have a solution to that problem, and apparently Atelaes and Visviva don't, either." I don't know what made you think that 75-80% was the relevant gray area for threshold, back then. My point is that various mentions of the sort serve as poor avidence of common practice. By contrast, actually closed votes are a fairly direct evidence. And AFAIK, there is no evidence of this direct sort that Wiktionary ever required 75% or more for a vote to pass. --Dan Polansky (talk) 06:32, 30 March 2012 (UTC)

make User:MaEr a rollbackerEdit

I propose we give MaEr, a long-established editor who most recently reverted bad edits to the entry [[wildcard]], the ability to roll back bad edits with the rollback button (which makes fighting vandalism that much easier). Does anyone object, or want to nominate MaEr for even more things (patroller, admin)? - -sche (discuss) 06:54, 31 March 2012 (UTC)

By extending rollback powers to more users, we make it easier for those users to fight vandalism in the Recentchanges and elsewhere, which we do (always) sorely need more users to do. - -sche (discuss) 07:33, 2 April 2012 (UTC)
Done. Actually, I thought he was an admin. Maybe he should be. —Stephen (Talk) 08:27, 2 April 2012 (UTC)
Shouldn't we wait for MaEr to comment before making him a rollbacker? Mglovesfun (talk) 11:02, 2 April 2012 (UTC)
Thank you for the rollback button, it's really conveniant.
But in the near future, I will just be a part-time editor, since I'm full-time employed and sometimes I like to read a good book. So whatever buttons you might give to me, don't expect any miracles. --MaEr (talk) 16:44, 3 April 2012 (UTC)

Counting number of articles in a given language in any given WiktionaryEdit

I am mostly active on the Norwegian Wiktionary. It is rather big, almost 128 000 words. However only 6 per cent of the articles are actual Norwegian words.

Is there a way to calculate how many Norwegian words there are? I am thinking in the line of {{NUMBEROFDEFAULTLANGUAGEARTICLES}} as a subset of the already existing template {{NUMBEROFARTICLES}}, which gives the total number. Or {{NUMBEROFPORTUGUESELANGUAGEARTICLES}} etc. Thanks in advance. --Teodor (talk) 16:35, 31 March 2012 (UTC)

We have Wiktionary:Statistics, which lists the amount of entries, definitions, gloss definitions and form of definitions by language. But this is only for the English Wiktionary. Ungoliant MMDCCLXIV 16:41, 31 March 2012 (UTC)
Does no.wikt not categorize words by their language? Then check how many words are in the category. Or if all such words are in subcategories of a category, use CatScan.​—msh210 (talk) 16:21, 1 April 2012 (UTC)
I don't know about doing it automatically or on-the-fly, but as of the last database dump (which was just a few hours ago), no.wiktionary.org mainspace pages had the following L2 headers:
(long list excised 20:25, 8 April 2012 (UTC); the delayed collapsing was causing problems)
RuakhTALK 17:31, 1 April 2012 (UTC)
This was very interesting information. Is this kind of statistics I might be able to produce myself? Could be interesting to put this up somewhere on our Wiktionary. Could be updated once in a while. Also shows some incorrect languages, ie misspellings and blatant errors. --Teodor (talk) 15:03, 2 April 2012 (UTC)
Re: "Is this kind of statistics I might be able to produce myself?": Absolutely. All you need is the database dump (which you can download from http://dumps.wikimedia.org/backup-index.html) and Perl 5.10.1 or higher; and a simple Perl script to process the dump. Right now I'm not at the computer where I wrote the script, but sometime in the next 36 hours I'll post it in my userspace and comment back here. —RuakhTALK 17:43, 2 April 2012 (UTC)
I've now posted the script at [[User:Ruakh/count-L2-headers.pl]]. I've made a number of improvements to it, some of which correct bugs in the output, so I've updated the above list accordingly. (In particular: the previous version would consider ====Synonymer==''Kursiv tekst''== to be an L4 header, and ===Substantiv== to be an L3 header, when in fact they're both L2 headers; and the previous version would be confused by certain pagenames containing :.) —RuakhTALK 05:25, 3 April 2012 (UTC)
Thank you, again, Ruakh, for the help you have provided. I can see how your script works (more or less) but I cannot reproduce its intended output. Is it possible to write commenting lines to it? For someone without experience with Perl to be able to use (like me...) I think it is necessary to state clearly where the input dump-file goes. And where to look for the result, if that goes to a file too. I have tried to execute the script but I get no further than seeing a prompt saying "... >RESULTS.txt". --Teodor (talk) 20:47, 5 April 2012 (UTC)
If the dump is named foo.xml.bz2 and you want the output to go to bar.txt, you would type count-L2-headers.pl foo.xml.bz2 > bar.txt. —RuakhTALK 21:56, 5 April 2012 (UTC)
Yes, got it! Thank you so much! Now I hope your script can be of help to others too. I suppose the admins on a given Wiktionary have tools to find incorrectly entered articles but I can also see your script coming in handy here. There are quite a few articles in the Norwegian W that start with either too few or too many == and also some with an incorrect language spelling, eg Egnlish etc. Your script does identify these. I have noted however, that it seems your script counts the same articles more than once for each. The sum of Portuguese articles actually outweighs the number you get from {{NUMBEROFARTICLES}}. So I gather maybe all the different conjugations count in your script but not in {{NUMBEROFARTICLES}}.
In your first script I got
and in the new version I get
One suggestion, if you feel like experimenting some more, would be to enhance the output to make it easier to produce nice looking tables. It would be neat to se directly the proportion of "home language" articles to the total, compare that to the same in other Wiktionaries etc. Best regards, --Teodor (talk) 10:42, 6 April 2012 (UTC)
I don't understand your comment. I've only posted one version of the script, and it gives “Portugisisk — 61513” for the April 1st nowikt dump. Where are you seeing this 177885? —RuakhTALK 20:25, 8 April 2012 (UTC)
That is odd. Upon reading your reply I thought maybe I had done something strange. But I tried to reproduce it just now with a new utput file and got 177885. I copied all the text in your script from the line use warnings; till the end. I ran that script on "nowiktionary-20120401-pages-meta-history.xml" which I had already downloaded and made a new output file on my disc. 177885 was the result. I will gladly try to assist in finding out why if you tell me what to do. Only I don't understand th parameteres in you script. Sorry for the inconvenience. --Teodor (talk) 22:29, 8 April 2012 (UTC)
Ah, I see. That's because I designed the script to take the "Articles, templates, media/file descriptions, and primary meta-pages" file (currently nowiktionary-20120401-pages-articles.xml.bz2; it's the file I mean when I when I write of "the dump"). The script would also work on the "All pages, current versions only" file (currently nowiktionary-20120401-pages-meta-current.xml.bz2), since the only difference there is that it contains every namespace (and the script is already designed to ignore non-mainspace pages). The problem with the "All pages with complete edit history" file that you used is that it contains every revision of every page; so, for example, if a page has been edited five times since a ==Portugisisk== section was added, then that file will contain six ==Portugisisk==-s for that page. What's more — if one old revision contained ==Portugisk==, and then that was fixed to ==Portugisisk==, then that file will still contain the ==Portugisk==. So the script can probably be modified to work for that file as well, but it would be a bit tricky. Would you like for that file to be supported, or did you just choose it because it was the first file on the page? —RuakhTALK 03:06, 9 April 2012 (UTC)


Thank you for pointing this out to me. I have to admit I just picked the first file without reading any comments:) -- 10:49, 9 April 2012 (UTC)
Last modified on 13 April 2014, at 23:41