Wiktionary:Beer parlour/2009/March

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit

March 2009


I've started writing Wiktionary:About Serbo-Croatian which should serve as a guideline for entering SC entries and merging existing B/C/S ones. Since about 99% foreign-language (also English) publications (dictionaries, grammars) treats SC as one language with standard-variety differences compared and explained parallelly, so should Wiktionary. There is really little point in duplicating 95-100% of content, and making it harder to Wiktionary users to spot actual differences. Comments are welcome at the talk page. --Ivan Štambuk 14:50, 2 March 2009 (UTC)

Hmmm.....does this mean that all words which were formerly Serbian, Croatian, etc. will now be Serbo-Croatian? Not knowing enough about the language(s), I am not qualified to assess the merits of this move, but I imagine that perhaps some long-standing bitterness had to be overcome to accomplish this (if so let me be the first to offer my praise). I look forward to seeing what comes of it. -Atelaes λάλει ἐμοί 08:30, 3 March 2009 (UTC)
I received disinfection from nationalism disease.. --Ivan Štambuk 12:00, 4 March 2009 (UTC)
Since you talk of duplicating content, have you tried to adress the issue of duplicating some sections between the roman form and cyrillic form ? I know nothing about Serbo-Croatian but I am interested in the Chinese entries and there is the same problem between simplified and traditional entries. I am not happy at all with the current system, consisting of copy and paste between the two versions. It means each time we want to add a sense, a derived term, an etymology, we have to do it twice. Most people won't bother and the two entries will become fairly different over time.
However, I am not able to think of a better one, except with clever use of templates and subpages, but it may be to complex for editors to understand. Other solutions are not as neutral (selecting a "master page" and making the other a soft redirect for example).
Koxinga 09:03, 3 March 2009 (UTC)
I can't see merging what ISO 639 treats as separate languages into one macrolanguage entry and then treating the languages only as dialects. I've got some specific problems with the proposed WT:ASC that I'll give on the talk page of the proposal. Carolina wren 22:35, 3 March 2009 (UTC)
No, unfortunatelly there is no other convenient way but to simply duplicate content and warn editors to keep them in sync. This applies both to duplicate spellings in one language (English -or : -our, -ise : -ize..), and one language in multiple scripts (this esp. pertains to ex-Soviet satellite states who use(d) Arabic/Latin/Cyrillic depending on the ruling regime, not to mention Sanskrit which is written in just about any Indic script as almost all of them are phonologically ismorphic many having special extensions simply to represent "sacred" Sanskrit sounds..). The biggest problem would be English-language entries themsevels, as they need to provide real definitions, where one should look to prevent duplication of stuff such as translation tables (via e.g. recently introduced {{trans-see}}), as variant spellings are gracefully handled by various "soft redirect" templates. Tho there are still messy entries like / left which need proper care.. Also, there is this trick with labeled sections to reduce duplication of definition lines or perhaps even etymologies, but for other headers that need native script-entries (derived terms, inflection..) you just need to do the dirty work manually. --Ivan Štambuk 12:00, 4 March 2009 (UTC)
I like this idea, as it would save a great deal of confusion. However, I think it needs more discussion and probably a vote, since it goes against our standard practice of deferring to ISO 639-3. And there is the fact that there are separate Croatian, Serbian, and Bosnian Wiktionaries, so merging our treatment of the languages here would break symmetry in an unfortunate way. -- Visviva 02:54, 4 March 2009 (UTC)
I support the idea as well. Serbian, Bosnian, Croatian Wiktionaries are no stringent argument, because there is also Serbo-Croatian Wiktionary. The two Russian grammars, which I am reading while studying Serbo-Croatian use сербскохорватский/сербохорватский in their titles. Additionally, what I am learning therefrom, suffices entirely to understand both Serbian and Croatian authors without wading through a separate Croatian grammar, which permitted me to add a couple of quotations here without being a native speaker. Therefore my personal experience convinces me of the oneness of that language. The uſer hight Bogorm converſation 09:35, 4 March 2009 (UTC)
bs/hr/sr/sh wikts are all dead projects with infinitesimal growth rate. Interwikis would be handled transparently, but {{t}}-generated superscript links would be problematic. I'll see whether User:Tbot can make an exception for this and generated all 4 wikts as links in the translation tables (tho it would prob. look very weird). --Ivan Štambuk 12:00, 4 March 2009 (UTC)


Hi there all. Seeing the number of oversight requests that this Wiktionary gives, I would like to propose to you all that certain editors of the English Wiktionary be given oversight status, particularly Dmcdevit and Rodasmith. Does anyone agree with me or feel that this Wiktionary has a need for our own oversighters, or is this just a bad idea? Cheers, Razorflame 20:27, 3 March 2009 (UTC)

I don't think we have a great need for oversighters. We've had relatively few cases of oversight (see log, which might not be complete, but I think it is complete starting with its earliest date). But they certainly can't hurt, and if someone proposes Dominic or Rod as oversighter, I'll be glad to vote in support. (I do think it would take a formal vote, like any other group change.) Meta policy says we'll need at least two oversighters if any, so they can check each other's logs, as for checkusers. (As to the choice of who, I have no objection to Dominic, but am not sure about Rod, merely because he's already a checkuser. Isn't there some sort of Meta policy separating jobs? Or am I making that up? Aside from that, though, I have no objection to Rod, either.)—msh210 20:38, 3 March 2009 (UTC)
If you are willing to identify to the foundation, I believe you would also make a great oversighter. Cheers, Razorflame 20:42, 3 March 2009 (UTC)
Thanks.—msh210 20:51, 3 March 2009 (UTC)
I agree with Msh. With only about a dozen uses in the last year, we are not yet a the point where there is a need for local oversighters beyond the services already provided by stewards. Regarding the separation of roles, there is no policy on it, and it is common on other projects for people (like yours truly) to have both. In normal circumstances, it may make sense to spread the responsibilities among different people, but there is certainly a good argument to be made that when you are talking about access to sensitive, personally-identifiable material, the fewer accounts with access the better. (And also, thanks for the vote of confidence. I imagine you might want someone who is more active at the moment, anyway. It's why I haven't asked for CheckUser back.) Dominic·t 21:55, 3 March 2009 (UTC)

I have no problem with adding some oversights. I trust all the names which have been put forth thus far, but I wonder if it might be better to spread the responsibilities out a bit, so that no one editor has too many responsibilities/too much power. Everyone knows that English Wiktionary admins have a penchant towards malevolence and tyranny. :-) -Atelaes λάλει ἐμοί 21:26, 3 March 2009 (UTC)

But since malevolence and tyranny are the current standard practice, we're actually required to concentrate power on fewer people, until and unless there's a vote to change that. ;-)   Seriously, though, I'd be down with any of these names given so far (Dmcdevit, Rodasmith, msh210), and can easily list another half-dozen admins I'd trust with it (yourself among them, Atelaes). It doesn't much bother me if the same person has both CheckUser and Oversight, since they seem like very different kinds of tools, but I think it would probably make more sense to have different people do it, just so the same people don't get badgered with all request types. —RuakhTALK 21:45, 3 March 2009 (UTC)
I can see the headline on News for Editors: "Vote to include sissy Wikipedia policies like 'assume good faith' and 'warnings before blocks' fails 2-14. :-) -Atelaes λάλει ἐμοί 21:52, 3 March 2009 (UTC)
I also haven't noticed any pressing need, but having someone available to conveniently perma-zap bad revisions of WT:FEED would be nice. An awful lot of personal info gets posted there. (Alternatively, we could just make a habit of deleting WT:FEED periodically and starting fresh.) -- Visviva 09:48, 4 March 2009 (UTC)
+1. (Alternatively, +1.) —RuakhTALK 22:04, 4 March 2009 (UTC)
Incidentally, the way I understand oversight, it's impossible to remove part of a revision: the only thing that can be done is the removal of an entire revision. So if personal info gets posted and then further edits are made before the info is removed, all of them will have to be zapped. The solution would seem to be (and this is partially based on a conversation I recently had with some stewards) to remove the personal info so as to get a "good" revision, and then zap all the intervening revisions: all the revisions with the personal info. This deletes the author info from the good intervening revisions, so that info should be posted somewhere (the talkpage, say) for GFDL purposes. (This explains my recent list of such info to Wiktionary talk:Feedback, by the way.) This is a bit of a pain in the neck, so the optimal thing to do is remove personal info as soon as it appears.—msh210 00:18, 5 March 2009 (UTC)
Re: "This deletes the author info from the good intervening revisions, so that info should be posted somewhere (the talkpage, say) for GFDL purposes": For discussion pages like Wiktionary:Feedback, I think any unsigned comments in oversighted (overseen?) contributions should be tagged with {{unsigned}} (with appropriate author information), rather than putting that info on the talkpage. But then, that's a lot of work for oversighters (overseers?). :-/   —RuakhTALK 00:47, 5 March 2009 (UTC)
I prefer "oversighter" and "oversighted," since, as with "administrator," "bureaucrat," and "steward," "oversight" is another misnomer for a user class. They just hide revisions, and don't actually oversee anything, nor have the extra authority "oversight" makes it sound like, so the made up words seem a little more appropriate than the the misleading "overseer." Dominic·t 01:50, 5 March 2009 (UTC)
Actually, this is no longer true, a least the attribution issue. Note that the "oversight" user class is currently undergoing major changes. The old oversight extension is essentially obsoleted by the new feature mw:RevisionDelete. Currently, the oversight user class is the only group with access to RevisionDelete, but the idea is that eventually all administrators will be able to use it to suppress revisions/logs/edit summaries/usernames, with the oversight class being the only ones able to review the ones hidden from administrator view. As you can see from this example, RevisionDelete is much more flexible than Special:Oversight. The record of the edit remains in the edit history, and will not disrupt attribution unless the username itself was used to post personal information, and that's what has to be removed. You can still only remove the entire text of change, if the content is what is being hidden, though. Dominic·t 01:50, 5 March 2009 (UTC)
This is wonderful news for deleting revisions (right now we have to delete all of them and then undelete the ones we wish to, another pain in the neck). These RevisionDeleted revisions will be undeletable by admins, too?—msh210 19:19, 5 March 2009 (UTC)

(Is anyone going to explain what this is? Ƿidsiþ 19:59, 5 March 2009 (UTC))

Oversight allows specific editors on a Wikipedia/Wiktionary/wiki to hide revisions from view that contain personal information and the such. Cheers, Razorflame 20:03, 5 March 2009 (UTC)

Phrasal verb entries (again)

I would like to propose that phrasal verb entries follow this pattern -- {{infl|en|verb|phrasal verb '''to [[abide]] [[with]]'''}} The reasoning is that entries such as abide by (traditional entry format) are just plain messy, whereas entries such as abound in don't give enough info. However, abide with tells the user directly that this is a phrasal verb, and so he can look for the inflections from the verb link, if needed. Usage notes would be placed in those entries, such as hang up, to indicate that hanged is not normally used. Usage notes would be a good idea for all entries anyway, to indicate the separability , or not, of the particle. I would like to get on with this asap, and so would appreciate early comments (if at all possible ;-)) -- ALGRIF talk 11:41, 4 March 2009 (UTC)

With just the information given, I prefer the format {{infl|en|verb|phrasal verb|head=to [[abide]] [[with]]}}. There is no reason to repeat the verb if the only addition is going to be the infinitive particle. --EncycloPetey 14:51, 4 March 2009 (UTC)
I agree. Or maybe {{infl|en|verb|phrasal verb|head=to abide with}} to abide with (phrasal verb), so that only the verb itself is linkified. (Normally I support linkifying all words in the headword, but if we want readers to see the verb-entry itself for inflected forms, then I think it might make sense to linkify just that.) —RuakhTALK 22:29, 4 March 2009 (UTC)
I think I would go with your suggestion, Ruakh. I take it that the "head=" argument is important, then? -- ALGRIF talk 13:58, 6 March 2009 (UTC)
The "head=" argument is what allows an editor to display the inflection line with diacriticals, links, or particles not present in the PAGENAME, such as the infinitive particle to that is standard for English verb lemma entries. --EncycloPetey 14:37, 6 March 2009 (UTC)

Good examples of "complete" pages?

Hi. I was trying to find out what a "good" Wiktionary page looked like, and have had considerable trouble determining which (if any) pages could be considered to be in a "good" or "complete" state. (I was specifically looking for something like Wikipedia's "Featured" status pages. Shining examples of the highest quality currently attainable...)

Should I assume that the 5 words linked permanently from the mainpage (etymology, wiki, free, English, and dictionnaire) are as good as can be, or are there better examples?

Thanks. Quiddity 21:55, 4 March 2009 (UTC)

Good question. Do we even have a benchmark describing a good entry, or helping assess its quality? Michael Z. 2009-03-04 21:57 z
The closest I could find was WT:ELE, which had a few examples of specific elements done properly linked, and a few seemingly random words wikilinked.
I'd suggest you adapt the EncycloPetey list into something that can be linked from a few of the topmost help pages. Perhaps also link it from the bottom of the "Word of the Day" mainpage template (where I initially got lost.. :) Quiddity 04:20, 5 March 2009 (UTC)
EncycloPetey has some; see User:EncycloPetey/Model pages. —RuakhTALK 22:06, 4 March 2009 (UTC)
We have a lot of adequate pages; hinder is one that comes to mind. On the other hand, I would be amazed if we have any entries for non-trivial words that can be considered truly comprehensive, accurate and accessible. This is largely because we haven't focused on quality very much, beyond establishing some very minimal standards; there are still so many words to be added that spending days or weeks on a single entry seems like a poor use of time.
On the one hand, we definitely need to do more in this area. On the other hand, given the loathsome political tar-pit that w:WP:FAC turned into, I have some reservations about trying to create anything resembling that process here. -- Visviva 05:59, 5 March 2009 (UTC)
To be fair, I think we could be a lot more formulaic than Wikipedia in determining which pages are 'complete'- citations for each sense spanning a set time period, inclusion of senses that other dictionaries have, translations for a set of languages, etc. Nadando 06:06, 5 March 2009 (UTC)
Let a thousand flowers bloom. We probably need multiple additional approaches to quality improvement. I don't think we have exhausted the potential of any of our existing tools.
But unofficial lists of model entries seem like a good idea for now. Some competing models wouldn't be bad. There are likely to be potential improvements for most types of entries and a need for finer classification of model entries for the foreseeable future. I doubt that we are yet happy with very many of the long entries for our most polysemic words, but we might have models for shorter entries. Some specific entry types should include entries for all legal L2 headers. The illegal headers give use some ideas of problematic classes of entries. Scientific names is an example. Some English language types that come to mind: linguistic entries (examples), past participle/adjective, present participle/adjective/noun, phrasal verbs, pro-sentences, non-proverb sentences, various types of proper nouns, attributive use of nouns, offensive terms, terms with offensive attestation quotes, terms with illustrations, terms with series templates. DCDuring TALK 15:41, 6 March 2009 (UTC)

My major beef with listen, parrot, and hinder is that, within Translations, they use another indentation for various purposes.

One purpose is to distinguish scripts when one of the scripts used is Roman (a.k.a. Latin. It should be pointed out that in Kurdish and Serbian this script is used to actually write the language, rather than being a romanization as with most other languages.) However, we already have a way of distinguishing scripts when none of the scripts are Roman, such as between traditional and simplified Chinese, by using slashes. There is no reason that this could not be done for Kurdish and Serbian. Romanizations use parenthesis, various scripts whether Roman or otherwise use slashes, and the most compelling reason to do so is to avoid clutter at e.g. uncle.

The second purpose is to group a language family together, particularly Chinese (which is considered either a language or language family depending on who you ask. The subheadings Mandarin, Min-nan, Cantonese, etc. are considered dialects or languages, respectively.) In Wiktionary the subheadings used are all level-2 language headings in their own right and more appropriately belong at the same level as every other language. The language family grouping is as absurd as grouping the Scandinavian languages, listing English under Germanic, and putting Italian, Spanish, and Portuguese in proximity of each other. In fact that ridiculous suggestion makes even more sense than what's done now, since many of the so-called Chinese dialects aren't even that closely related.

In my opinion, the only good reason to indent would be to list dialects that, for Wiktionary purposes, are considered variations of the same level-2 language name, such as Flemish under Dutch. An example at I love you, the most widely translated phrase on Wiktionary, includes cities or regions for Arabic and German. But even that page has the same problems above for Kurdish and Serbian, Cantonese and Mandarin. I remember in a prior discussion of this an example of regional use of Min-nan that would have to be twice indented under the current, flawed system. Even if you don't agree with me on the proper use of indentation, it must be strange to have multiple purposes, no? 18:29, 11 March 2009 (UTC)

Glosses indicating English meanings

From ELE: "However, a translation into English should normally be given instead of a definition, including a gloss to indicate which meaning of the English translation is intended."

The problem is that there is no defined way to write such a gloss, not even a recommendation. The result is a number of varied styles used by different people and on different entries. Examples:

  1. alpha; the Greek letter Α, α
  2. alpha, the name for the first letter of the modern Greek alphabet.
  3. beta (letter of the Greek alphabet)
  4. the Greek letter beta

I realise that there can be no binding rule that everyone should follow, but a short example or recommendation would go a long way in making things consistent, pretty and easy to read and understand. I find it hard to write entries if I have to decide on a case-by-case basis which style I should use. Wipe 12:43, 5 March 2009 (UTC)

Myself I really don't like parentheses disambiguation, In part personal dislike, part because the automatic generation oversimplify definition by a large margin in most cases. Hence I always word it so as to remove them. I usually go for a formulation along the lines of "the Greek letter beta". Circeus 15:58, 5 March 2009 (UTC)
This had long been my feeling as well. However, a little project I've been fiddling with lately, to convert slices of Wiktionary data to MDF or LIFT formats, has changed my way of thinking. Using definitions in place of translations + glosses means that the translation data itself cannot be extracted, so no mapping of the FL lexeme back to English can be established. Ideally, I think we would provide all three: translation, gloss, and definition (not sure how that would be formatted). But in terms of importance, I would say translations, supplemented with necessary glosses, provide more value than full definitions. -- Visviva 05:13, 6 March 2009 (UTC)
FWIW, Tbot, if I'm not mistaken, uses parentheses.—msh210 19:14, 5 March 2009 (UTC)
  • I think 1— that a gloss is not always required, especially if that sense already has a context tag (which in my opinion is preferable); and 2— that when you need to gloss, brackets are the best option. Ƿidsiþ 19:57, 5 March 2009 (UTC)
There has been a discussion on this at Wiktionary:Beer_parlour_archive/2008/November#Formatting_of_glosses_of_non-English_entries. From what I have seen, using parentheses is the common practice at Wiktionary. There is now the template {{gloss}} that I have created, which creates the brackets for you, and, in future, can be made customizable to be presented in italics or any other preferred format depending on the user's personal taste. I do not know how many people are using the template, though. -Dan Polansky 20:26, 5 March 2009 (UTC)
Nice! Had managed not to notice this until now. -- Visviva 05:13, 6 March 2009 (UTC)
Good work, I added a sentence about the template to ELE (but it got reverted). Now I have something to go with. Most people here seem to think that a more or less direct translation, followed by a gloss in parentheses indicating the intended meaning (and explaining connotations or usage), is the best thing to do at the moment (until someone invents a better alternative). Wipe 01:08, 8 March 2009 (UTC)
Yes, but I would still want the possibility of adding a full phrase after the translation (and I'm not sure I'd like that within a parenthesis) should a gloss be "necessary but not sufficient". \Mike 10:36, 10 March 2009 (UTC)

Filipino diacritic/stress pronunciation

{{Filipino diacritic}} (3 inclusions), turned out to be some sort of pronunciation template. Now I've also found 19 inclusions of {{tl-stress}} (Tagalog?) 11 red-links of {{fil-stress}}.

Are these all the same, and acceptably merged? Do they represent a respelling system, or just an indication of stress as used in dictionaries? Michael Z. 2009-03-07 16:08 z

As the number of Tagalog-speaking Wiktionarians is still quite small, it might be more productive to ask the creator directly. One doesn't see her around the fora much, but presumably she had something specific in mind for these. -- Visviva 07:02, 8 March 2009 (UTC)
Hello. The templates were supposed to act as pronunciation guides specific for the Philippine languages (as Tagalog, Filipino, Hiligaynon, as far as I edit). They are based from an existing diacritic system of pronunciation used in Philippine language dictionaries. I had been aiming to integrate the diacritic system on Wikt, but for now I am unsure if these templates are the best solution, hence the hiatus and the little mess I'd left. Recently I had asked EP for advice at the Information desk and am presently awaiting his answer.
I'll clean things up as soon as I get a clearer direction, no worries. --Icqgirl 17:14, 10 March 2009 (UTC)
No sweat. Have a look at the way I rewrote the first one. {tl-stress} could be rewritten similarly. Maybe better, the templates could be merged into one with the addition of a lang= parameter. Let me know if I can help. Michael Z. 2009-03-10 21:47 z
Thanks for offering, Michael. The lang= parameter is a great idea.
There are just a couple of concerns that make me hesitate to keep using the templates. I did see a stress template for each Phil. language as extravagant; other languages that also use marks don't get exclusive templates. :P What I want is for each word to have their marked counterpart, and since what it does is portray pronunciation, I thought it would be suitable under ===Pronunciation===. Then a link, like what the templates are doing, to a page that would explain usage of the marks.
So the templates seem to work fine to address this, until I found buwan. The page name takes normal Roman alphabet characters and have the stress underneath ===Noun===. It looks better as a whole, takes less space, and whoever understands the marks would have no problems, but what of the reference-link?
Also it must be noted that the marks are no longer used in everyday writing, and today may only serve as guides for ambiguous words with multiple ways of being said. Keeping marked versions within the Pronunciation parameter might lessen confusion, but may deny the fact that marked words may still be used in the vernacular.
Finally, I prefer layout of the buwan entry, if only a little link with stress usage may be provided.
I think I dabbled fairly much, and may have sounded confusing. I seek advice, in order that a best solution may be concluded. Well, thanks again. I think I'm glad you found the templates (People actually look through the Phil. words, yay! I feel less isolated), albeit in their raw state, and are willing to help fix them. --Icqgirl 16:55, 12 March 2009 (UTC)
The potential problem I see with this is that our readers who are not familiar with the Filipino dictionary conventions may think that the accent is part of the normal orthography.
Many English dictionaries mark stress in headwords – usually using primes as in syl′labi·cation – but Wiktionary has chosen not to do so. Our general solution is to provide IPA pronunciation which shows syllabic stress. I do see the advantage of simply marking stress. But since this specific method is a foreign dictionary convention, and we don't do this for other languages (as far as I know), my preference would be not to mark it thus for only the Philippine languages. Michael Z. 2009-03-12 20:01 z
Actually, marked words were used for a time and had teemed in old literature. I think marked words have to be included on Wikt in case these are searched from such books.
I agree with the normal orthography problem. I worry that the current template system portrays that the marked words are only used as pronunciation guides, when they stand on their own as, well, words.
What about entries for each marked word then? Not all words have marks, so it wouldn't constitute to doubling the entirety of the Phil. language vocabulary. They all could sport a template that says An alternative/A dated spelling of (word).
I apologize for the multitude of ways to have the marking system around. I reason that this is healthy brainstorming! --Icqgirl 12:31, 14 March 2009 (UTC)
Then these could be listed under “Alternate forms,” and marked obsolete (or archaic, dated, etc.). The actual entries may contain a usage note that this form is still used in dictionaries.
The standard way to mark them up would be:
   === Alternate forms ===
    * [[rúta]] {{qualifier|obsolete}}
The {{qualifier}} is used to place a context label without putting the current entry into its associated category. Creating the actual entries would be lower priority, since the actual spelling is already documented by such a link. Michael Z. 2009-03-16 16:02 z
Cool. Okay, it shall be done. I guess I'll sweep out the templates now, and start including alternate forms. Hey, thanks for your time and your recommendations, Michael. I wouldn't have ended up with this without your experience and willingness to help. And if ever another template may be required I'll remember to call upon you or the fora. --Icqgirl 07:32, 18 March 2009 (UTC)

Category:Chakavian and Category:Kajkavian

May I move these to Category:Chakavian Croatian and Category:Kajkavian Croatian?

Then they would match others like Category:Ekavian Croatian, Category:Ikavian Croatian, Category:Ikavian Serbo-Croatian, Category:Ijekavian Serbian, and the could be standardized in the dialect context labels {{Chakavian}} and {{Kajkavian}}Michael Z. 2009-03-07 17:24 z

Done. Michael Z. 2009-03-11 20:59 z

Category:Doric dialect and Template:Doric dialect

Any objection to renaming these Doric Scots? Then they would be clearly not Greek, and the template code could be standardized. Michael Z. 2009-03-07 17:34 z

Category should definitely be renamed IMO. But couldn't the template just have "regcat=Doric" and leave the user to specify lang=sco or lang=grc? -- Visviva 17:39, 7 March 2009 (UTC)
That would work fine. It feels like a weirdly pre-scientific taxonomy, but we're only making use of the specific word, and not implying that these Scots are related to these Greeks. wilcoMichael Z. 2009-03-08 17:43 z
Done; see {{Doric}}, defaulting to lang=sco, and Category:Doric ScotsMichael Z. 2009-03-09 06:33 z

Request for importer flag

Hi there all. I would like to formally ask the community if I could get the importer flag here please. There are many pages on other Wiktionaries that I would like to be able to import here and I need to show stewards local consensus before I am able to request the flag. Therefore, I would like to ask if you trust me with the importer flag here. I already have the importer flag on the Simple English Wikipedia, and I would like it here to be able to transwiki some pages from other Wiktionaries here. Thank you for listening and reading this request, Razorflame 04:41, 8 March 2009 (UTC)

Can you give some examples? It's hard to see how importing could work in entry-space, except perhaps between here and Simple. But this whole "importer" thing is new for us. Would it allow importing from other projects (e.g. Wikibooks or Wikipedia), or would it be limited to Wiktionaries? -- Visviva 04:47, 8 March 2009 (UTC)
It would allow for transwiking between only the wiktionaries, but it also allows for file uploads of pages from other projects (however, I would not be doing this, unless the information can only be found there). And yes, I would mainly use it between Simple and here. Examples of pages would be pages that exist on the Simple English Wiktionary, but do not exist here, and pages that have English Wikipedia articles about things that should be definitions here, but aren't. I would fully format each and every page that I import, as well as give a link to the page that I import it from. Thanks, Razorflame 04:50, 8 March 2009 (UTC)
I need some more concrete acceptance from this community before I can go ahead and request the bit at Meta. Thanks, Razorflame 19:45, 8 March 2009 (UTC)
This is going to sound rather arrogant (and perhaps it is), but I wonder how much content the Simple English Wiktionary has that we do not. I don't suppose you could throw us a few concrete examples. -Atelaes λάλει ἐμοί 00:56, 9 March 2009 (UTC)
simple:protogalaxy, simple:protogalaxies, simple:bucking bronco, and sex partner and sex partners, just to name a few. There are several others that come to mind as well. Those are the best concrete examples that I can come up with for pages that could be transwikied. Thanks, Razorflame 01:23, 9 March 2009 (UTC)
Above :simple: links slightly edited so as to go to the correct page, hope that's OK. -- Visviva 06:23, 9 March 2009 (UTC)
Not a problem. Cheers, Razorflame 06:48, 9 March 2009 (UTC)
Support, seems worth trying out. Should we have a Vote? -- Visviva 02:06, 9 March 2009 (UTC)
Don't see any reason to not have a !vote. Cheers, Razorflame 02:22, 9 March 2009 (UTC)

(outdenting) I've gone ahead and made a !vote. Please let me know when it gets underway. Thanks, Razorflame 06:58, 9 March 2009 (UTC)

Incidentally, is there any reason all autoconfirmed users can't automatically have import status? It doesn't strike me as something that can really be abused anything worse than what you can already do with an edit button. (Is importing on top of something dangerous? I forget.) Dominic·t 13:50, 11 March 2009 (UTC)

Yet another ginormous list

Created User:Visviva/Linkeration and subpages. The basic idea is that, as a rule, if the ==English== section of one entry links to another entry from a bulleted-list section such as "Related terms" or "Antonyms", that link should be reciprocated somewhere within the target entry (or else removed as inappropriate). So this is a list of English entries which have an incoming unpiped link from an entry for another word, but which don't contain that word anywhere in the entry text. By clicking the "edit" link, the unreciprocated-link info for that entry is transcluded above the edit box (handy if you're absent-minded like me).

So anyway, I know we've all got plenty of stuff on our to-do lists already, but feel free to take these lists for a spin if you're looking for a little variety. Items that have been dealt with can be deleted from the lists, if you wish (but any comments will just be overwritten in the next update).

Note that this is still the first draft, so there are various little bits of stuff in there that shouldn't be. Examples of exceptions I've found so far, that I will try to fix on the next pass:

  • SI units and other translingualities. If ZF is ==Translingual==, and zettafarad is ==English==, can "ZF" still be considered a synonym of zettafarad? Can't see why not...
  • Links from New, and any similar entries where it makes sense to refer to a completely unrelated word in the Derived terms section. A fairly unusual case, I think.

No doubt there are quite a few others; please post here or on my Talk if you notice any. -- Visviva 07:27, 8 March 2009 (UTC)


Or, why I've been much less visible for a month ... I set out to improve the program we've been using to maintain the language interwiki links here, by adding reciprocal links to the other language wikts, and it has taken up much of my wiki-time for a month. I've not run lots of other things in the meantime, prod me on my talk page if there are specific things you want.

Interwicket runs as a bot on all 150 active wikts, updating whatever it can find. It works by reading recent changes, and by reading the indexes of all the wikts on a 3-4 day rotation. I was using the dumps, but they are completely broken yet again. It has been publically stated that they are "low priority". (OT aside: I wonder what all the people putting work into WMF projects would think or do if they fully realized that the backups are "low priority". That's right: there are no backups of the WMF project data. The live database is in Tampa, with one replica in Amsterdam. If something took out one of those, there would be only one copy until they could build another replica; which given past performance might take months. If a software fault took out both, or a bad coincidence—possibly on purpose, as we have very dedicated vandals—took out both, the projects would just cease to exist; the Wikipedia simply could not be rebuilt.)

The Interwicket process is stable now, running, with a lot of work to do: a statistics run on 14 February showed that 525,001 entries needed links updated, and that was only the pages that already had at least one link; other titles that occur in more than one wikt but had never been linked are not in that count. You can see the present state of affairs at User:Interwicket/FL status.

Just so you know. Questions ideas etc write on Interwicket's talk page or mine. Robert Ullmann 13:57, 8 March 2009 (UTC)

Wow, thanks for the programming effort! If it took you a month, it can't have been trivial. What worries me is the WMF's stance on backups. Maybe it wouldn't hurt to make some more noise about this. There is WalterBE who comes by at least once a week to ask if there is 'news'. Well, I would call this OT remark of you as very newsworthy! It sure merits to go higher up the priority list. --Polyglot 02:30, 11 March 2009 (UTC)


Does this indicate Roma carnival cant, or British gay cant, or both? Is the language a form of English or Romany? Cf. Polari, Category:PolariMichael Z. 2009-03-08 17:33 z

After reading w:Polari, it appears to me that one is an adoption of the other by another group, and they're essentially the same slang. Michael Z. 2009-03-11 20:16 z


Categorized as a regional dialect label, but it is used for both English and Italian terms, and only applies a subject Category:Venice. If there's no objection, I'll turn it into a purely topical label with the text “in Venice”. Michael Z. 2009-03-08 17:37 z

Existing uses should probably be zapped, if they are topical-only. But there is a w:Venetian language, which makes me wonder if there isn't also Venetian Italian in the same way we have both Scots and Scottish English. -- Visviva 02:09, 9 March 2009 (UTC)
On closer examination, this is used strictly as a topical label, and it seems warranted in most of the entries (specific entities in Venice). I made it into a geographical topic label by changing the text to in Venice, and removing the regional (dialect) context category.
Dialect labels could still be created for both Venetian Italian dialect and the Venetial language, but they aren't needed for any of these entries. Michael Z. 2009-03-09 06:02 z
Hmm... that doesn't seem like the sort of topical label we would want to encourage, and most of the definitions have "Venice" in the text anyway. But I guess it's harmless enough. -- Visviva 06:19, 9 March 2009 (UTC)
Well, I'm not positive each of these entries needs that label, but someone placed it there.
But we do need geographic topic labels. A number of dictionaries use them; while others would write “in Venice” as part of the definition. Right now we use “regional context labels” to indicate both regional dialect and geographical context, which is just confused. Example: SAS is currently marked as British English (with the unfortunate label text UK), but it is not dialectal, rather it has this meaning in the context in the UKMichael Z. 2009-03-09 06:31 z
Regional context labels are not intended to be used for geographical context, they are meant only for regional dialect. SAS looks fine to me because the abbreviation would probably not be understood outside of those areas, but this is a tricky problem and good to get cleared up in our documentation. A good example would be the Sphinx in Giza which should not be marked Egypt! At the same time, yankee meaning any American is best marked British, since inside the U.S. the meaning is different. (Or can someone find a better example?) DAVilla 01:39, 27 March 2009 (UTC)
Right in concept, but dead wrong in the details. Remember that regional dialect tags denote restricted usage. The British Special Air Service is known worldwide as the SAS; its proper name can't be considered restricted to British English, unless there were some other name for it used by North Americans. That's like saying FBI or US Army are examples of dialectal American English. Same for SAS (Scandinavian Airlines System) – whether we consider the initials well known or obscure, their use is not restricted to the “Scandinavian dialect of English”.
Right about Sphinx, and possibly yankee, but it would be nice to have some evidence supporting that. Michael Z. 2009-03-31 17:12 z
If I were British then certainly I wouldn't think FBI to be restricted to the US. I made the assumption on SAS because I hadn't heard of it. This is a tricky case because it's an acronym, and we don't have any rules on those. (Some have suggested that we are labeling them wrong in the first place, by not indicating their part of speech.) If an acronym would have to be cited out of context, that is without indicating what the acronym stands for, then there would be many that could be so cited only within a restricted usage, even though they are actually used more widely, but with definition provided. DAVilla 17:43, 8 April 2009 (UTC)

Request for comment on language for two up/down votes on definition format

I think the time has come for a WT:VOTE, or rather two votes, on capitalization and periods. I don't really care what policy we adopt, but the present inconsistency and back-and-forthing needs to end. With that in mind, and despite the fundamental evilness of voting, I'm proposing to run the following as separate up/down votes. If they both fail, we'll try an up-down vote on whichever alternate wording opposers found most agreeable. Both of the following pertain specifically to this sentence in WT:ELE: "Each definition may be treated as a sentence: beginning with a capital letter and ending with a full stop. "

  1. That the wording of Wiktionary:Entry layout explained#Definitions be modified from "Each definition may" to "Each definition, even if it consists of a single word, should"
  2. That the wording of Wiktionary:Entry layout explained#Definitions be modified from "be treated as a sentence: beginning with a capital letter and ending with a full stop" to "begin with a capital letter. Definitions that are not complete sentences should end with a full stop only if they are followed immediately by a sentence or independent sentence fragment."

The first of these would solidify the existing de facto periods-and-capitals standard and clarify that it applies to one-word translations as well. (There have been reasonable disagreements on this score, but when translations and full definitions are commingled in an FL entry, lowercasing the translations and uppercasing the definitions looks awfully messy. At any rate, the current wording of ELE doesn't address this question at all; that needs to change.) The second would supplant our existing periods-after-everything practice with the periods-only-where-appropriate standard found in most style guides. It's my sense that the first will pass and the second will fail, but I could be wrong.

I don't propose to debate the substance of these here, or at all; they will either pass or fail and then we'll move on. But since so many policy votes run aground on wording rather than substance, please take a look at the above phrasing, and see if anything seems imprecise or poorly worded. Once we have the clearest possible language for people to either support or oppose, we'll take it to WT:VOTE. -- Visviva 10:49, 9 March 2009 (UTC)

First, what we have now is not broken and does not need to be fixed.
If a definition is a sentence, we capitalise and use full stop.
If it is a word or phrase, we (of course) do neither.
If a definition is not a sentence, capitalisation is wrong, and the full stop is wrong:
  1. Cat.
for Swahili paka is not a sentence, and the definition "Cat" is wrong. (And I don't care if you would insist on [[cat|Cat]], the presented form is still wrong.) A single word or phrase must not be capitalized. If something must be fixed, it is the sentences that should go, with all definitions as phrases. But please note that the distinction is very important: the word or phrase is something that might be substituted for the term in a sentence, the sentence definitions are descriptions of the word. Most dictionaries tend toward the former.
Finally: votes happen after debate and reasonable consensus, they serve to document reasonable agreement; this is not a democracy. Robert Ullmann 11:49, 9 March 2009 (UTC)
What you describe is a reasonable system, but it is not actually the system we have at present, because we do not have a system at present. What we have is a system of everyone doing their own thing, backed up by ELE which only describes how definitions "may" be formatted. There are, I believe, 8 possible permutations (capital/lowercase, period/no period, translations distinct/not distinct from definitions), and at present one can easily find all eight of these represented in various entries (often several in the same entry). We all have our preferred styles, and while I'm happy to say there is relatively little edit warring over these issues, everyone has moments where they have to enforce consistency in an entry, and at that point they enforce whichever of the 8 possibilities they find agreeable (or whichever one they believe, incorrectly, to be policy). This is broken, and needs to be fixed. I honestly don't care how we fix it; I'll write every definition in uppercase and end it with an exclamation mark if we can just have consistency. If you'd like to propose a vote on specific wording for your preferred system, as outlined above, please go ahead.
Beyond that, it seems to me that we have a system currently where consensus is never documented because it requires a vote, and votes on substantive issues never pass, largely because of disagreements on underlying issues which arise from a failure to establish and document underlying consensus. This doesn't work, can't work, and most definitely is not working. But that's a separate issue. -- Visviva 12:12, 9 March 2009 (UTC)
I'm thinking perhaps I should split #1 into two votes: one on "may" -> "should" alone, and a separate up/down vote on the distinction between translations and definitions (which is a perfectly reasonable distinction, but is not mentioned anywhere in policy at present AFAIK). -- Visviva 12:16, 9 March 2009 (UTC)
Do you get that there are several distinct kinds of definitions and that they can not be forced into the same format? (and please remember the aphorism about "a foolish consistency being a hobgoblin of mall minds" ;-). The simple fact is that we have (1) words and phrases that might replace the word being defined, (2) sentences that describe the meaning of the word, (3) translations/glosses, (4) "form-of" redirects pretending to be definitions. When more than one class occurs in an entry, it is not possible to attain "consistency": they are different things. Sorry.
WT:ELE says definitions may be sentences (1) and (3), capitalised and ending with period, or not, and therefore not. It probably should clarify that a definition may be a word/phrase/glossed translation (2) and (4) and not be cap/period; but that only clarifies what it already says. You can not force them into some "consistent" format without expelling one or the other half of the definitions from the wiktionary, "cat" simply cannot be written as a sentence, and "A domesticated species (Felis silvestris) of feline animal, commonly kept as a house pet." is. The distinctions are real, and useful, and used.
By all means clean up cap/not period, and period/not cap; but that isn't a change to ELE. Sure there is lots of syntax variation out there. But ELE is not broken. Clarify it if you want, but don't change it.
(If one were to force it, the only possible way is to always use no-cap, no-period, phrases; lots of print dictionaries do this, but then we have to dumb-down the sentences; I don't think people would like that? It is, in any case, the precise opposite of what you suggest. ;-)
There is no distinction between "translations" and "definitions", as the concept of a 1-1 "translation" for a FL term is (as you certainly know!) completely bogus. In using "cat" for "paka", it is just giving a barely adequate definition. It really should be more complete. (and written as a proper definition) ("paka" is Felis silvestris and sometimes other small cats, it doesn't correspond to any one sense of cat ;-) Robert Ullmann 12:41, 9 March 2009 (UTC)
I agree that there are a lot of different ways to distinguish certain types of definitions from others, any of which can be used as a basis for stylistic distinctions. We can distinguish synonym-type definitions from quasi-sentence definitions, as you suggest (but I have to note that very few definitions are actually sentences); we can distinguish translations/glosses from full definitions, as Stephen has suggested elsewhere; we can separate non-gloss definitions like "form of" from real definitions; and so forth. I am quite happy to accept any and all of these distinctions, as long as they are documented and set forth in policy; otherwise I'm afraid that I, like many others, will ignore or simply fail to notice them. On the other hand, I don't think that any style is really more correct than any other; I don't accept that there is anything inherently wrong (or right) about "cat", "cat.", "Cat", "Cat.", "A cat", "a cat.", or any other of the similar permutations that can currently be found on the project. The choice between them is a purely stylistic one. At the moment these are all equally permitted by ELE, and ELE also makes none of the possible distinctions among types of definitions. As a consequence of this, the overall coherence and portability of our data is suffering. This is annoying and wasteful of effort, and I'd like it to stop. -- Visviva 15:03, 9 March 2009 (UTC)
My primary concern at the moment is just to find a way of breaking these issues down into individual, atomic elements that we can discuss and decide on their respective merits. It seems to me that most policy votes fail because they try to address too much; we Wiktionarians are an extraordinarily fractious bunch, so achieving even a simple majority on more than one issue at a time is virtually impossible. Thus we get stuck with a status quo that would never pass if it were put up for a vote today. Even the simplest clarifications invariably attract opposition (usually arising from issues unrelated to the clarification itself, as with the two recent ELE votes), so any change has to be split up as much as possible if we're ever going to make any progress at all. By the same token, this strategy will probably attract some "Oppose unless X also passes" votes, but the alternative has already been tried and has almost always failed. I just don't see any other way to proceed. -- Visviva 15:03, 9 March 2009 (UTC)
In my experience, Robert, the only actual sentences are written by total newbie contributors — such as "werebears are people who transform into bears whenever they like." — and experienced editors routinely fix these as part of our wikification process. I assume this isn't what you mean? Do you mean the non-gloss definitions, like defining as "The definite article."? —RuakhTALK 14:45, 9 March 2009 (UTC)
  • From what I've seen on browsing various FL-English dictionaries on archive.org, is "Cat."-like definitions are very frequent in older dictionaries (first half of C20 and earlier), but today appear to be completely obsoleted by "cat"-like (no capitalisation and no period) forms. Are there any modern FL-English dictionaries that use "capitalise always, even if a single word, or words separated by commas" strategy at all? I've seen lots of editors using this type of formatting for definition lines, but I've never seen in it any modern dictionary personally. --Ivan Štambuk 14:20, 9 March 2009 (UTC)
    I think this is mostly overflow from the style for English entries. I'm not at all sure that this is a bad thing; in an open wiki, it makes a lot of sense to take the path of least resistance, and maintaining two separate styles is easier said than done. At any rate, ELE currently makes no distinction between the format of English and FL definitions, so the confusion/conflation of the two is not surprising. -- Visviva 15:06, 9 March 2009 (UTC)
There is one thing about which I am confused. From what I understand, in cat, "A domesticated species (Felis silvestris) of feline animal, commonly kept as a house pet." is not a sentence but a long term. It is not a sentence as it has no explicit subject; "A cat is" is implied before the term. Correct me if I am wrong, though. --Dan Polansky 14:47, 9 March 2009 (UTC)

Apologies to Robert U., but style of capitalization and punctuation is just style, and there is no absolutely “wrong” form. You just have to look at cat in Dictionary.com and a couple of paper dictionaries to see this is the case.

Our style guide does need improvement, but whatever we settle on should be as simple and consistent as possible, treating all entries the same. We don't need to be lecturing novice editors on the difference between sentences and phrases so they can go back and change capitalization. To improve ELE is to make it shorter, not longer. Just one of the following (either is used in some mainstream dictionaries):

Each definition should begin with a capital letter and end with a full stop (a period).


Each definition should begin with a small letter and end with a full stop (a period). Any sentence or phrase following a full stop should begin with a capital letter.

Adding any more qualifiers just appeals to a Victorian fetish for formal grammar, and doesn't improve the production, usefulness or delight of the dictionary in any way at all. Michael Z. 2009-03-09 16:21 z

My two cents (or 14p)'s worth: English definitions should start with a capital letter and end with a full stop, even if they are only one word. FL definitions, being glosses, should be uncapitalised and un-full-stopped, except in those cases where no English translation exists and explanatory sentences are necessary. Ƿidsiþ 08:52, 10 March 2009 (UTC)
A question, then: if separate votes were raised on A) changing the "may" to a "should" and B) introducing the English/FL distinction, would you vote yes on both, or would you oppose A because it would be unacceptable for A to pass without B? I'm just trying to a get a feel for the best way of structuring the vote(s). I've been assuming that breaking these down into individual issues is the best approach, but I don't know if that might prove to have just as many pitfalls as the other way. -- Visviva 10:14, 10 March 2009 (UTC)
If I understand you rightly, I think I would vote yes to both. Personally. Ƿidsiþ 11:46, 10 March 2009 (UTC)
I'd vote for A. I'd vote against B, because it introduces an unnecessary complication for the editor (and I totally don't see anything about English/FL distinction in the proposal – if it's there, why not state it explicitly?). Michael Z. 2009-03-10 15:09 z
It isn't, at the moment; I'm thinking of this as an additional vote (maybe replacing the one I proposed as #2 above). I guess the language would be something like That a sentence be added to ELE#Definitions, reading "Those definitions which consist only of a synonym or translation, or a series of synonyms or translations, may begin with a lowercase letter". Or something like that? I would add it to the list above, but I think this discussion is already long and messy enough; I had probably better just put up the draft WT:VOTE subpages and request comment there. -- Visviva 15:31, 10 March 2009 (UTC)
I think you should indicate a style for translations (only in foreign language entries) and a style for explanatory definitions (in either English or FL entries) but not mention synonyms or equivalent phrases at all, unless you think there might be consensus on that. BTW, one major difference that may have been overlooked is that the first is comma-separated and the second semi-colon separated. 17:49, 11 March 2009 (UTC)
Well, it seems to me that the issues for synonyms and translations are identical, so anyone who supports an exception for one ought to support it being extended to the other. I'm not sure I understand what you're saying in terms of punctuation; in my experience both semicolons and commas are currently used in both cases, though I would consider semicolons the standard (commas mostly being used to separate words within larger semicolon-delimited groups).

Proposed ELE votes

Nos. 1 and 3 are contradictory. The first implies that definitions phrased as sentence fragments should be treated as full sentences, or that most definitions are full sentences, or both. The third sets up an exception for the case of “a term”. But is a term restricted to “just one word,” or “just one compound word,” or “a term and its modifiers,” or “an equivalent phrase,” or what?

Since many definitions are written with subst itutability in mind, and reflecting the grammatical function of the defined term, then we have many long definitions which are not sentences. In fact, I would say that the majority of definitions are not subject-verb sentences. I have no idea whether the following examples “should be treated as a sentence” or is “a definition line that consists solely of a term that can be directly substituted for the headword:”

  • buttonhole: A hole through which a button is pushed to secure a garment or some part of one.
  • Ivanhoe: The hero of this novel.
  • Stakhanovite: An extremely productive or hard-working worker, especially in the former USSR, who may earn special rewards.
  • Empire State Building: A skyscraper in New York City, the tallest in the world in 1931–72.
  • go: To leave; to move away.
  • despicable: Fit or deserving to be despised; contemptible; mean; vile; worthless

If these are passed, will we have to undertake a massive campaign to add capitals and periods, or strip them? Or are we going to argue for weeks on how to interpret this? Michael Z. 2009-03-13 17:47 z

And how would the text read if both 2 and 3 are passed? Michael Z. 2009-03-13 23:04 z

"Terms" are defined in WT:CFI#"Terms" to be interpreted broadly, aren't they? We could replace "term" with "term eligible for inclusion under Wiktionary:Criteria for inclusion", I guess. I didn't think that would be necessary, but I guess it is clearer. Anyway, none of your definitions would meet the definition as written, but these would:
  1. leave; depart
  2. contemptible; mean; vile
If those who more strongly support this distinction disagree (e.g. if "to" or "a" should be accepted as part of a term), I'd be happy to adjust the wording. But I don't see any ambiguity.
Unfortunately, any actively-enforced standard will require changing a large number of entries, for the simple reason that there is currently no standard, and one can find tens (hundreds?) of thousands of definitions that follow any of the four possible combinations of capitals and periods, with no particular regard for the distinctions proposed here. How actively we go through existing entries to enforce this standard will be, as always, up to the energy and interest of the community's members. But I'd rather settle this going forward now, when we have a few million senses, rather than wait until we have billions of them. -- Visviva 05:45, 14 March 2009 (UTC)
If all three votes passed, the passage would read as follows, up to the end of the paragraph:

Each definition should begin with a capital letter. Because most definitions are not sentences, they should not end with a full stop unless this is necessary. For example, if a definition is followed by a short qualification, such as "Used mostly in the plural", then both the definition and the subsequent qualification should end with a full stop. However, in the case of a definition line that consists solely of a term that can be directly substituted for the headword, such as a synonym or English translation, or a list of such terms, the definition should not be capitalized, and should not end with a full stop. The key terms of a definition should be wikified.

I think I will change "term that can be directly substituted for the headword" for "term that is eligible for inclusion under the criteria for inclusion"; it is more the terminess than the substitutability itself that is key. Thanks for pointing that out. And perhaps the "general principle-proviso" relation should be clarified, to minimize the appearance of contradiction. ("However, there is one exception to this: ") -- Visviva 05:45, 14 March 2009 (UTC)
I wouldn't support the proposed changes, mostly because "definition" is too fuzzy a term when used here. Does it mean the definition line, even if the line is not (strictly speaking) a , or does it only mean items that actually define the word? Consider that wiki-lwayers will argue as to whether "# fire" is or is not a definition of Latin and must therefore be treated as a sentence with a capital letter and period. For definitions lines that merely translate an FL entry to English, such formatting is overkill. I think the portion of ELE to be modified needs a more serious and intensive reworking than modifying a few words and punctuation. --EncycloPetey 22:11, 14 March 2009 (UTC)
There are too many qualifications, it is real work to parse it all out to figure out what it really means, and I think the wording actually fails to adequately convey its intentions.
The worst example: when to use periods? When necessary. When necessary for what? There's an example, but does it represent only one situation where this would be necessary, or is it defining the entirety of when using periods is considered necessary? Or are we supposed to know what is necessary from some universal rules of English orthography (the grade-school rules we were taught are not universal or even necessarily always good).
I also still question the intention. So in dog, should definitions 4 (“A man.”), 9 (“A hot dog.”) and 10 (“Underdog”) remain capitalized and punctuated as is? If we (correctly) remove the article “A” from 9, we are supposed to drop the period too, because it is changing from something else into a “term?” I don't see the point of such arbitrary, formalistic rules. Michael Z. 2009-03-15 19:24 z
That is my understanding of the criterion supported by Robert and others, which I have attempted to formalize in vote 3 -- except that "underdog" would be lowercase. If that is not correct, I hope someone will either edit the vote or clue me in. -- Visviva 16:09, 17 March 2009 (UTC)
In terms of vote 2, I will try to clarify it a bit, but this is basically the same criterion found in the Chicago Manual of Style, among many other style guides: don't end a sentence fragment with a period, especially in a vertical list, unless there's a darn good reason for doing so. Heck, even MS Word enforces that (MS Word is not an authority on anything, of course; but this is a very widespread and widely-accepted rule of style, and I don't understand our reasons for abrogating it). -- Visviva 16:09, 17 March 2009 (UTC)

If I understand the intention correctly, then here's a simpler rephrasing which gets the same point across:

A definition line which only lists one or more terms should have no capitalization or periods. Any phrases or sentence fragments should be capitalized. If there is more than one phrase or sentence fragment, then each should end with a period. Examples:

  1. leave, depart
  2. Someone who is morally reprehensible
  3. All humans collectively; mankind. Also Man.

Just a suggested wording – I am not in favour of having three different cases, especially what I see as a major change by removing periods from solitary sentence fragments, which appears to be the way the majority of definitions are currently formatted. I By the way, the “Used mostly in the plural” example doesn't appear to occur even once. Michael Z. 2009-03-15 19:43 z

Just to clarify, I'm not proposing that we have three separate cases. I'm proposing that we have a vote on a) the general rule that is already present in ELE as a suggestion, and b) two modifications of that rule that may or may not have sufficient support. If all three votes pass, then we would have the three cases you describe, because that's what we as a community will have decided on. On the other hand, if only the first vote passes -- which I suspect is the most likely outcome -- then we will have the simple always-a-capital, always-a-period rule. If no votes pass, of course, we'll be left with the same godawful mess we have now, and somebody else can try their hand at this farcical procedure.
The reason for this is approach is that I am hoping people will use the rationale "I support this rule, but with qualification X. Therefore, I will vote for this rule together with qualification X; then even if there is not enough support for my preferred phrasing, at least we will have a coherent policy." This would be more constructive than the usual course of thinking, to wit: "I support this vote, but only with qualification X. Since qualification X is not included, I will oppose." (The same rationale is then followed by supporters of qualifications Y and Z, the upshot being that a rule that everyone basically agrees with fails, as do any subsequent attempts at a compromise.) Several people have expressed support for the "exception for translations/glosses" rule; therefore, IMO we should vote on that together with the main rule -- hence vote 3. And personally, I think we should consider whether it is really constructive to dump unnecessary punctuation all over the page, in violation of pretty much every style guide ever written -- hence vote 2. -- Visviva 16:09, 17 March 2009 (UTC)
I admittedly didn't have the time to read all of the above as I need to get on a plane pretty soon -- my two cents:

Please let's go for "no capitals, no stops" as a general rule-- for example I added to differ and had to capitalize

" Disagree."

to fit in with the rest of the layout.

Then I decided to add an explanatory definition part, and had to redo/on do. the various capitalizations -- do other editors not find this roundabout, redundantly spent effort? It pushes my RSI --arms over the pain barrier for sure sad smiley.

And then, when even one word definitions are capitalized, then why not the related terms? (it almost feels like discriminationagainst thim smiley); and why not then capitalize. the context specifications before definitions etc. etc.?

In my view, I see what I propose/vented as modern, clear and clean, and yes, I pretty much prefer a low frills layout approach. and let our content speak for itself! Smiley.

The capitalizations etc. just remind me, and perhaps other users, of old-fashioned and bad dictionaries, even worse than what one has to deal with these days.

See you all in two weeks, smiley--史凡 08:00, 17 March 2009 (UTC)

PS in case a more complicated capitalization and punctuation pattern 'd be preferred, please please could a bot then be created to whack also new entries/ edits in shape, so as to keep editing accessible to disabled users like me, begging smiley--史凡 08:09, 17 March 2009 (UTC)

I do agree with your point about the general painfulness of the current system, in which one is always trying to either adjust one's additions to match the existing format of the entry, or to adjust the existing format of the entry to whatever one thinks it should be. If we can just agree on something, hopefully that will reduce the quantity of wasted effort considerably. Your preferred no-dot, no-cap approach hasn't seemed to have much support as a general approach, but I think it makes perfect sense and is really the simplest of all possible options. If the above votes don't work out, maybe we can try that instead; I'd vote for it. -- Visviva 16:19, 17 March 2009 (UTC)
What we should do is wrap all definitions with {{defn|}} so that capitalization can be styled :-P according to user preference. DAVilla 02:00, 27 March 2009 (UTC)
Oh, please no. That would be an admission that we can't muster up consensus for even the most trivial decision. Like the misbegotten inaccessible code blemishing every entry, so that editors don't aren't forced to suffer double quotation marks? This kind of “user choice” signals failure and paralysis. Michael Z. 2009-03-27 14:40 z

Revising Template:policy

The current wording of Template:policy forbids any changes whatsoever without a vote. This has played a considerable role in the current brokenness of our policy documents, as even the simplest, most trivial changes get bogged down in the procedural mess of WT:VOTE and often never get accomplished. Since I'm on a vote-proposing roll tonight, I'd like to suggest some language for a vote on changing the language of this most dysfunctional of all our dysfunctional policy documents. Here's my initial thought:

That the wording of Template:policy be changed from "It should not be modified without a VOTE" to "It should not be modified without discussion and consensus. Any substantial or controversial changes require a VOTE."

Any thoughts on this? I don't know if there's any hope for this, but I'd like to give it a shot. I can't speak for anyone else, but this has been driving me crazy for years. -- Visviva 12:39, 9 March 2009 (UTC)

Because people will persist in making "simple clarifications" that entirely change the meaning ;-) Robert Ullmann 12:45, 9 March 2009 (UTC)
Well, as they say, that's why God invented rollback. :-) I mean if a change is reverted, that should be pretty strong prima facie evidence that it requires more discussion, if not a vote. -- Visviva 13:42, 9 March 2009 (UTC)
I couldn't possibly agree more, Visviva! I don't know how the present wording got to be policy, but I refuse to believe that it was through voting. People should be allowed to use common sense ... --Eivind (t) 12:53, 9 March 2009 (UTC)
Support. Unlike Eivind, I'm sure I could possibly agree more, but for the life of me I can't see how. Furthermore, I'd support making this change only on the basis of discussion and consensus, without even a VOTE. :-)   —RuakhTALK 15:09, 9 March 2009 (UTC)
Not sure whether you're kidding, Ruakh, but, seriously, I don't think that that can be done. If current policy forbids modification of the pages without a vote, then certainly we can't without a vote allow such without-a-vote modification.—msh210 15:44, 9 March 2009 (UTC)
Well, it would be hard to argue that this change is anything but substantial and controversial (at least, I hope we haven't been forcing each other to jump through these hoops for so long out of mere inertia), so even by the new wording a vote would be required. :-) -- Visviva 03:14, 10 March 2009 (UTC)
Ruakh, Eivind, anyone else: are there any specific changes to the wording that you would recommend? -- Visviva 03:14, 10 March 2009 (UTC)
I find your wording to be quite sufficient, though I'd like the last sentence to be: "Any substantial or controversial changes require a vote or unanimous consensus." … --Eivind (t) 07:51, 10 March 2009 (UTC)
I'd like to explicitly include the bit about rollback specifically defining the issue as contentious and vote requiring. But, yes, I also think this idea has merit. Only time will tell if any changes can be made non-controversially. :-) -Atelaes λάλει ἐμοί 08:41, 10 March 2009 (UTC)
Indeed it will. I'm hoping this will at least allow people to better understand which changes are controversial and why, before they go to WT:VOTE; all too often that learning takes place after the vote is already in process, leading to the vote author trying to revise the text, leading to a bunch of legalistic carping about how the vote is suddenly "invalid" because somebody actually edited a wiki page.
Would substituting "disputed" for "controversial" do the trick, do you think? I mean, if a change is reverted, people may reasonably argue that it isn't really controversial (maybe editor X is just being difficult), but they can't argue that it isn't disputed. -- Visviva 10:05, 10 March 2009 (UTC)
I guess I find it difficult to envision a substantial (let alone a controversial) change to a policy document that would get unanimous consensus. Someone will always object, unless the issue is just too trivial to care about, in which case it wasn't substantial in the first place. :-) (I would tell the one about the group of Wiktionarians who tried to agree on dinner, but I don't know if some folks would consider starvation jokes a bit off-color.) And frankly, for really substantive changes I think that votes can be a good thing -- they provide a clear historical record of the consensus that was reached. -- Visviva 10:05, 10 March 2009 (UTC)
I'd be happy to rewrite it as above, and am not too fussy about wording. (It still needs to be clear that, unless you are an admin who's been here for years, your changes will not be right :D).Seriously, if you look at the WT:ELE history you get people who "just don't understand" why their changes are not acceptable, there needs to be a quantifiable metric. (Changes should not be made without strong consensus in WT:BP or a formal WT:VOTE.). On a related note, it would be good, in my opinion, if we did a quick recap on our Policy documents (many of them are complete tosh cloned from Wikipedia), but I don't know if anyone has the inclination. Conrad.Irwin 00:39, 11 March 2009 (UTC)

Gender in derived terms

I have been adding gender mark such as "m", "f" or "n" to terms listed under the head "Derived terms" and other heads such as synonyms, antonyms, and related terms, in foreign-language entries of course. My reasoning was that if the gender marks are there with the lists of translations in English entries, they should probably also be there with other lists of terms.

An example: in the Czech den, "deník m" is listed with "m" under the head "Derived terms".

But I have got some doubt. Is this really wanted or needed? After all, the gender is already marked at the deník entry; why replicate it at places that are linking to "deník"? Any thoughts? --Dan Polansky 14:56, 9 March 2009 (UTC)

I'd say the marks are welcome whereever the gender actually makes any difference for the use and meaning of the word. While, when as in Norwegian, where the most times the word's gender doesn't really have anything to say meaningwise, it can be left out. But, as an example, when listing ting as a derived term or synonym, a gender mark would be very welcome, since the word has two totally different meanings when when used as a neuter or male word. The Norwegian Wiktionary equivalent to the Beer parlour is no:Wiktionary:Tinget – not meaning "Wiktionary:The thing" :D --Eivind (t) 15:05, 9 March 2009 (UTC)
I think it's fine to do what you're doing. I would personally file this under things that are a) harmless and b) somewhat helpful to the user, but c) not really worth going out of your way for. So if you don't mind doing it, and you're editing the entry anyway, there's no reason not to. The same logic would apply to transcriptions for non-Roman languages, I think (but not to glosses). Most editors won't bother to add this info, but I don't think I've ever seen anyone removing it. -- Visviva 16:19, 9 March 2009 (UTC)
You know, I've wondered for some time about the merits of gender. It seems to me like a fairly unimportant part of a word. I certainly wouldn't include the declension of an Ancient Greek translation, and yet declension is just as important (in my mind) as gender. Clearly we want the word itself, and glosses (where appropriate) are invaluable. Transliterations are rather nice, as they allow a person unfamiliar with the script to (roughly) say the word aloud in their head. But gender......? If this project were a dictatorship headed by me, I would remove gender from all places except a word's own entry. There are many situations (translations being the big one) where I think we really want to work on trimming things down a bit, and gender seems like a reasonable thing to go. -Atelaes λάλει ἐμοί 09:21, 10 March 2009 (UTC)
You make good points. I honestly haven't given much thought to the matter, so I will leave the issue to those who have. -- Visviva 16:05, 10 March 2009 (UTC)
Since I'm the one 'guilty' for having proposed to add the gender to the translation tables, I'll try to explain why I did so back in the early days of Wiktionary.
Back then there were not many words/translations with their own entries and having to add full entries for each and every translation, just to be able to add the genders as well, was a bit much to ask of contributors who simply wanted to add a translation. OTOH, I do think the gender is important in many languages in order to be able to know which (form of the) article/adjective to use and how to flex the past participle (in Romance languages, French in particular). So it's nice to see the information without the extra click.
Anyway, if people want to drop gender from translation tables when they know it's in the term's entry, I won't oppose, nor endorse that. It's something that I would show along with the translation if the data came out of a DB, but maybe it's not wise to duplicate the information. --Polyglot 02:04, 11 March 2009 (UTC)
In the translation tables, I find the gender useful as a marker that there are other forms without having to list each form. It's certainly preferable to having each form given individually in the translation table. As for derived terms, I might could see it for cases where the gender differs from the root term, but not as a general rule. Carolina wren 03:42, 11 March 2009 (UTC)
In this case I feel we have to rely on Wiktionary conventions for various languages. It would be a bit silly to put (present active indicative first person singular) after every Ancient Greek verb translation (although not nearly as silly as listing the 500 odd forms which most verbs have). Each inflected language needs to have a standard for lemmata for each POS, and that should suffice. -Atelaes λάλει ἐμοί 09:35, 11 March 2009 (UTC)
Current practice is to only markt he gender in Translations tables if the gender is fixed (i.e. for nouns). Marking a translation with m will imply that it exists only as a masculine form. --EncycloPetey 22:02, 14 March 2009 (UTC)
I never both to mark the gender for Derived terms or Related terms, because it's often given only for nouns (which have fixed gender) and for adjectives (which have variable gender). The information isn't relevant for the current entry in any way. However, I also don't usually bother to remove such gender notes when I come across them, since I don't recall the community ever deciding whether or not they should be included. It's currently a matter of preference to include/exclude them, as far as I know. --EncycloPetey 22:02, 14 March 2009 (UTC)

Standardizing English dialect names

I'd like to standardize some template names and category names, for English dialects. Since we're a multilingual dictionary, the dialect category names should include the language name, for clarity. This would also allow the template code to be standardized (as per template talk:context#Creating label templates).

Let me know if any of these proposed moves sound wrong. Michael Z. 2009-03-09 16:52 z

"England and Wales English" feels iffy, but there are likely no other good ways to express it, so... Circeus 17:20, 9 March 2009 (UTC)

Also, for these local names should we prefer a traditional adjectival form which may be unfamiliar to many international readers (Bristolian English, Cornish English, Cumbrian English, Liverpudlian English), or a clearer attributive (Bristol English, Cornwall English, Cumbria English, Liverpool English)? Since these are native English names, I'm thinking there's nothing wrong with the former, since each category page may carry a description. But if, e.g., Bristolian is a synonym for Bristol English, would that make Bristolian English sound silly because it's redundant? Michael Z. 2009-03-09 20:12 z

No. "Bristolian" means "of Bristol", so "Bristolian English" is the English of Bristol, but equally "Bristolian" is the language of Bristol (as English is the language of England). Since Bristol is in England and uses some dialect of that language, I don't think there is any redundancy. You can use either term. Equinox 01:27, 14 March 2009 (UTC)
Thanks. I'll use the adjectival forms where appropriate.
I've started moving these, and I'll mark them off as I go. Michael Z. 2009-03-14 16:19 z

Template:London is not a regional dialect template, but a geographic context label. I'll change its text to in London, and create a new dialect label to accommodate the two or three entries which are actually London slang. Michael Z. 2009-03-14 20:37 z

Maybe I'm a little dense, but why should London or any other geographic location have a geographic context label? That doesn't seem to me to be the sort of thing that should be handled by context labels at all, but rather as part of the written definition itself if appropriate. Carolina wren 20:45, 14 March 2009 (UTC)
Print dictionaries do it one way or the other; some both ways. I figure if the locale actually defines the term, then it belongs to the definition. If it's a case of usage being restricted to references to the thing in the locale, then it is appended as a label. Using a label is also well suited to Wiktionary's way of structuring entries.
Not sure if this is warranted in every entry for {{London}}, though. Michael Z. 2009-03-14 23:05 z
Exactly right. To put it another way, some words carry a meaning only in a limited part of the language's range. There are many spanish words, for example, that have additional or different meanings in the Western Hemisphere and which are not found used in Spain. For these terms, we tag them as having a restricted geography in which the term is applied. Some words don't even occur outside of a narrow gepgraphic range, and this information is context of its usage just as much as saying it is restricted to the jargon of a particular field or restricted in use as slang. It's merely a restrcition in geographical use rather than in register. --EncycloPetey 23:46, 14 March 2009 (UTC)
To be a little more explicit: I'm not referring to regionalisms which are only used in a particular region or belonging to a regional dialect. Rather I mean restricted referent: E.g., SAS is used in English worldwide and not restricted to speakers of British English, but refers to the Special Air Service which is in the UK, or to the Scandinavian Airlines System which is in ScandinaviaMichael Z. 2009-03-15 00:04 z
One problem I've noted: Category:Northern English does not identify which north. The US has northern English dialects as well ("Lots of places have a north."). The category ought to be named Category:Northern UK English or something similar. --EncycloPetey 23:41, 14 March 2009 (UTC)
Hm, actually that would be Category:Northern England English (Scotland is north of that, and still in the UK). No conflict yet, because no one has been labelling US regionalisms at all. I'll add this to my long list. Michael Z. 2009-03-15 00:04 z
No time like the present: see #Move Category:Northern English > Category:Northern England English, below. Michael Z. 2009-03-27 20:23 z

circumfixes and circumpositions

I'd like to know how to name circumfixes and circumpositions. I've found a- -ing and en- -en, and I think using "- -" is good for circumfixes in languages with alphabet. You can easily create a page for the German ge- -t. But how do you write those in Japanese or Chinese? They don't use a hyphen for prefix and suffix entries.

And is there any rule for circumpositions? In English, what should be the title for the phrase for ...'s sake? We should have a rule for interwikis' sake... The two existing entries omheen and vanaf don't help much. — TAKASUGI Shinji 01:45, 10 March 2009 (UTC)

Why not use hyphens for Japanese and Chinese affixes? We use them for every other language, and I've been using them routinely for Korean even though most Korean dictionaries do not. This seems like a matter of style rather than correctness, and all other things being equal, it makes sense to apply the same style across all of our languages. -- Visviva 03:10, 10 March 2009 (UTC)
See this conversation about the use of a hyphen or a "Hebrew hyphen" in Hebrew prefixes' entries' titles.—msh210 22:01, 11 March 2009 (UTC)
Our usual approach is simply to redirect all common variants to the most common or unmarked form (e.g. "for God's sake"). But there are cases where that is awkward or impossible. For such cases, I don't think any resolution has been reached. Both "..." and "X","Y" have been used, to mixed reviews. -- Visviva 03:25, 10 March 2009 (UTC)
Okay, using "..." (or "…" ?) as a redirect seems nice. Thanks. — TAKASUGI Shinji 04:17, 21 March 2009 (UTC)

Razorflame making trouble on meta

See meta:Requests for comment/Sysop abuse on English Wiktionary. Unfortunately with no attempt whatsoever to understand policy and procedure here.

Needs to be replied to; I've added my 0/02. Robert Ullmann 15:10, 10 March 2009 (UTC)

(oh, and he wants to de-sysop SemperBlotto? ROTFL) Robert Ullmann 15:13, 10 March 2009 (UTC)
I withdraw my request for comment, and I have retired from this Wiktionary. You will hear no more from me again. Razorflame 15:15, 10 March 2009 (UTC)

Robert, please consider your wording before pressing "save page", so that you avoid making people look like fools when they do something with good intentions. Frankly spoken, though I don't want our blocking policy to be like Wikipedia's, I would consider it more correct to warn first, then block. --Eivind (t) 15:27, 10 March 2009 (UTC)

I'm sorry. But it can be hard to keep someone from being foolish, or looking that way. Please look at the item on SB's talk page, and note that Razorflame did not ask about the blocks, not ask about policy or convention, but proceeded directly to berating SB and threatening de-sysopping. The softest response is (and was) snark. A serious reply would have been to request censure of Razorflame for making personal threats, but clearly neither SB nor I thought that that was called for! Robert Ullmann 16:00, 10 March 2009 (UTC)

Well, that was all rather unfortunate. :-( I hope Razorflame comes back some day; we've had editors who have caused a good deal more trouble than that and still gone on to make good contributions to the project. But, yeah, that's not really the way we do things 'round here. -- Visviva 15:46, 10 March 2009 (UTC)

If this is the way that people get treated when they try to show people what they do wrong, and give them suggestions on how to fix it, I probably won't be coming back. Razorflame 15:47, 10 March 2009 (UTC)
If you first try to find out if it is wrong, and then make suggestions, that would be wonderful. When the first thing you bring up includes a direct threat, not so much, eh? Robert Ullmann 16:04, 10 March 2009 (UTC)
If I wanted it to be a direct threat, I would've said I want you desysopped, not I believe that your use of the block tool is bordering on abuse of the blocking tool, and because of it, I believe that you should be desysopped. There is a difference. Make sure you learn it before you shove that in my face. Razorflame 16:07, 10 March 2009 (UTC)

I would like to formally apologize to the community as a whole, and to whoever was involved in this fiasco. Sorry for the drama that I caused, and you won't have any of this from me in the future. Thanks, Razorflame 16:51, 10 March 2009 (UTC)

I would be more than happy to have more from you in the future. Though I am not one of them, this projects have a lot of contributors who has been here quite some time and who have their ways of doing stuff. I reckon some feedback on these ways always are welcome. But, of course, I reckon both parts can do things better when communicating next time. --Eivind (t) 22:12, 10 March 2009 (UTC)

Propose change in the blocking policy

Hi there all. I would like to peacably propose that we change the blocking policy here to say warn first, then block, instead of block first, ask questions later. I believe that the latter method is highly destructive and I believe that you would be better off with warning users first, and if they repeat the action a second time, then block. Thanks, Razorflame 17:11, 10 March 2009 (UTC)

Oppose. Blocking clearly malintentioned anonymous edits, even if first edits, even if without any warning, is an effective way of how to administer a wiki site with many entries and few administrators. --Dan Polansky 18:10, 10 March 2009 (UTC)
Oppose per Dan. -Atelaes λάλει ἐμοί 19:03, 10 March 2009 (UTC)
Oppose, sorry. I do think we should be giving warnings more often than we are, but it seems counterproductive to modify the blocking policy in a way that doesn't reflect common practice: all it will do is confuse newcomers. If you can convince the most active patrollers to do as you suggest (which, TBH, I'm certain you can't), then I think the next step after that would be to codify the change. —RuakhTALK 20:48, 10 March 2009 (UTC)
While I oppose the proposal per others' comments, I can't agree with your reasoning, Ruakh. If the consensus is that a policy should be implemented, then implement it, and those who don't agree should conform anyway.—msh210 21:52, 11 March 2009 (UTC)
I think that the best way to deal with vandals is to revert (once or a few times), without any other action, not even warning. If the vandal insists, either a block or a very short warning may be appropriate (blocking without warning is not a real problem with clearly malintentioned editors). This is (probably) much more effective in the long term than blocking at the first offense. Lmaltier 21:49, 10 March 2009 (UTC)
Oppose - Jonathan Webley 21:53, 10 March 2009 (UTC)
Very well, then. If you like the policy the way it is, fine with me. Cheers, Razorflame Public 22:01, 10 March 2009 (UTC)
Oppose. The ratio of admins to contributors is far too small to afford such a luxury. —Stephen 22:10, 10 March 2009 (UTC)
Oppose --Polyglot 01:47, 11 March 2009 (UTC) You don't want to see sysops burn out. Be very glad that the active ones are doing the ungrateful job they are doing. You can't imagine how quickly the project would deteriorate without their constant efforts to weed out the vandalism and spam. I know what I'm talking about. I was almost alone 3-4 years ago when this project was starting to attract the attention of vandals and spammers and it quickly becomes boring to be policing instead of being able to concentrate on the far more interesting linguistic part of the work. I have great respect for people like Semperblotto, who is finding the strength to keep fighting against this for more than 3 years now already! And I would like to thank him and all the others for it. Making their/our life harder by complicating policy is not at all what is needed. It may be harsh that innocent contributors get blocked as well this way, every once in a while, but that's the price we have to pay, I'm afraid.

Our administrators have at least 8 times as many pages to look after. We don't have the luxury of spare time to coddle some asshole who wants to vandalize the dictionary. I don't know what incident spurred this discussion, but guideline seems aimed at only punishing those who are intentionally harmful, so it looks good to me.

  • Wikipedia: 1,626 admins,[1] 2,787,468 articles[2] (1:1,714)
  • Wiktionary: 83 admins,[3] 1,186,749 entries[4] (1:14,298)

 Michael Z. 2009-03-11 15:35 z

It's not quite as simple as that. While we have more pages per admin, most of those pages are verrry quiet. I currently have more than 14,000 pages on my watchlist, and even with bot edits visible it's dead quiet most of the time. If those were Wikipedia pages, I'd be tearing my hair out just trying to keep up (I speak from experience). I'm far too lazy to figure out the actual daily edit rates for the two projects, but if we extrapolate from total page revisions, there are about 180,000 page revisions per admin on the pedia, vs. about 77,000 revisions per admin here. That suggests that the total workload in terms of admin patrolling is actually a bit lower here (and that rings true IMO; I would blame it mostly on WP's constipated RFA culture, which keeps admin numbers dangerously low). I agree that the pedia approach wouldn't work well, but I think it has more to do with the nature of our project, and the kinds of edits that are made/needed. -- Visviva 16:26, 11 March 2009 (UTC)
No, but most people qualified to be vandal-checking on Wiktionary are admins (or are likely to be so soon). On Wikipedia, lots of people who watch for vandalism are not admins and never will be. So WP has a large pool of help that we don't. Wikipedia also uses bots to patrol for vandals, which we do not do (for many reasons). As a result, the per-person workload is indeed greater for Wiktionary. --EncycloPetey 21:56, 14 March 2009 (UTC)
Oppose and see below Robert Ullmann 15:56, 11 March 2009 (UTC)
Oppose, although I admit that I'm a little bit late in seeing this discussion. However, all of the reasons stated previously are still good reasons. --Neskaya kanetsv 16:41, 16 March 2009 (UTC)

on warnings

Let me see if I can explain a bit more. As I see it, "newbie" contributions can be put in 4 groups:

  1. Contributions from users familiar with en.wikt format, or taking enough time to learn a bit: often people from other wikts, or the pedias, etc, or "brand new" but clearly looking at what goes in a entry. These people should get our welcome message, and are not a problem.
  2. Contributions that are misguided, badly formatted, mis-understand something. These people get a nice welcome message, and a note about what they are not doing correctly.
  3. Junk from people paying no attention, using entries as the sandbox. These get reverted.
  4. Outright vandalism: deleting content, replacing it with obscenities (one wishes—forlornly—that these would exhibit more imagination ...), and very frequently direct personal attacks. ("Look, you're in the dictionary under 'dork'! Ha ha!") These get temporary IP blocks, or permanent logged-in-user blocks ("vandal-only account"). And the damage reverted.

See? Note that there are no "warnings". We are much friendlier than the 'pedia! We don't put big STOP! signs on people's talk pages. If they show any interest at all in contributing, they receive a nice welcome (if a bit impersonal) and some help. If they are vandals, there is no need to bother with them.

If someone does persist in edit-warring on a page, etc, we protect the page for a while, rather than warn and block the user. Much better than the draconian (and immensely overblown) disciplinary process on the pedia. (Blocks aren't supposed to be punishment, right?) Maybe the pedia could learn from us?

(So how does one tell the difference between 2 and 3 and 4? I listened to a talk by the Assistant Warden of Cedar Junction state prison years ago. One of the things he said was that too many people were being sent by the courts into the prison system. His words were: "Two thirds of the inmates in my facility should not be there and one third should never get out." It isn't hard at all to tell which is which. If you wonder about a specific IP block, go look at the contribs; if it was something deleted and you can't see it, an admin will probably be happy to tell you want sort of thing it was, if a legitimate request.)

One more little thing: a reason for blocking, even if just for an hour or 3, is that it appears in the block log; when the vandal re-appears a day or a few later, it is visible when re-blocking, persistent vandals are more easily identified. Robert Ullmann 15:56, 11 March 2009 (UTC)

This is an excellent summary IMO; it should be posted somewhere semipermanent. But I do think we have a little trouble telling the difference between 2 and 3 sometimes. For example, when I'm creating entries from my newspaper lists, I often find that the entry has been previously deleted. I don't always bother to check the deleted revs, and most of the time it was just vandalism, or something completely unrelated, but maybe 20% of the time the deleted entry turns out to have been a poorly-formatted version of the one I was about to add. And I think I've made some slips of this type myself. As long was we don't mistake 2's for 4's, it's not a severe problem, but we should be aware of our fallibility on this score. -- Visviva 16:35, 11 March 2009 (UTC)

Transwikis from other Wiktionaries.


I'd like to propose that we allow transwikis from other Wiktionaries. Admittedly, most other Wiktionaries aren't in English (the exception being simple:), but I think it would be useful to be able to bring over an FL entry into our Transwiki: namespace, then translate it to English and format it our way. I often take content from he:, just acknowledging it in the edit summaries; but starting with the he: entry here, then translating it to English, might be smoother.

I believe we have to have a vote on this before it will actually get implemented, but I figured I'd start a discussion first. Does anyone have any opinions?

RuakhTALK 21:51, 10 March 2009 (UTC)

I would be in support of this idea. I like the idea and definitely think that it would help to streamline the process of transwiking pages from other Wiktionaries. Cheers, Razorflame Public 22:01, 10 March 2009 (UTC)
I agree as well! I would like to be able to import some good entries from no.wikt on Norwegian words. --Eivind (t) 22:15, 10 March 2009 (UTC)
Why not? I would appreciate it if the transwiki items remained separated by source wiki. DCDuring TALK 22:39, 10 March 2009 (UTC)
I'd only support this if the entries did not get put into Transwiki. They should be immediately re-formatted and saved as proper entries. We have way too much stuff languishing in that space as it is. (besides, as noted somewhere recently, the longer the page is in transwiki the more out of date it becomes). Conrad.Irwin 00:15, 11 March 2009 (UTC)
Well, stuff languishes there because the current use cases are a "push" process: Wikipedia decides they want us to take something off their hands, so we bot-import it and no one necessarily ever looks at it again. These inter-Wiktionary transwikis would be a "pull" process: an administrator would only transwiki an entry that he was specifically interested in, or that a non-administrator was specifically interested in and had asked him to transwiki. So I'm not terribly worried about that. But if you'd rather we brought them straight into namespace 0, I'd be down with that. (Really, I think the ideal would be to transwiki them into a subpage of the user who wants them, but Special:Import doesn't seem to support that kind of thing.) —RuakhTALK 02:15, 11 March 2009 (UTC)
Note that the namespace is entirely up to the importer. We could decide by policy to limit import targets to main namespace, but that's a separate matter from enabling the tool. Dominic·t 13:55, 11 March 2009 (UTC)
Yeah, but it makes sense to figure it all out at once, rather than enabling the tool but not letting people use it until we determine ground rules. :-P   —RuakhTALK 14:10, 11 March 2009 (UTC)
What actually needs to be done to enable this? Is there a specific extension involved, or is this a setting that just needs to be flipped on? Is the feature documented somewhere? (I went looking for transwiki documentation once and found pretty much zilch; maybe I wasn't looking in the right places.) -- Visviva 14:12, 11 March 2009 (UTC)
In general, import is a function that just needs to be turned on. And it's probably too trivial to need a vote of any sort. However, this case is a little different, since we'd be asking for import from all Wiktionaries (and, really, for the same reason, I'd say all Wiktionaries should be able to import from any of the others), since it's a list of project names that could get long; it may necessitate a software change first. Dominic·t 14:27, 11 March 2009 (UTC)
It seems to be controlled by the $wgImportSources configuration variable. I imagine we would need to give a developer the list of interwiki prefixes. —RuakhTALK 15:03, 11 March 2009 (UTC)
I think that this process is not trivial enough to not have a !vote on the issue. I believe that since this would be very involving, that we should involve the community of the English Wiktionary in on it, and that that would also allow for other users to voice their opinions on the matter. I, for one, would love it if people would not transwiki pages into the Transwiki namespace as that namespace has quite a few articles hanging around in there with nobody to tend to them. I would be alright with them being transwikied either directly into the mainspace and then having the importer in question immediately then simplify and format the entry for our uses here, or have the importer in question transwiki the page directly into their userspace (a subpage) and then simplify/format the entry there and then move it into the namespace when it is correctly formatted and simplified for our uses here. Everyone agree or disagree on this? Cheers, Razorflame 17:19, 11 March 2009 (UTC)

I've now created Wiktionary:Votes/2009-03/Transwikis from other Wiktionaries. It's set to start in a week; please take the opportunity to look it over. This is a wiki, and I make no claim of ownership over that page; if you feel that the options presented are the wrong ones, please be bold. —RuakhTALK 18:45, 11 March 2009 (UTC)

I don't really like seeing histories cloned and existing in more than one Wiktionary at the same time and will probably vote that way when it gets rolling. Couldn't one just cut and paste the code for the word into a fresh entry spot or your sandboxes, then use your expertise to translate it into English, then save it into an English entry page (meaning, the first one to bring it in this way would be the one starting a new history here)? Goldenrowley 02:15, 16 March 2009 (UTC)
Yes, you can, provided you indicate that in your initial edit summary (otherwise it's a copyright violation). —RuakhTALK 01:06, 21 March 2009 (UTC)

The vote has now begun. —RuakhTALK 01:06, 21 March 2009 (UTC)

Category:Wiktionary policies Spring cleaning

Here is a list of the pages in that category and a perceived view of the state they are in. It'd be great if we could make Wiktionary space tidier (but obviously don't waste time that would otherwise go into writing the dictionary!). Please amend and annotate these lists as you see fit.

Actively maintained.
Probably accurate, but may need tweaking/expanding
WT:BLOCK First paragraph should be consistent with the rest.
WT:BOT Should be clearer and mention WT:VOTE.
Wiktionary:Capitalization Reads more like a help page.
Wiktionary:Citations Not sure that the bit about redirecting is consensus.
Wiktionary:Obsolete and archaic terms Seems informative, if messy
WT:DELETE Needs converting to Wiktionary terminology, though seems to reflect policy
Wiktionary:No personal attacks merge with WT:AGF and Wiktionary:Civility
Wiktionary:Policy - Abbreviations Could be useful, but not without some serious thought
Wiktionary:Signatures delete after copying relevant sections into WT:USER
Wiktionary:Spam Most of this under WT:CFI, WT:DELETE, WT:BLOCK. might be good to have an explanation, but not really policy
WT:USER A bit about CentralAuth would not be amiss
Wiktionary:Reconstructed terms
Wiktionary:Translations and Wiktionary:Translations/Wikification
Things I can't guess about
Please move things from here into appropriate other lists.
A bit of a mess, needs serious work.
Wiktionary:Civility Dumped from 'pedia.merge with WT:AGF and Wiktionary:No personal attacks'
Help:Creating a Wikisaurus entry Do we know what a Wikisaurus entry should be like yet? Wiktionary:Thesaurus considerations
Wiktionary:English Phonemic Representation delete should be covered by Wiktionary:Pronunciation
Wiktionary:Languages with more than one grammatical gender delete should be covered by Wiktionary:Translations/ the "About Language" pages
Wiktionary:Links some good ideas, but needs expansion (should probably not even be policy, just a document to help newbies)
WT:NPOV Remove the bit about userpages, rewrite to reflect Wiktionary-specific issues.
Wiktionary:Policies and guidelines Hopelessly out of date
Wiktionary:Referencing dictionaries The bits about CFI should be in CFI, the rest should probably be somewhere (if it's accurate)
Wiktionary:Spelling variants in entry names Horrendously out of date, should merge with redirects probably
Wiktionary talk:Template documentation Umm, might be nice to have some policy about templates into which this could slot.
Wiktionary:Thesaurus considerations. Needs updating, merge with Help:Creating a Wikisaurus entry.
Wiktionary:Transliteration and romanization Needs updating; whether we need or want transliterations in inflection tables etc. has been disputed

I don't know what other people's thoughts on this are, but I'd certainly be ecstatic if some of these pages could be made to look loved, even if only for a week. There are many other pages in Wiktionary space which are "policy-ish", feel free to add to the lists above and to add extra annotations. If you have lots of time on your hands, please integrate the above suggestions into the documents (for the ones marked as true policy, create a new copy that we can vote on). Conrad.Irwin 01:40, 11 March 2009 (UTC)

Strong support. Your characterizations seem about right to me (though I would question whether anything but ETY is "actively maintained" at the moment). A template policy/guideline should be a top priority IMO; there are a lot of unwritten rules that newbies and oldies alike run afoul of.
Note that WT:AGF, having been ratified as {{policy}}, is currently in the same can't-be-modified Catch-22 as CFI and ELE. I also wonder if, in the process of merging and updating it as you suggest, we should move it to a more Wiktionary-specific location, calling it Wiktionary:Good-faith editing or maybe Wiktionary:Tact or Wiktionary:Interacting with humans or something like that. We function very, very differently from Wikipedia (and from most of the other projects, which tend to model themselves on Wikipedia)... anything we can do to draw attention to that fact is probably good. -- Visviva 04:31, 11 March 2009 (UTC)


I'm new ad I was wondering what to do? —⁠This unsigned comment was added by Sgaetano1 (talkcontribs).

Welcome! We've got lots and lots of stuff that needs doing, but it depends what your skills and interests are. You might want to start with Wiktionary:Requested entries or Category:Requests. -- Visviva 17:05, 11 March 2009 (UTC)
I often link new folk to Help:Example sentences. Conrad.Irwin 21:04, 11 March 2009 (UTC)
That's a very spiffy page. (But what's up with the "Official policy" bit?) -- Visviva 05:18, 12 March 2009 (UTC)
It's just a quote from WT:ELE. Mainly there, "because I could", I think. Conrad.Irwin 09:48, 12 March 2009 (UTC)

Request for transwiki bot approval - User:HersfoldBot

Hello everyone,

I am Hersfold, an administrator over at Wikipedia (although I know that doesn't count for much here :-) ). I know I haven't made any actual edits here, but I am here to make a request nonetheless. I have written a bot, User:HersfoldBot, that is designed to import articles marked to be transwikied from the English Wikipedia to here using the Special:Import function. Currently, this process is done manually, as the only previous bot fulfilling this purpose is no longer running. The problem with the manual process is that it does not copy over the entire history of an article unless one of your administrators uses the import tool I previously mentioned.

HersfoldBot works by gathering the list of articles on Wikipedia to be transferred here, then one by one checking to see if there is already an entry by that same name in your Transwiki: namespace. If there is not, the bot will use the import API to copy the entire history of the article over here, while appropriately logging the transwiki both in our log at w:Wikipedia:Transwiki log/Articles moved from here/en.wiktionary and your log at WT:TW. The article will be updated accordingly on our end, and continues on until it runs out of article or runs into a problem. There are multiple fail-safes built into the bot to prevent it from breaking anything. In order to operate, however, this bot will require both a bot flag and the import right; while this is most commonly given along with the rest of the admin tools, it seems unnecessary to do so that way since the import right can be individually given by a steward. On the other hand, that involves going several rungs up the ladder.

There is more detail about how the bot operates at the Wikipedia request for approval, located w:Wikipedia:Bots/Requests for approval/HersfoldBot. It is not yet approved there, however should be soon. The bot's source code (in Java) can be viewed at w:User:HersfoldBot/Source, and the results of tests conducted at Wikipedia and the Test Wikipedia can be reviewed at the bot request I linked and off of the bot's Wikipedia userpage. If you have any questions, please let me know - I'll be checking this page regularly. Thank you for your time and consideration. Hersfold (talk) 01:49, 12 March 2009 (UTC)

I don't see any intrinsic problem with this, as it is simply speeding up and improving a process already in place. However, I wonder about the utility of the 'pedia transwikis in general. It sometimes feels as though we're getting a bunch of junk that we really don't want, simply because the 'pedia doesn't want it either, and feels bad about flat out deleting it. I'll ask Goldenrowley to comment on this (being probably the only user currently engaged in the daunting task of cleaning up and organizing the transwikis), and probably defer to his judgment on the situation. -Atelaes λάλει ἐμοί 02:12, 12 March 2009 (UTC)
It sounds like a great time saver, if it accurately does the steps explained above. The previous bot did not screen the imports very well & that seems to be where the problem lay in previous years. Its an improvement over our last bot if it will be "checking to see if there is already an entry by that same name in your Transwiki: namespace" ...MAKE SURE it ALSO looks in the main Wiktionary namespace which is even more important. I cannot comment on the Javascript as I am just a novice to Javascript. Once it does the import ^You did not mention but I see you have plans to also use the after handling cleanup flags (TWCleanup and TWCleanup2) correctly^ Goldenrowley 02:24, 12 March 2009 (UTC)
One more thing I suggest if the article is over about a certain number of characters long, the bot asks for a manual review first because it may be more appropriate on Wikipedia if it goes beyond the definition, or more likely a glossary that we already have under another name. It is easier to define a word from scratch than to pull it out of super long articles. Goldenrowley 02:34, 12 March 2009 (UTC)
Responses so far: When the idea for the bot was given to me, I was told not to check the main namespace as content from the imported article could possibly still be merged in. If this isn't what your after, I can certainly change that - it's a matter of adding one line of code.
The bot is actually coded in Java, not Javascript; Javascript is a web-based language, and this runs off of my computer.
Yes, the bot will be tagging the articles with the appropriate templates on Wikipedia's end once the import is complete.
I could add a character limit, although since it'll be manually reviewed afterward on both ends anyway, I'm not quite sure how useful that would be. Hersfold (talk) 02:51, 12 March 2009 (UTC)
Re: "Javascript is a web-based language": not true! The most popular use of JavaScript is in browsers, but it's actually a general-purpose hosted language, and quite a few different kinds of tools support it. —RuakhTALK 03:12, 12 March 2009 (UTC)
Please do add a line of code to compare it to mainspace entries, as merging the articles is very painful process and difacult and we don't keep the 5000 Transwiki's that we already imported to mainspaces so we'd get them back again. Sometimes it's a good idea but that should be a manual decision. Duplicate words that did not add value accounted for about 35-50 percent imports from the last bot....on the character limit, I think the character limit will ensure a little manual control on longer articles, because once handing them off to Wiktionary, I think it should truly be a hand-off. Otherwise it sounds like a good bot. Goldenrowley 03:25, 12 March 2009 (UTC)
Ruakh: True, you're right. I should know better. :-) Goldenrowley: I'll add the code in for the mainspace comparison now, then. I'm still uncertain on the character limit, though, because especially in wikimarkup, a lot of the characters are formatting. Also, some articles that get sent to our Articles for Deletion process can be fairly longish, and those have a community consensus on our end to be sent here. Whereabouts would you be looking for a character limit to be set? Hersfold (talk) 05:40, 12 March 2009 (UTC)
N.B.: If comparing to mainspace titles, make sure to compare also to the lowercase-first-character version (since, as you may already realize, our PAGENAMEs are case-sensitive, unlike enWP's).—msh210 18:55, 12 March 2009 (UTC)
Hey, I think this is an interesting idea, and would support it, but I wonder whether it might be better to run this with no bot flag - there are a few people, including me, watching recent changes who might be motivated to deal with something that they saw coming across but who find Special:AllPages/Transwiki: a bit daunting, and given that there are only 37 entries in that category, it can presumably be slowed to transwiki once every five minutes? Also, better behaviour for pages that exist in the Transwiki: space, but not yet in mainspace, would presumably be to re-inport (though I don't know if that's possible). Just throwing stuff around, would support either way. (Heh, and congratulations for ploughing through that much Java (pretty much my least favourite language), the previous bot was written in shell script calling pywikipediabot functions.) Conrad.Irwin 10:24, 12 March 2009 (UTC)
I would support this if you promise to do some weeding first. Very many of the articles transwiked from -pedia are just rubbish and should have been simple deleted. SemperBlotto 10:33, 12 March 2009 (UTC)
Conrad: I can change the throttle speed if needed; you do have a point there. Everything is currently set to the default speed, which seems to make one edit roughly every ten seconds or so, which should be slow enough it wouldn't need a bot flag anyway unless the API got finicky. The bot will log all of its imports to the easier-to-navigate WT:TW in any event. Also, while the bot can import a page into an existing article, the documentation says that the result often isn't pretty, and "Therefore this should not be done except to reconstruct the true page history". I'd prefer to avoid this since it could easily make a mess of things and has the potential to cover up content which could be much more useful.
SemperBlotto, what exactly do you mean? The bot is incapable of making such a judgement by itself, and the articles it will transwiki will have already been reviewed and tagged by a human editor. Our deletion policy does not support the deletion of dictionary definitions without a discussion on the matter. Hersfold (talk) 18:51, 12 March 2009 (UTC)
Thanks Hersford for your consideration. I don't want to hold up this idea, requesting a character limit is negotiable, I was thinking it anything longer than a screen full of text is a pain to edit down to a dictionary definition. Per SemperBlotto's concern, a bot can't really change the quality of articles chosen to send to Wiktinary that will take helping Wikipedia make their instructions clearer perhaps for example we can beef up their instructions with our Criteria For Inclusion WT:CFI. Goldenrowley 01:19, 13 March 2009 (UTC)
One question, will the bot be able to strip out the "copy to Wiktionary" commands from Wiktionary and add "[[Category: Move to Wiktionary 2009-0X]] at the top of the import - with X being the month of import? This will be very helpful. Goldenrowley 01:24, 13 March 2009 (UTC)
Having looked at the articles currently slated to be moved here, it looks like you'd be looking for a character limit of about 5000 - this seems to be a little more than a screen of text on my computer. Most of the articles are well under this, being one-liners. The two really big articles we currently have are the two lists, and I can see why you'd want to deal with those manually. I'll wait to add this until you confirm it.
I can have the bot remove the template and then categorize it on your end, yes; that shouldn't be too difficult. Hersfold (talk) 04:34, 13 March 2009 (UTC)
5000 sounds safe... thanks. Sempoblotto I worked on your concerns by changing a template at Wikipedia see [[5]] -- this template and five variants use to flow words into the "Move to Wiktionary" category without any documentation, hence inflating it with unchecked words. I also went ahead and assessed the words there tonight so this template is currently empty but will be useful in the future. I also worked on the Wikipedia's moving instructions to explain our criteria etc. Goldenrowley 05:45, 13 March 2009 (UTC)
No problem. Should I have the bot log on this end that articles weren't imported for the character limit, so you know to check it eventually? Tagging it on our end helps to some degree, but our editors won't have a clue what to do about it. Once we've got that clarified, though, I'll create a w:Template:ManualTranswiki that the bot will use to tag those articles with. Once they've been reviewed and cleared for importing, someone can set a |import=yes option to have the bot carry through with it anyway. Hersfold (talk) 15:50, 13 March 2009 (UTC)
I would love if you could take that extra step to have the bot list items that were over the character limit in some place to us... perhaps on a fresh page so it can be put in an attention category and handled a little differently? It sounds like teamwork! Although we have a backlog so you may find we don't act very quickly on new Transwiki's, if that's okay. Goldenrowley 04:52, 14 March 2009 (UTC)
Where at and how? As for the backlog, that's not big deal; the bot will ignore these articles until it gets the OK to import them anyway. Hersfold (talk) 21:20, 14 March 2009 (UTC)
To clarify, what page should these reports be made at and what format should the bot use? Hersfold (talk) 03:20, 15 March 2009 (UTC)
This is not very original but how about if we started a page named "Wiktionary:Transwiki_log/Long articles for review". For syntax, how about just use the pound sign (#) and Wikipedia crosslink and the bot's timestamp-datestamp? Goldenrowley 23:11, 15 March 2009 (UTC)
(outdent) Ok, I'll go make the page now and add some directions to it. Is there anything else the bot could potentially do? It seems like we've covered most everything. Hersfold (talk) 01:14, 16 March 2009 (UTC)
Ok. Our old bot used to clean up a lot of the Wikipedia codes that look ugly when imported here, but I can't advise you what magic it used to do it. I'd be happy as long as you can do the other things we already mentioned. For clarification I think we should wait to create the page named Wiktionary:Transwiki_log/Long articles for review until the very day you're ready to import. I can at that point add it to the Transwiki Table of contents, and give it a header and intro. Goldenrowley 02:26, 16 March 2009 (UTC) ... Oh, you already created the page. 02:28, 16 March 2009 (UTC)
Oops, too late, I already made the page. Sorry about that. As for cleaning up the wikimarkup, there's so many different things we add to our articles it's probably easier to leave it the way it is so that it's more noticeable when you go to edit it. There's no way the bot could possibly catch everything, and it's very likely it'd make a mess of things at some point no matter how accurate I tried to make it. I'm just not sure that's really feasible; possibly in the future, but right now I think the main priority is to get it running. Hersfold (talk) 03:18, 16 March 2009 (UTC)
Ok. I'd like to just note before we close out of fairness a lot of things coming from Wikipedia sent to us are very good, and fitting to be here ... Wikipedia sends some real nuggets our way, like hypodigm and well done glossaries. Now perhaps you might help encourage Wikipedians to come over here and clean up their words after the imports to fit our style... Goldenrowley 05:23, 16 March 2009 (UTC)
While it's not exactly my area of expertise, I'm sure we've had some good things come from you as well. Hopefully if people recognize they don't have to do so much work getting the articles here, they'll help out more fixing them up. :-)
If you don't have any more suggestions, would it be time to open a vote? I understand for bots these take two weeks or so, and I can't get the bot approved on Wikipedia's end until I get a final trial run finished. Hersfold (talk) 07:10, 16 March 2009 (UTC)
Sure it is a good time to have a vote. Goldenrowley 05:44, 18 March 2009 (UTC)
Vote posted at Wiktionary:Votes/bt-2009-03/User:HersfoldBot for bot status Hersfold (talk) 17:11, 18 March 2009 (UTC)

Specific entities

This is a request to rewrite WT:CFI as it pertains to the inclusion of specific entities, that is, proper nouns that refer to a single, unique object or place. Currently "a name should be included if it is used attributively, with a widely understood meaning". Sears Tower and until recently Empire State Building were named as failing this test. Many other current entries like Golden Gate Bridge and even France would fail as well, though there is community support to keep. In fact, there is good reason to include specific entities such as these. They have exact translations into other languages, sometimes based on the translation of certain words, or a transliteration of the spelling, or phonetic approximation, or some mixture. This can often be inaccurate from historical times but preserved to the present, so assumptions cannot be made. While an encyclopedia is useful for detailed information on the entity, translations are our linguistic duty.

In my mind the question is whether the term has entered the lexicon, which is weighed through citation, and hopefully not arduously. I propose to expand the criteria to "attributively or generically" and/or "metaphorically". I would also like to clarify what is meant by this with examples, as is done with a subpage for brand names. However, I am not certain myself on what would count, or even what does count now, particularly as there had been debate on the meaning of attributive. Your thoughts and examples on this would be greatly appreciated. Personally I don't believe that notability should play into the determination, as it does on Wikipedia but never here. However, I would like to add a clause similar to "clearly widespread use" to prevent overburdening with citation requests, perhaps with "clearly widespread knowledge of" or something similar.

What I mean by generic/metaphoric are uses that don't mean the entity itself, but an entity with analogous characteristics. For instance, use of a(n) instead of the is a great example, and thankfully not usually so difficult to search for: "the energy needed to keep a Sears Tower or a World Trade Center functioning". Also the...of as in "the Washington, DC, of Western Europe", possibly other constructions. Let me know what you think. 08:15, 12 March 2009 (UTC)

I'd be in favour, although I'd like to give everyone a chance to point out any obvious flaws we've missed. This is a likeable proposal because it reinterprets or relaxes the existing rule, rather than adding new ones.
However, I think we could go even further, within our mandate, although I can't think of any suitable criteria. To support toponymy, the division of onomastics which studies place names, we could certainly allow a much broader range of geographic names. This is done in etymological dictionaries, and specialized geographic dictionaries.
But we must stick to information about the terms, and not about the things they represent. Etymology, pronunciation, usage are in. Encyclopedic and gazetteer information are not. Michael Z. 2009-03-12 19:42 z
Wikigazeteer would be a great project within WMF. We could incubate it here. Initially, we could focus on the terms alone, which would build on the linguistic diversity among Wiktionarians. We would thereby, together with WP, provide a core for any eventual independent Wikigazeteer. We should be happy to serve the WMF community and eventual Wikigazeteer enthusiasts by providing the maintenance during the incubator period. Even in long run we could provide the much-appreciated service of etymological support as we do for Wikipedia and Wikispecies. We probably should provide a category structure ASAP. Who wants to run with this? DCDuring TALK 20:20, 12 March 2009 (UTC)
Let's concentrate on a dictionary of geographic names, on etymological principals. A gazetteer is fundamentally different – it's about places, not the names of places. Of course someone's WikiGaz project would be welcome to grab our data and structure, but let's not duplicate Wikipedia by making a gazetteer here. Michael Z. 2009-03-12 21:35 z
Do you mean, Mzajac, that London should be defined not as "1. The capital city of the United Kingdom and of England, situated near the mouth of the River Thames in southeast England, with a metropolitan population of more than 12,000,000. 2. A city in southwestern Ontario, Canada, with a population of approximately 300,000. 3. A city in Ohio, USA." but only as "1. A place name." or perhaps "1. A city name."?—msh210 21:54, 12 March 2009 (UTC)
Do you mean, Msh210, that if the (uncited, undated) population figures were removed, you wouldn't know which city was being referred to? Michael Z. 2009-03-13 04:33 z
Do you think that there should be separate entries for London, England and London, Ontario? Or just a single entry for London with separate senses? DCDuring TALK 10:24, 13 March 2009 (UTC)
Definitely one entry about the place name “London”, no entries about particular localities. Heck, for dictionary purposes we could list all of the places under a single sense as far as I'm concerned – but perhaps that would become unsatisfactory in the long run, as more information is added (e.g., Londonderry derives from the sense of London, England, not London, Ontario). Michael Z. 2009-03-13 15:03 z
So this would an entry with a definition that needed no definition, being a locus for pronunciations and translations? I suppose you might need some ability to discriminate between pronunciations by referent, say, between Cairo, Egypt (chiro), and Cairo, Illinois ('kay-ro). Would you want the entry to indicate whether the name was in current use with respect to a specific referent (see Salisbury) and, if not, what the successor was? DCDuring TALK 17:45, 13 March 2009 (UTC)
And etymology, of course. This would add depth to the remainder of our dictionary, as words spring from place names and vice versa (Tyndall stone, chicken Kiev, cf. Київ). I'm not sure what you mean about the referent, but currency of usage is relevant, and that a place is named after another belongs in the etymology.
Well, London can be defined as the city that is located near the mouth of the Thames, being the capital and largest city of the United Kingdom. In practice, I suppose many places would be identified rather than “defined”. This isn't much different from our special handling of entries on individual letters, numerals, symbols, etc.
I don't think the separate pronunciations for Cairo warrant two main headings – this could be done with a qualifier label for the pronunciation, or a usage note.
Perhaps such entries can be distinguished with a Toponym heading instead of just Proper nounMichael Z. 2009-03-13 22:51 z
(Belatedly responding to Mzajac's reply to me.) I don't mean anything. I was merely trying to clarify your intent. That said, I would not mind if only placenames that met strict criteria (such as our current attributive-use criterion) had full definitions (like "A city in England"), and all others were allowed in if attested but with only quasi-definitions like "A city name" or "A place name". I think that that's workable.—msh210 16:06, 19 March 2009 (UTC)
It could be definitionless. The entry could just be something like a Wikispecies entry, putting each place in hierarchies of places, eg, EU/Great Britain/England/London/Knightsbridge, providing etymologies, alternative spellings, synonyms, translations, and links to other projects. DCDuring TALK 22:44, 12 March 2009 (UTC)
I personally think that most place names should go in Wikipedia. I'd be fine expanding upon the "used attributively" clause but I don't want them all here. Wikipedia handles them much better (e.g. the pop-out WikiMiniAtlas was created for them). One reason people often mention for including place names is that someone needs to catalogue translations of place names, and since bilingual dictionaries often do this we should as well. But when I want to translation a proper noun (whether place name or other), I almost always go to wikipedia and find the appropriate iwiki link. To add one that's missing, create the article (potentially a "stub") in the target language wiki and insert in the iwiki links. While it's not perfect, it does a good job. Wikipedia is different than standard encyclopedias, and analogously just because bilingual dictionaries provide translations of proper nouns doesn't necesarily mean we should. We can reinterpret our goals in the context of the wikifamily. --Bequw¢τ 01:33, 13 March 2009 (UTC)
How many clicks does it take you? How many clicks could we offer? How likely would a user be to find us rather than Wikipedia from Google or similar starting point? DCDuring TALK 01:46, 13 March 2009 (UTC)
as a non-native speaker the standard pronunciation of say place names would be paramount for me to know to avoid ridicule of less knowledgeable native speakers [I speak from experience unfortunately, sad smiley]than one finds here in thewictionary community -- so please please, whether under form of links or direct entries, help make such information available!史凡 07:43, 13 March 2009 (UTC)

The general principle is: all words in all languages. In my opinion, New York, Phoenix, Stratford-upon-Avon, William and Shakespeare are words, and should be accepted. On the other hand, Phoenix, AZ or William Shakespeare are names, but not words, and should not be accepted. Lmaltier 21:35, 13 March 2009 (UTC) And Wikipedia does not include pronunciations, etymologies, derived terms, anagrams, etc. Lmaltier 21:40, 13 March 2009 (UTC)

Well, in point of fact, Wikipedia articles routinely contain pronunciation and etymological info. See e.g. w:Chicago. And thanks to interwikis, they contain translations into most languages that users are likely to need (and the FL wiki article will often contain useful contextual information that one would be unlikely to find on Wiktionary). As for derived terms, I'm not sure how much value they add, but I would support expanding the attributive use criterion to include any place name that is an etymon of one or more valid words. (In fact, that was my original understanding of the criterion, though I have since been disabused of that.) -- Visviva 04:48, 14 March 2009 (UTC)

Wiktionary:Place names

I've tried to summarize various proposed criteria for toponym inclusion at Wiktionary:Place names. I believe each of those has been proposed by at least one editor at some point. Please edit and expand, as appropriate; please also feel free to go into content issues such as definitions and translations, which I didn't feel quite up to dealing with on the first pass. Incidentally, looking at the list, I find that I personally would be happy with all of the three "strong" criteria, even in combination -- inclusion in a dictionary or primary division of a country or used metaphorically -- but the moderate and weak options give me serious qualms. -- Visviva 06:46, 14 March 2009 (UTC)

Wikipedia articles routinely contain pronunciation and etymological info? Well, this information is absent for most place names (especially small ones), and belongs to Wiktionary more than to Wikipedia (it's about words). How do you find it for very local place names, not always deserving a Wikipedia article, but which are words nonetheless? About translations: how to you find the translation of India into Bavarian, if it is not here (note that you can find it in the Inde page, on fr.wiktionary)? How do you find all translations when several translations are possible (e.g. for Serbian cyrillic and latin alphabets)? But my main argument is that (most) place names are words, and that we should include all words. In my opinion, encyclopedic criteria, such as primary division of a country, are relevant to WIkipedia, but not here. Lmaltier 08:18, 14 March 2009 (UTC)
Well, yes, many WP articles don't have that information (just as many Wiktionary entries don't), but the information is welcome there and its addition encouraged. So we would only be providing unique value, in this respect, if we choose to be more inclusive of place names than Wikipedia. This would be a sensible resolution of the problem, but it is more or less the opposite of current policy; the entire section of CFI would need to be stricken (which might not be a bad thing).
In terms of Wiktionary:Place names, am I correct in thinking that you would accept nothing stricter than a combination of the three weak criteria (attested + verifiable + present in an official list of toponyms)? This would position us to add unique value, so I would be hard-pressed to oppose it if it comes to a vote; but I have to admit that the thought of an entry for something like Indian Hills Mobile Home Park (which meets all three criteria, probably for more than one location) is somewhat disturbing. -- Visviva 09:24, 14 March 2009 (UTC)
Of course, I don't propose to include Indian Hills Mobile Home Park, because it's not a word! When something is a word, it should be included. When it's difficult to consider it a word, but we think that we can bring linguistic value nonetheless, then it might be accepted too (but clear CFI would be needed). Actually, current CFIs might be appropriate, provided that, in addition, any name which is clearly a word is also includable, including smallest villages... Information about the place, but not about the word, such as country flags, population, etc. should be strictly forbidden (the only information about the place should be the definition and a location map, to be able to understand the word). Lmaltier 10:36, 14 March 2009 (UTC)
How would your criteria treat Indian Hills and North Carolina? They, too, are not words. I look forward to seeing the maps for London (and Springfield. Is search-and-delete patrol for non-approved types of graphics botable? DCDuring TALK 10:56, 14 March 2009 (UTC)
I consider North Carolina, New York, Le Havre or past perfect as words. But not Phoenix, AZ, nor ''Indian Hills Mobile Home Park. The difference seems rather clear to me, but clarifying it might help. I'm sure you can see it. For maps, I was meaning a map showing where the place is located, because it may be useful to understand the word, not a map with London streets, of course. Lmaltier 17:10, 14 March 2009 (UTC)
It makes sense to me that "Phoenix, AZ" is sum of parts, while "North Carolina" isn't. On the other hand, "Indian Hills Mobile Home Park" isn't really sum of parts -- at least, none of the places by this name that I looked at were part of any larger "Indian Hills" community; it seems that the respective developers just chose this name from The Big Book of Trailer Park Names (or wherever). So is it rather the qualifier "Mobile Home Park" that makes this not a word? I guess that makes sense; similarly we wouldn't want "State of North Carolina" or "City of Chicago". Would this also apply to more conventional geographic labels, such as "River" or "Lake" or "Island"? Given that many of these are often referred to without their label (e.g. "the Chicago flows slowly"), that also makes a good deal of sense. This would argue for deletion in the case of WT:RFV#Aleutian Islands, if I understand correctly. (Do I?) -- Visviva 07:18, 15 March 2009 (UTC)
Yes, you understand me well. For rivers (e.g. Seine), the important word is Seine, and one page may be sufficient (except when this name is always followed by River, which may make River a part of the word, just as in Mexico City). For Aleutian Islands, I would keep Aleutian (I assume it's an adjective), but I would also keep Aleutian Islands, because it's the normal name for the place, and it is as much a word as Ireland, I think). Lmaltier 14:20, 15 March 2009 (UTC)
Your criterion as applied to river names bothers me. When one says Miami River, "River" is a part of the name. It is not truly a separate word, but part of a compound proper noun. Yes, one could say "the Miami", but it wouldn't be clear whether you meant the river or the Native American people by the name (or actually meant the city of Miami and simply made an error in speech). This is a regular problem in US English, where both a river and/or another geographic entity were both named after the same Native American tribe. "Mississippi" is both a US state and a major US river, so the river is often called Mississippi River as a set phrase name to distinguish it from the state. The same applies to Missouri, which is a river, a state, a culture, and even a historical territory. It is more often called the Missouri River to clarify meaning. So, a criterion that says it's always followed by "River" doesn't work for river names. The issue is fuzzier than that. --EncycloPetey 15:23, 15 March 2009 (UTC)
You must be right, about rivers. In French, it's rather unusual to add fleuve before a river name, but the English usage is different.
Another thing to be pondered: odonyms. Some of them could be accepted, when they are actual words (e.g. interesting linguistic information can be provided for Canebière or Champs-Élysées) but obviously not all of them. This restriction can be justified by the fact that almost all street names can be analysed as the sum of two (or 3...) words rather than a single word and, therefore, their presence would not bring linguistic value. This is different from city names, which are single words in most cases, which justifies the inclusion of names such as New York. But in languages such as German, does this analysis also apply or not (e.g. in Seestrasse)? Lmaltier 17:39, 15 March 2009 (UTC)
Some compound street names in English are probably worth entries as well, especially those in New York and London. Consider High Street (which is idiomatic) versus Fleet Street (which isn't, but carries connotations). Never having lived in London, but often comming across names of London streets in television and reading, I am often bewildered trying to understand what connotations may (or may not) be implied in the mention of a particular London street. For the US, Pennsylvania Avenue is worth an entry, certainly. --EncycloPetey 17:48, 15 March 2009 (UTC)
Yes, they are includable for a different reason (Fleet Street has a special meaning, too, I think = London newspapers). Lmaltier 18:53, 15 March 2009 (UTC)
The Thames, the Thames River and River Thames are just alternate forms of the same noun. Presumably the first would be listed as an abbreviated form, and the last chosen as the main entry (is it technically a “lemma?”). It's not just indigenous American names which suffer from multiple meanings. I live close to the Red River, which locals often refer to as the Red. It flows from the US into Canada. Usonians call it the Red River of the North, to differentiate it from the bigger one in Texas. This is just a matter of context and usage. Michael Z. 2009-03-15 20:02 z

How about prohibiting the addition of toponym entries merely for the sake of inclusion, and requiring some lexicographical function? Toponyms could be required to have at least one of an etymology, a non-obvious pronunciation, a usage note, or a list of derived non-toponyms.

And what do you think of the specialized subheading “Toponym”, to denote a particular type of proper noun? Michael Z. 2009-03-15 20:02 z

No pronunciation is obvious. And a definition is something lexicographically useful, too. Also note that different places with the same name may have different etymologies and/or different pronunciations and/or different demonyms, etc. Therefore, each place should get its own definition. Lmaltier 18:01, 19 March 2009 (UTC)
Indeed, different places with same spelling may have different pronunciations, even when one is derived from the other. Cairo and Vienna in Georgia (US) are pronounced very differently from their source cities in Egypt and Austria. Carolina wren 18:45, 19 March 2009 (UTC)
The pronunciation of Red + River is obvious from its components to anyone who reads English – for this reason many dictionaries omit pronunciation of such compounds. The geographic description “a river flowing north through North Dakota, Minnesota, and Manitoba, into Lake Winnipeg” is not a lexicographical definition. An entry with only this information is purely encyclopedic, and doesn't belong in the dictionary.
On the other hand, if you add an etymology explaining that Red comes from Ojibwe, or a label about the regional usage of Red River and Red River of the North, then it would have lexicographical value. Whatever guideline we come up with should reflect this. Michael Z. 2009-03-19 20:07 z
I prefer lexical criteria, and not the weak lexical criterion which almost matches the moderate factual criterion. Factual information would be wonderful, but it diverts our attention in a very noticeable way, primarily that the context information (such as US/UK) would be confused with factual information of where that place is located. For instance, "Fredericksburg" is understood very differently in Central Texas, which would warrant a more local context, {Texas}. Something similar could be said of a number of cities named Jacksonville. However, Jacksonville in Florida does not warrant a local context because it is understood across the country to be the city in Florida. The problem is that the temptation to label the Jacksonville in Florida as {Florida} is just too great.
If we are both to indicate regional context labels and to include obscure place names in definitions, our focus should be on a lexical criterion, since only lexical critera focus our attention on how the term is used, rather than if it is correct. Information on other places would have to be left to Wikipedia. And yes, that's regardless of whether the pronunciation is unusual or the term has etymological information. The strongest reason to include place names is for translations purposes, but we have never included any term on the sole basis that it has non-trivial translations, and that even applies to terms like older brother where the grooves run deep and long. The only on-goal method is lexical, and the primary means citation. Any secondary means are going to need to be much more indirect than addressed, such as differing pronunciation depending on location, e.g. Houston, or being the etymology of another term like New York strip, but not just on the basis of having pronunciation or etymology sections.
My proposal above to use genericization or metaphor is along the lines of a strong-medium lexical criterion, being more inclusive by allowing citation but also addressing the problem of identifying legitimate quotations without the entire out-of-context hassle. At the same time it is not so weak that citation gathering amounts to fact checking, although a stricter clause could be based on that, or on authority, used to avoid having to cite so many place names and to eliminate some of the holes that make results seem arbitrary. 02:01, 20 March 2009 (UTC)
I disagree with Mzajac about the fact that a geographical description is purely encyclopedic: the definition must be present to explain what the word means, and understanding where a place is located is necessary to understand what the place name means. But adding the population, or the length of a river, would be purely encyclopedic. Why do you consider that mentioning that Miami is the name of a town in Florida is more encyclopedic that mentioning that cat is the name of an animal? It's normal that a Wikipedia page and a Wiktionary page have something in common: the definition. Lmaltier 20:10, 20 March 2009 (UTC)
Of course the object of an included place name must be identified. But I don't think identifying a place qualifies its name for inclusion in the dictionary, any more than “An actor (1915–85) who played the lead in Citizen Kane” qualifies Orson Welles for inclusion.
Dictionary entries represent terms, that is, words and, in some cases, names (while encyclopedia articles represent things, including people, and places). It should be some quality or aspect of the name which qualifies it for inclusion, not merely the existence of a corresponding placeMichael Z. 2009-03-20 22:11 z
I don't think anyone objects to defining some terms as just place names, much as we define family names and male and female given names. If you think that's the most that should be defined then that's a legitimate viewpoint, though not one with general agreement. This entire argument is centered around the question of whether to define specific entities, such as Lincoln which in nearly any context refers to the former president, or Athens which in nearly any context refers to a city in Greece. While there is willingness to include some of these entries, there is also opposition to including every place name (although opposite to your views some would just as well do so). The question is where to draw the line. 01:25, 21 March 2009 (UTC)
Sorry, misread your response. To say that the name itself must qualify the entry, rather than an understanding of its meaning (e.g. Berlin to stand for the seat of the German government) is a fairly strict lexical criterion. 01:36, 21 March 2009 (UTC)
"Encyclopedic" can refer to a couple of things: the length or content of the article, or the entry title or topic. What is considered encyclopedic in either case is an open question. Although other ideas have been floated, in my opinion there should always be a definition line. At the same time, I have had reservation with including e.g. the exact weight of deuterium or a measurement of the length of a year in place of the existing definitions. The definition is the fundamental explanation in those cases. For people and place names, the definition should answer the question of why the term is worth including. We don't need the history any more than the scientific details. If they're well-known, what are they known for? 01:25, 21 March 2009 (UTC)
By encyclopedic, I mean information about the thing, like a city's population, etc., as opposed to information about the name of the thing. No, a definition does not need to explain why a term is included. Wikipedia has notability guidelines and usually explains the significance of the things in an introductory paragraph. Wiktionary has attestation requirements, which may or may not be documented by quotations.
We haven't really established how to “define” a place name. If you ask me, London is well defined as “The capital city of the United Kingdom and of England.” Arguably, most places will have to be geographically identified by something like “situated near the mouth of the River Thames in southeast England”. But “with a metropolitan population of more than 12,000,000” is strictly encyclopedic, and is out of place in the dictionary – a better alternative would be a specific Wikipedia link at the end of the definition line, where you can learn a thousand such facts about London if you choose to. Michael Z. 2009-03-21 20:55 z
There is also the issue of commonly used place names like Springfield and Boone County - if a place name is used in multiple places, it is worth an entry stating that this is a place name used in multiple places. In the case of London, Paris, etc., there should be one definition line for the primary use, and one indicating that there are multiple other places named for the primary. bd2412 T 02:41, 22 March 2009 (UTC)
If being the capital is the reason for inclusion, then name it as the capital. If one would be expected to know that it's situated near the Thames, then say so. If the population qualifies it, then list the population. And frankly I don't think something as fickle as the population is a good criterion. In some cases we might have to say "once had a population of..." because once qualified as a term, always qualified. DAVilla 08:48, 26 March 2009 (UTC)

syllable stops/IPA slashes/phonetic system orderof listing

  1. could the stops be put in saythe Respelled word in/at say the beginning of the pronunciation section, as for example, gon.or.rhe.a has an obvious problem [unsplitable ɹ in IPA]today/for now as stops are /seem to be put in the actual IPA.
  2. do we really need those "//" around the IPA? In paper print dictionaries they'd delineate the pronunciation from the rest of the entry. But since we arenot being paperbacks/dbound, and so are having the luxury of space tohave a dedicated subsection, could we spareourselves the dictation/typing effort in/ to what seems asuperfluousaddition/symbol?
  3. would it be an idea to have the IPA standard/pronunciation first and then letfollow alphabetically with what all not other phonetic systems? -- I am rather an inclusionist,. So I argue not so much against the latter, but for not having to look between the clutter each time I want/need to check the pronunciation as all the competing phonetic systems are unhelpful for/ to me [and perhaps others]and finding what I need can become a little tedious, especially with variant pronunciations; so what I guess is international standard [iso-didn't accept them as standard yet?] at a standard place, like ideally in my view up frontwould seem an improvement to me-- any thoughts?has this been discussed already in the past?--史凡 07:04, 13 March 2009 (UTC)
fixed linebreaks so numbering works. -- Visviva 12:56, 13 March 2009 (UTC)paragraph
ah comma, again learned something more, also about number two [so much reading to do and in short, I'm inactive for two weeks in Macau sigh]and beyond,thank you!!smiley--史凡 14:42, 13 March 2009 (UTC)
Re 2: The slashes are used to indicate a broad transcription (phonemic) rather than a close transcription (phonetic). See Wiktionary:Pronunciation for more. In principle, either brackets or slashes can be used, but in practice we generally go for a phonemic transcription (easier to define, easier to understand).
Re 3: I agree that IPA should be first, but current policy (as set forth in WT:PRON) is for enPR to go first if present. I believe this is mostly because many (especially American) users find anything other than enPR difficult or impossible to understand. Also certain editors have a chip on their shoulder(s) against IPA, for reasons I've never really understood. -- Visviva 12:56, 13 March 2009 (UTC)
Why would enPR come first? It can't be used for other languages, and even in English, no offence to anybody but, it's mostly dated and parochial. Even for those who prefer enPR, wouldn't it be more convenient if the order was consistent, with IPA always in the first position? Michael Z. 2009-03-13 21:59 z
I agree; I'm just pointing out the existing convention. -- Visviva 05:49, 14 March 2009 (UTC)
Part of the rationale for the current sequence (enPR, IPA, SAMPA/X-SAMPA) is that it is alphabetical. This makes the decision independent of perceived importance, and therefore sidesteps the issue or preference some people have with one or more of the systems. --EncycloPetey 21:45, 14 March 2009 (UTC)
that is what I thought/surmised, not yet having read my way through the respective wictionarypages; I asked, nevertheless,so to find out whether most users like me seem/appear to use the IPAas the phonemic system of choice, in which case it could be considered moving it up upfront, but I guess such would be subject to a proposal and vote, for which I consider myself byfar to "green" and inexperienced in thewictionary community -- I am happy. the IPA's thear, in contrast with quite a few paperback dictionaries smiley--史凡 10:58, 15 March 2009 (UTC)
In #1, I'm not sure what you mean. The stops appear in different places for different pronunciations of the word, and so must be included as part of a phonetic transcription. They are not indepedent of dialect or even of variation. In , for example, the placement of the stress differs depending on how one pronounces the first vowel. So, the stops can't be given independently of the pronunciation. --EncycloPetey 21:45, 14 March 2009 (UTC)

strike throughPlease bear with me, I'm further proof readin ' this,I only posted this already, because my computer is unstable, and I am/was afraid losing all of it in a crashstrike through

Okay, this was the main impetus for my post, and since I have to do it without pictures. I'll have to go for a thousand words sad smiley

Even before my time here, I noticed my Longman dictionary though having some aspects bylike very much, like explanatory style explanations, has some weaker points, as I might have mentioned beforehere, somewhere. Exactly their IPAsections are one of them as some entries have actually the wrong IPA,, wrong in that for example the given sentence syllable stress is wrongetc..

Having said that, recently often noticing the syllable stops in thee IPA ofsome entrieshere,I went to this only English dictionary I have here to see what they do with the syllable stops [I was actually convinced they, long man, wouldnt carry them], but see, they actually are there, but not in the IPA as such but in the entry -- name [that onlypart of the entry that not can be changed or edited herein wictionary]

<perhaps superfluous, skip next three paragraphsshy smiley>Now perhaps I need to provide some background information about this non-nativeuser me myself:in school, where notwithstanding many North Americans' apparentconviction [and I mean that as purely an observational realization in my experienceand in no way as a reference to anybody in da wictionary community I have recently had the pleasure of having contact with, I cannot count on my hands. how many North Americans have come to me and told me and aSsuch to the Midler Europeancase in point], where language education was a disgrace in general --there were the exceptional teachers of Cours, who understood to motivate and who I feel grateful to, but they are/were a rare occurrence..

In this context, syllables stops were given very short treatment: "it's too hard for you guys/it's a very hard topic, just don't spits words,and ignored da concept."[in my native language, Brabantian, you just know where the syllable stops are, you kind of feel thatas a native speaker. And even if you would go in the rules I guess they are prettytypically straightforward-- roundabout way of saying that now with/at almost 39 years of age, I am finally looking into syllables stops in English, so please forgive me. when I don't seem to be very knowledgeable about them, as that reflects reality.

But as most of the time, I'm happy to learn so I gave it a go. Now, I noticed in my dictionary that gonorrhea as the first word that seemed to confirm my suspicion that having the syllables stops in the IPa proper might be inconvenient/fraught with drowbacks. My dictionary renders the syllables stops as follows:


which impelled me to the above post [to which I oppended Sections 2 and 3] So perhaps now it is visually clear, what I meant by the hereparaphrased "ther is need to split the r"

< Strike through-Please bear with me, I'm further proof read this,I only posted this already, because my computer is unstable, and I was afraid losing all of the crash -strike through strike through>

I did quite some preparatory stuff before this post /answer, but "despottic" slipped my attention, and I will attempt totend to that later on; for now, I do can say that my dictionary in no entry seems to proffer/provide alternatin/alternative syllables stop placements, so may I be forgiven to have deedused from that observation that syllables stopss as such would be a fixd feature of the English language, one-stop pattern per word --observation in which I might well/seem to have been very wrong!

Another point where I would see concern/unclarity, though, is that if you take the UK --English spelling ending we see/notis the following:

--rhoea for which the correlating IPa is --i:.Schwa [the latter from memory. I will go back and checkafterwards]

What seems to be a problem here iS the very fact that IPa symbols relate in a one-to-one fashion whith sounds of the spoken language, but do not so in generalwith the letters of the actual English spelling, given the very unfoneticnature of thelatter, which here in this casemeansthet, when for some reason, say I have to use UK spelling as onlyaccepted or available, and additionally I would need to know where the syllable stop exactly is when inputting text, I would still not know for surewhether it's between the o and thee "e", orbbetween the latter letter and the "a".

strike throughPlease bear with me, I'm further proof read this,I only posted this already, because my computer is unstable, and I was afraid losing all of the crashstrikes you>

Going by the little I know from Greek, I would opt to keep the "o" and "e" the together, and separate them from the "a", but still, you may get my Gist, namely that this is quite a roundabout and inexact way of guesstimating where te split in syllables. Actually if I would have for some reason to actually " split --spell" wordin an official letter, it may in "good" old Flemish tradition may lead to one's job application letter being chucked in the rubbish bin, only partially read, namely to whear the spelling mistake is located!

I guess I'm not the only one who would hope that the latter occurrence would be something of old times or at least that other parts of the world are not so narrow minded. [Which I can say I hope since I hail from their], but still, if you feel the above examples/elaborations at the instance of this particular word gonorrhea." hold water, then you might see now what was my original point, and perhaps further agree with me that wictionary should be able to provide a more presize service.

Now again, if the above/what you said holds true, thanI do not say I have a readily available idea how2 amend this; my initial proposal was based on my what thenseems to be erroneous assumption that syllable stops are invariably located, but at least I hope that the lengthy above clarifies what I tried to more put in a nutshell in the original post under section 1 smiley.

My apologies re the length of this post, I was afraid itwould take me a bit, but it seems to have exceeded even my own projections, sorry for that shy smiley--史凡 11:55, 15 March 2009 (UTC)

strike throughPlease bear with me, I'm further proof read this,I only posted this already, because my computer is unstable, and I was afraid losing all of the crash strike through

. I had a look at "despotic" now, word of the day, which was very convenient, andnoticed that the IPa accordingly for that purpose got amended recently, and that only the "ic" was split off. My own dictionary doesn't list this word as a separate entry. and so does not provide information on how to split it in syllables, so I do not exactly know what you meant by differring syllable stop positions.

. I tried to look up "syllable stops" but could not find it wheather in my own dictionary ,wictionary or Wikipedia, so now I'm even doubting, whether I understood rightly the technical name confused smiley

I spent considerable time redacting this in the hope to clarify, forgive me if I didn't reach My goal aapologetic smiley--史凡 16:19, 16 March 2009 (UTC) . PPS what I meant all along is the placements of those "./interpoints"

In my opinion, your intuition of where the stops should be is more correct than that of anyone who speaks only English and no other language, although they will argue it to death because of the way our minds are trained to consider the final stop as part of the syllable of a single word, e.g. for words like "topped". However, this is an unusual feature for language, and it tends to disappear when words are put together. For instance, there isn't any difference between the pronunciation of "top top topped" and "topped topped topped", or "slow down" and "slowed down", because stops are not repeated. If you can believe that the T or D is part of the second word, then you'd have to conclude that it's part of the second syllable. But in the isolated world of English, it isn't always seen that way.
The reason the r would be repeated is to emphasize the point that the vowel could be rhotic, but in my optinion this is incorrect as well, since rhotic vowels have their own symbols. 03:02, 20 March 2009 (UTC)

Species names

How to we define the second part (not the genus part) of species names? e.g. benthamiana is used in three different names "Pinus benthamiana", "Nicotiana benthamiana" and "Pourthiaea benthamiana". Do we call it Latin or Translingual. I assume they are adjectives. What can we say about them other than an etymology? (from the surname

in this case) SemperBlotto 12:13, 13 March 2009 (UTC)

I would say that all species epithets should have a ==Translingual== section, and only those species epithets that are also attested in use as words in Latin text should have a ==Latin== section. As for the translingual POS, I'm not quite sure, but I guess "Adjective" makes as much sense as anything. -- Visviva 12:46, 13 March 2009 (UTC)
I say we define it as "used in various Latin names of taxon" and link to them, also link to closely related forms (Benthamia, benthami, benthamii, but also benthamianus and benthamianum), which will be used in different names. Circeus 20:27, 13 March 2009 (UTC)
Wouldn't each of these three unrelated plants be referred to as benthamiana, for short? – then shouldn't each be a separate sense of the term? (If the species name cannot be attested on its own, then there should only be entries for the genera.) If they are named after different Benthams, then they need individual Etymology X headings, too. Michael Z. 2009-03-13 21:33 z
I don't think we would need separate etymologies even if there were multiple Benthams; as long as the derivation is uniformly from the surname Bentham (rather than a Bentham town, for example), it just seems like needless clutter. The different honorees can just be listed in the Etymology, or in Derived terms, or both. -- Visviva 06:53, 14 March 2009 (UTC)

OK. Second question: What language code should I use as the second parameter of {{etyl}} in Translingual entries. I see that in Homo someone has used "mul", but it doesn't seem to generate the correct category. SemperBlotto 18:02, 13 March 2009 (UTC)

As I see it, "lang=mul" put Homo in category:Translingual proper nouns. DCDuring TALK 18:27, 13 March 2009 (UTC)
mul (multiple) and und (undetermined) are standard codes, but IETF recommends not using them unless absolutely necessary.[6] Michael Z. 2009-03-13 21:43 z
But you were asking about {{etyl}}. I think there lang=la because we have treated International Scientific Vocabulary as New Latin, which does not have a separate ISO code yet. DCDuring TALK 18:32, 13 March 2009 (UTC)
Not when it is used in {{etyl}}. I always thought that this was intentional. Nadando 18:30, 13 March 2009 (UTC)
The entry for is Translingual, and should be in Category:Translingual proper nouns, but the entry for homō is Latin and is in Category:Latin nouns. The former is an internationally-used genus name; the latter is largely restricted to Latin. We use mul deliberately and frequently in both {{etyl}} and in {{infl}}. --EncycloPetey 21:30, 14 March 2009 (UTC)

We are having exactly the same discussion on fr.wiktionary. My proposal is: translingual for scientific names (+ additional language sections only when useful), and Latin only for words such as benthamianum, because they are not translingual, but they are Latin (even when they were created recently, and don't exist in classical Latin). The la code is not dedicated to classical Latin. Lmaltier 21:24, 13 March 2009 (UTC)

I have created an entry for the above example - taking onboard some of your suggestions and ignoring others. Feel free to improve / correct etc. SemperBlotto 22:29, 13 March 2009 (UTC) Incidentally,

Looks spiffy to me. -- Visviva 06:53, 14 March 2009 (UTC)
Doesn't work for me. There is no grammar for "Translingual", and this is the feminine form of the Latin adjective benthamianus. I agree with Lmaltier's approach. --EncycloPetey 21:31, 14 March 2009 (UTC)

p.s. I now notice that we already have

(and a few others) - defined rather differently. SemperBlotto 22:32, 13 March 2009 (UTC)

My approach and opinion is that species names (the full two-part name, e.g. Calochortus elegans) is Translingual, the genus name (e.g. Calochortus) is Translingual, but the species epithet (e.g. elegans) is Latin, although it may also be used as slang or jargon in some languages. English is one of the few language to occassionally (and incorrectly) refer to a species simply by its epithet. Calling Calochortus elegans just elegans is calling it "elegant", which is a description and not a name. --EncycloPetey 21:30, 14 March 2009 (UTC)

Patroller rights

So, I see the user group "Patroller" is now available here. Though I have never been active enough to ever thinking about being given the sysop rights, I would find it very convenient to become a patroller. I have plenty of experience as a patroller on no.wiki, before becoming an admin, and I find it to be a very useful tool. Is it so that this right is granted to anyone? How is this right thought to be used? --Eivind (t) 13:12, 13 March 2009 (UTC)

I would prefer it that we keep this group to sysops. Otherwise things become even more complicated than they are already, and besides, if you have enough knowledge to patrol, you have enough to be an admin (in my view). If you'd like to be an admin (note, you wouldn't have to do anything except patrol if you didn't want to) then I'd happily nominate you. Conrad.Irwin 17:05, 13 March 2009 (UTC)
+1. (at least on the EivindJ for sysop thing). Someone who has been here for more than a year, engages actively in editing, patrolling, and discussion, and who is already an admin on another project, is someone we would do well to have as an admin. No need for an intermediate step.
In theory, it seems like a separate patroller group might be useful, but in practice it's hard to think of a case where we would trust someone to patrol but not to handle the whole toolbox. Separating the two roles seems more suited to a Wikipedia-type situation, where becoming an admin is such a stressful process that most established users won't bother. -- Visviva
Due to several exams and lots of work in the years to come, I have totally abandoned Wikipedia and retired from all time-consuming user rights (like Wiki-admin, Global Rollbacker/SWMT, OTRS, Wikt-bureaucrat). For a while I was totally inactive on all Wiki projects, but since my exams happen to be language-orientated (several of them in English, German and Nynorsk), I have conscience to use some time contributing to Wiktionary. And I must say: I find it very rewarding to do so. However, I can not promise an exceeding activity level in the time to come. So, preferentially I'd like to become just a patroller, but if that is not possible, I will humbly accept your nomination. --Eivind (t) 17:34, 13 March 2009 (UTC)
We asked for "patroller" in the same technical request as "autopatrol" so that we wouldn't have to go back to the devs again if we wanted to use it; but the idea was not to use it at the time. So we would need something like a policy vote to use it. As noted, there isn't (as we've seen so far at least) any reason not to nominate someone for sysop/admin. Also, to really patrol effectively, you need delete rights, there is a lot of crud. (Different from the pedia, since they don't allow IP-anon or unconfirmed to create articles.) And block is useful (even if you use it with great restraint), else patrolling will involve requesting blocks fairly routinely. Should just nom for admin. (not nearly as brutal as RfA ;-) Robert Ullmann 18:04, 13 March 2009 (UTC)

Etymology of compound words

I have a question about compound words that have more than two components. In Hungarian, I usually show only the two main components in the etymology section, e.g. vakbélgyulladás = vakbél + gyulladás, then at the component's page, the further components are shown: vakbél = vak + bél. To me this system is cleaner and simpler, since I also add the compound word to its components in the derived terms section. Is there a recommendation regarding this? Would it be better to show all components, e.g. vak + bél + gyulladás? Thanks. --Panda10 23:15, 13 March 2009 (UTC)

Personally, though I am no master of etymologies, I don't prefer the latter. Of course, common sense is valuable, and one should consider how one think the word was initially created. But anyway, sometimes it's impossible to say. E.g., did the term come from un- + necessarily or unnecessary + -ly? Here we should probably divide the word into prefix + stem + suffix in the etymology section. --Eivind (t) 23:47, 14 March 2009 (UTC)

Printer's apostrophe (or should I say Printers apostrophe)

I recently came across this and was curious what is meant by the terms "garbage entries" and "nonsense". The article in question seems to espouse a convention redolent of the kind of fastidiousness of which Dr. Johnson was quite wont. Any elucidation would suffice. Thank you.—Strabismus 00:42, 14 March 2009 (UTC)

If there is a particular sort of apostrophe that is "always wrong", we should either forbid it or automatically convert it to the desired kind. I don't think this "list of apostrophes I don't like" is very productive, unless (which might be the case!) it's going to be processed in some fashion to standardise the punctuation. I think ideally we should treat the various apostrophe forms as equivalent, so that searching for a word with one form would find it under another, but we still probably want a unified format underneath, and the straight quote is best for that in English. In other situations (Hawaiian?) the straight quote may be totally inappropriate. Equinox 01:34, 14 March 2009 (UTC)
I would not interpret it as a "list of things I don't like." Connel hasn't been on much for awhile now, but when he was he was one of the best (the best?) at taking formatting problems and fixing them en masse. The issue with the standard apostrophe ' and the fancy apostrophe ’ is a divisive one, and one not likely to be resolved soon. See Wiktionary:Votes/pl-2008-12/curly quotes in WT:ELE and User talk:Doremítzwr#DISPLAYTITLE for some bedtime reading. -Atelaes λάλει ἐμοί 01:50, 14 March 2009 (UTC)
O.K., thanks for clarifying. Yes, I think that the printer's apostrophe <’> ought to be replaced with the plain ASCII apostrophe <'>. Interestingly, many, many French websites and books insist on the former's usage. I'm not too fond of it for two reasons: 1) it's so small that it looks almost the same as the ASCII apostrophe, and 2) it can (as was mentioned before) make searching difficult. As per the Hawaiian ʻokina, that's more a typographical convention, which nonetheless ought to be observed. It always appears like a "6" apostrophe. One more thing I'd like to point out: in the past (and sometimes the present) computers have had difficulty executing commands for queries with the plain ASCII apostrophe. When you use such a system you will usually get something like this:
No results.
We obviously encounter no problems like that here, so the ASCII apostrophe seems to work fine for lemmata in general.—Strabismus 22:03, 14 March 2009 (UTC)
Many French books insist on using typographical apostrophes?¹ In any language, no professionally-typeset books use the typewriter apostrophe or typewriter quotation marks.² If the two punctuation marks look the same³ to you, I'd guess it's because your OS uses no antialiasing or poor antialiasing.⁴ The two are distinguishable in most fonts even at small sizes on Mac OS X.⁵
If correct punctuation⁶ breaks search (if indeed it does), then so do ligatures, accents, and unusual Latin characters, and all non-Latin text. Let's just use ASCII for the whole encyclopedia.⁷ Michael Z. 2009-03-15 15:11 z
¹Yes, e.g., check out the apostrophes at the French WT.
²You're joking, right? As a matter of fact I find very few books even in English which do not use . I almost NEVER see '
³Rather, ALMOST the same. At smaller sizes, the bend in the printer's apostrophe is very faint; not because of my display but rather because of my eye's lack of perfect resolution.
⁴No. Not for me.
⁵I'm a Mac.
⁶Not punctuation, instead the apostrophe alone. Other characters are usually recognizable unless there is an option to turn off accented characters, in which case the interface will just ignore them.
⁷Oh, let's not. Unicode has come late enough in the technological timeline as it is. ASCII is to Unicode as rounding off to the nearest hundred (200) is to precise numbers (162.7).—Strabismus 18:18, 15 March 2009 (UTC)
I think you misunderstand. My last sentence was meant to be ironic – it's impossible to write an international dictionary without the huge range of Unicode characters which we use routinely, so I think it's silly to mandate using ugly typewriter apostrophes and justify this with fallacious technical arguments like search problems. Your argument about French typography is an understatement. I wrote that virtually every book in every language uses typographical punctuation marks ( “ ” ‘ ’ ), none of them use typewriter punctuation ( " ' ).
My point was that in almost all books I've read and looked at, the standard “curled” look is used. ‘’“”.—Strabismus 21:20, 15 March 2009 (UTC)
Perhaps it's premature, and perhaps it would be facilitated by new features in MediaWiki's wikitext parsing, but sooner or later we should strive for the professional presentation of curly marks. Michael Z. 2009-03-15 19:00 z
Yes, I wonder if the characters could be pre-processed by wiki to automatically appear curled.—Strabismus 21:20, 15 March 2009 (UTC)
Of course. Fortunately, we already use different characters for 1 and l (old typewriters include 2, 3... but not 1, because l (the letter) was considered to be OK). I think that ' was created for the same reason: fewer keys. Lmaltier 20:06, 15 March 2009 (UTC)
That's long been a typographical bugbear. Since the lowercase el kerns more closely than does the Arabic numeral one; e.g., 1911—vs.—l9ll. Another thing I hate seeing is stuff like this: Now that's what I call `expensive'? ARRGH! It's tacky and it makes me FFFRAGE! GRRRRR! @#^&! O.K., I'm better now…—Strabismus 21:20, 15 March 2009 (UTC)
I forgot about using L for 1! But that was remedied in computer keyboards, and, I think, in better electronic typewriters too. The “dumb” forms of raised punctuation however, are forever fixed in digital typescript, and have tremendously infiltrated digital publishing. MS Word's smart quotes do not always get them right, but there is software which does (e.g., Markdown) – I'd love to see Wikitext incorporate this. Michael Z. 2009-03-15 21:06 z
Me too. From my experience Corel's Word Perfect usually gets smartquotes right. But being a Mac user, I usually just have to type ALT+[ for <>, ALT+SHIFT+[ for <>, etc.—Strabismus 21:20, 15 March 2009 (UTC)
Same, I've got used typing real quotation marks without thinking, but not apostrophes (I use a Dvorak keyboard layout). MS Word fails to put an apostrophe into abbreviated years, like “back in ’08,” and instead enters a single opening quotation mark. Michael Z. 2009-03-16 15:50 z
As far as entries go, given the relative ease of inputting ' over ’, the entry itself should be at the dumb apostrophe. Ideally there should be an auto redirect from equivalent entry with the smart apostrophe unless that entry actually exists, just as we have now for capitalization. However, we don't have that at present. As for what gets typed in the entries themselves, if people want to go the extra effort, I don't mind. Carolina wren 21:59, 15 March 2009 (UTC)
Oh, but what I was suggesting is that somehow the display of the dummy apostrophe/quotation marks (which, you will note have only ONE form each; i.e., neither has a beginning nor end form 'apostrophe', "quotation marks") would be rendered by WT syntax as their equivalent book forms. I'm not certain how this could be accomplished given the various uses, but I would suggest it be set by default and THEN when you DON'T want it to display you would suggest so thus <nosmart>'tis</nosmart>. Granted, noobs might complain and therefore need proper introduction, but then again, hey, what the heck. At any rate, at larger sizes, the æsthetics of punctuation becomes, dare I say, critical.—Strabismus 00:06, 16 March 2009 (UTC)
The standard <nowiki> would be sufficient. Michael Z. 2009-03-18 19:29 z
Typewriter and curly apostrophe look almost the same on a computer screen, but there is a big visual different in a printed text. I have noted that both French Wiktionary and the French Wikipedia use the curly ones: d’une. It would be helpful if we at least used curly ones for French words so that interwikis would match up. —Stephen 19:43, 18 March 2009 (UTC)
Matching up is very convenient. But on the off-chance that another WT uses a different spelling convention from ours. We could link to the appropriate article, along with the regular spelt interwiki article titles.—Strabismus 00:06, 19 March 2009 (UTC)
We have our own style guidelines instead of mixing 'n' matching according to each of the dozens of foreign-language Wiktionary. It would be silly to have hors d'oeuvre#English but hors d’oeuvre#French.
But I would be in favour of adopting the French WT's style for apostrophes in general. We already use the typographical okina for Hawaiian, so why not do justice to the rest of Latin and Cyrillic typography too? Michael Z. 2009-03-19 01:02 z

For anyone here who has yet to check the aforegiven link, I direct you to User talk:Doremítzwr#DISPLAYTITLE, whereat {{DISPLAYTITLE:}} is discussed as a way to display the “Printer’s apostrophë”, but without having the entry actually located at the typographical spelling. I believe that to be an excellent compromise solution; others, specifically in that discussion Atalaes, disagree. The points brought up by both sides in the discussion are worth consideration.  (u):Raifʻhār (t):Doremítzwr﴿ 15:28, 19 March 2009 (UTC)

"Honey, I accidentally struck out half of the beer parlor!"< mistake corrected

actually, it was everything following somewhere middle in my lengthy post where I was trying to strike out my correction -- placeholder (if that's the right word) -- it was like a bit lucky I immediately noticed actually, as I was about to leave for a longer break and didn't assume anything having gone wrong (though I went back to check whether I had succeeded in striking out my paragraph, which instead disappeared from display entirely, and all the following comments, that is the rest of my entry/post and following comments of others, stricken through).

I think I realized what was the offending wiki code snippet, but will not post it here lest bringing some lurker or another on the wrong idea --perhaps already I shouldn't post this here, but I wanted to apologize to anybody inconvenienced by my attempt seriously-gone-wrong(I somehow had to think about that story about a new admin on Wikipedia who apparently deleted the main page, whatever that is, I hope my gaffe fell short of that apparently pretty outrageous event shy smiley... [I would have never thought I would have been able to "achieve" the like "feat", being such a computer idiot me myself, but perhaps that actually "helped" huh...

sincere apology,--史凡 13:26, 15 March 2009 (UTC)

I must say this on behalf of all fellow Wiktionarians....What on earth are you talking about!? --Jackofclubs 15:13, 15 March 2009 (UTC)
Or maybe I shouldn't have asked. --Jackofclubs 15:14, 15 March 2009 (UTC)

since it unwittingly happened again tonight, poor newcomer me unfamiliar with markup language,if you go to the post"syllable stops.", one diff before my last one should show you what I meant.

I take the occasion to thank you for having explained to me "hok'em" a while ago in the tea room, thank you once again!--史凡 17:32, 16 March 2009 (UTC)

Dutch adverb template

Per a request, I have set up a template for Dutch adverbs ({{nl-adv}}). The coding is simple, and I haven't written documentation yet because I'd like editors experienced in Dutch to see if it does everything it needs to do. The core coding is a switch that (1) checks whether the adverb is comparable, (2) checks whether the adverb has only one superlative form, (3) if neither (1) nor (2) is true it uses a default code to generate the inflection line. It is possible to specify the comp and sup forms for irregular adverbs and for adverbs whose stem adds a consonant, doubles a consonant, or shortens a vowel in the comparative form. I have demonstrated use of the template on the entries for lang, snel, veel, and haast.

Obviously, since the template is not yet checked over, it is not yet accelerated. Should it be found to work properly, then acceleration will be tricky, since the adverb forms are usually the same as the adjective forms. --EncycloPetey 16:35, 15 March 2009 (UTC)

Simple English Wiktionary needs active editors

Hello there all. I would just like to ask if there are any active editors here that would be interested in helping out a fellow WMF Wiktionary. The Simple English Wiktionary only has a few active editors, and we could always use some more. I would therefore like to ask any active editors here if they would be willing or interested to participate in the Simple English Wiktionary. Thank you, and I look forward to seeing you there, Razorflame 19:57, 15 March 2009 (UTC)

What advantages can Simple English Wiktionary offer, compared to our (Complicated?) English Wiktionary? --Volants 11:02, 16 March 2009 (UTC)
Those arguments are given a long time ago, at the very beginning of simple Wikipedia and Wiktionary. Simple English Wikipedia and Wiktionary are there to help those of us whose English is not so good (: We should consider it an ingenious way of making the English language slightly more understandable for people in general. For those of us who contribute to Wikipedia with the motive to make knowledge available for everybody, this should be a very interesting project! --Eivind (t) 12:55, 16 March 2009 (UTC)
My problem was figuring out which words were simple and should be used in a definition (I couldn't find the master list of simple words). What I thought I needed was a program that would color codes words in a proposed definition based on how 'simple' they were (simple words in green, slightly harder words in orange, really hard words in red). But that begs the question of how to determine simpleness of a word. RJFJR 14:48, 16 March 2009 (UTC)
RJFJR, on the Simple English Wiktionary, there are several big lists of words that are considered simple. In the Recent Changes, check the word list under the "please focus on words in this list" section of the box at the top of the Recent Changes and it should give you a big list of words, some of which are missing :). Cheers, Razorflame 17:56, 16 March 2009 (UTC)
I will try to be more active there. My problem...Spirit is willing, but... -- ALGRIF talk 17:09, 16 March 2009 (UTC)
I hope to see you there :P. Razorflame 17:56, 16 March 2009 (UTC)

Thank you all who have noted this and I hope to see you all on the Simple English Wiktionary! Razorflame 17:56, 16 March 2009 (UTC)

RJFJR, you might also want to check this out. Ogden's system has its advantages and its disadvantages. But for simple WT practice, most people who have a good grip of the English language should be able to distinguish words in their own vocabularies that are "simple" from those that are "inkhorn" or "million-dollar".—Strabismus 22:48, 17 March 2009 (UTC)
Also, please note that the Simple English Wiktionary does not put a space in between the template for the part of speech and the definitions. Please don't add spaces in between these two things anymore. Thanks, Razorflame 16:04, 18 March 2009 (UTC)

Appendix:Variations of "c"

I'm wanting to add "c-with-a-macron-over-it" as a character, but don't know how to type it, or even if this is possible. The symbol is used frequently in medical shorthand for "with" (from the Latin cum (with). Could someone provide this character, perhaps in the form of a link? --EncycloPetey 22:02, 15 March 2009 (UTC)

According to this page it's not in the universal character set, no idea if that means it can't be produced or not. Nadando 22:09, 15 March 2009 (UTC)
Also see [7]- the code è that they give doesn't work. Nadando 22:15, 15 March 2009 (UTC)
In the fonts Alphabetum and TITUS Cyberbit the codepoint E455 produces a lowercase Latin c with a macron on top. For the uppercase it is at E055 in TITUS Cyberbit. But these are in the private use area so the glyphs assigned to these codepoints are technically not canonical (for web and DTP usage, anyways). For the time-being, you could consider using [c̄] or [C̄] until(?) they are finally unicoded, at which point you can simply "move the page".—Strabismus 00:26, 16 March 2009 (UTC)
It is unlikely that will ever be added a single codepoint to Unicode, so you'll need to use a combining diacritical mark since private use characters are not appropriate here. Are you certain you want C with macron above (C̄ c̄) or C with overline above (C̅ c̅)? The two are similar looking, with the difference supposed to be that if used on adjacent characters the macrons shouldn't touch (c̄c̄), but the overlines should (c̅c̅). Carolina wren 01:10, 16 March 2009 (UTC)
That distinction I do not know. I've seen it done primarily in handwriting up to this point, where any typographical distinction is lost. Interestingly, the samples you typed do not display correctly for me in the text, but do display correctly when I view the same text in an edit window. --EncycloPetey 02:33, 16 March 2009 (UTC)
Amusingly enough, for me they appear fine in the text, but not in the edit window where they appear as if the macron or overline is to the side over a blank. It's a font issue in both cases. Some fonts can't gracefully handle unexpected combinations of combining characters and base characters, and it can be exacerbated if the display engine falls back to a different font for the combining character, altho since both the combining macron (U+0304) and the combining overline (U+0305) are part of the core Unicode 1.0 character set, you'd expect support for both would be fairly decent if not always typographically precise. Carolina wren 04:12, 16 March 2009 (UTC)
Technically, it is easy for fonts to correctly position macrons, but no font vendor has really bothered with it until very recently, so font support is still bad. In Arial, the macron is too low and appears inside the capital C (something which could be fixed if positioning correction was done for 11pt). -- Prince Kassad 13:19, 16 March 2009 (UTC)
M–W's Medical Dictionary includes it as c or c̄ (p 89). Michael Z. 2009-03-16 15:44 z
À propos the display of combining diacritics: if your browser/OS supports stacking, then it ought to work. I suggest that you, EncycloPetey or anyone else, go ahead and use or since, as I stated earlier, you can always move the page later IF c-macron is ever unicoded. The latter [ ̅] should be used if the character in question is going to be frequently used in combination with other characters of the same type; i.e., characters that are used in juxtaposition. Otherwise, the plain macron [ ̄] should be used.—Strabismus 18:59, 16 March 2009 (UTC)
In the odd case such as this, why not put the entry at c (macron), since likely no one will be able to type the proposed stacked character? Make a picture of one in MS paint and upload it to the Commons for use in illustrating what the thing looks like in the entry and in the Appendix. bd2412 T 19:25, 22 March 2009 (UTC)
"no one will be able to type the proposed stacked character" has never been a reason against a certain name. I mean, we use Avestan script, which "no one will be able to type" either¹. Though it might possibly make sense to use an image for the actual entry (and thus t-image (or a fork) on the appendix page). -- Prince Kassad 19:29, 22 March 2009 (UTC)
I beg to differ, as I have created and installed an Avestan keyboard layout (pre-U5.2) on my computer. It saves ALOT of trouble. I'm a Mac, BTW.—Strabismus 19:54, 23 March 2009 (UTC)
How about our sign language entries? As in, 4@RadialPalm-OpenB@CenterChesthigh Contact 4@UlnarPalm 4@BasePalm Contact 4@TipFinger. bd2412 T 20:56, 24 March 2009 (UTC)
What about them?—Strabismus 00:58, 25 March 2009 (UTC)
They accommodate terms not readily reproduced in a typed format. Which is what c (macron) would be useful for, as a title. bd2412 T 01:14, 25 March 2009 (UTC)
Oh, in that case, they're merely paraorthographic. That is, their descriptive appearance which happens to appear in text-form (as opposed to gesture-form) is not much to worry about, I think. I don't know if there's a better method of notation for ASL than pictures or (as they appear here) names of files which describe their execution.—Strabismus 03:11, 25 March 2009 (UTC)

Position of "Related terms" heading

This one has been for some time on my mind. I wonder whether the heading "Related terms" should be placed as L3 heading, a subheading of the language heading such as "English", instead of as L4 heading, a subheading of the PoS heading such as "Noun". AFAICS whether a term is related to the lemma of the entry depends exclusively on the etymologies of the terms in question, not on the PoS.

The current WT:ELE guideline's take on the position of "Related terms" is hard to apply when there are more PoS, for me anyway, as I do not see based on what should I place the related term to, say, the "Noun" PoS, or the "Adjective" PoS.

My recent edit to picture entry shows the problem, I hope.

With derived terms, I usually do not have this problem.

I know that changing this would require the change of WT:ELE, and that this change would have painful consequences, requiring an update of great many entries. I do not propose anything yet; I wonder what the other thoughts are on this subject. --Dan Polansky 14:35, 16 March 2009 (UTC)

Yes. "Related terms", as we define it, is supposed to be about etymological relations.
There would still be issues in the relatively few (but not rare) cases where terms are "related" through a common distant root, but we show distinct etymologies for, say, verb and noun, where one is a coinage or borrowing of a different vintage. Nevertheless, this would be a desirable improvement, IMO. DCDuring TALK 14:54, 16 March 2009 (UTC)
I too, would consider this a correct improvement. I have sometimes had the same problem, precisely because the relationship is really about etymology, not PoS. Note to Dan: "No pain, no gain." ;-) -- ALGRIF talk 17:07, 16 March 2009 (UTC)
Sounds good to me too. I'm not sure an amendment of ELE is actually required, though it wouldn't be a bad idea. As was noted in the course of the recent unsuccessful attempt to pass a simple proofreading vote, ELE doesn't really go into detail on which headings are subordinate to POS and which are not. I believe AF's code would need to be changed slightly, as it currently moves Related terms from L3 to L4 unless there are multiple POS sections.
As an aside, the current position of Related terms in the middle of a series of semantic-relations sections is far too confusing IMO. Even experienced editors get tripped up by this arrangement, and the newbies don't have a prayer of getting it right. I doubt if there is much interest in making such a sweeping change, but if I had my druthers we'd turn RT into a bullet point under Etymology. -- Visviva 15:31, 17 March 2009 (UTC)
I could dig that. A horizontal list of English cognates. Cool. DCDuring TALK 16:53, 17 March 2009 (UTC)
Word. This had never occurred to me before, but Dan's point is 100% correct, and Visviva's suggestion seems spot-on. —RuakhTALK 17:00, 17 March 2009 (UTC)
Ick. That would work provided the number of related terms was always small. Unfortunately, the list can be quite long for many words, especially in languages like English and Latin where affixes and endings generate new words from standard roots. —⁠This unsigned comment was added by EncycloPetey (talkcontribs).
Then we would just link to the root page, something like "see (also) -ize#Derived terms", much as we already do, correctly, in many entries. I'm not actually proposing this change, as it seems like it would involve far too much grief relative to the potential gain. That said, I really don't see any reason for the present approach other than inertia; following RU's criteria below, mixing etymological with semantic relations makes no sense in terms of either logic or presentation. I can't imagine that we would choose the current arrangement if we were starting from scratch; and given that our work has scarcely begun, that is an important consideration. -- Visviva 14:33, 20 March 2009 (UTC)
Yes, ick. The present method puts it somewhere at the end. We have too much above defs already. Robert Ullmann 02:45, 20 March 2009 (UTC)
Dan: ELE says under Derived terms: "If it is not known from which part of speech a certain derivative was formed it is necessary to have a "Derived terms" header on the same level as the part of speech headings." This has been understood (at least for several years) to also apply to Related terms, immediately following. If there is more than one PoS, just put it at L3 as you did in picture. If there is one POS, don't worry about it. (It goes at L4, in the L4 sequence.) Is not broken. Robert Ullmann 02:45, 20 March 2009 (UTC)
I shouldn't be trying to write this at 5AM, having only gotten up in the middle of the night to make spaghettini marinara (the real kind, with the fish ;-) and BP takes forever to load Keep in mind that the data we are representing is not a strict hierarchy; we have chosen a hierarchy that fit the software being used (which fortunately is quite flexible): from spelling (exact form), to language, etymology, PoS, and various attributes.
The layout we use is a compromise between the chosen hierarchy, the structure of the data, and the presentation order that (we believe) is useful to the reader. Trying to force each attribute into the "correct" place in the hierarchy faces two problems: there is no "correct" place, as the fundamental data is not fully hierarchical, and it borks the presentation order. A "good" logical, pedantic representation of the data in an ordered hierarchy would place the definitions themselves very nearly last. (!) enough for now, I finish my midnight pasta and sleep Robert Ullmann 03:12, 20 March 2009 (UTC)
I'm just not seeing how it makes sense to have RT at L4 if an entry has only one POS and L3 if it has more than one. This seems almost certain to result in many entries with a flawed structure, as POS sections are added over time. For example: 1) a noun entry is created, 2) an RT section is added at L4, 3) a verb section is added below the noun, leaving the RT section incorrectly in a noun-specific location. That seems (mildly) undesirable in terms of both logic and presentation, and if my own random experience is any guide, it happens quite frequently. Granted, the problems thus created are not particularly serious, but I don't see what the existing system gains us. Unlike DT, POS-specific RT are the exception rather than the rule; in fact if one sticks to etymological relations I cannot think of any examples at all (if we use etymological and semantic relation as the basis for RT inclusion, there are many examples -- but while I've been doing that in some of my own additions, I didn't think it was standard procedure). I mean, two POSs can't have different etymological relatives unless they have different etymologies, can they? Hope your pasta and rest were both satisfactory. :-) -- Visviva 14:33, 20 March 2009 (UTC)

Move Category:US to Category:United States English

To match everything else in Category:Regional English, and make it clear that this is a regional dialect category, and not a geographical subject category. Any objections? Michael Z. 2009-03-16 17:49 z

No, because this move actually makes sense. Cheers, Razorflame 17:56, 16 March 2009 (UTC)
Depends. If the question below ends up with us renaming Category:Commonwealth English to Category:British English on the basis of most common usage then this should be renamed to Category:American English on the same basis. On a side note, and without making any statement of superiority, the equivalent categories on Wikipedia are w:Category:American English and w:Category:British English. It would be nice but not essential to use the same name for the same concept on both Wikipedia and Wiktionary. Carolina wren 00:44, 17 March 2009 (UTC)
In an international perspective, isn't American English actually Category:North American English? In my experience most dictionaries use the label American for terms used in the USA and Canada, but ignore the distinction. The ones which recognize it use North American, Canadian, and USMichael Z. 2009-03-17 00:52 z
Support. Clarifies the existing setup. Not sure what to do about cases such as {{US|politics}} or {{US|_|politics}}, where the label is in fact regional rather than dialectal (as many terms from US politics are used routinely in international reporting on the US, sometimes even --horrors!-- with a "metric spelling"). I suppose the best solution is probably just to create {{US politics}} and enforce its use in the relevant entries. -- Visviva 15:20, 17 March 2009 (UTC)
The mix-up of dialect and topic labels is a big problem, and I hope that renaming categories and templates may help keep it from spreading. Your example would belong in category:United States of America, rather than category:US; I don't know if there's a label for it, but it should say in the USA rather than US.
I'm apprehensive about creating any more combining categories, like US politics, US informal, informal politics, chiefly Appalachian offensive endearing political written slang, etc. ad infinitum. By some authorities, there are nine axes of context labelling, so I don't care to calculate the potential number of individual categories we would be asking for. Michael Z. 2009-03-17 16:25 z
It doesn't seem to me that there is any particular difference between creating {{US politics}} (et al.) out of {{politics}}, and creating {{chemical engineering}} out of {{engineering}}. Politics and law happen to be fields in which the key contextual boundaries coincide largely with national boundaries. No one would seriously propose creating or tolerating {{US chemical engineering}}, for example. Even if there were terms used specifically by US chemical engineers, these wouldn't be used by non-US chemical engineers when discussing US chemical engineering, because there is nothing unique (AFAIK) about chemical engineering practiced in one country rather than another.
On the other hand, consider an Irishman and an Australian sitting at a bar in Seoul; it's January 2008 and perhaps the TV is tuned to CNN, so the conversation turns to the US election. Provided they have some familiarity with the topic area, the term "caucus" is as likely to come up in their conversation as it would between a couple of guys from Grand Rapids. This situation is not captured by {{US|politics}} and it sure isn't captured by {{politics|in the USA}} (actually, I don't really see any difference between "US English" and "(English) in the USA"). Likewise if a couple of US folk were discussing Canadian politics -- and happened to be among the elite few who actually know the first thing about the subject -- they wouldn't start calling "ridings" something else just because they're American. So AFAICS, this isn't really an intersection of dialect or region with topical field, because the field isn't actually defined in a dialectal or regional way; it's just a particular topical subdivision of a topical field. -- 17:06, 17 March 2009 (UTC)—⁠This unsigned comment was added by Visviva (talkcontribs).
Neither caucus nor riding#Etymology 2 needs a “politics” label at all, because they are inherently political terms, and in general use, not restricted to use by political specialists. Caucus is used worldwide, and has additional specific meanings in the speech of North America and New Zealand, and of the USA, so the respective senses should be marked with dialect labels North American and New Zealand, and US. A political riding has no other name outside of Canada, so arguably this sense is used worldwide, to refer to something which is in Canada.
Of course many regionalisms are used outside of their place of origin, to small or great degrees (as indicated by your phrases “provided they have some familiarity with the topic area” and “who actually know the first thing about the subject”). Only recently have dictionaries started to account for this thanks to corpus lexicography, and it's difficult for us to do. Ideally, we need to mark the place of origin of a term in its etymology, and possibly mention the geographic context it applies to, as well as the dialects or regions in which it is used, in the definition.
And the label isn't “(English) in the USA”, it's just in the USA. This is considered clear enough in dictionaries like the COD, so why wouldn't it be good enough for us? The in... labels are much rarer, and stand out in contrast to the familiar US labels. Michael Z. 2009-03-17 17:47 z
Yes, arguably US politics is a subtopic of politics. But this may apply to many fields, especially law, but also any field which is regulated or legislated, including immigration, trade, pharmaceuticals, finance, taxation, the environment, construction, child care, tobacco and alcohol, pornography, etc. Where does it end? If you ask me, this is a case of crossing the geographic context in the USA with the subject context politics. Even if you disagree, we had better be pretty cautious in defining subtopics like this. Michael Z. 2009-03-17 17:54 z
Well, if we got to a point where it made sense to have a "Pornography law" category, then subsequent "US/UK/Canada/etc. pornography laws" would be a logical next step. I wouldn't see anything more problematic with such categories than with, say, an "Applied nanorobotics" or "Paleopalynology" category. These categories would be serious overkill now, but if we get to the point where such divisions are useful, good for us.
I would agree that any subcategory of law, government, or politics can be usefully divided along national boundaries; were it not for the broad similarities between the various Anglo-American systems, this would be inescapable. I don't see how this would extend to fields such as finance, tobacco production or childcare, except in relation to law ("tobacco law", "childcare law"). -- Visviva 05:23, 18 March 2009 (UTC)
Glancing at the contents of category:Politics, I think you may be right, that there are enough terms which could be subclassified to justify the category. It should be clearly labelled as a topical label template and category. (I mistakenly converted the legal tag {{England and Wales}} to a regional dialect label, because its nature was not clear from either its text or its category's membership.) What if the template's name and text were {{politics in the USA}}? {{US politics}} (US politics) is easily confused with {{US|politics}} (US, politics), which means something rather different. Michael Z. 2009-03-18 16:44 z
I just don't see how the reader is supposed to interpret "in" as referring to the location of the referent rather than, for example, the location of the speaker. If I had ever come across (in the USA) in an entry, I would have just assumed it was an error for (US). If we also change the {{US}} output to "US English" (or something similar), I guess the difference would be somewhat clearer. But again, I don't really see the need for geographical context labels at all. In the case of geographical terms, the location belongs in the definition itself, just like taxonomic info in the case of an animal or plant name; on the other hand, in the case of political or legal terms, it is really a topical rather than a geographical distinction. -- Visviva 05:23, 18 March 2009 (UTC)
The 3200 entries have a good number of errors and omissions, including especially use of the "US" tag in the alternative spellings, no PoS category, no inflection template. It would be as good a cleanup list as any. We need to have an alternative tag for items that are US by context rather than linguistically unless we choose to let the two be always confounded. In many cases it is difficult to distinguish between the two, but sometimes it is obvious (White House).
I suggest that all 3200 {{US}} tags be converted to {{USE}}, which would place items in the proposed new category and display as (US English). I suggest that, upon completion of the conversion, that the tags {{US}} and {{USA}} be used for the geographic context, which would display as (USA). I believe (rebuttably) that more entries made by infrequent contributors add the tag as meaning the geographic context. DCDuring TALK 16:46, 17 March 2009 (UTC)
This scheme wouldn't be consistent with our other labels, nor with most dictionaries.
All of our dialect label templates are named after just the country or region name, and adding “English” makes the label redundant with the language heading. Abandoning this scheme for only one template would cause more confusion. This would also make it impossible to use the standard dialect label template, which is generalized to work with other languages, e.g., to apply Category:US Spanish, Category:US French, etc. (which don't have individual categories yet, but both are present in Category:Languages of the United States of America, so it's reasonable to see this coming).
Dialect labels in dictionaries are always in an adjectival form when possible (e.g., American for American English, British for British English), otherwise attributive (United States, US, for United States English) – evidence at User:Mzajac/Dialect labels. The much rarer geographic context labels are usually in the straight noun form, as In the USA, but many dictionaries just incorporate this into definition text. This text is self-evidently not a dialect, while just USA isn't (but at least I haven't heard of any dictionary using the dialect label U.S.A. or USA). Michael Z. 2009-03-17 17:12 z
My principal concern are:
  1. that we not merely assume that the majority of the items with the {{US}} tag are about language rather than geography.
  2. that the new uses of {{US}} and {{USA}} don't introduce error making more work.
  3. that we have tags for the geographic contexts ready to go so we can start separating language from geographic context.
  4. that the tags not require too many keystrokes.
  5. that the new context displays and category names be intelligible to more than just us.
This last concern would probably be met by learning from what other dictionaries do. DCDuring TALK 17:43, 17 March 2009 (UTC)
  1. Who knows? Renaming the category will only help by alerting editors to its nature.
  2. These are not new uses. As far as I can tell, category:US has been for US usage since its creation in January 2005, {{US}} has always supported it, and {{USA}} has always redirected to that one; category:United States of America has been for the subject since its creation, in February 2006
  3. Sounds good.
  4. How about standardizing {{in USA}}? (With a convenience redirect from {{in the USA}}.) This form would work with other topical labels, which may not be anglophone places or correspond to linguistic domains (e.g., in the UK, in Scandinavia). The simpler form {{US}} ought to be reserved for the much more common dialect/vocabulary labels, and also corresponds with their text (British, Mid-Atlantic US). [edited]
  5. I've made some notes at User:Mzajac/Dialect labels. If you have access, read the Norri paper.
 Michael Z. 2009-03-17 18:21 z

Carolina's comment has prompted me to reëxamine this proposal. I've rebooted it at #Move Category:US to Category:American English, below. Michael Z. 2009-03-19 15:52 z

Category:Commonwealth English

Some, but not all, of the subcategories of Category:UK have been added to Category:Commonwealth English. Since the other members of the 'Commonwealth English' category are mostly politically independent countries, it seems that 'Guernsey English', ' Manx English', 'Welsh English', and 'Scottish English' should remain sub-categories of 'UK', otherwise why exclude 'English English', 'Cornish English' and 'Alderney English'?

You can also expect to meet with strong objections to Ireland being classified as a Commonwealth country (from the description: "This category is for English words used in several Commonwealth countires [sic] but not in the United States,") as Ireland left the Commonwealth in 1948.

This seems to be 'a one size fits all' approach to English, handily categorising all 'foreign' (to speakers of American English) usage as 'not used in the United States'.—⁠This unsigned comment was added by Kaixinguo (talkcontribs).

I added every dialect outside of England to Commonwealth English. No one says that Ireland is in the Commonwealth. This category is for the range of the language that most dictionaries call British English (evidence at User:Mzajac/Dialect labels). “Commonwealth” is a convenience label and a Wiktionary novelty: there is no such dialect as “Commonwealth English”, and the dialectal range has nothing to do with membership in this political organization (Canadian English actually doesn't belong here), and no other dictionary uses this.
There is no such dialect as UK, either, and no other dictionary uses this Wiktionary innovation either. Michael Z. 2009-03-16 18:48 z
Putting the questions surrounding 'Commonwealth English' to one side, I feel I have not made my point sufficiently clear.
The category 'Commonwealth English' has 17 sub-categories, most of which are independent countries. As constituent parts of the UK, Guernsey, the Isle of Man, Wales, Jersey and Scotland were already included in 'Commonwealth English' by virtue of being sub-categories of 'UK'. Similarly, regional English sub-categories of the other nations (such as 'Newfoundland English', a sub-category of 'Canadian English') have not been directly added into the 'Commonwealth English' category. Furthermore 'Guernsey English' and 'Cornish English' have not been added to the 'Commonwealth English' category though 'Jersey English and 'Manx English' have. Therefore 'Guernsey English' and 'Cornish English' should also be added to 'Commonwealth English', meaning that any other UK dialects including 'English English' will also be added; or the component parts of the UK should be removed from direct listing in 'Commonwealth English'. Kaixinguo 19:36, 16 March 2009 (UTC)
Kaixinguo, the subcategories represent more-or-less main independent dialect regions, not countries. There is no “UK English”, so I made sure dialects of the main nations were represented: Scotland, England, and Wales. Cornwall is part of England. Guernsey English has been there all along. There is no category for English of England, but I just remembered Category:England and Wales English, and added it. Michael Z. 2009-03-16 20:58 z
Kaixinguo, I fixed the problem with listing Éire in the Commonwealth category, of course it should not be there. Irish English is the English spoken on the island of Éire and has nothing to do with Commonwealth or Britain, Éire has been independent from Britain for the last 8 or 9 decades and deserves exactly the same attention as US regional variant. So, although mine opinion is not too authoritative, I would opt for distinguishing three main varieties of English - Commonwealth, Irish and US. The uſer hight Bogorm converſation 19:46, 16 March 2009 (UTC)
Are you serious? Please don't invent new taxonomies of English for Wiktionary. Michael Z. 2009-03-16 20:37 z
Completely serious. Please, look at Éire in MW online, this is the veritable name of the country which has gained its independence 8-9 decades ago. Why did you expurgate the remark that Commonwealth/British English cannot refer by any means to the variety of English spoken in Éire? How are we supposed to deal with Irish English?? It would be no doubt contumelious for Irish people to præsent their variant as British/Commonwealth and I stronly contest that. I suggest you add and Irish in the expression in the category: as opposed to North American and Irish English. The definition below does not encompass Irish English. The uſer hight Bogorm converſation 21:44, 16 March 2009 (UTC)
I looked at the M–W link, and it doesn't redefine the way linguists and dictionaries use British English. This is not about anyone's national pride, this is about the conventional classification of English language varieties, in which Irish or Hiberno English is a subdivision of British English (as opposed to (North) American English, which includes Bermudan, Canadian, and US English). Of course the definition below includes all parts of the British Isles. Michael Z. 2009-03-16 22:06 z
Well, but by no means can the isle of Éire be called British isle, this country has been independent for almost one century! There are the isles of Ireland and Great Britain, but British isles may refer either to all isles except the isle of Éire or, when the speaker includes Éire in this notion, irredentism is to be inferred. On w:Talk:British isles (and the archives) there were numerous efforts to move the page under a neutral name, but hitherto it has misfired unfortunately. In Gaelige the name is Oileáin Iarthair Eorpa which translates as Isles of Western Europa and sounds far more neutral than B. I. The uſer hight Bogorm converſation 14:28, 18 March 2009 (UTC)
Here's how dictionaries treat English of the Commonwealth of Nations

‘Words tend to be specified as being exclusively British English especially when British and American English are compared. In CED, the annotation Brit. is used "mainly to distinguish a particular word or sense from its North American equivalent or to identify a term or concept that does not exist in North American English" (CED p. xvi). COD's label Brit. means that "the use is found chiefly in British English (and often also in Australian and New Zealand English, and in other parts of the Commonwealth) but not in American English".’ —Juhani Norri 1996, “Regional Labels in Some British and American Dictionaries,” in International Journal of Lexicography, v 9, n 1, pp 1-29.

I've adjusted the category description appropriately, and removed Canadian English. If you don't like the name Commonwealth English, then let's move it to Category:British English so that our labels can correspond with every English dictionary ever written. Michael Z. 2009-03-16 20:47 z

I realise now that there was already an ongoing 'Commonwealth English' debate higher on this page. Anyway, shall I move 'Guernsey English', 'Manx English', 'Welsh English' and 'Scottish English' back to being sub-categories of 'UK English' or shall I add all the other varieties of English from the UK ('English English', 'Cornish English', 'Alderney English'. ad infinitum ) to 'Commonwealth English'? Kaixinguo 21:05, 16 March 2009 (UTC)

They are in Category:UKMichael Z. 2009-03-16 21:17 z
Edit conflict- yes, I do realise they are in Category:UK. It has been difficult but I can just about grasp that point.
The point is: why are regions which are only parts of the UK listed directly in the category when those of other countries are not? BTW I made Category:England and Wales English for legal definitions, not to represent an 'independent dialect region'. Kaixinguo 21:27, 16 March 2009 (UTC)
Also, if there is no such thing as 'UK English' then why the hell is there a 'UK' label? 'UK English' just means 'British English', but including Northern Ireland and the Channel Isles. It may seem surprising to some as Americans are always saying 'British this' and 'British that' but we don't tend to use 'British' very much. If something is English, we call it English. If something is Scottish, we call it Scottish. Kaixinguo 21:40, 16 March 2009 (UTC)
This is the exact definition, Kaixinguo - British English = UK including the islands in La Manche, but by no means the Republic na hÉireann. The uſer hight Bogorm converſation 14:28, 18 March 2009 (UTC)
Why not use {{legal|England and Wales}}? Otherwise, the template and category should be renamed, recategorized, and removed from a sense of cookie.
Our UK and Commonwealth labels don't follow the practices of any English dictionaries, and should be changed. Until we see a consensus to change them, we should organize their contents as sensibly as possible. I don't accept your definition of UK English as representing anything real – it implies that one kind of English is spoken in London and Belfast, and another in Dublin.
But dictionaries from England do use the label British for the broad variety of the language taught in schools in England, the British Isles, and most of the former Empire and British Commonwealth, in contrast to the variety of North America. Michael Z. 2009-03-16 21:56 z
British isles is a delicate and ambiguous notion. Probably you mean Ireland and Britain? The uſer hight Bogorm converſation 14:28, 18 March 2009 (UTC)
It is just as wrong to imply that one kind of English is spoken in London and Belfast, and another in Dublin through using the label 'UK' as it is to label Scottish, Welsh and Cornish English, and all the English (of England) dialects and accents, as 'British English'. One includes Northern Ireland and the Channel Isles and the other doesn't. Kaixinguo 22:43, 16 March 2009 (UTC)
You're misrepresenting me. British English is the English of Britain, the Commonwealth, and other countries, but not of North America. It includes the English spoken by the Irish.
Anyway, perhaps they're all wrong. But Wiktionary should get it wrong in the same way that American Heritage, Oxford, Cobuild, Random House, and Webster's do, rather than the novel way you are proposing. Michael Z. 2009-03-16 23:08 z
I most certainly am not misrepresenting you. I put forward a valid point that demonstrated that 'British English' varies from 'UK English' by the inclusion or exclusion of Northern Ireland and the Channel Islands. The use of 'UK' in Wiktionary and the definition of 'UK' in Wiktionary are in no way 'my deinitions' or 'my proposals'- by those assertations it is you who has misrepresented me. Furthermore you have yet to sort out the mess that is the 'Commonwealth English' category. Kaixinguo 23:46, 16 March 2009 (UTC)
No, man. British English is recognized in language studies as a variety of English, and in the global perspective it happens to include Irish English. We can turn to practically any general or etymological English dictionary to see this is so; a number of them explicitly explain the label.
The phrase UK English is a little-used sum-of-parts construction, equivalent to “English in the United Kingdom”; it has almost no linguistic currency, and no place here. Perhaps in the UK is useful to indicate legal or government-related terms, or usage restricted to national establishments.
Why do I have to sort out a mess you perceive? The contents of the category looks fine to me, and forgive me, but I don't think you've clearly explained what's wrong with it. If the explanation depends on your own view of Commonwealth English differing from mine, then obviously you have more convincing to do.
In my opinion, the existence of these two categories under these names is out-of-place, and no amount of shuffling their contents will fix this problem. Michael Z. 2009-03-17 00:37 z
Firstly, it is too familiar to address me as 'man'. Secondly, I it ought to be clear from my first comment that I am also opposed to the 'Commonwealth English' categorisation. Thirdly, you ought to sort out the mess as you are the one who has just added some, but not all (until I pointed out your mistakes) of the sub-categories of 'UK' to 'Commonwealth English' because you don't believe that 'UK' is a valid label. Whether or not you believe it is a valid label, it is the one in use at the moment on Wiktionary. Therefore the constituent parts of the UK do not need to be listed explicitly in the category 'Commonwelath' as they are listed implicitly by being sub-categories of 'UK', just as all the other countries are listed in that category. Even if the 'UK' category were chabged to 'Britain' or 'British English' this would remain the case. It is in fact a very simple point. Kaixinguo 01:19, 17 March 2009 (UTC)
Category:England and Wales English ought to be re-named Category:English and Welsh English Kaixinguo 01:23, 17 March 2009 (UTC)
The category:Commonwealth English is international, so I added entities at a national level: dialects of independent countries, dialects of the w:Countries of the United Kingdom (England, Scotland, Wales), and dialects of islands outside of Britain – if I have misunderstood the relationship then please correct me, but I didn't think of the channel Islands as in England, nor the Orkneys as in Scotland, and it's best to be liberal here in case a reader has the same understanding. It would be redundant to include national dialects and their local subdivisions in one flattened hierarchy.
I don't know what's right to do with category:England and Wales English and the corresponding template – does it constitute an integral dialect at all, or should it be replaced with separate categories for England and Wales, and restore a category for legal terminology? I can't judge from the contents, which includes only one clearly legal term. Michael Z. 2009-03-17 02:52 z
I've taken the initiative and made a proposal below, at #Move Category:England and Wales English > Category:England and Wales LawMichael Z. 2009-03-27 20:42 z

Bot flag request for User:Mzajacbot

I'd like to get a bot flag for User:Mzajacbot. I would use it occasionally for mass replacement of templates. Example test run visible in the bot's contributionsMichael Z. 2009-03-17 20:54 z

I may also use it to make Old Church Slavonic spellings comply with recommendations and new characters from Unicode 5.1, like changing ы, я, ѹо+уMichael Z. 2009-03-18 21:42 z
Is that as alternate/archaic spellings or a complete change? Support for OCS is not something we could expect anyone other than an Old Church Slavonicist to have on their computer right now. Indeed, I would think it would be safe to say that the percentage of computers that have any Unicode 5.1 support is very low, so if keeping the non-archaic variant as an alternate spelling is at all correct, it probably ought to be. Carolina wren 22:59, 18 March 2009 (UTC)
Support for new Unicode 5.1 characters is provided by installing the Code2000 font (v 1.17 or newer), on Mac or Windows, or any one of a half-dozen¹ more specific fonts. I believe this is a lower threshold than for a number of our living languages. We've also discussed creating redirects for Unicode 5 spellings.
Details of font support and updating entries have been discussed at Appendix talk:Old Cyrillic alphabetMichael Z. 2009-03-18 23:49 z
¹I have quite a number of high-end multi-plane Unicode fonts on my computer and so far only one of them has glyphs assigned to the code-points running from U+A640 to U+A673 and that is Code2000. If you know of others which exist (preferably freeware), by all means, please tell me! Thank you.—Strabismus 20:20, 19 March 2009 (UTC)
All I know of are listed in Appendix:Old Cyrillic alphabet, with test charts on the respective talk page. Michael Z. 2009-03-19 21:25 z
Great. I now have some more (good) Cyrillic fonts. Thanks!—Strabismus 20:16, 20 March 2009 (UTC)

I have received the bot flag for User:MzajacbotMichael Z. 2009-03-25 20:21 z

Categorization and definitions

Just found out about a poorly documented policy Wiktionary:Votes/2007-05/Categories at end of language section. It would be nice since it's policy if it were listed someplace other than a vote page. That policy is not on Wiktionary:Categorization while Wiktionary:Entry layout explained gives that placement as only a recommendation not as a requirement. However, this isn't about that directly.

There ought to be a way other than {{context}} to associate categories with specific definitions since not all categorization is or should be associated with context labels. Using plain categories at the end of definition lines would work, but given how overwhelming that vote was, I doubt the policy could be easily revised. That leaves the following idea. Having a template, say for example {{topic}} to do the job. Since {{topic}} and {{context}} would probably both want to use some of the same categories, probably the simplest way to implement this idea would be to add a parameter to {{context}} to disable adding the label and have {{topic}} pass its arguments to {{context}} along with the nolabel parameter.

(I chose {{topic}} as the proposed name because we already have the {{topic cat}} series of templates to organize topic categories, so it seems logical for this usage, but any name would work for me, even {{xyzzy}}. :D ) Carolina wren 04:41, 18 March 2009 (UTC)

I think this would be a good thing, but would require considerable thought and planning to implement well. Ideally would incorporate some of the functionality of the moribund {{jump}} template, i.e. not just categorize senses but allow them to be linked directly and associated unambiguously with the glosses used elsewhere in the entry.
On the other hand, if we're considering purely semantic categories (like, say, Category:Fruits), it might be better to integrate this with revamped, transcludable Wikisaurus pages. In that case the category would be tied to the "Coordinate terms" or "Hypernyms" line, rather than the sense line. (Or we could scotch categories of this type entirely, and replace them with 'saurus links). Just thinking out loud... -- Visviva 04:56, 18 March 2009 (UTC)
Carolina, can you describe some examples where this is needed? Michael Z. 2009-03-18 15:44 z
Here are a couple of edits (pi tau) made by AutoFormat that undid what I had intended. As you can see, the categories that got moved to the end of the language section each belonged to different senses. and in the case of pi to different etymologies even. I've also once or twice come across in my edits what looks like ghost categories left behind after a sense was deleted, tho I can't give a link at present, since I didn't make note of that at the time. Carolina wren 19:51, 18 March 2009 (UTC)
I see what you mean. It doesn't make sense to complicate the process with a template, whose only function is to fool the bot. I think the policy should be revisited, and categories which are restricted to individual senses should be allowed at the tail end of the sense line. Michael Z. 2009-03-18 21:33 z
Agreed. (Note that the reason given in the aforementioned vote for having all categories at the end of the L2 section is that doing so "facilitates section editing by language, and results in all category references, including those in templates, appearing in language order at the bottom of the page", which would allow for them to be in the middle of the L2 section, too; but see EP's comment in the BP discussion that preceded the vote for another viewpoint. Oh, and you might want to see also one and another discussion that preceded that BP discussion.)—msh210 16:17, 19 March 2009 (UTC)

Bot flag request for Darkicebot

Hi there all. I've finally worked out the kinks to my interwiki bot Darkicebot that should allow it to successfully maintain the interwiki links on pages here using the python wikipeidabot interwiki.py. It should work for the wiktionaries now. I looked up and learned enough programming to successfully modify my interwiki bot to correctly change the entries here. I would like to therefore request that my bot get a test run. It does remove links to other wiktionaries such as pt:Dick from the interwikis for en:Dick, but I believe that this is the correct removal of the interwikis because pt:Dick redirects to pt:dick. Please discuss here. Thanks, Razorflame 15:36, 18 March 2009 (UTC)

It's been my impression that Interwicket and VolkovBot have this mostly under control, and that the Pywikipedia approach has been tried and found wanting for the wikts. If this is wrong, hopefully someone more knowledgeable will chime in. -- Visviva 16:11, 19 March 2009 (UTC)
See User:Interwicket/redirects for a discussion of why we link to redirects, and why all wikts should. In any case, if you look at User:Interwicket/FL status you'll see there is little or nothing to do, here, or in NS:0 on any wiktionary; Interwicket is keeping them within ~2 hours of current (except for blocks of entries added by bot, which are queued over a few days). (Also the "framework" code has problems with iwiki sort order that have been reported for many weeks with no action; at this time the un-modified framework code should not be run.) (!) Robert Ullmann 12:01, 9 April 2009 (UTC)

Move Category:US to Category:American English

See #Move Category:US to Category:United States English, above, for previous discussion. Carolina Wren's comment prompted me to do some research and modify my proposal.

[Edit: as before, the rationale for the move is to match everything else in Category:Regional English, and make it clear that this is a regional dialect category, and not a geographical subject category, like Category:United States of America.]

It appears that until recently much academic writing about (North) American English has ignored Canadian English. I've been unable to find any attestation which clearly defines both American English as English of the USA and North American English as that plus Canadian English. Some authors imply this relationship, but others seem to use the two terms more-or-less interchangeably [Edit: or as explicit synonyms; see Citations:American English. Is this dated?].

In any case, it's clear that although various dictionaries use either the dialect label American or US, and North American or U.S. and Canadian, the term United States English is almost absent from linguistics and lexicography. So I'm proposing this category move, which would explicitly establish our usage of American English as US, and North American English as US plus Canadian. Michael Z. 2009-03-19 15:50 z

Sounds good to me; something has needed to be done about these categories for a long time, and this seems like the most straightforward solution in this case. But I'm sure someone will object. :-) -- Visviva 16:25, 19 March 2009 (UTC)
I like it, and it would also cause our categories to share the names already in use on Wikipedia which is a nicety, though certainly not a necessity. Carolina wren 16:36, 19 March 2009 (UTC)
If I understand correctly, I suppose it depends on whether we are going for precision or convention. "American English" is still liable to confuse—since other countries on the continent can also be considered American—while using "United States" seems clearer, even if it isn't lexicographical convention. Also, you say that other dictionaries use North American or U.S. and Canadian. As long as we're choosing, isn't "U.S. and Canadian" more precise? North America includes at least one other English speaking nation in Central America, and several others if you interpret North America to include Caribbean nations (as our North America does). Dominic·t 22:54, 19 March 2009 (UTC)
If you think that Template:North America has the wrong name and text, then that is a different issue.
If you pick up your dictionary and find that it has an essay about varieties of English, then dollars to doughnuts says it will talk about the dialect American English. Only some business seminar books and the like use United States English.
(Belize has English as an official language, but most people there speak Creole, and I don't believe that small number of English speakers there can be clearly defined as speakers of either American or British English. I also believe that Bermuda is considered to speak a variety of American English, while English in most of the Caribbean is based on British English.)
The entire field of linguistics uses American English for a dialect which predates the establishment of the USA. Whether the label is very precise or not, we should be referring to the exact same thing. It would be hubris to think we are smarter and can improve on their practice. Michael Z. 2009-03-20 22:32 z

I would have liked to move forward with this, but DCDuring opposes renaming the category, for reasons expressed above, and at his talk. I won't push for such an important change without consensus, so this is dead.

I thought this would be a no-brainer, and hoped to move on to sort out the misnamed category:UK and category:Commonwealth English (presumably representing the mythical dialects United Kingdom English, or UK English, and Commonwealth English), but that is a much more complicated knot. So the most important members of category:Regional English will remain in an embarrassing state. Michael Z. 2009-03-24 21:47 z

Hi all, I have a suggestion regarding North American English but I cannot be sure whether this is the correct place for it. I would like to suggest that on English language pages the differentiation between the various spellings in all branches of English be much more clear. I'd like to suggest that, for example American English be placed as a subtitle on pages relating to those spelt in that manner. If say more than one region or country use that spelling I believe they should all be listed, it is a big job I understand, but I am continually sent to American English pages with spellings I dont recognise. Is this something people agree with, and are there any other ideas on how to make it clearer? Rainyova 15:28, 26 March 2009 (UTC)

Remember that the biggest group of terms or senses are universal, and will remain unlabelled. Thus, dialectal labels are the exception and not the rule. Also, many regional spellings and regionalisms are known and used beyond their original range.
British and American are the two main types of regional English/regional spelling, according to pretty much every dictionary. (Some US and British dictionaries use their own variety as the baseline, so they don't explicitly label it; for example, the OED doesn't generally use the label British.) Exceptions are:
  • Canadian spelling, which is almost always superficially (but accurately) described as “a combination of US and British spelling”, and is a bit more receptive to variation, although the dialect is usually considered a variety of (North) American English.
  • Regionalisms, which belong to some more-specific place; e.g., Manchester, Scottish, Irish, New England, Newfoundland, Australian, Anglo-Indian.
  • Countries with a small number of English speakers where no one has been able to describe the English accurately (whether because it is mixed, in transition, or simply not studied). E.g., Belize, which apparently has a British tradition but the younger folks watch a lot of US TV. Michael Z. 2009-03-26 17:49 z
We can't list all countries consistently, because there are no comprehensive sources for this. Regional English outside of the British Isles and the United States were barely studied until the late 20th century.
I agree that readers may benefit from more consistent and understandable dialect labelling, but it's hard to imagine how exactly to improve it. One of Wiktionary's biggest defects is that we have very poor standards for what gets labelled and how, inconsistent with other dictionaries, so I'm trying (with much frustration) to improve this situation. Michael Z. 2009-03-26 17:49 z

Does "Original Research" from Wikipedia still apply?

Although the headline is slightly misleading, it still applies to the circumstances I am in. Wikipedia has a "no original research" policy which is self-explanatory. A year ago, I made a Wikitionary article for the word "ablexxive", which was deleted, and quite rightfully so, since it had no results on a Google search and it was a word a friend of mine made up. I do, for the record, regret putting it there. But now, it shows up on Google Autocomplete, has a blog named after it, has an entry in Urban Dictionary, but, most of all, has over 200 links to a forum in which I post, where I added it and its definition to my signature. My question is this: if, eventually, the word gets widely known enough, will I be able to write its entry here on Wikitionary? It would be something I have wanted to do all this time.

Thank you for taking the time to read this, and I hope for an answer.

--Freiberg, Let's talk!, contribs 22:29, 19 March 2009 (UTC)

See WT:CFI#Attestation.—msh210 22:48, 19 March 2009 (UTC)
(ec) You'll do wisely in reading WT:CFI, I think. The word might be worthy of inclusion when it has become a more natural part of the language, used by several people at different occasions, over some time, with a consistent definition. I would advice you to start gathering citations of the word actually being used, not just defined in Urban dictionary and at a blog. --Eivind (t) 22:52, 19 March 2009 (UTC)
Thank you for the links, and by no means am I planning to put it in yet. I was just wondering for the long run. --Freiberg, Let's talk!, contribs 23:14, 19 March 2009 (UTC)

Broken RSS feed for Word of the Day

Hey! Anynone who knows what's wrong with the the Word of the Day RSS feed? To me it seems like it is stuck at December 23. Same for you guys? If so, we should remove the link from the front page. --Eivind (t) 10:16, 20 March 2009 (UTC)

Hi, seeing as he seems to have much less time at the moment, I've just snaffled Connel's script and set it up to run under my account at http://toolserver.org/~conrad/wotd/ we can update the link to point to that. Don't know how to handle the fact that many people will already be subscribed to the old one. Conrad.Irwin 11:58, 20 March 2009 (UTC)

Sorting of translations by language family

While our current translations system works fine for most cases, on pages which have very many translations (say, water) it becomes very hard to find the translation you want because of the sheer amount of translations, which leads to people adding duplicate translations. My idea, either as an addition or as a replacement to the normal translations table, is to sort the translations by language family after a certain amount of translations, like the PIE entries. This adds some structuring to the translations and makes it much easier to find the translation you want. Any comments? -- Prince Kassad 13:08, 22 March 2009 (UTC)

That's a great idea! But only, of course, after the number of languages exceeds, say, 50ish-100ish.
In some of the larger translation sections of English nouns I've seen some doubt raised as to whether or not some of the languages therein actually exist! I can't think of an example but it's pretty funny, when you consider that it's not too difficult to research the name of the language that you're wondering might not exist, right?—Strabismus 19:08, 22 March 2009 (UTC)
Being someone who spends a fair amount of time sorting through the translation languages we have on Wiktionary, rest assured it can be more difficult than you expect. -Atelaes λάλει ἐμοί 19:30, 22 March 2009 (UTC)
I think that would make it too difficult for most people for find the language(s) they want, and I also think it would only invite more duplicate translations. It’s a good idea to require the use of a language code using {{|xx|}} in long translation sections such as water, and that might help stem duplication a little.
For some closely related groups, we already do this. For example, the various Chinese languages are indented under *Chinese; the Apachean languages go under *Apache; the Arabic languages under *Arabic, etc. —Stephen 19:21, 22 March 2009 (UTC)
Of course, the use of language codes helps a lot, but from my experience, it does not really prevent duplicates. See, for example, this IP edit, adidng a duplicate K'iche' translation. -- Prince Kassad 19:25, 22 March 2009 (UTC)
I would love to see languages sorted by families, but Stephen does make a good point about some users having a hard time finding what they want. What would be ideal would be to have customized sorting, so that users could view them sorted by alphabetical order, as well as by family (no idea how to do this, mind you). However, what we really do need is an official standard. Nowhere does it officially mandate that Arabics and Chineses go together. I think there is definite merit to such groupings, but if we do them, we need to know what languages are grouped and what aren't, and have a policy which enforces it, so that this is done consistently. -Atelaes λάλει ἐμοί 19:30, 22 March 2009 (UTC)
But in your K'iche' examplethis language codes were not used. Therefore, I don’t see the duplication at all. Where is the duplication? —Stephen 19:37, 22 March 2009 (UTC)
That was only because the anonymous editor appears to have been unaware of language codes. In fact, K'iche' was already in the translation list, and he just added it again, thus it's a duplicate. (Try searching for K'iche', you should find two results in the translation table.) -- Prince Kassad 20:08, 22 March 2009 (UTC)
Another thing that would help would be a setup like that of the Russian Wiktionary. In the Russian Wiktionary, they only use language codes, not spelled out, and somehow or other they are automatically alphabetized. For example, in ru:давать, you can move the different translations around anyway you like, but the result will still be correctly alphabetized. If we could do this, then duplicate translations would be immediately obvious. —Stephen 19:30, 22 March 2009 (UTC)
Well, they've done what I had considered, but had decided was too complicated to implement. Their whole translation section is one big template, as opposed to a list with some templates at the extremities, as we have it. Each language uses the iso code as the parameter name and everything else as the value. Such a system would doubtless confuse some users, but it would be exceptionally powerful (it could do everything that I wished it would in the previous paragraph. What are people's thoughts on this? -Atelaes λάλει ἐμοί 19:34, 22 March 2009 (UTC)
I say, let's bring back the ISO codes or at least utilize them more fully. I bring this up because a couple of years ago I was raked over the coals for using ISO codes too much. I have NO idea what disadvantages they present. At any rate, we DO need more automated tasks. I'm a bit of a perfectionist and when I see unalphabetized entries in the translation section of a lemma I feel compelled to alphabetize them.
Kassad's original idea has something going for it. I noticed that all of you brought up that it would be confusing for less-informed users to find the right translation. Well, remember the find function? They could use that. Furthermore, many of the languages in, say, water#Translations are probably all but new to most people's eyes. So they wouldn't know to look for them in the first place and the ordering by family would be immaterial. Whaddaya think?—Strabismus 22:04, 22 March 2009 (UTC)
What's the limit (either a hard limit, or in terms of performance hit) on the number of parameters that can be passed to a template? Even if we limit the translations section to just the ISO 639-2 languages, that's still over 400 parameters to support, and no way we could possibly support the close to 8000 parameters needed for full support of ISO 639-3. The only way I could imagine it even coming close to working with full flexibility would be if it took alternating pairs of arguments (Example: {{transfusion|de|fü ßär|gr|{{Grek|φυ βαρ}}|ru|{{Cyrl|фю бар}}}}). Even then it might be a performance hog if set up to handle the very large translation sections we're discussing here. Carolina wren 22:41, 22 March 2009 (UTC)
The number of arguments isn't going to be a problem except that the performance hit is substantial. Templates are not really "code", or at least it could be said that they're not efficient because they evaluate all of their parameters, even if empty, even unused branches. This might be debatable if it could be done efficiently but it really can't. This isn't going to be the way to go. 03:12, 23 March 2009 (UTC)
Please elaborate,—Strabismus 19:58, 23 March 2009 (UTC)
The template argument size limit is 2 megabytes. We're highly unlikely to hit that any time soon. -- Prince Kassad 20:07, 23 March 2009 (UTC)
It's possible but highly unpreferable. Addressing the previous comment, before reaching the argument limit there are cutoffs on the amount of text substitution that can be done in a single template call, and that will be easily maxed out unless written well. But even if the template is as efficient as possible, by the very fact that it's written to take so many arguments, it will cause a significant performance hit even if there is only one translation. 01:35, 24 March 2009 (UTC)
Then maybe we should stick to our regular layout but still work on what to do about the 200+ translation sections.—Strabismus 20:06, 24 March 2009 (UTC)
This can be a problem, but I think the solution proposed would only add more complexity. It makes sense to me to have a clear policy —that every listing should be a legitimate ==Level 2== language name, standardized in cases like Persian/Farsi, that these should be alphabetized— and then enforce it with a bot. DAVilla 08:31, 26 March 2009 (UTC)
We already have standardized language names which are automatically sorted by bot. But it seems the bot's sorting is not community consensus at all. -- Prince Kassad 15:37, 26 March 2009 (UTC)
Ah, I see. Well that's a very different problem though. When you see an example of a bot change you don't like, certainly raise that point. DAVilla 00:52, 27 March 2009 (UTC)
I can of course do this, but this does not solve the main problem: not being able to find the translation you want in the sheer amount of translations. -- Prince Kassad 10:08, 28 March 2009 (UTC)
What about systematically adding bullets like:
  • Chinese: See Mandarin
  • Farsi: See Persian
whenever those languages occur? What examples did you have of translations that are difficult to find, especially those that wind up being duplicated? DAVilla 17:49, 8 April 2009 (UTC)

User categories?

Can users claim categories for themselves? I'm referring to the category Category:User:Maikxlx/la, which I probably shouldn't have created .--Jackofclubs 13:21, 22 March 2009 (UTC)

Well there's no need for a category for these - if you want a single page listing all of the proposals that have been drawn up in someone's userspace, you can make a page in your userspace (or theirs, if they wish) containing that list. bd2412 T 18:59, 22 March 2009 (UTC)
As long as you're dealing with subpages, you can use {{Special:Prefixindex/User:Maikxlx/Template:la-}} for an automatic treatment - though that admittedly requires some judicious use of the naming of the subpages... \Mike 13:57, 30 March 2009 (UTC)

Bot flag request for Opiaterein Inflectobot

Lame name, I know :D But the bot will be used for batch-loading of articles from text files, starting with Lithuanian entries - especially the bagillion forms of dalyvis participles. I've started testing with nouns, which have a significantly lower number of forms, for evaluation purposes. These entries, so far, are forms of smulkmena and šypsena and can be seen at Special:Contributions/Opiaterein Inflectobot. — [ ric ] opiaterein — 12:42, 24 March 2009 (UTC)

  • Add to above examples forms of respublika — [ ric ] opiaterein — 12:48, 24 March 2009 (UTC)
  • Add forms of kūnas, for a masculine noun example. Had a minor source changed needed with this one, just two superfluous letters. — [ ric ] opiaterein — 13:05, 24 March 2009 (UTC)

I'll just list the test runs at the bot's talk page :D Opiaterein Inflectobot 13:42, 24 March 2009 (UTC)

IPA template issues

Template:IPA in its current format not always overly helpful. A link called "IPA" links somewhat haphazardly to IPA charts, orthography articles and phonologies over at Wikipedia. Even in the instances where there are local pronunciation guides (Latin, Spanish, Swedish), the links go to Wikipedia articles. I've started a thread about this over at template talk:IPA.

Peter Isotalo 10:05, 25 March 2009 (UTC)

I think this is an excellent proposal. We can take what we want from the 'pedia and create our own appendices. Additionally, local appendices offer the luxury of using "ifexist," a luxury which 'pedia links do not afford. This is one area where I think that some duplication between us and Wikipedia is desirable (this and grammar appendices). We will inevitably have slightly different needs than them. -Atelaes λάλει ἐμοί 05:56, 26 March 2009 (UTC)

Logo Again...

Please participate: meta:Wiktionary/logo/refresh. Conrad.Irwin 16:55, 25 March 2009 (UTC)

For further reading, see the mailing list archive for this month, and also the visual identity proposal that was written for Wikimedia. Conrad.Irwin 09:59, 26 March 2009 (UTC)

How to handle words with one etymology but multiple pronunciations?

I want to start writing preload templates for some Lithuanian adjectives and participles, and I would love to be able to include pronunciations... but for form-of words that obviously are derived from the same word, I can't format them nicely without using ===Pronunciation 1=== and ===Pronunciation 2=== headers. Conrad has suggested that Etymology and Pronunciation sections be listed underneath POS headers as level 4 headers, as we do with synonyms, antonyms, derived terms, etc. and frankly, I think that's a pretty good idea - especially if we can't use ===Pronunciation 1=== and ===Pronunciation 2===. Otherwise, there would be no way to do it. I want to get started on adding these forms-of as soon as possible, but this is a topic we absolutely have to settle on first if I'm going to make them as complete as I can. Opiaterein Inflectobot 18:15, 26 March 2009 (UTC) That wasn't supposed to be a minor edit, and I forgot I was logged in as the Bot. But yeah. Discuss :p — [ ric ] opiaterein — 18:21, 26 March 2009 (UTC)

This happens in Ukrainian, where the just the place of syllabic stress can sometimes change meaning. I formatted горілки with two noun headings, but that doesn't seem perfect.
Another option might be to enter the main pronunciation normally, and enter it in a context label for a sense with unusual pronunciation. Michael Z. 2009-03-26 18:26 z
It's slightly more extreme in Lithuanian, where the movement of stress often changes the pronunciation of the word drastically, as in varlės, which uses the Pronunciation 1/2 headers with the nouns at Level 4. — [ ric ] opiaterein — 18:48, 26 March 2009 (UTC)
On a tangent, it would be nice to have the gloss “frog” in that entry, so the reader can ascertain the basics without clicking. I'd even be tempted to enhance the glosses for a reader unfamiliar with the cases; something like “(of a) frog,” “frogs,” and “(o,) frogs.”
Just remembered, I treated the same thing differently in ковбаси, simply putting two headwords under one heading. I kind of like this. It may work to put a pronunciation line between the headword and definitions, but that would probably break down if you had to add multiple transcriptions and regional variations in pronunciation. Michael Z. 2009-03-26 19:31 z
I've long since given up on glosses as they're too difficult to manage. In some cases, such as the gloss being too simple or too "exact", they would serve only to confuse the reader more. Better to learn the grammar terms, if you want to learn to use the words correctly. Especially in Lithuanian, which is highly synthetic, an exact gloss can be ridiculous. One example is the word baltesniąja - pronominal feminine singular instrumental form of baltas, comparative degree. So what gloss should that take? "with that particular/specific whiter..." Many Lithuanian adjectives have around 154 forms. I'm simply not going to do that :p On the other hand, simply saying "white" doesn't tell the person who knows nothing about grammar anything useful.
Aaanyway, if this topic isn't worked out, I can't wait forever to add these words. It's like an itch. Since our current formatting puts Pronunciation above the POS, if we can't figure out how we want to do this in the long run, I'll simply format these form-of entries I have waiting in the same way as varlės. I really would like for this to be discussed and handled before I start this part of the project, because I work fast and I'm not crazy about the idea of going back and fixing things that I've already done because nobody cared enough to nip this issue in the butt :D — [ ric ] opiaterein — 00:32, 27 March 2009 (UTC)
Note that you can qualify the pronunciations within one pronunciation section. May not be very pretty, but is standard practice. But one understands looking for a better way. (Also: if glosses, or doing anything else right, is too hard, and you are "simply not going to do that", perhaps this task should be left to someone who will do the job properly?)

There are several substantive problems with "Pronunciation N":

  1. WT:ELE does not permit it, and no-one has presented a workable proposal to modify policy. It must be proposed, voted on, and added to ELE before (continuing to) use it.
  2. There has been no workable proposal for N etymologies with M pronunciations.
  3. It prohibits sub-sections under pronunciations, as the next level is the POS/L3 level (shown at level 4); for example EP's proposed use of a Homophones L4 header under Pronunciation would not have worked). Not that we already have this restriction with Etymology, you can't use a sub-section header under Etymology, because if there is more than one, the sub-section is at POS level. (!)
  4. It splits (in many cases) POS into odd divisions. If it were to be used at defense for example, it would split off the US sports noun imperative form. ( "DEE-fence!" ;-)
  5. Lastly, it isn't the "right" hierarchical structure (which I'll explain a bit here)

Consider the Etymology N headers: these aren't there to separate words that differ in some attribute, like pronunciation, they are there to separate homographs.

We have a hierarchy that we use: each unique spelling (entries), within that by language (L2 headers), then words in the same spelling that are not the same word: homographs (Etymology N headers and nesting), then POS (L3 headers), then attributes, semantically related terms etc (L4).

Pronunciation is not a structural division in the hierarchy at the level of homographs/etymology. (A Gedankenexperiment: imagine entries with "Homograph N" instead of "Etymology N", and you will see the structure clearly.) It is simply that the various POS and forms with the same spelling often (in some languages always) share the same pronunciation, and sometimes not.

In view of the structure, pronunciations should either be listed with qualifiers in one section, or perhaps as L4 ("attribute"), The first of these is the default standard permitted by ELE. Figuring out something better is a fine idea, if then proposed as a coherent standard. Robert Ullmann 08:01, 29 March 2009 (UTC)

If we were a traditional dictionary that did not list separate entries for inflected forms, I might buy your arguments. However, we include entries for inflected forms, which have their on etymologies. Consider that Spanish estoy (which is the 1st-person singular present active of estar) comes etymologically from Latin sto, and not from Spanish estar. With this in mind, your argument concerning homographs falls apart. --EncycloPetey 08:08, 29 March 2009 (UTC)
(Nonsense; methinks you are in too much of a hurry to knock something down, please read it again. Has nothing to do with lemma/non-lemma.) The word estoy has no homographs, and (therefore) one etymology; the fact that the infinitive has a different etymology is utterly irrelevant to the entry structure for estoy (or for the infinitive). The structure we are representing with the "Etymology N" headers is differeing homographs (which are necessarily in the same entry). Robert Ullmann 08:22, 29 March 2009 (UTC)
Robert, you always seem to be in favor of the solution that requires the most mess. If Pronunciations aren't L4 headers under POS, the only option left for words with the same etymology and one header that matches anything close to the present ELE standards is Pronunciation N. To have Wiktionary so inflexible in its possible layouts is ridiculous.
While you think Petey is in "too much of a hurry to knock something down", I think you're fighting to keep all the entries that we have listed where the etymology is more important than the POS the way they are so they don't have to be fixed.
Traditional dictionaries often have the pronunciation right after the headword. Even that would be something that could be developed for Wiktionary. — [ ric ] opiaterein — 11:08, 29 March 2009 (UTC)
Opiaterein, that solution isn't any better. Look at palma#Latin and alba#Latin to see why. (alba is the simpler case, with a single etymology) In some inflected languages, the spelling is the same for different inflections of the word, but the pronunciation differs. Whether the different pronunciations are given under the headword or after the definitions in L4, we have the same problem of tying specific pronunciations to specific senses.
Robert, I'm not "in too much of a hurry to knock something down". I've already spent several years weighing out the several proposed solutions. I am stating opinons that I have developed over those several years, so my response was not a "hurry" at all. In fact I had started writing what I posted even before you posted your comments, but had an edit conflict. I modified my comments to be a reply to yours instead of the independent comment it originally was. --EncycloPetey 15:39, 29 March 2009 (UTC)
I think that Robert is right that the major structural subdivision “Etymology” is used to separate distinct terms, which happen to be spelled the same. But what we're talking about here is variations in pronunciation of, and perhaps inflections of, a single term. Even if it's a bit awkward, it makes more sense to use qualified pronunciations, exceptional pronunciation entered in a context label, or usage notes, or a combination of these, if possible – whichever is most suitable for each particular case.
Using a “Pronunciation 1/2/...” fork complicates the structure. What if a homograph has multiple pronunciations? Do we add yet another level by entering “Pronunciation 1” under “Etymology 1?”
An intermediate and more compatible alternative may be to divide the structure at a lower level, by introducing redundant POS headers, as in горілки (does ELE prohibit this?). If necessary, they could be “Noun 1” and “Noun 2”, as for distinct etymologies. This introduces branching at a level that's already accommodated, rather than adding two different trunks to the tree. Michael Z. 2009-03-29 15:49 z
The problem is that those solutions aren't "a bit" awkward; they're incredibly awkward. Qualified pronunciations would require the definition to precede the pronunciations to which it applies, which would produce results most Wiktionarians would (rightfully) object to, since it would make the Pronunciation section of some articles ridiculously long and complicated without real cause. Would you be able to interpret a pronunciation qualifier that read (Classical, ablative feminine singular) and match it quickly with the appropriate definition line? What happens when the display form of the word must also differ because of optional diacriticals, as in true for Arabic, Hebrew, and Latin (among others)? The other proposed solutions have similar difficulties with structure, readability, logic, and implementation. I maintain that the use of Pronunciation N headers is the simplest to implement and most easily understood by our readers. --EncycloPetey 16:02, 29 March 2009 (UTC)
EncycloPetey is clearly more skilled at expressing my thoughts than I am :D In other words, especially after checking out the Latin entry he linked to... I have to agree with every point he makes.
Either Pronunciation N headers are needed in some cases, or etymology and pronunciation (and just about everything) should probably be nestled underneath the POS, which is a kinda big change, which might actually in some cases not be a good idea. It's hard to say at this point. — [ ric ] opiaterein — 00:44, 30 March 2009 (UTC)

How do the paper dictionaries handle this situation? Our adding an Etymology N header is exactly equivalent to a monolingual dictionary adding a redundant main headword; having separate bold entries word1 and word2. The only Ukrainian examples I've been able to find are inflections found within a single entry – but I'll keep looking. How do Lithuanian dictionaries handle this? Michael Z. 2009-04-06 19:02 z

Category:English uncountable nouns

If we have this category, shouldn't it include all English uncountable nouns? By which I mean, shouldn't the {{en-noun}} template add them? Either that or {{uncountable}} shouldn't include a category and we should delete the whole thing? Nadando 05:45, 27 March 2009 (UTC)

I've thought of that before, but didn't really think to bring it up... but yeah, {{uncountable}} isn't generally going to be used except on nouns that are both countable and uncountable in different contexts, so it's kinda strange that that template should be the one to add the category without {{en-noun}} doing the same thing. — [ ric ] opiaterein — 11:20, 27 March 2009 (UTC)
I believe that it is intentional: the idea being that we should take the trouble to mark individual senses as uncountable. Very many of the items using {{en-noun|-}} are wrong. The "-" sometimes is used because an editor didn't know the plural form or thought the plural form was unattested, non-existent, or rare. Nouns ending in the various suffixes that form abstract nouns are among the most commonly erroneous in this regard, but there are many others. A complete inflection-line solution would have to recognise many possibilities, include a major cleanup, and probably amend {{en-noun}}. Achieving consensus seems difficult. DCDuring TALK 15:41, 27 March 2009 (UTC)
Since just recently, we have {{en-noun|?}} for the indeterminate cases. Michael Z. 2009-03-27 20:14 z
Yes, prospectively that might work, unless editors stick to old habits (as they will). Adding all entries with "-" to :category:uncountable will include many entries wrongly. Perhaps we should have a hidden maintenance category inserted in all noun entries that do not have uncountable/countable sense markers and check each one. DCDuring TALK 11:00, 29 March 2009 (UTC)

Wiktionary:Easter Competition 2009

Feel free to contribute to this years Easter competition at Wiktionary:Easter Competition 2009. SemperBlotto 11:18, 27 March 2009 (UTC)

It is so early with 23 more days remaining, but anyway, better betimes than belatedly. The uſer hight Bogorm converſation 11:22, 27 March 2009 (UTC)
Well, having read Easter Monday, April 13., I at least realised why you initiated it now. Could you kindly change the italicised expression to Catholic Easter Monday, since there are also Orthodox participants, who will be celebrating on April 20. Just to be exact. The uſer hight Bogorm converſation 11:26, 27 March 2009 (UTC)
Umm, I think Easter Monday, April 13 is clear enough. The fact that there are two is irrelevant (we might just as well have put Monday, April 13th - there are a lot more Mondays that Easter Mondays). This could we quite a hard competition, would a single rhyming couplet be sufficient? Conrad.Irwin 14:40, 27 March 2009 (UTC)
A rhyming couplet would be acceptable. But it might not get too many votes (who knows?). SemperBlotto 17:08, 27 March 2009 (UTC) p.s. We might make it a "spring" competition next year.

Move Category:Northern English > Category:Northern England English

(Recently moved from Category:Northern English dialect; see #Standardizing English dialect names above).

EP pointed out correctly that in the future this could conflict or be confusable with Category:Northern US English, etc. The proposed name is a bit awkward-sounding, but perfectly correct and unambiguous, and harmonized with other names in Category:Regional English. It is better than the only alternative, Category:Northern English EnglishMichael Z. 2009-03-27 20:22 z

Question, are you planning on special casing the use of English in {{Northern}} or coming up with a replacement template? While I have not yet needed to place it yet, I was planing on using {{Northern|lang=ca}} for usages in the Northern Catalan dialect. As things currently are, I only need to add the category Category:Northern Catalan once I have an entry that needs it. Carolina wren 23:32, 27 March 2009 (UTC)
Excellent question! I knew there was a region I post all my crazy plans here.
Strictly speaking, the regional templates should each represent a region, not a direction, so we would need {{Northern England}} and {{Northern Catalonia}}. The exact text of the template would depend on the case, but should probably refer to the region too, unless the region is very strongly associated with the language in English (e.g., Northern might be suitable for English, but Northern England is better anyway – and the template could theoretically be used for another language, like Northern England Romany or something).
I could make a template for you in the next day or so. Michael Z. 2009-03-28 00:42 z
Thanks, but I can manage the template creation when I need it. However, Northern Catalonia is subject to some confusion, especially for English speakers, since w:Northern Catalonia is in France, not Spain so I'm doubtful that would be the best template name. Probably should be {{Roussillon}} if a stand alone template is deemed proper. Carolina wren 01:42, 28 March 2009 (UTC)

Done. Moved {{Northern}} to {{Northern England}}, and changed the text from Northern dialect to Northern England. Now to manually check that every entry is English. Michael Z. 2009-03-31 17:41 z

Also deleted the “directional” template:Northern and created template:Northern Crimea, to support Category:Northern Crimean TatarMichael Z. 2009-03-31 18:17 z

Move Category:England and Wales English > Category:England and Wales Law

I mistakenly moved this from Category:England and Wales, before Kaixinguo pointed out that it was created for the legal code of England and Wales. The judicial websites use both “England and Wales” as an attributive and “English and Welsh” as an adjective, but the first version makes it a bit more plain that this refers to the law of a combined entity.

There are three transclusions:

  • commonhold, the only relevant legal term.
  • cookie (2) duplicates sense no. 1; that one's usage outside of North America is already covered by the qualified chiefly North American, but perhaps it should be chiefly North American, also British, the separate sense for Scotland making the exception clear.
  • whisky: this could be labelled British, Canadian, and the note or definition amended to specify that whiskey is Irish and US.

 Michael Z. 2009-03-27 20:40 z

Done. The template is now {{England and Wales law}}, and the correctly-capitalized topic Category:England and Wales lawMichael Z. 2009-03-31 17:25 z
'England and Wales law' sounds odd, it ought to be called 'English and Welsh Law' if you must insist on turning everything into an adjectival form Kaixinguo 23:06, 5 April 2009 (UTC)
It's the England and Wales Court of Appeal, not English and Welsh Court of Appeal. It's correctly the attributive form of a single noun representing a political entity, not the application of two adjectives representing distinct nationalities or languages. Law of England and Wales might even be better. Michael Z. 2009-04-06 16:14 z

British English

My English dictionaries label many entries as Brit., for British. I bet yours does too. British English, British, Brit., or Br, etc., is used in virtually every dictionary, including the CCE, CED, LDE, OAL, COD, AHD, RHD, and W3. (The ChD and OED assume British English as a baseline, e.g., the latter only applies a regional label “when the word is not current in the standard English of Great Britain.”) Source: Norri 1996.

British in dictionaries:

  1. Collins English Dictionary: “mainly to distinguish a particular word or sense from its North American equivalent or to identify a term or concept that does not exist in North American English”
  2. Concise Oxford Dictionary: “the use is found chiefly in British English (and often also in Australian and New Zealand English, and in other parts of the Commonwealth) but not in American English”
  3. Canadian Oxford Dictionary: “the use is found chiefly in British English (and often also in Australian and New Zealand English and in other parts of the Commonwealth except Canada) but not in North American English” [my emphasis for differences from no. 2]
  4. Merriam–Webster Online: “The label British indicates that a word or sense is current in the United Kingdom or in more than one nation of the Commonwealth (as the United Kingdom, Australia, and Canada).”[8]

There is no such dialect as United Kingdom English (UK English) or Commonwealth English, and dictionaries and linguists don't use these terms. They speak of British English, which originated in Britain and was brought to British overseas territories, and also of various specific regionalisms, like Australian, South African, Indian, etc. British English predates the establishment of the United Kingdom (1707 or 1800) and Commonwealth of Nations (1931).

Category:UK (3,139 entries) could be renamed Category:British English, to conform to the usage in linguistics and lexicography and in every dictionary of English, and to fit with every other name in Category:Regional English. Its description can be rewritten to comply with the usage in English dictionaries.

{{UK}} could be moved to {{British}}/{{Brit}}, and converted to a standard regional template with the text British. If anyone knows of any words whose use actually stops at the state boundaries of the United Kingdom, let me know.

Category:Commonwealth English: the 79 entries can be moved to appropriate regional English categories and the category deleted. {{Commonwealth}} can be redirected to {British} and orphaned.

Somebody had to bring this up. We really ought to fix this now, or it becomes an embarrassment. If there are no serious objections, then I will get on with it as soon as I think I can get away with it. Michael Z. 2009-03-28 00:20 z

"If anyone knows of any words whose use actually stops at the state boundaries of the United Kingdom, let me know." Surely this must apply to a whole load of regional and dialect terms. I'm thinking of UK-subcultural words like pikey, chav, and grebo; Scots and Welsh borrowings and regionalisms like twp, cwm, canna, bevvy... No doubt some of them have spread beyond the UK, but I bet many have not. Equinox 00:34, 28 March 2009 (UTC)
So those are all used in England, Scotland, Wales and Northern Ireland, but absent from the vocabulary of the Republic of Ireland and elsewhere? Which dictionary cites them as such? Michael Z. 2009-03-28 01:27 z
Twp. for “township” is used in western Canada and USA, and I don't see dictionaries listing it with any dialect restrictions at all.
According to my dictionary (CanOD, based on the COD), cwm is not regional, but refers to a coomb “in Wales”, or a glacial cirque in geography (“Geog.”; and Coomb is labelled “Brit.”). Per M–W, cwm is “Chiefly British”.[9] Cwm's origin is “[Welsh]”.
One of the (informal) references for chav says it is widely known in southern England, with ned being the Scottish synonym.
I wonder if canna is really Scottish, or merely used by others to evoke Scottishness in speech – our only citation was spoken by a Canadian actor, written by an American writer for a U.S. TV show. In any case, that might be Scottish or chiefly Scottish, but not UK.
I can't find evidence that any of these are labelled UK by any dictionary or other source. Michael Z. 2009-03-29 19:46 z
Checked the OED. None of these is marked as UKMichael Z. 2009-03-31 19:51 z

I'm getting started. I've updated the description of category:UK to better describe British English in the dictionary context.

Next, I'll change the template text to British, and move {{UK}} to {{British}}, with redirects from {{Brit}} and {{brit}}. Eventually I'll replace {UK} with {British} in entries. I'll also start gradually removing the easy entries from category:Commonwealth English.

I'll not merge and rename the categories yet. May as well give people a chance to take notice and make a fuss before things get fixed so well that it is hard to restore the current shambles :-). Michael Z. 2009-03-31 19:51 z

You really should have offered people a chance to vote on this before wading into it in your size twelves. Kaixinguo 23:09, 5 April 2009 (UTC)
Is there a problem with the proposal, or is this just a point of order? Michael Z. 2009-04-06 05:29 z

New {{American}} and Category:American English

See #Move Category:US to Category:American English above for prior discussion.

It seems to me that DCDuring's objections to the move was to a large extent based on the dual usage of {{US}} as both a regional dialect template and a geographic template. Since we seem to be adopting a pattern of using capitalized adjectives for the dialect templates, it seems to me that creating a new category with the current template redirect (used by 7 articles at present) repurposed as a pure dialect template would work here to generate the desired separation. If repurposing {{American}} is objectional, perhaps {{Am.}} would work instead. Carolina wren 00:42, 28 March 2009 (UTC)

US for Category:American English and American for Category:United States of America isn't logical to me – both look like dialect labels. The template names in Category:Regional context labels, and their several redirects, vary quite a bit, referring to regions, either by noun or adjective.
Subject labels tend to refer to political rather than linguistic divisions. They would be more clearly distinguished if they followed the form used for geographic subject labels in the COD: {{in the USA}}, {{in the UK}}, {{in Ireland}}, etc.
But frankly, why introduce subject labels into this discussion at all? This is an independent discussion. We could choose to introduce subject labels, or we could choose to treat the subject as part of the definition, as is very common in dictionaries and in our existing definitions, or we could leave things as they are. Whichever of these we may choose, we still have to fix bad category names anyway. Michael Z. 2009-03-28 05:23 z
First, I think you misunderstood my intent which was:
  1. {{American}}Category:American English
  2. {{US}}Category:US, tho since Category:United States of America already exists I suppose we could repoint it there at least temporarily. (However, I think we should use Category:United States as do both Commons and Wikipedia.)
Second, this is not an independent discussion. Editors will expect some degree of similarity between templates, labels, and categories. If the category is American English I believe most editors with some knowledge of how we tend to do things around here will expect {{American}} and/or {{American English}} to populate it and if it doesn't we'll be confusing them or at the very least be forcing them to engage in an extra step. I agree that {{in the USA}} would be clearer for the geographic template, but that argues in favor of using {{in the USA}} (or {{in the US}}) and {{American}} while deprecating and eventually deleting {{US}} as ambiguous. Carolina wren 06:00, 28 March 2009 (UTC)
Okay, but Category:US is intended for terms and usage restricted to the USA, and not for the subject “about the USA,” and it contains 3,200 items. I'm sure that there are a few, or dozens, or maybe even hundreds of dialect labels misused as topical labels – but whether we rename the category or not, they will be continue to be incorrect and requiring the exact same cleanup. This is why it's unreasonable to make renaming these badly named categories contingent on making a plan for geographic subject labels. Let's fix what's wrong, now, then consider adding new optional features.
Regarding the name of the dialect template, dictionaries use either American English, American, AmE, U.S., or US for their dialect label. So any of these should be able to serve as dialect labels (and in fact, {{US}} already has redirects {{American English}}, {{American}}, {{America}}, {{U.S. English}}, {{US English}}, {{USA}}, {{U.S.}}, and {{us}}). And that's why geo subject labels, if we ever do choose to start using them systematically, should take a significantly different form than just the name of a country or region, or its adjectival form.
Short version: dialect labels are common, so each “ambiguous” forms should be usable as dialect labels. Geo subject labels are relatively uncommon in dictionaries, so they should take an unmistakably distinctive form, even if it takes typing an extra word or two. Michael Z. 2009-03-28 06:50 z
While in "theory" {{US}} and Category:US are for dialect usage, in "practice" they are a mixture of dialect and geographic usage at present, and need to be separated out. Unlike print dictionaries, we don't have only editors who will know what was intended by each label. Since practice has shown that {{US}} will be interpreted ambiguously that argues in favor of deprecating the use of ambiguous terms as dialect labels regardless of what theory might say. Real world experience should always trump all other considerations. Carolina wren 15:05, 28 March 2009 (UTC)
Is this a real-world problem resulting from some attribute of this label, or is it also only in theory? 1) In practice, what proportion of {{US}} labels are applied as subject labels – 50%, 10%, 1%, 0.1%? 2) In practice, is this any worse for {US} than for a dialect label with an unambiguous term? 3) Which labels have ambiguous and unambiguous terms? Michael Z. 2009-03-28 15:48 z
Quick survey: I picked ten transclusions of {{US}} (the 1st, 51st, 101st, 151st, etc.), and evaluated how they are used: pound, pip, pants, gull, hectometer, wayleave, up, closet, professor, stickshift. Zero problems.
So I scanned through the entire category looking for USA-related topics. The only misuse of the template I could find was in particular proper nouns, which are very obviously things in the USA, but that are internationally recognized: Independence Day, Founding Father, Ivy League. In my opinion, this wouldn't be difficult to weed out.
(I did find a huge proportion of terms marked US which are used outside of the US, certainly in Canada and possibly beyond, but that is a very different problem)
Unscientific, but I don't see this the use of US as a topical label as a problem of huge proportions. Michael Z. 2009-03-28 16:35 z
I added a basic description to the label templates {{North America}}, {{UK}}, {{US}}. This should help prevent misuse. Michael Z. 2009-03-28 16:49 z
I did my own survey [using the category rather than the template), and agree that proper nouns are the largest problem, which isn't surprising since whether you call it a tire or a tyre you still need four on your car no matter which side of the pond you are on. A second unrelated problem (affecting 3 of the 40-odd entries I looked at is miscategoriztion because of people using {{US}} in the Alternative forms/spellings section or in the glosses of foreign language words to mark that linked to term as American, a problem that I expect afflicts all dialect templates. Still, I'm not objecting to the change from Category:US to Category:American English with or without the template change, but trying to come up with a solution to the objections raised by others that caused you to decide to back burner this for lack of consensus.Carolina wren 18:48, 28 March 2009 (UTC)
In the second case, do you mean editors using {US}, where they ought to use {qualifier|US} to prevent inclusion in a category? That shows up regularly. I don't think context labels should ever occur in such a list of links, so a bot could correct or at least flag such occurrences.
I do appreciate your trying to moderate. Still, this project has resolved to use dialect labels. Our dialect label usage has serious problem which must be corrected.
But Wiktionary has not resolved to use geo labels. Although I had planned to write up a proposal myself, we are not using them now, and it is not preordained that we ever will, because there are other solutions based on broad precedent in print dictionaries (e.g., in the definition).
To hold fixing dialect label problems hostage by insisting that we adopt geo labels is not an acceptable way to move forward. Michael Z. 2009-03-28 20:41 z
I wonder if we could modify all regional/dialect labels specifically so that they could be used outside of the definition line, and would only categorize when used explicitly with {{context|...}} or even, ideally, categorize differently when used with a different meta-template in the pronunciation section. This is a relatively large technical feat though so I'd want to be sure it's the way to go. DAVilla 09:07, 5 April 2009 (UTC)
For pronunciations we have {{a}} (for “accent”). Does that fit the idea, or need improvement? Michael Z. 2009-04-06 05:57 z
Yes, that's the exact example I wanted. If {{US}} were redirected and changed to "American English", would it make sense that not only {{context|US}} but also {{a|US}} reflected that, each properly categorized? Or similarly for any other region. In other words, is there a 1-to-1 correspondence between regional labels of restricted use and pronunciation, either broad or narrow? DAVilla 17:57, 8 April 2009 (UTC)


I was wondering why "lang=zh" is rendered as Mandarin?


Says "Mandarin Wikipedia", but that's patently false.

Then there are the zh categories which are described identically to the cmn categories.

Isn't zh the umbrella for all Chinese? 06:40, 29 March 2009 (UTC)

It isn't "patently false" it is quite correct: zh.wikipedia.org is in Mandarin. There is also zh-yue in Cantonese, and zh-min-nan in Min Nan. Yes, they should be named cmn.wikipedia.org, likewise yue and nan. But they are not (yet, there is a request that has been pending for ages ...) So our {{zh}} template reads "Mandarin", and that is the default meaning of the code within Wikimedia. Using lang=cmn with the {{wikipedia}} template would be better, but the WMF software doesn't recognize "cmn" as an interwiki prefix (yet), so we can't. Robert Ullmann 06:59, 29 March 2009 (UTC)
I thought we followed ISO 639 language codes in our templates – why on Earth would we follow Wikipedia's website naming? Michael Z. 2009-03-29 14:40 z
We have to consider it any time there is a deiscrepancy between the two, because there are some cases where a project exists under the wrong ISO code. If we don't allow for that, then we end up with visitors from other projects confused, or with entries or translations given under the wrong lanugage name. The Norman language has no ISO code, but the MW projects use nrm (e.g. [10]), even though nrm is the ISO code for the Nimbari language. This problem does not exist solely on the Wikipedias; it also affects Wiktionary interwikis when a Wiktionary project is under an altered or incorrect MW code. --EncycloPetey 15:28, 29 March 2009 (UTC)
Still, when an ISO code does exist, shouldn't we use it? After all, much of the world is still not visitors from other projects. If it doesn't interfere with something else, then a redirect should bring those visitors to the correct code, instead of encouraging a non-standard with little justification, mixed in where an ISO code is expected. Michael Z. 2009-03-29 18:36 z
Well, we certainly always give preference to ISO codes over MW codes, but there have been some compromises. For example, {{als}} has been deleted, as it had almost no correct use (we have very little Tosk Albanian), and was constantly being confused with the MW code for Alsatian. Once the Alsation projects are given proper codes, we can undelete it. However, having it exist as "Alsatian" was certainly not acceptable. -Atelaes λάλει ἐμοί 18:43, 29 March 2009 (UTC)
Ouch. You can't really make things more clear than making the template's entire contents be “Tosk Albanian,” can you? Michael Z. 2009-03-29 19:48 z
That won't help. For comparison, {{law}} is a language code, yet people continue using it as a context template. -- Prince Kassad 20:27, 29 March 2009 (UTC)
For context templates why don't we just have 4-letter codes (all lowercase) or 3-letter codes with an initial cap? e.g., {{law.}} or {{Law}}. Eventually we will be representing many, many more languages whose ISO codes happen to correspond with words or prefixes commonly used as topical specifiers.—Strabismus 20:38, 29 March 2009 (UTC)
For this and other reasons of clarity, it may be helpful to have standard prefixes for all usage templates. We have our category:Context labels sorted into five subcategories (of which “grammatical context” is not context at all), but this could be sorted out as POS plus the various kinds of restricted usage (e.g., currency, regional variation, technical terminology, taboo, insult, slang, style or register, and status, per Landau 2001).
This would avoid the kinds of ambiguity we have inherent in {{US}}, by having {dial-US} or {dial-American} for dialect and regionalisms, and {topic-USA} or {subj-USA} for geographic subject. Maybe editors typing in {tech-medicine} might get a hint that it's not meant for commonly used words like or cough (“a sudden, usually noisy expulsion of air from the lungs, often involuntary”). Michael Z. 2009-03-29 23:08 z
From my experience, there are too many unnecessary uses of the label Med. (for medicine) in dictionaries nowadays. Most of these uses would be better represented by physiol. (for physiology) or path. (for pathology). The label Med. should be reserved for strict medical usage as in names of remedies, medical procedures, and the like.—Strabismus 00:36, 30 March 2009 (UTC)
Cough is not a technical term. Its usage is not limited to the field of medicine, physiology, or pathology. It shouldn't have any usage label. A restricted usage label – “context label” – is not like a Wikipedia subject category. Michael Z. 2009-03-30 01:40 z
Well, it could be seen as a physiological phenomenon. At any rate, technical or not, most words probably fit into at least one category. As far as "cough" goes, we haven't reached a consensus. But, yes, there are many words which are difficult to categorize.—Strabismus 13:16, 30 March 2009 (UTC)
Do you understand that cough mustn't have a context label, because its usage is not restricted to a specialized context? Michael Z. 2009-03-30 15:35 z
Mustn't? Well, just because it's not restricted to a specialized context doesn't mean that it shouldn't be assigned ANY labels or categories. Otherwise, bat would need to get rid of its mammals, baseball, etc. labels and categories.—Strabismus 20:22, 30 March 2009 (UTC)
Yes, mustn't. I see that bat rightly doesn't have these labels. Perhaps there is vocabulary used by experts in mammology, but dictionaries don't use “usage” labels like {{mammal}} (terms shunned by the cold-blooded?). This is why that template is up for deletion.
I'm not talking about categories at all – just labels – although our ad hoc use of categories doesn't support any dictionary functions that I can see. It would be better to apply them with labels only, thus accumulating restricted vocabularies, rather than blocks of terms vaguely related by the random whims of editors. Michael Z. 2009-03-30 21:23 z
When the context makes the sense clear and label seems superfluous. Some context templates also add the appropriate category to the lemma in question, thereby doing two jobs in one. But as a general rule, CTs should be used only when context doesn't do the job. Wouldn't you concur?—Strabismus 20:31, 31 March 2009 (UTC)
If the usage of a sense is restricted to a certain context, then it should have a label indicating this. Of course this can be done by a phrase in the definition or by a usage note instead, but then the information is less structured, less consistent, and requires a category to be manually added too.
But its the restricted usage which determines whether the label belongs, not whether the definition is clear or not. Michael Z. 2009-04-01 02:50 z
We've run into this problem before and weren't quite sure what to do. For instance ballpark once had just baseball as the context, but this was extended to other sports, because when talking about sports generally it could mean a park for baseball specifically. (Then someone decided it could include other types of sports as well.) I wonder though if we shouldn't just tag the term with "in general" as well. Coughing is a technical term, but it's not just a technical term. 00:33, 31 March 2009 (UTC)
Yes, general is a feasible label as many vocables have different senses which are used, more or less, at the same frequency.
Cough can be (and many times is) used in technical contexts but the technical term is tussis.—Strabismus 20:31, 31 March 2009 (UTC)
In dictionaries, things in general are the ones without labels. ballpark means the same thing to everyone, in any context, or in isolation. The fact that the sport of baseball is played there is part of the definition. Since its use is not restricted to a sports context, it mustn't be labelled as such. A run on the other hand, means one or more things in general, but it has a specific meaning in the context of baseball, so this sense should have a label.
If a term is only used in restricted settings, like I suppose tussis, then it won't have unlabelled senses.
But for many terms you can include separate non-technical and technical senses, without and with a label respectively, if the lay understanding of something is different than its definition in a specialized field. I don't know if there's a special medical definition for cough. But, for example, we all refer to basically the same thing when we say clay – but while to most of us it's an earthy substance that's pliable when wet, to a geologist it's defined as stone particles 4–5 µm in diameter, and to a potter as a substance with a sufficient quartz content to vitrify at a particular temperature. Each of these could be a sense with (or without) a context label. Michael Z. 2009-04-01 02:50 z
EP, I think you mixed up Nimbari (code nmr) with Narom (code nrm). The problem still remains though with how Norman was coded. --Bequw¢τ 00:55, 30 March 2009 (UTC)
Actually, we have plenty of Tosk Albanian, we just call it Albanian ({{sq}}/{{sqi}}) since it's the standard dialect. Angr 09:32, 30 March 2009 (UTC)

WT:ELE#Context labels

I wrote a new section, because there is some misunderstanding. Please review. Michael Z. 2009-03-30 18:03 z

I draw your attention to the banner across the top of that page. Please revert. DCDuring TALK 19:36, 30 March 2009 (UTC)

Are you against any parts of the addition? If so, let's start a vote now, but better to discuss the detail first. Michael Z. 2009-03-30 19:53 z
I have labelled the section as a proposal, so there are no misunderstandings. Michael Z. 2009-03-30 20:01 z
Please revert. What you did is a "modification". It is not minor. Modification requires a vote, as the banner indicates in so many words. DCDuring TALK 20:42, 30 March 2009 (UTC)
Oh my goodness – we really need to change that "vote for everything rule" soon. I happen to like common sense … but I guess rules are rules, until we change them. --Eivind (t) 21:11, 30 March 2009 (UTC)
While I feel bad for doing so, I feel I must agree with DCDuring on this one. First, let me say that I do appreciate your taking the time to try and improve policy. Contags are something very clearly in need of such improvement. However, while I'm currently leaning to a "BP consensus should be sufficient to effect such changes", there isn't even that for your edits. I feel very confident that contag policy is going to be contentious, and a very long discussion. Inasmuch as I'm not looking forward to having it, it needs to be done. If you'd like to move your content to a personal page, so that we can refer to it, by all means do so, but I think you should revert. -Atelaes λάλει ἐμοί 21:18, 30 March 2009 (UTC)
Personally, I think it's fine to add proposals, provided they're clearly set off as proposals, and provided they don't take over the page (not too many, none too long, all under active discussion, each reasonably likely to be accepted, etc.). —RuakhTALK 23:51, 30 March 2009 (UTC)

So we wait a month or more to have ELE say what we should already be doing? Fine. Vote at Wiktionary:Votes/pl-2009-03/Context labels in ELE. I'd like to mention that perhaps we do have consensus, since all I see is “point of order” discussion, and not one jot about what I've actually proposed. Please register your contentions promptly, so we can get on with it. Cheers. Michael Z. 2009-03-30 21:41 z

The vote needs to follow discussion, but I wonder if there's another matter which has already been discussed and may be ready for a vote...... -Atelaes λάλει ἐμοί 22:16, 30 March 2009 (UTC)
Well, the vote's discussion links point here. When we're happy with the wording, I'll remove the traffic light and we can vote. What else needs voting? Michael Z. 2009-03-30 22:34 z
Whether anything else needs voting. :-) -Atelaes λάλει ἐμοί 22:41, 30 March 2009 (UTC)
Anyone who was wondering what I meant by that should see Wiktionary:Votes/pl-2009-03/Removing vote requirements for policy changes. -Atelaes λάλει ἐμοί 06:26, 31 March 2009 (UTC)
We have no shortage of "places" to put proposals, guidelines, etc in Wiktionary space. Why would we put a proposal on one of the tiny number of places where we specifically say not to? So we can show we show that "We don't need no badges" ? DCDuring TALK 00:35, 31 March 2009 (UTC)
Agreed. Proposals do not belong in official policy documents. -Atelaes λάλει ἐμοί 00:51, 31 March 2009 (UTC)
Welcome to wikibureaucracy! Just like real bureaucracy except anyone can join in!
Oh, and thank you for this positive contribution. 01:37, 31 March 2009 (UTC)

So, now that we've had a big argument about the comment's insertion, perhaps we could move on to the content itself? Personally, I saw nothing in there that I disagree with. However, I haven't given the topic as much thought as some, and thus reserve the right to change my mind if someone gives a particularly convincing argument. I am initially supportive of including the content as it was originally written. -Atelaes λάλει ἐμοί 00:51, 31 March 2009 (UTC)

The formatting could be better, and the wording of that final sentence might be implied by some as giving official sanction to Template talk:context as policy (which it doesn't), tho I can't think of a clarifying substitute that wouldn't sound hopelessly bureaucratic. The proposal itself seems common sense. — Carolina wren discussió 02:09, 31 March 2009 (UTC)
I changed it to “These templates are based on {{context}},” letting the reader discover herself that the linked template is documented. What about the formatting needs improvement? Michael Z. 2009-03-31 03:27 z
Main format improvement would be to also show the output of some or all of the examples similar to WT:ELE#Homophones and WT:ELE#Rhymes. — Carolina wren discussió 04:00, 31 March 2009 (UTC)
I added one example, but adding them all in standard form makes it hopelessly cluttered. The silly boxes on preformatted text add a lot of noise, and rendered definition lines look like part of the text. Maybe I'll try to redo them all in manual HTML, but not today. Michael Z. 2009-03-31 05:12 z
Okay, added all the examples in side-by-side tables. Not bad. Michael Z. 2009-03-31 06:09 z

Thinking on it some, another thing that is needed is a foreign language example. Granted ELE is often written as if it were EELE (English Entry Layout Explained) with WT:ELE#Inflections the most flagrant example of a section which is in need of a rewrite to reflect actual practice. — Carolina wren discussió 04:09, 31 March 2009 (UTC)

Go ahead and give me a good example. I've chosen all real example code from WT entries. I don't think it makes much difference, since the labels and definition text is all in English anyway. Michael Z. 2009-03-31 05:12 z
Thanks for that. I forgot all about the lang prameter. That's important, so I moved it up (and edited). Michael Z. 2009-03-31 16:11 z

{{European French}}

{{European French}}

Do this template and the contents of Category:European French represent the French spoken in France, Belgium, and Switzerland, or just in France? Michael Z. 2009-03-31 20:26 z

Just found Wiktionary:Information_desk#Varieties_of_French. Answer: sorta both, chiefly the latter. Michael Z. 2009-03-31 21:03 z


Wikisaurus - non-English entries

I and Jyril differ in how to treat non-English entries in Wikisaurus. I propose to treat them in close analogy to the treatment at Wiktionary: each non-English headword of a word cluster should get its own page, as is now done at Wikisaurus:příbuzný. The translations between Wikisaurus entries can be entered into L5 heading "Translations", as now done at Wikisaurus:relative, that is:

 * Czech: [[Wikisaurus:příbuzný]]. 

I oppose creating translation subpages of Wikisaurus entries, such as Wikisaurus:sound/fi and Wikisaurus:drunkard/translations. Instead, I think the Wikisaurus entry for the Finnish cluster for drunkard should look like Wikisaurus:deeku, that is, much like the English Wikisaurus:drunkard, just that the words are Finnish. --Dan Polansky 19:10, 23 March 2009 (UTC)

The only reason I can see to use subpages would be as a means to differentiate between two languages if they would most naturally use the same headword. However, since the standard Wikisaurus format is already using language as an L2 header, just like the Wiktionary entries, suboages aren't needed for that purpose. Carolina wren 20:03, 23 March 2009 (UTC)
Foreign languages in Wikisaurus? What? We don't do foreign languages in Wikisaurus. If we did I certainly wouldn't want it to be on another page, not a subpage or separate entry, rather on the very same page as pertinent to the topic. However, I was told a while back that "we" didn't want that. 00:54, 24 March 2009 (UTC)
First, I think every wikisaurus page should provide the "gloss" of each synonym, since most every synonym has a gloss of some kind. Second, I see no reason for us not to have wikisaurus entries on foreign words, but I would disagree with having translations in wikisaurus entries on English words (except for words that have entered the english language, like siesta for rest, or aloha for goodbye). However, if we have a wikisaurus entry on a foreign word (such as Wikisaurus:deeku, noted above), we should have a gloss in English for each of those words. This is the English language wiktionary, after all. bd2412 T 21:06, 24 March 2009 (UTC)
I've added glosses to Wikisaurus:juoppo, to which I have moved Wikisaurus:deeku, as Jyril though "juoppo" is a better headword. --Dan Polansky 08:10, 25 March 2009 (UTC)
What on Earth would we do this for? What would Wikisaurus:juoppo have that fi:Wikisaurus:juoppo didn't, or vice versa? Either launch Wikisaurus as a separate, language-independent project, or leave the foreign language synonyms to the foreign language Wiktionary. DAVilla 08:20, 26 March 2009 (UTC)
This is a good point. Let us have a look at Wikisaurus:příbuzný, which, having a lot of hyponyms, is more interesting than Wikisaurus:juoppo currently is. Let us compare Wikisaurus:příbuzný with the imaginary cs:Wikisaurus:příbuzný. The differences: (a) all the headings such as "Synonyms" are in English, (b) glosses that appear on hovering over with mouse over words are in English, (c) the indication of lexicon such as "slang" is in English, (d) the words are one click away from the English Wiktionary. For (a), it seems worthless for Czech, as the Czech entry would have "Synonyma", which is easily guessed at, but the rendering of "Synonyms" into other languages, such as Japanese 同義語, is harder to guess.
The rationale for having non-English entries in English Wikisaurus seems to come close to the rationale for having non-English entries in English Wiktionary, to me anyway.
Whether these differences bring enough added value to justify the existence of Wikisaurus:příbuzný and the likes I do not know. At least, as it now stands, Wikisaurus:příbuzný can be used by an English speaker to test his knowledge of Czech, quickly verifying the guessed meaning by hovering over Czech words with mouse. --Dan Polansky 19:28, 26 March 2009 (UTC)
Not that I'm complaining, but can the glosses just be laid out in text (rather than having to hover over each word with the mouse to see the meaning)? bd2412 T 23:46, 26 March 2009 (UTC)
Also, regarding Wikisaurus:příbuzný, is there no more nuance to these words? If I say someone is an alcoholic, a drunkard, and a lush, I mean three different but closely related things. bd2412 T 23:53, 26 March 2009 (UTC)
Oops - I meant Wikisaurus:juoppo, with respect to the nuance question. :-/ bd2412 T 04:50, 27 March 2009 (UTC)
There probably are nuances, but all the Wiktionary entries of Finnish words listed at Wikisaurus:juoppo just say "drunkard". --Dan Polansky 19:27, 28 March 2009 (UTC)
This brings to mind the old saw about Eskimo words for snow. bd2412 T 19:45, 28 March 2009 (UTC)
Thank you for doing the comparison. We should ask ourselves what kind of users need to be able to distinguish between shades of meaning in a foreign language, and if that's worth the effort of translating a thesaurus into every Wiktionary. Considering how watered down translations can be in general, where several definitions given in a dictionary of that language compare in many cases to just a single line or word here, in my opinion our efforts in the Wikisaurus space would be better focused on creating the as yet non-existent entries in the native language. DAVilla 00:33, 27 March 2009 (UTC)
I've focused my efforts on creating English entries in Wikisaurus, and am planning to stay with that.
Given that you disfavor Wikisaurus:juoppo and the likes, what is your take on the existence of Wikisaurus:sound/fi and Wikisaurus:drunkard/translations, then? --Dan Polansky 19:16, 28 March 2009 (UTC)
It's a mess, I admit that. --Jyril 20:28, 28 March 2009 (UTC)
If it's going to be on English Wiktionary, it may as well be incorporated into the page, rather than on a subpage. I'm just not sure that's worth encouraging. What I would encourage is the creation of Wikisauruses on other Wiktionaries, and populating them! And in that case, why not interwiki links? 20:32, 5 April 2009 (UTC)
Couple of comments: firstly, I don't agree that using language-specific headwords is a good idea as this is an English-language Wiktionary. However, maintaining synonym lists for other languages can become really frustrating if we don't have Wikisaurus entries for them. My suggestion is that we use language-specific subpages for different languages, for example Wikisaurus:Drunkard/cs for Czech equivalents for the word "drunkard". --Jyril 12:27, 28 March 2009 (UTC)
I can understand that you can oppose having non-English headwords in Wikisaurus, but the reason that you have given is beyond my understanding. Yes, this is an English language dictionary, and yet, we do have entries for juoppo and příbuzný. I don't see the connection. In fact, this is a multilingual dictionary, whose metalanguage is English, rather than just an English dictionary. --Dan Polansky 19:23, 28 March 2009 (UTC)

Suggestion: Why not insert an ISO language prefix after the namespace for all non-English Wikisaurus entries? That is, use Wikisaurus:cs:příbuzný instead of Wikisaurus:příbuzný. This will follow what we already do for Categories.--EncycloPetey 08:12, 29 March 2009 (UTC)

I was about to suggest the same thing. We already do it for the Rhymebook (though differently, and I'm not sure why english also takes one), and it would help with things like homonyms: crime, art etc. Furthermore we don't want foreign-language Wikisaurus entries to be detected by {{ws}}, which would be bound to happen. Circeus 23:55, 16 April 2009 (UTC)

Dictionary of American Regional English

I was reading this article, and noted that on page two it says, "After the final volume is published, the next phase of the project will be to put the dictionary online. Hall envisions an online edition that will be updated constantly" - which, is, of course, what we do. I thought that I might as well try to contact these folks and ask if they'd like Wiktionary to "host" their definitions. I realize that there may be some CFI issues, but at the same time it seems that these people have put a tremendous effort into researching and verifying regionalisms. Would anyone object to my tendering the proposal? bd2412 T 19:49, 25 March 2009 (UTC)

It's very brave of you, as w:Sir Humphrey Appleby used to say. DCDuring TALK 20:18, 25 March 2009 (UTC)
Okay then, hearing no objection, I have emailed the coordinator of that project the following:

Dear Professor Hall,
I am an administrator of Wiktionary, a sister-project of Wikipedia working towards the creation of a free, open content dictionary containing all words in all languages. In reading the news of your near-completion of the Dictionary of American Regional English, I was struck by the description of your goal to put the dictionary on the internet in the form of "an online edition that will be updated constantly." At Wiktionary, that is precisely what we do. Our project has, with the input of thousands of volunteers, amassed over 1.2 million entries. We work constantly to expand and improve our content. We employ a wide array of tools and processes to verify the attestability of our entries against our criteria for inclusion, and although we include regional and colloquial terms, we are well-policed against the addition of made-up words that have not effectively entered any vocabulary.
If you would be willing to provide us your work under the GFDL (an open-source license which essentially allows free downstream use of a work so long as the initial author of the work is identified and credited), we will upload all of your definitions, etymological information, usage maps, and verification information into Wiktionary. We will identify authorship by your project of each term, and index and categorize the entries for easy searchability. We will do all of this for free, out of our love for words and the expansion of access to information about them. Please let me know at your convenience if you are amenable to pursuing this opportunity.
I look forward to hearing from you.

So, the die is cast, let's see if anything comes of it! bd2412 T 23:44, 26 March 2009 (UTC)
Just out of curiosity, did you really sign the email


?—Strabismus 14:17, 27 March 2009 (UTC)
I did. Met with glum failure, too (at least it was quick). Here is the response:
Thanks very much for your interest, but Harvard University Press holds the copyright and intends (perhaps with the cooperation of another Press) to host the online edition.
Best wishes,
Joan Hall
Such is the reaction of many publishers to online entrepreneurs/non-professionals. It's a shame, really. I guess they don't want to lose any profit they might get from just "doing it all by themselves". It's hard to get people born in the Baby Boom Generation and before to understand the concept of "charity publishing"… Oh, well. At least you tried, I'll give you that, BD2412. :)—Strabismus 21:06, 28 March 2009 (UTC)
The point is moot. Eventually, we'll get all their words, even if we have to write our own definitions for some. It's just absurd that in this day and age people think they will profit (financially, that is) from an online dictionary. bd2412 T 21:28, 28 March 2009 (UTC)
It would be even more absurd to think they could profit from a print dictionary without a good accompanying website, no? Michael Z. 2009-03-29 19:53 z
I think their response is reasonable. Yes, we're an online dictionary, but our content is publically licensed under the GFDL. For us to host the DARE content, they'd need to release it under the GFDL, which would let anyone publish a print DARE. —RuakhTALK 13:36, 29 March 2009 (UTC)
Good point.—Strabismus 20:27, 29 March 2009 (UTC)
This brings to mind a feature we could introduce that the “competition” is unlikely to: links to other definitions! At the bottom of each entry, we could systematically add links to the definition at Dictionary.com, M–W, EtymOnline, Google Definition search, maybe Urban Dictionary, etc. Michael Z. 2009-03-30 00:00 z
I really like this idea! Other ideas: Princeton University's wordnet (a semantic tree), WordReference.com. For French, the Trésor de la Langue Française is online, but I don't think you can link to its entries. There are multiple online copies of out-of-copyright versions of the Académie dictionary, though. THe OED can be linked to, but only accessed via paid access. Circeus 22:22, 18 April 2009 (UTC)
Re: "For French, the Trésor de la Langue Française is online, but I don't think you can link to its entries.": Right, so {{R:TLFi}} uses the copy at http://www.cnrtl.fr/definition/, which does let us link to individual entries. :-)   (It doesn't work so well if there are multiple entries for identically spelled words — you can only link to one, and you need JavaScript even for its internal links to the other ones — but by and large, very useful.) —RuakhTALK 00:35, 19 April 2009 (UTC)