Wiktionary:Beer parlour/2007/April

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit

Lewis & Short

As some of you may have gathered from the Grease Pit, we're getting the Lewis & Short Latin dictionary. Inasmuch as this should help to quickly fill out our Latin section, it does pose some interesting problems. First and foremost, if anyone knows of ANY reason why this could possibly be copyrighted, this should be noted. Dumping the entire L&S onto Wiktionary and then finding out that it's copyrighted or that certain parts of it are copyrighted would certainly be a rather large mess. Wikipedia says it copyright free, and by every conceivable copyright law that I can find, it should be in the public domain, but I could have missed something. Secondly, the L&S has a LOT of material, much more than any Wiktionary entry I've ever seen. As an example, take a look at wikisource:A Latin Dictionary (they have the first two entries). This would probably be a good time for the WT:AL page to be expanded and clarified. Personally, I rather doubt that the L&S is being extranneously verbose, and most of the information is probably quite worthwhile, but this is something that should be discussed. A third issue is what kind of disclaimer the bot should put on the articles. I was thinking that it should put a note similar to the 1913 Webster disclaimer, and include it in a category such as Category:L&SLatin entries requiring format. No matter how much the bot is tweaked, the entries will undoubtedly require some gruntwork. However, I don't think that the part about things being out of date is as necessary. The English language has certainly changed in the past 100 years, but the Latin manuscripts, in large part, haven't. It is my understanding that many university Latin courses still treat the L&S as one of the best dictionaries out there, regardless of its age. What does everyone else think? Atelaes 19:38, 28 February 2007 (UTC)

Our understanding of Latin may not have changed much, but appropriate English translations might have; for example, a word formerly translated as "gay" might now be better translated "festive". (I don't know if such problems are likely to be sufficiently widespread as to warrant a message on every entry taken from L&S, but it's something to think about.) —RuakhTALK 06:13, 1 March 2007 (UTC)
Actually, our understanding of Latin has changed. In particular, scholarly opinion about the pronunciation (and hence the placement of macrons) has changed for more than a few words since Lewis and Short. There is also additional scholarship in the case of some words, which has altered our understanding of the use of certain words (sorry, but I'm blanking on examples right now). The translations are less likely to have undergone significant change than you might think, but that's partly becasue Lewis and Short seldom give a single word as the entire translation. Usually, there are several equivalent translations given, and sometimes an explanation of the meaning as well, particularly when there isn't a good single-word translation.
I've got extensive notes for the content of Wiktionary:About Latin and will try to tackle a major expansion of WT:ALA this weekend, but this has been a very busy week for me at work. --EncycloPetey 06:28, 1 March 2007 (UTC)
If the edition we are contemplating is the 1879 edition (and it appears that there is no other) then the work is in the public domain under just about any copyright regime, and definitely under those of the U.S., UK, and Australia. bd2412 T 19:18, 3 March 2007 (UTC) (your friendly neighborhood copyright attorney)


Currently the IPA template links to the Wiktionary "IPA chart for English". What is good about that is that this table covers UK, US and Australian English. However, we need to link to a chart that explains the symbols in other languages too. I thought we had that once? What happened to it?

I think that IPA and SAMPA should each link to the same page on Wiktionary that explains all the symbols of both schemes alongside each other, given that Wiktionary is a multilingual dictionary. I like the style of the Wikipedia page, which explains the sounds used in a language rather than the sound of each IPA or SAMPA symbol. We would probably want separate tables for other languages. — Paul G 12:50, 3 March 2007 (UTC)

The IPA chart for all languages is at Wiktionary:IPA pronunciation key. Both have a purpose. Personally, I use the "English..." version most of the time, since it's usually easier to understand, but check on the big one if I'm dealing with a non-English pronunciation or am unsure what is meant on the "English..." one. --Enginear 16:22, 3 March 2007 (UTC)


User:Eric_Utgerd: w:Borat, or just User:Dangherous? Kysztte 04:58, 4 March 2007 (UTC)


This IP user has been editing policy/near policy docs, making changes to entries that have almost all been rolled back, will not create an account and log in. Also has copied some templates from wikipedia that don't belong here. Seems awfully aggressive for a newbie. (See Wiktionary:Modular Wiktionary which was created in one edit. Don't know what's up. Blocked temporarily (feel free to change that at will), while I revert some things. (Another person from the 'pedia who thinks everything there should apply here?) Robert Ullmann 13:35, 4 March 2007 (UTC)

Oh, specifically, modifying Wiktionary:Translations without a vote (or discussion) is sufficient cause for a block (at least technically). Robert Ullmann 13:37, 4 March 2007 (UTC)
User name is Mac, see User talk:Mac. Hasn't been around in a while ;-) Robert Ullmann 14:00, 4 March 2007 (UTC)
I'm going to downgrade the policy status in the template on Wiktionary:Translations, it should still be in some sort of draft status as it says. Robert Ullmann 14:06, 4 March 2007 (UTC)

Wiktionary:Requests for verification

Why do so many widely used terms get listed? Currently on the page we have "phwoar", "châteaux", "boo-ya", "confusticate", "fo’ shizzle" (sans apostrophe), "Mr. Big" (sans full stop), the phrase “Houston, we have a problem”, "F-", "decider" and "retcon."

Some of these have passed, others are still up there. What I'm wondering is: why were they listed at all, given how widely they're used? Regardless of what's in print, surely it's pedantic to words that everyone hears on a daily basis? RobbieG 21:41, 5 March 2007 (UTC)

Of those, I've never heard phwoar, wouldn't spell châteaux that way in an English context (preferring to omit the circumflex, and possibly to use an -s instead of the -x, since it's pronounced differently anyway), wouldn't spell boo-ya that way (instead writing boo-yah), have never heard confusticate, would be torn on whether to include the apostrophe in fo shizzle, have never heard Mr Big (in either spelling), am not sure if I've ever heard F-, and am not sure I've heard retcon. The only ones I couldn't imagine RFV-ing are Houston, we have a problem and decider — and it doesn't shock me that someone with different experiences than mine could RFV them. —RuakhTALK 02:54, 6 March 2007 (UTC)
Also, remember that sometimes we'll RFV a term not because we don't believe it exists, but because we want to request some citations for it. That's what the verification process is all about. Widsith 09:36, 6 March 2007 (UTC)
Be thankful that so many of these have shown up on RFV before someone else deletes them. There have been a number of entries that I have completely cited that were deleted the same day they were listed. DAVilla 19:55, 7 March 2007 (UTC)


Connel MacKenzie blocked this user on February 13 along with the network The network is in Los Angeles. I do not live in California. I left a message on MacKenzie's talk page, but he seems to be ignoring it. Both C.M. and TheDaveRoss are checkusers, so perhaps TheDaveRoss can verfiy that WilliamKF is not me. C.M. also reverted this user's edits, which seems to be inappropriate given the circumstances.--Primetime 20:46, 6 March 2007 (UTC)

I didn't see any reverted edits, or any suspicious edits for that matter. Are they just not showing up?
Perhaps the user was banned because he claimed that certain editions of the OED are out of copyright. Pretty harsh. I mean, it's got to be true, you know. DAVilla 19:53, 7 March 2007 (UTC)

This is User:WilliamKF. I just found out that my account has been locked out for being a sock puppet of User:Primetime. This is a false accusation. I have a few requests:

  1. How can I get my account unblocked?
  2. How can I get my IP address unblocked (
  3. How can I get the sock puppet accusation cleared up? Where is the evidence to support this claim? On Wikipedia, there is a formal process which I could refer to, I'm not sure how that is done over here.
  4. How was I expected to contact anyone about these issues with a blocked account and blocked ip address? I had to go to a computer on another network to post this message, rather inconvenient and probably beyond what many would be willing or able to do.


FKmailliW 03:58, 8 March 2007 (UTC)

Connel unblocked me! Thanks Connel! WilliamKF 06:32, 8 March 2007 (UTC)

Why we should abandon AHD

I know we've just had a vote on AHD, but, as someone who enters a lot of pronunciations, I feel we should be dropping it altogether.

I understand that many people prefer a "respelled" pronunciation, as it is more readable than IPA. However, there are a number of reasons why I think that AHD has no place in Wiktionary.

  1. It is only really any use for American pronunciations, since that is what it was designed for.
  2. It therefore does not represent any other variety of English. For example, ä represents the sound of the vowel in the American pronunciation of the word "ah", which is equivalent to the IPA /ɑ/, but the British pronunciation of "ah" is /ɑː/. AHD can't therefore be used accurately for any other variety of English as it simply does not represent any other variety of English.
  3. We should therefore be asking why American English deserves its own pronunciation scheme. Having a scheme for American English only is POV and unhelpful to speakers of other varieties of English.
  4. AHD is phonemic rather than phonetic (I hope I've got those round the right way). It represents sounds as used in American pronunciations, not sounds as articulated with the vocal tract (which IPA does, which makes it language-independent).
  5. It contains symbols for varieties of US English that we do not give in Wiktionary; for example, ŏ, apparently equivalent to /ɒ/, the vowel heard in "pot" in British English, is not used, because we don't give pronunciations for American accents (New England accents?) that use that sound. General American uses ä (as in "pät") rather than ŏ.
  6. Symbols can be pronounced differently when used in combination with others; for example, "s" is IPA /s/, but becomes /ʃ/ ("sh") when followed by an "h". Although a pronunciation scheme is available, this is a potential pitfall for those whose first language is not English.
  7. Similarly, diphthongs are given as single symbols (eg, ā for /eɪ/), meaning that the individual components of these sounds are not available. These are needed in some varieties of English (see my illustration below).
  8. It lacks symbols for the rarer sounds of English, especially in words adopted from other languages that are only partly naturalised. For example, the IPA /ɬ/ (Welsh "ll" - used in words adopted from Welsh, such as penillion) is absent. Similarly absent are /ç/ (the sound at the beginning of "huge", although this is usually transcribed as /h/), /ɱ/ (the sound of the "m" in "emphasis", although this is usually transcribed as /m/), /ɾ/ (the sound of "d" and "t" in accents with flapping that makes "medal" and "metal" homophones) and /ʔ/ (the glottal stop, heard in uh-oh). Although these are rarely used, we need to have them available. Most of these also have SAMPA equivalents.

To illustrate my points, see bread. The UK and US pronunciations are broadly the same. However, one Australian pronuncation of the word is /breːd/. That cannot be transcribed in AHD, as /e/ is not available (ĕ is equivalent to IPA /ɛ/, not /e/, and ā contains the sound but is a diphthong, not a monophthong). There is no equivalent to the lengthening symbol /ː/ either.

While some complain that IPA is hard to understand and difficult to learn, this is somewhat of a red herring. Clicking on the "IPA" link next to any IPA pronunciation takes the user to a pronunciation chart. The user need only look up what they need at any one time and does not have to learn the system.

If we are to continue to use a "respelled" pronunciation scheme, we need either to extend AHD, or to invent some alternative that covers the sounds of all varieties of English, as IPA already does. Any invented alternative would however still be phonemic rather than phonetic (or vice versa if I've got those round the wrong way). In the absence of a better solution, it would therefore be better to abandon AHD altogether. — Paul G 11:13, 3 March 2007 (UTC)

Hear, hear. Widsith 11:23, 3 March 2007 (UTC)
I never write AHD myself, but I would rather see an AHD pronunciation than none at all. If there are many editors who find AHD simpler to write than IPA, we might lose out overall by banning it.
And obviously, no AHD pronunciation should be removed unless the AmE equivalent in IPA is already present. --Enginear 16:35, 3 March 2007 (UTC)
Now that we've voted to call the system enPR, we need to decide exactly what symbols it will use and how. As things currently stand, there is no key to AHD pronunciation anywhere. So, before we consider abandoning the system, we ought to have a go at creating a pronunciation guide to see just how practical (or impractical) it is to set up an international version.
I think it would be very useful to have a broadly phonemic pronunciation system -- one that doesn't try to hit all the details of pronunciation, but provide a quick impression of the pronunciation by comparison with common sounds in common words. Consider that the word rob is the US is pronounced /rɑːb/, but in the UK the same word is /rɒb/. This difference is a consistent one, so that the phoneme pronounced in the US as /ɑː/ is usually pronounced /ɒ/ in the UK. A system like enPR could consistently use /ô/ for both, because the difference is a consistent one. The essential way the system works is by comparison with known words. We would simply need to ensure that we have a table of pronunciation values that is keyed to both flavors of English pronunciation. --EncycloPetey 22:52, 5 March 2007 (UTC)
Just a fleeting thought, could a bot be written to scan for AHD pronunciations, and then convert them to IPA in the entry (assuming that the AHD portion is in a standard format)?

A-cai 14:28, 4 March 2007 (UTC)

No. Unlike SAMPA, AHD symbols do not pair one-to-one with IPA. Also, there has been no consistency in the past as to how stress was marked. --EncycloPetey 22:52, 5 March 2007 (UTC)
I'm no fan of AHD enPR and wouldn't mind that much if it were abandoned. However, the arguments you present, Paul G, I'm afraid I find unconvincing.
  1. If it is "only really any use for American pronunciations," it's a simple matter of extending it. We should feel free to adapt it to our purposes since it's our system not the American Heritage Dictionary's: AHD was a misnomer from the start.
  2. Perhaps it "does not represent any other variety of English" but this wouldn't mean it cannot (after suitable modification). For example, ä can be (re)defined to represents the sound of the vowel in ah regardless of dialect.
  3. American English doesn't deserve its own pronunciation scheme, no, but enPR doesn't have to be American-only.
  4. enPR is phonemic rather than phonetic (yes, you've got those round the right way) and this is not a bad thing. In fact IPA, as used in this dictionary, is also phonemic. The advantage of IPA (& similarly X-SAMPA) is that it can be used for phonetic transcriptions, however, it can also be used for phonemic ones too. When giving a pronunciation of a word it is generally better to give a phonemic one, this way you don't have to concern yourself about which allophone is being used nor do you have to worry about different realisations of a given phoneme dependant on accent.
  5. "It contains symbols for varieties of US English that we do not give in Wiktionary;" I think we should strive to include all varieties of English ... but perhaps this is just my opinion. Your example, ŏ, could be used for the vowel in pot in whatever dialect you speak. For an speaker of General American ä and ŏ could represent the same vowel, for an RP speaker these would be different.
  6. "Symbols can be pronounced differently when used in combination with others;" this is a disadvantage somewhat but could, if necessary, be fixed.
  7. "Similarly, diphthongs are given as single symbols (eg, ā for /eɪ/)," I don't see this as a problem since these are single phonemes. Indeed some of these phonemes can also be realised as monophthongs also.
  8. "It lacks symbols for the rarer sounds of English," it can, where necessary, be extended. However, what we don't need are different symbols for allophones. For example, the m in emphasis may be [ɱ] but this is an allophone of /m/.
Also the Aussie vowel in bread is the same as in care. This can be transcribed in AHD (i.e. the American Heritage Dictionary's pronunciation scheme): brâd. Jimp 03:57, 9 March 2007 (UTC)

direcciones de mexicali...?

quien me puede dar algunas direcciones de mexicali.

Quizás, le puedo ayudar si me pueda aclarar. ¿Qué quiere? ¿La dirección de cierto negocio o teatro ubicado en Mexicali? —Stephen 18:58, 8 March 2007 (UTC)

Captions of Pictures

Recently I came across the entry for Российская Федерация, and there, the Russian flag is pictured, but captioned in Russian.  I wondered if that was standard to caption pictures with foreign languages, even though this is the English version of Wikipedia.  My initial inclination is that even on the entries for non-English words, the pictures should be captioned in English, but I'm open to other ideas, and what I'm really hoping for, as with everything, is a standard so I know what to edit in the future. — V-ball 18:06, 7 March 2007 (UTC)

I’ve done a fair number of these, and the intended audience is people who know a little bit abour Russian, or are studying Russian, or at least are interested in the Russian language. The word Российская Федерация is clearly translated for anyone to see, and if someone is then also interested in Russian Federation, they can go there to see what it says. On the Russian pages I put translations, examples of usage, grammatical notes, etc., as I think are needed and useful to beginning students of the language. Advanced students don’t need any of it, and those who have no interest in the language can go on to the linked translation at Russian Federation. The captions explain the picture in clear but simple language, and each element of the caption has a separate entry for anyone interested. I think it might be a good idea to include a Russian flag on the Russian Federation page, but I don’t usually make such additions to English pages. —Stephen 19:08, 8 March 2007 (UTC)

About Greek

I hope that people will visit Wiktionary:About Greek and Wiktionary:About Greek/Transliteration and make comments about them on their respective talk pages. Thorough knowledge of Greek is not necessary in order to criticise the Inflection lines and most of the other suggestions made there! Thanks, Saltmarsh 15:51, 9 March 2007 (UTC)

en-noun & English plurals

I've noticed that we are just using {{en-noun}} these days and will be depreciating {{en-noun-irreg}} etc. This means that all nouns with irregular plurals will fall into category:English nouns - this I can live with.

However, recently I've noticed a general trend to place all English plurals in category:English plurals and the irregular plurals and ones that end in "-es", "-ies", "-en" etc. are all going to end up in one category making them harder to find.

Can we get a steer or a least a vote on what we are doing with the English plurals categories?--Williamsayers79 22:12, 8 March 2007 (UTC)

My first suggestion would be if were going to have a catch all category of category:English plurals then we could list all the irregulars in appendices.
If we don't have a catch all category and decide to seperate the various regular and irregular English plurals into sub categories then we need some way of embelishing the {{plural of}} or {{irregular plural of}} templates to auto-categorise irregular plurals accordingly.--Williamsayers79 22:12, 8 March 2007 (UTC)
I suggest we put all English plurals in Category:English plurals; the irregular ones can then be placed also into a category for irregular plurals (by hand, I suppose, or by a robot identifying the appropriate instances). -- Beobach972 22:47, 8 March 2007 (UTC)
I agree, and I've been going through the plurals linked from that template and doing exactly that, adding both the |lang=English identifier which forces them into the plurals category, and where necessary adding Category:English plurals ending in "-es", Category:English plurals ending in "-ies", Category:English plurals ending in "-a", Category:English irregular plurals ending in "-ae", Category:English irregular plurals ending in "-i", and Category:English irregular plurals ending in "-en". I've finished plurals starting with a and I'm moving on to the b's tomorrow. bd2412 T 06:12, 9 March 2007 (UTC)
Re: The switch from {{en-noun-irreg}} to {{en-noun}} causing all nouns to be in one category: it doesn't have to be that way. {{en-noun}} is already fairly intelligent; there's no reason it can't choose its category intelligently as well. —RuakhTALK 05:19, 9 March 2007 (UTC)

A bigger problem I have is that the {{plural of}} template assumes that the entry is a plural noun. While this is fine for English, other languages have plural adjectives. --EncycloPetey 03:49, 12 March 2007 (UTC)

Faroese vs. Faeroese

Both these spellings seem current here on Wiktionary, and the categories Category:Faroese language and Category:Faeroese language both exist, with some subcategories. A standardization is needed, so that a single spelling be used in all section headers and categories for the language here. Someone seems to have had deletion of Category:Faroese language in mind and a move to Category:Faeroese language, but the spelling with ae seems less common, not only on Wiktionary and on Wikipedia (see w:Faroese language, w:Faroe Islands), but also on the rest of the Internet: Faroe Islands vs. Faeroe Islands, Faroese vs Faeroese. To me it seems that the best choice is "Faroese", although the other spelling may be more original. – Krun 23:10, 10 March 2007 (UTC)

Faroese is the only spelling we should be using, as per ISO639-3. --Connel MacKenzie 10:56, 11 March 2007 (UTC)
Thanks, I'll move everything over, and perhaps you'll help with deleting the relevant categories, since I'm not an admin. – Krun 12:04, 11 March 2007 (UTC)
Remember template {{fo}} ;-) Robert Ullmann 18:55, 11 March 2007 (UTC)
I've updated the template. What does that ever get used for, anyway? I see from WhatLinksHere that it's included in a couple translation sections... I thought we were opposed to using templates in translation sections (or was that just in headers?), weren't we? -- Beobach972 19:51, 11 March 2007 (UTC)
The template is used in other Wiktionaries, and exists here primarily so that it can be subst'ed. Becuase such templates are regularly subst'ed, you won't see them used. However, if the template didn't exist, we couldn't subst it. --EncycloPetey 03:44, 12 March 2007 (UTC)

Oxford English Dictionary Fasciles

I looked through the archives here and found no definitive statement on official policy for using the Oxford English Dictionary Fasciles which are out of copyright (i.e. first fascile was published in 1888). I have heard it stated that there are copyvio concerns, plus outright errors, and on that basis the OED is not to be used, but instead one should use the Websters 1913 edition which is out of copyright and corrects many errors in the OED. Can someone please point me to the policy? Is it the case tho Websters 1913 contains all entries from OED? Thanks. WilliamKF 06:37, 8 March 2007 (UTC)

I say we should make use of all resources. To state that there are errors in an old OED which are not in a less old Websters does not imply that the Websters had no errors. If we are willing to make do with a Websters which doubtless contains errors we should also be willing to make use of an OED which may contain errors. We should do our best to spot errors in all our sources rather than embark on an impossible quest for an error-free out-of-copyright resource. — Hippietrail 14:54, 13 March 2007 (UTC)

Common mispelling vs alternative spelling

How do we determine if something is a common misspelling or a valid alternative spelling? RJFJR 16:36, 8 March 2007 (UTC)

  • If a spelling appears in a dictionary it's valid. If it appears in publications it is likely to be correct, especially if used multiple times in one text. — Hippietrail 14:38, 13 March 2007 (UTC)

Is it correct to assume that the English Wiktionary has no intention of implementing the new logo? -- Zanimum 18:11, 12 March 2007 (UTC)

Almost nobody likes it, it was "decided" by a process that didn't include any visible fraction of the people affected, and it is a total trademark violation that will last a New York minute when the lawyers from the trademark holder see it and call WMF General Counsel. Forget it. Robert Ullmann 19:27, 12 March 2007 (UTC)

Placeholders in article names

Did we resolve the issue of placeholders in article names after all? Here are a couple of examples that illustrate what I mean:

  • take someone to task
    Here, the pronoun "someone" is a placeholder for a noun or another pronoun. This is fairly simple to handle: just create take someone to task. But "someone" and "somebody" are interchangeable, so there must also take somebody to task. Is this solved by duplication or cross-referencing? That has already been discussed elsewhere and is not quite the issue I am asking about. (At the time of writing, we have the latter but not the former. This is not good.)
  • to n decimal places
    Here, n is a mathematical placeholder for "any integer". The letter n is commonly used in this way in mathematics, but is it appropriate in English? Should the article be named "to n decimal places", "to X decimal places", to ... decimal places", or something else? Of course, when n is 1, we have "to 1 decimal place" (singular). (There is a need for this article, incidentally, as it has non-idiomatic translations.)
  • do a ...
    We have discussed this one before. While some are set phrases (for example, "do a Reggie Perrin" is a well-known idiom in UK English, or at least, was a few decades ago) and so can have individual entries, just about any noun can be substituted for the ellipsis in "do a ..." as desired, with the meaning understood. "Do a ..." is therefore a useful construction and requires an entry of its own, but how do we name its article?

Perhaps the answer is to write these using the placeholders that you would expect to see, namely "someone", "n" and "...", respectively. — Paul G 12:03, 20 February 2007 (UTC)

I believe the standard method (OK, the method that I use) is to choose one version and add definitions, translations and all the rest. Then to add simple redirects for as many of the other forms as are used. It can get messy. SemperBlotto 12:30, 23 February 2007 (UTC)
I agree. One complete form, many redirects. bd2412 T 11:23, 15 March 2007 (UTC)

Scots Gaelic and Scottish Gaelic

We have entries in both; there is discussion in the Tea Room and in the Grease Pit request for bot fix, ready to go. Change is to standardize on "Scottish Gaelic" (ISO639/SIL is either or, literature prefers Scottish Gaelic). Comments? Robert Ullmann 15:31, 11 March 2007 (UTC)

Incidently, let it be noted that having it as Scottish Gaelic will help avoid confusion with Scots. :) -- Beobach972 19:46, 11 March 2007 (UTC)
Agreed. When I see "Scots Gaelic", I always think someone has missed the fact that Scots and Gaelic are different languages. —RuakhTALK 23:20, 11 March 2007 (UTC)
Agreed : Scottish Gaelic is the name preferred on Wikipedia and by SIL and Ethnologue. The name Scots Gaelic (while valid) is easily confused with Scots, which is an entirely different language. --EncycloPetey 03:41, 12 March 2007 (UTC)
Agree we should use Scottish Gaelic to avoid confusion with Scots.--Williamsayers79 13:35, 13 March 2007 (UTC)
Having been confused on precisely that issue, here, in the past, despite the very Scottish name "MacKenzie", I do not speak any flavor of Scottish I fully support this "confusion reduction" effort. Do you need someone to do the bot run? Or is it waiting for a vote? --Connel MacKenzie 06:59, 14 March 2007 (UTC)
I have the bot all set up; tested on a couple of dozen entries. Just was leaving it for a few days to see what comments there might be. Robert Ullmann 11:51, 14 March 2007 (UTC)
Just to check, the bot, will it change all the Scots Gaelic instances in headers, translations and categories?--Williamsayers79 13:17, 14 March 2007 (UTC)
And section references, don't forget those! Yes. Robert Ullmann 13:28, 14 March 2007 (UTC)

Done. 818 headers, 525 translations, 12 ttbc, 23 wikilinks, 231 cats, 70 section references. There are some new entries since the last XML dump that will get caught when I run the recheck on the next one. Also a few exceptions (badly formatted translations). Robert Ullmann 09:33, 16 March 2007 (UTC)

Determiner vs Determinative

We have had prior discussion about allowing the use of Determiner as a POS header. Opinion was divided, but not strongly opposed since it would be a closed set of terms. The guide for understanding English determiners recommended by proponents is the Cambridge Grammar of the English Language (hereafter CGEL), which I ordered and have begun reading (in small portions). I have discovered that, by their terminology, Determiner is not a part of speech.

What the CGEL does is to recognize several grammatical levels between "part of speech" and "clause", and discussions are careful to identify which level the discussion treats. For instance, there are separate levels of discussion for "noun" (which is a part of speech); "nominal" (which is a higher category also including a noun and associated modifiers or a pronoun and associated modifiers); and "noun phrase" (which is the meta-category for a structure including a noun or pronoun, together with modifiers and determiners). Note: By their definition of "noun phrase", it is not a part of speech and so should not be used as a POS header.

As a result, I was struck by their discussion of determiners. The CGEL defines determiner functionally, and explicitly so. That is, a "determiner" is any part of speech fuinctioning in a certain capacity. For example, possessive nouns (e.g. Mary's) and pronouns (e.g. my) may function as determiners. Therefore, Determiner should not be used as a POS header on Wiktionary. However, there is a part of speech recognized in the CGEL called a determinative, and this is the part of speech previous discussions have centered upon. We have been using the wrong term as a POS header.

What this means for the mechanics of editing is that all instances of ===Determiner=== as a level 3 header (or lower in a few cases) ought to be changed to ===Determinative===. Words currently classified in catgeories such as Category:English determiners should be recategorized in Category:English determinatives. The Category:English determiners (and similar categories in other languages) should continue to exist, though. It is an important meta-category that includes Articles, Numbers/Numerals, some possessive Pronouns, indefinite Pronouns, demonstratives, as well as the Determinatives themselves.

The alternative is to continue to implement a terminology that flies in the face of the CGEL. --EncycloPetey 19:19, 25 February 2007 (UTC)

We have not been using the wrong term as a POS header. The CGEL terminology is somewhat idiosyncratic. They make an excellent case for it being a more appropriate way to describe (specifically English) grammar, but that doesn't make it even the most common way, let alone the only acceptable way. Personally, I don't think either "determiner" or "determinative" is familiar enough to a lay audience to matter which one we use, so I lean to "determiner" as simpler and more English-sounding. If the community makes a conscious decision to move wholesale toward the descriptive framework of CGEL, that would be a Good Thing, but there's no reason to do it higgledy-piggledy. -- Keffy 20:35, 25 February 2007 (UTC)
Yes, I think Keffy has it right. I did make this distinction back in January.--BrettR 13:34, 18 March 2007 (UTC)
That's good to know. The class of determiner is new enough that I haven't encountered it much before, and have little information outside of the CGEL to rely on for understanding the preferred terminology. If the linguist community norm is for determiner, then using that term should be fine. --EncycloPetey 22:48, 25 February 2007 (UTC)
Well, we should be striving for the correct term. While my objections to the heading were not well expressed, I still have reservations about it. It is something of a relief to hear that CGEL was mis-represented in those previous discussions. IIRC, it was called a POS in those discussions, with the misleading implication that CGEL said it was.
All in all, a determiner is simply a type of noun, or a type of pronoun. I can see the inflection line(s) being specialized for en-noun-determiners (etc.) I can see those template including a Category:English determiners, but only in addition to the correct noun or pronoun categories. I am much less convinced this merits a separate third level heading, now. --Connel MacKenzie 02:38, 26 February 2007 (UTC)
No! A determiner is not a type of noun! Some pronouns do function as determiners, but the CGEL recognizes them as "pronouns functioning as determiners". However, there are other words that function as determiners. These other words include Articles, Numerals, and Demonstartives. In the noun phrase "the big red bus", the is a determiner. In the NP "one big red bus", one is a determiner. In the NP "each well-groomed little boy", each is a determiner. In the NP "that ferociously charging lion", that is a determiner. Traditionally, these words have been lumped into the adjectives when they functioned this way, but they're serving a different function from adjectives. Adjectives present attributes of the noun (or pronoun) they associate with (big, well-groomed, charging). By contrast, the Determiners "point to" a particular noun (or pronoun) instead of providing attributes. Ruakh has more information given below. As I've read the examples and discussion in the CGEL I've become convinced that this is a new POS worth recognizing on Wiktionary. My comments were intended to question the terminology. Note that while the CGEL does not call the category "Determiners", they do recognize the category; they simply call them "Determinatives". Keffy has noted (above) that this is not the usual terminology, which is why they painstakingly differentiated their perculiar choice of name. --EncycloPetey 00:25, 27 February 2007 (UTC)
My Collins English Dictionary (2005) uses "determiner" as a POS name. Thus defining "the determiner (article)". It also includes a, some, any, this, that + poseesives (my, your) and numerals —Saltmarsh 06:35, 28 February 2007 (UTC)

What the CGEL calls determinatives and other modern linguists call determiners are traditionally classified doubly as adjectives and as either nouns or pronouns, with a few exceptions (notably a, an, the, and every, which always require a noun and therefore have traditionally been considered adjectives exclusively, or in the former three cases articles by sources that didn't consider articles to be adjectives). If you look up all, both, each, some, two, many, and so on in most dictionaries, it will give a part-of-speech heading along the lines of "adjective and pronoun" or "adjective and noun". Since Wiktionary seems to object on principle to such headings, feeling that each part of speech warrants its own definition, it's in our best interest to use a heading like "determiner" for determiners, rather than defining each determiner twice, once as noun/pronoun and once as adjective. —RuakhTALK 06:59, 26 February 2007 (UTC)

Back in January, I posted the following:

Differentiating determinatives (determiners) from English adjectives

   * Both adjectives and determinatives modify nouns in phrase structure.
   * Both adjectives and determinatives can participate in fused-head constructions.
   * A determiner is an obligatory part of many noun phrases; Adjectives are always optional
   * Determinatives alone can modify singular countable nouns; Adjectives alone can't.
   * Most adjectives can be used predicatively; Determinatives typically cannot.
   * Adjectives are usually gradeable, determinatives non-gradeable.
   * Determinatives identify nouns and mark them as definite or indefinite while adjectives describe properties attributed to them.
   * Core determinatives cannot co-occur with the', a, and an; adjectives can.
   * Many determinatives are licensed only for specific singular/plural countable/uncountable nouns, while adjectives are generally licensed independent of these considerations.
   * Determinatives can often function in the slot (det) of them; adjectives can't
   * Determinatives can often function in the slot so (det) (noun); adjectives can't
   * Determinatives can be modified by only a very limited set of adverbs; adjectives are less limited in this way.

Most of this information is taken from The Cambridge Grammar of the English Language (CGEL).--BrettR 13:34, 18 March 2007 (UTC)

Numerals and their categories

What is the correct category naming style for numerals -- full language name or language code only? Looking at Category:Numbers and the siblings of Category:ja:Cardinal numbers it seems to be mixed.

Also, could someone give me suggestions on how to handle an entry such as 二百五 205? Should I make a POS template that links to 二百四 204 and 二百六 206?

Cynewulf 15:37, 5 March 2007 (UTC)

If "number" is the POS heading, then I'd imagine it should use the full language name (like Category:English nouns, Category:Hebrew prepositions, etc.); but if "number" is just a description of the topic, and the POS heading is, say, "determiner", then it should use the language code (like Category:ja:Horses, Category:fr:Days of the week, etc.). According to WT:POS, it's a matter of some debate when and whether "number" should be a POS heading, so … —RuakhTALK 18:37, 5 March 2007 (UTC)
The category names for the numbers are very mixed. I initiated a move to standardize the numeral POS headers some time ago, but the move had strongly entrenched opinions that were incompatible with each other. Part of the problem lay in debate over whether the part of speech should properly be termed "Number" or "Numeral". There was a general feeling at the time that the header should simply be Number (or Numeral) instead of Cardinal Number or Ordinal Numeral and the like, but we couldn't agree on which shortened form should be used.
The additional problem is that there are names for cardinal numbers that do not function grammatically as numerals/numbers. For example, aleph-null is a cardinal number mathematically, but the word aleph-null functions only as a noun, never as a numeral. My take has therefore been to treat the cardinal numbers and ordinal numbers as topical categories, but include them within a grammatical super-category. Thus, the Japanese cardinal numbers would be in Category:ja:Cardinal numbers and the Japanese ordinal numbers would be in Category:ja:Ordinal numbers, but both of these would be subcategories of Category:Japanese numerals (or Category:Japanese numbers according to some). This way, the numerals/numbers are listed within a grammatical category, but the topical category within Category:ja:Mathematics could exist as well. See Category:Afar numerals and the contained subcategories and entries to see how I would set it up. There is a template {{cardinal}} and one for {{ordinal}} that does automatic categorization.
The idea of a template to link backwards and forwards among the cardinals (and ordinals) is a good one. I've considered the same idea myself, but I haven't figured out exactly how to include all the useful information you'd need to have in a reasonable format. It would mean inserting the template into thousands of exsting pages, so it ought to be designed for easy editing and insertion. --EncycloPetey 23:04, 5 March 2007 (UTC)
(not commenting on the cats for the moment ;-) A template would be very good; this is presently done in lots of ad hoc ways, which often break ordinary parsing of the page, since the links are put in odd places. (The only standard place would be under "See also", with appropriate gloss—next, previous, whatever—but people seem to do anything else but!) A fairly simple template plus various magic options for fancy things people will want to do would be good, it can be a float-right box? There are several examples, one is used for the Greek letters, but whoever did it subst'd it (arrgh!) so I don't know what it was and the entries are a mess. Robert Ullmann 12:09, 8 March 2007 (UTC)
There's one I created for the signs of the {{Zodiac}} that might serve as a model. However, I'd want to see both the word and numerical form of each number name in such a template. --EncycloPetey 03:53, 12 March 2007 (UTC)
THE CGEL model has cardinal numerals as either nouns or determinatives. Ordinals are adjectives.--BrettR 13:43, 18 March 2007 (UTC)

Wikionary Scalability usefulness et alia

Great work all of you, still a long way to go. I have a comment on scalability and usefulness and the method for adding new entries.

1. For this to be truly useful, all words in other wikis need to be automatically tied to the wiki dictionary. To do that manually will be very tedious so it would be great that admins--whoever you are--created a 'search and link' macro every time a word is added and viceversa, i.e. each word definition is linked to all articles that use it in order of importance ( articles, headers, text). A new kind of link ( right click or on hover) would need to be created for this to not confuse articles.

2. As polyglot I dont care if the article is french,spanish,english,german, russian or arabic or... The Dictionary (and all wikis in general) should unify these fields. Currently there is a lot of duplication. If you want a truly scalable and mantainable wiki dictionary translations need to work differently that they are working now. Instead of replicating the conjugation of a spanish verb in the english wiki, e.g. http://en.wiktionary.org/wiki/matar for http://en.wiktionary.org/wiki/kill, the translation needs to link to the spanish wiki http://es.wiktionary.org/wiki/matar which is bound to be better checked and linked to the rest of the spanish language.

If this is properly done as a database then the link ought to be bidirectional. This will dramatically increase the connectivity and usefulness of this resource. Otherwise this effort will only come to fruition through strenuous manual work and until then will be shaky at best.

thank you jvdp

  1. User:RobotGMwikt does update interwiki links (after some amount of XML dump delay, or replication lag, or something.)
  2. OmegaWiki (formerly WiktionaryZ, formerly UltimateWiktionary, formerly ...) seems to be what you are looking for: more of a universal translation engine. License differences have forced that to no longer fall directly under the WMF umbrella.
--Connel MacKenzie 10:52, 11 March 2007 (UTC)
For you, as a polyglot (;-) it may be perfectly reasonable to use the es.wikt for Spanish entries. For someone who say speaks Swahili, with English as a second language, and uses the English wikt as a references for Spanish, these entries are critical. The task of the English wikt is to define everything in English. Sure it seems redundant to you, but to others it is utterly essential. And note that es:matar which you say is "bound to better checked and linked to the rest of the Spanish language" lacks an entry for Spanish itself, let alone the conjugation. Our entry is far more complete. Robert Ullmann 15:45, 11 March 2007 (UTC)
Regarding 2: the new convention is to include a link to the foreign language wikt in translation sections. This is being done for all entries. See e.g. frequency, under Dutch translations. The links will be inserted for all languages, on the long run. H. (talk) 20:40, 17 March 2007 (UTC)

Middle English

I have been wanting to bring this up for ages but have been dreading it a bit.

Is there any point in using the term "Middle English" as a language header? I dislike it and I think it's unhelpful and misleading. This is why:

  1. It's not very well defined. Whereas Old English ends (in the written record at least) very markedly at the Norman Conquest, ME by contrast blurs considerably with modern English. A lot of words which I've seen entered here as "Middle English" survived well in to the seventeenth or eighteenth centuries and I think they're better off being labelled as =English= with an {{obsolete}} tag.
  2. Most words in ME are identical to their modern English counterparts, which means we'd need a lot of duplication. Again, the difference with Old English is worth pointing out: that also has lots of familiar vocabulary, but OE words need their own entries to provide grammatical gender and other information which does not exist in Middle English.
  3. Calling it Middle English makes it seem like a totally different language to modern English, which is not necessarily desirable; better, to my mind, to just include ME senses of words as obsolete under the =English= heading, which is a better representation of the "continuum" of the language.

For an example, I was looking at siege recently. In the past this has been spelled in the following ways: sege, cege, seche, segh, seghe, seeg, seege, seage, saige, sige, siege, syege, seige, sedche, sedge, syedge, seidge, sidge, segge. Now which of these are we going to call Middle English? To be sure, some forms were only spoken during the ME period, but most either outlived it or in some cases were not connected with it at all. So it makes more sense to me to call them all obsolete spellings of the modern form of the word.

The solution I've played with on some pages is to have a =Spellings= header which would include all known spellings of a word, including ME forms, with the obsolete forms marked as obsolete. Some entries (e.g. colour/color) would also have non-obsolete (i.e. "Alternate") forms under the Spellings header.

Any thoughts on all this rambling? Widsith 15:40, 12 March 2007 (UTC)

Thumbs-up. :-) —RuakhTALK 16:24, 12 March 2007 (UTC)
This is one place where some form of a "Word History" sectio would be appropriate - for showing when various spellings existed. --EncycloPetey 05:39, 13 March 2007 (UTC)

I strongly believe that Middle English should be a separate heading because Middle English (ME) is with out a doubt a separate language. Very few English speakers can read it and even fewer can understand it if they hear it.[1] If we include ME forms under ==English== headings, then some affected users might actually start using them. I also object to the example. Many of those spellings are unique to ME. They can go under the ==Middle English== heading. Other spellings are unique to certain senses. I know sedge, for example, is used in ornithology. If we start treating ME as Mod.E, then we'd have to start including ME pronunciations under Mod.E. Pronunciation in ME is different from Mod.E. in every case. ME at times even used different letters from our own language.

I don't think people realize just how much the language changed between 1100 and 1500. Two branches of the IE family--Italic (French) and Germanic--were merged together. Spaniards can read Portuguese easier than English speakers can read Middle English. Middle English is about as easy for us as Italian is for them. From what I've read, you can say the same for Norwegian and Swedish, as well as Russian and Belorusian.--Νικα 07:58, 13 March 2007 (UTC)

Well, first of all ME is not "without a doubt" a different language - that is why it is called Middle English and not Mediaeval Germanic or something. Your objection to my example is also misplaced. sedge as an ornithological term is a completely separate word from sedge as a spelling of siege, which is what I am concerned with. The point about pronunciation is a good one, but again there are huge problems since there is really no "standard" ME pronunciation. Chaucer's London dialect sounded wildly different from the Northern dialect of Sir Gawain and the Green Knight for example. And it opens up the question of other obsolete pronunciations - should we be including Shakespearean pronunciation? What about Restoration pronunciation or Victorian pronunciation? As for different letters, again I don't see the problem with that - modern English regularly used æ until about fifty years ago and we have no problem including encyclopædia etc. Widsith 09:03, 13 March 2007 (UTC)
I hope you won't make the same argument for Old English. That language is closer to German than to Mod. E. (I prefer to call it Anglo-Saxon.) Also, sedge is a variant spelling of a specific sense of siege. As for the dialectal issue, my impression has been that ME varied more in spelling than Mod. E. As for the date issue, I prefer to use the introduction of printing (ca. 1475). That marked a milestone in the standardization of spelling. That also, incidentally, is one reason why you see so many different ME forms of the same word. I wouldn't mind giving pronunciations from Shakespeare's time under Modern English so long as they are marked as obsolete, though.--Νικα 09:43, 13 March 2007 (UTC)
No, sedge was used for all senses of siege throughout the sixteenth century. Old English was very different from modern English, that's exactly why I think it should be treated differently, as I explained above. The problem with picking 1475 as a cut-off date is that it's completely arbitrary - no one woke up on New Year's Day 1476 speaking modern English. It was a very gradual change, which is not reflected if we relegate older forms to a Middle English heading. ME spellings and vocab did not die out in 1475. Most continued for centuries and many are still with us today. Widsith 10:39, 13 March 2007 (UTC)

I think that it would be tidier to have seperate Middle English language sections because of the volume of content. It may also get a lit confusing with a multitude of ME spellings on a modern English page, and what about all those words that did not make it to modern English? --Williamsayers79 13:42, 13 March 2007 (UTC)

I must say though that I agree with Widsith that Middle English is readable - and is closer than you think to modern English. This becomes apparent when you read something in ME in your own dialect... By the way Old English is not closer to German than it is to modern English, this is a POV pushed by Latin/Romance based-English fans who seek to deny the English languages Germanic and Norse roots.--Williamsayers79 13:42, 13 March 2007 (UTC)

If the header were "Modern English", then it would indeed be wrong to include words and senses that died out before the Modern English period; but as the header is "English", I think it's quite reasonable to include words and senses from both the Middle English period and the Modern English period, assuming the former are appropriately labeled so no one assumes they're current. —RuakhTALK 07:00, 14 March 2007 (UTC)
It is my impression that the distinction between Old, Middle, and Modern English is sort of the standard convention in linguistic academia. I have to imagine that there is some reasoning behind this (although I'll be the first to admit that I know nothing about Middle English myself). I think Connel has a good point in that it's probably best to simply follow the standard conventions as laid out by SIL. We have enough decisions to make as it is, without recategorizing languages. Certainly, the point is well made that doing so presents a type of clean separation where none existed, but I don't know if Wiktionary is at a point yet where it can handle time spans that well. Certainly the OED can sort of lump everything together, but they have millions of cites, and thus can reasonably show roughly when a word came into usage and when it disappeared. It's my opinion that Wiktionary is not at a point where it is feasible to do something similar. It seems a bit misleading if a word hasn't been around for 500 years and we simply label it as obsolete. As EncycloPetey said, word histories would be the goal for something like this, but I think we have a hard enough time simply getting definitions and proper format up for the words we need. I believe that Wiktionary will, in time, evolve to the point where it can dispense with such rough approximations and deal with word histories individually, but I just don't think we're there yet, and so Middle English should be kept as a category, at least for the time being. Perhaps it should be noted again that I know nothing about Middle English, and so my opinions on the matter should be taken with a grain of salt. Sorry to shit on your parade Widsith, I can understand why you were dreading bringing this up. Atelaes 08:27, 14 March 2007 (UTC)
I'm not an expert on this, but it's my understanding that the line between Middle English and Modern English has to do with the Great Vowel Shift, which was a large-scale shift in pronunciations but not spellings. Now by my understanding of what kind of information is freely available, we can't really give accurate pronunciations for Middle English words other than simply assuming the spelling is representative, which strikes me as less than useful; so I don't see that we gain anything by drawing this same distinction. In terms of vocabulary and usage, there's really no way to distinguish the two.
BTW, there seems to be some sense in this discussion that labeling centuries-obsolete senses as "obsolete" is understatement; but by that argument, we should also have a separate category for Early Modern English, as there are plenty of words and usages that haven't been seen since the 1550s, which is technically the Modern English period but is well beyond obsolete.
RuakhTALK 16:28, 15 March 2007 (UTC)
Connel: it wouldn't be wrong. Yes, it is sometimes useful (especially in literature studies) to treat ME as a different language, but that's just a convention. As Wikipedia puts it: "Middle English is the name given by historical linguistics to the diverse forms of the English language spoken between the Norman invasion of 1066 and the mid-to-late 15th century" (my emphasis), a quote which also reinforces how vague the cut-off point is. This all came about because I read the word sege in Gawain and wanted to enter it. I cannot be convinced to put it under a =Middle English= heading when I know damn well it was still being used four centuries later. The same can be said for many - most - ME words. Marking it out as a separate language just irritates those of us who are familiar with it, and misleads people who aren't by giving the mistaken impression that it's a wholly separate entity from modern English. I agree with Atelaes that we don't yet have the number of citations to make this evident, but after all there are very many areas in which we are still lacking material – surely that's our aim though. In the meantime, for myself I think I'll just avoid entering such words altogether, since I can't bring myself to use =Middle English= and I don't want to annoy others by using =English=. Widsith 10:00, 14 March 2007 (UTC)
Rather than having nothing at all, what about a hard-coded Index to Middle English, providing a place to list a form (and its modern spelling) and provide a citation? --EncycloPetey 16:49, 14 March 2007 (UTC)
I don't know if this is worth anything Widsith, but if you'd be willing to take the time to give each "Middle English" entry a couple of cites which give an approximate range (to the nearest century) of the word, I would have absolutely no problem with it being put under =English=. I can't speak for Connel, however. Atelaes 20:42, 14 March 2007 (UTC)
For my part, if ISO or Ethnologue considers it a language it's a language. And if print dictionaries exist for it we should cover it. If Middle English specialists want a free online dictionary that covers their field, it should be right here. If there is overlap with either English or Old English then so be it. More is good, less is bad. — Hippietrail 20:52, 14 March 2007 (UTC)
I agree, we should have seperate language sections in articles for Middle English, and its own category. The free online dictionary - yes that is us!--Williamsayers79 21:39, 14 March 2007 (UTC)
Totally agree. It is a long-term project, but isn’t almost anything here? H. (talk) 21:15, 17 March 2007 (UTC)
I wonder if some of the most recent comments might be missing the point here. Widsith is not proposing that we not do Middle English words. Rather, he's proposing an alternate method for categorizing them. Atelaes 21:43, 17 March 2007 (UTC)


This is an experiment I am working on. The impetus is that we have a fairly detailed preferred format, and new users in particular find the details a lot to learn. (Why should it be "Usage notes" when there is only one?) It would be easier in a lot of case to just fix things, but rather than do a lot of fiddly editing, just tell something automatic to do it.

So the experiment is User:AutoFormat, it picks up entries tagged with {{rfc-auto}}, and can be taught to do various things. Right now it sorts languages into order, and adds the ---- dividers where needed, as well as some spacing. See [2] for example.

It is set up as a 'bot, but runs under my direct supervision, an entry or three at a time. So when you see it in RC, take a look if you like, but you don't need to worry about it. If you want it to try an entry that has a problem in a class it fixes, add the tag, but no guarantee when I will run it.

If you'd like to make any suggestions, etc: User Talk:AutoFormat, if you have more than one idea, add separate sections (they're cheap ;-). Robert Ullmann 18:12, 17 March 2007 (UTC)

Looks very nice! If you make a bot of this, I am going to use this a lot, since it are things I otherwise do manually. H. (talk) 21:37, 17 March 2007 (UTC)
I think your question 'Why should it be "Usage notes" when there is only one?' is meant to be a new user's, rather than your own, but it's because it makes for consistency to use "Usage notes" everywhere, and also just because there is only one usage note now does not mean that more won't be added later, and the person adding the new note might forget to change the title of the section. The same goes for "Synonyms", etc. I'm sure you knew that already, but it doesn't hurt to have it stated again in case anyone reading your posting is wondering.
Anyhow, I like this idea very much... how much will it be able to do automatically? — Paul G 09:59, 18 March 2007 (UTC)
Yes, the question was certainly meant to be a new user, not moi. Should have thought that was obvious? ;-) I'm not at all sure how much it can do; and that is why it isn't clear how generally useful it might be. Part of the reason for doing this you can see by looking at User:Robert Ullmann/Han/Problems which lists all the remaining problems with the Han entries that couldn't be done with AWB. (or with the code to fix the Korean Yale and Mandarin Pinyin). In particular, language sorting isn't so easy with just regex replacements. If you look at the list starting at 6434, you'll see where I had messed up the section flipping regex for a few minutes, and couldn't ID all the entries to fix at the time; now I can tag those and they will get fixed. Also quite a few with no ---- section dividers.
In general my thought is that when doing other edits and cleanup, users can drop the tag in if there is something they know the bot can fix. And even if it isn't run for a while, it will eventually. Meanwhile the tag is intentionally invisible, except for the cat at the bottom of the page. Robert Ullmann 19:13, 18 March 2007 (UTC)
Oh, and just a random sort of list of things it could do: fix header spellings, subst language names for codes in headers and translations lines, sort translations lines (careful with the multiples!), fix ''f'' to {{f}} (only in translations), unlink "top 40" language names in translations, move categories to the corresponding language sections, subst PAGENAME, wikilink one word definitions (and some variants, I did this in the Han entries), etc. And a number of things that could be tagged for attention if we wanted. Any given idea can be a new talk page section. Robert Ullmann 19:31, 18 March 2007 (UTC)

I have a fairly reasonable first version, would encourage anyone to add the {{rfc-auto}} tag to whatever they are editing that they think would benefit. I will be checking every edit. Robert Ullmann 22:17, 18 March 2007 (UTC)

"alternative spelling of" template

Connel Mackenzie requested an extra parameter for this template to show the region(s) where a term is used, and I think this would be useful too. Something like this, for the article encyclopaedia, for example:

{{alternative spelling of|encyclopedia|mainly UK}}

which would produce this:

An alternative spelling (mainly UK) of encyclopedia

or maybe

A mainly UK alternative spelling of encyclopedia

(The first is probably better because the second would need to change "A" to "An", and would not be clever enough to know that it's "A US..." and not "An US...".)

I don't know how to add this, otherwise I would do it myself. Could someone add it, please? — Paul G 10:14, 18 March 2007 (UTC)

Come to think of it, we can already write (mainly UK) or similar before the template, and it formats fine, so I'm not entirely sure this is necessary. What do you think, Connel? — Paul G 10:17, 18 March 2007 (UTC)
Context labels can be used. Thats how I've handled this in the past.--Williamsayers79 10:43, 18 March 2007 (UTC)
I'd like to see the parameter option in the template. I'd also like to see a standard set of regional abbreviations used, both as parameters in this template as well as in the {{context}} template and in a template for marking pronunciations in the pronunciation section. The {{alternative spelling of}} template could then simply have the parameter set as:
  • {{alternative spelling of|encyclopedia|UK}}
To produce the output:
I'd rather have the option of inserting a pipe and two letters at the end, instead of packing the whole shebang inside of a context template. --EncycloPetey 16:54, 18 March 2007 (UTC)
We have a perfectly ordinary way to do this, like everything else. The alt of template is just a definition line. Use the context label, like any other definition line:
# {{context|mainly|UK}} {{alternative spelling of|encyclopedia}}
1. (mainly UK) Alternative spelling of encyclopedia.
This is already standard format. Robert Ullmann 18:50, 18 March 2007 (UTC)

Connel's essay in response to Dmh's statements on deletion

OK, now you've ruffled my feathers. So, since I intend to start writing the book I've outlined, I need the practice at being verbose. So I'll practice my prose with a chapter here. For good measure, I cleared out/archived 185 KB of text so I'll have room for all these ones and zeros here.

What is Wiktionary?

That actually is a very good question. No one really knows the answer, though. The answer seems to be a dictionary that represents the minds of the collective contributors at any given point in time. If only vandals are here, then yes, it will quickly become another Urbandictionary. To be fair, Urbandictionary today, bears little resemblance to Urbandictionary a year ago. But I'm sure your can understand that if left unchecked, Wiktionary would rapidly decline.

But what keeps real contributors here? Is it their notion, that Wiktionary will grow to become the dictionary, that they each want to be able to tell their grand-kids they helped write? Does it purport to be some massive force for social change?

This is the left arm of Wikipedia. An encyclopedia simply cannot, and should not, cover the needs that a dictionary does. But for a universal reference, a dictionary component is needed. Most people looking something up, can't grok the idea of an encyclopedia containing dictionary definitions - that stuff, their brains automatically want to see lumped into a dictionary.

Back to the social change stuff for a moment...Wikipedia in and of itself, is becoming a force of some sort. More and more people are turning to it first when they'd like to understand something. So what is changing? The role of publishers? I can't imagine they are particularly pleased with that prospect.

So, off on a slight tangent, have you checked how much it costs to access the OED online these days? $382.75 In North America, buying an annual subscription through your local library is only $195.00. You don't get a copy of it or anything - you just get the ability to search their references and read individual items. Granted, that is a lot less than it was a year ago, but still...that is not chump change. Limited (i.e. incomplete) editions are available free of charge through most libraries.

Oddly, m-w.com still doesn't charge for their general access, but does charge $29.95 for annual access to the unabridged version. Nor does dictionary.com (but e-reference can be downloaded for $34.95, AHD for $26.00, Cult lit. $29.95, etc.) Cambridge is free or £21.00 to buy, Bartleby is free to access, or $60.00 for hardcover.) Instead, both bombard you with advertisements, many which make it past the various filters made to combat such nonsense. But how long can even that last?

So, um, wait a second. Why are we here again? I think it is the realization of all contributors, that such free access is extraordinarily precarious now. Any day, "they" (you know, them - "Them" - the big meanies) may decide it is time to start charging for access. All of the serious dictionary publishers have already made their attempt at going online. From their perspective, there is no more "adaptation" to new environments that they can do.

You and I, we know better. Free content is not just free, it is liberating.

Do I want to see a replacement for other dictionaries? No. I do want Wiktionary to be equivalent (or better.) I get the feeling that most contributors here feel the same way.

Now, does being a "multilingual dictionary" make us better? Absolutely not. Lookups are astronomically harder, glosses are much more susceptible to splits in deference to other languages, translations clog up the works to no end. Technical aspects, like simply rendering the alphabet, are no longer simple. But who am I to say? The decision was made by this community long before I ever heard of wikiAnything. And although I only listed the glaring defects, there are also benefits.

Being a multilingual dictionary gives tremendous insight into etymology and cognates. Having everything as a single search means the information you are looking for, about an obscure Greek term, is right at your fingertips. The lack of language separation at the software level has had direct benefits on the English side as well. It has forced us into listing all forms of a word, which really is a good thing.

Such a technical marvel has been inconceivable since dictionaries first existed. Look at the other online dictionaries...they still don't get it. Instead of listing the entire word, as spelled, the list the suffixes for a given headword. They could spell them out, but they are so set in their ways, they refuse to.

But back on topic, our immediate international reach has allowed other miracles, like the list of French Wiktionnaire's English terms that aren't yet in the English Wiktionary. The German and Dutch word distinctions have forced us to explain many entries in ways we would never imagine, as native speakers. And words like uncle which have only a single meaning, are suddenly clarified beyond imagination. (Yes, that is both good and bad.)

So why is being a multilingual dictionary so bad? It doesn't meet the expectations of our readers. As calcified as the dictionary publishers seem to be (to me,) the readership is an order of magnitude more guilty. So our software here, has to accommodate both the lay-reader and the hard-core linguists. (The recent PIE vote would be a good example of that.)

Of all things, I think our readers are of the most concern. No one wants to open a dictionary and see goatse. No one wants to look up a term, and find an obscure, deranged S&M re-definition of a normal word. No one wants to be redirected to a made up "phobia" that describes a fear of the word they are looking up. No one wants to find out that a Pokimon character used the thing they are looking up in episode 827.

And no one wants to be told the wrong way to spell a word (especially on a close match lookup.) People do want to know that they are using the right word, spelled the right way. Some authors want to know that the obscure word they are using is acceptable. Authors, in particular, know perfectly well, when and how they can go outside the bounds of strict, formal, correct usage.

Are we currently building a usable dictionary?

Now, lets look for a moment at the contributors. Who do we have? Not so many people are desperately interested in the grunt-work of composing accurate and consistent definitions. Instead, we tend to get a lot of people stopping in, with a strong desire for world recognition.

Turning en.wiktionary.org into an intellectual pissing contest (scenarii) is not exactly productive.

Other contributors wish to provide the terms relevant to their "vertical segment" in an increasingly popular dictionary. Others are here just because they are baffled that we don't have entries for their favorite terms. Many others seem to unconsciously think this is, or should be, a slang dictionary.

None of those groups are interested in the day-to-day grunt-work of building a usable dictionary. Occasionally, an individual in one of those groups is, but by and large, those stereotypes are nearly opposite of what en.wiktionary.org needs.

So, back to what is Wiktionary. Or rather, what should it be.

I for one, am embarrassed by the enormous number of words we have that do not appear in any other general-purpose dictionary. I for one, am embarrassed by the enormous number of words that appear here, that are universally thrown out as spelling errors elsewhere. I am astonished that, for the most part, except a tiny handful that I've marked, those errors have all the appearance of "valid" words. Such short-sightedness makes a task such as building a spellchecker from Wiktionary, nigh impossible.

Should we throw up our hands, as Dmh suggests, and allow a free-for-all? Or should we get serious about building a real, usable dictionary that can be looked at as an historic achievement? We have an opportunity here to provide the World with a copyleft usable dictionary.

Now, before I start on chapter two, (How to get there via a "multi-level Wiktionary") I'd like to take a quick straw poll. --Connel MacKenzie 00:42, 19 January 2007 (UTC)


  • Wiktionary should be a usable, "real" dictionary with nonsense and slang kept out of the main namespace.
  1. --Connel MacKenzie 00:42, 19 January 2007 (UTC)
  2. --Versageek 01:17, 19 January 2007 (UTC)
  3. --Cynewulf 02:18, 19 January 2007 (UTC)
  4. --DAVilla 04:56, 19 January 2007 (UTC) except that I'd challenge your notion of a real dictionary, since even respectable dictionaries have slang; and as long as "nonsense" is defined objectively.
    I was intentionally vague, so that I'd have something to say for chapter two. :-) But then, that will just be a rewrite of the "multi-level Wiktionary" thing, that I've gone on about before, elsewhere. --Connel MacKenzie 05:55, 19 January 2007 (UTC)
  5. --Jonathan Webley 07:26, 19 January 2007 (UTC). Delete the nonsense, but keep the slang.
    Perhaps you should move your vote down then. I am suggesting slang be eradicated from namespace zero, but remain search-able as full entries, e.g. "Slang:bitchin." --Connel MacKenzie 17:20, 19 January 2007 (UTC)
    As you can see from my delete history, I'm not in the free-for-all camp. To be honest, I'll need to see the slang namespace in action before I can be certain whether I agree with it or not. Jonathan Webley 11:44, 20 January 2007 (UTC)
  6. --Enginear 15:06, 19 January 2007 (UTC)
  7. --Jeffqyzt 16:58, 19 January 2007 (UTC) Agreeing to the bold text, assuming that the non-bold is merely commentary stating Connel's POV. Otherwise, the two options are equally disagreeable.
    Perhaps you should move your vote down then. Yes, it is an attempt to clarify my POV, for my little straw poll here. --17:20, 19 January 2007 (UTC)
  8. --Cerealkiller13 21:01, 19 January 2007 (UTC) (Let me be quite clear that I think this is the best option of the two, but I am not advocating it as a Wiktionary policy).
  9. —Stephen 23:20, 19 January 2007 (UTC)
  10. I am a dictionary editor and this is my manifesto? - [The]DaveRoss 17:21, 23 February 2007 (UTC)
  • Wiktionary should be a free-for-all with no [delete] or [move] buttons for anyone, whatsoever.
Please do not add other choices. Neither goal is likely; I'd simply like to know what the general desire actually is.
OK, then I can't answer yes to either of the above. I like a lot of the points you make above (though I'm still more with Hippietrail on making better use of available technology). I also completely believe you offer the choice in good faith, but it's still a false dichotomy. Wiktionary should be a usable, "real" dictionary. But real dictionaries include slang, and "nonsense" is like obscenity — you know it when you see it. Which is why ...
To be usable, Wiktionary needs consistent rules and they need to be consistently applied. It can't be a free-for-all. Back in the day we'd argue over whether a particular made-up word, and I mean something that the contributor said they'd made up out of whole cloth, should be in the main namespace or not. Eventually we decided on LOP, but then we had to explain why someone's baby ended up there instead of in the main space. It was on their web page after all.
It was about that time that CFI got larger — at least one person said too large — and a hell of a lot less vague. Now when someone introduces a made-up word, we can say "Nope, sorry, fails independence and attestation. LOP." Game over. Done. Next customer, please. This is progress.
I share your concern, to some extent, about filtering out garbage. I'm not concerned about (IMHO) silliness like scenarii, ingenuitive and the "I don't like it" sense of illiteracy. I'm not greatly concerned about vulgarity, profanity and internet-flavor-of-the-month. As a word geek, I'm much less bothered than most where a citation comes from, as long as it's durably archived and it's clear the speaker is using the term in question in earnest.
I am, however, concerned about two things that I believe you're also concerned about:
  • Preserving the information that (however silly the reasons) some terms will give people the impression you're stupid. Similarly, some (for generally more valid reasons) will offend. Also, some are only used formally and some are never used formally and some are in between. We need to say this, but without taking a POV as to whether any of this is justified.
  • Noting which spellings are commonly accepted and where, which ones are used infrequently but not considered outright wrong, and, within limits, which ones are commonly used but commonly considered wrong. We want to do this without letting in a full entry for every conceivable spelling of every conceivable word.
I think the solution to all but the very last item is, "Include, but mark." We can have (and have had) a lot of fun arguing over just how to marke things, but they need to be marked. The entry for scenarii needs to say that it's only used in particular techincal contexts and scenarios is overwhelmingly common elsewhere. The entry for ingenuitive might say that the very similar ingenious is much more common. The entry for your favorite obscenity should be marked as such and given as prudent a definition as will convey the meaning concisely.
All of these should be codified as general rules to be applied in such cases, not just done ad-hoc. They should be codified for the same reason that we codified handling of made-up words. It will make our life simpler and Wiktionary better.
The last item is harder. Plenty of bogus spellings will pass the current CFI, which is more aimed at filtering out made-up words. Actually, I think I have a proposal, but I'll give it separately. Even if that doesn't pan out, we need some sort of well-defined rule.
If I've been expressing this well at all, it should be clear that I'm in no way advocating a free-for all. If you re-read most of my complaints over time, you may find that they suddenly make more sense if viewed as "this is not following any consistent rule" and not as "we need to let in anything, from anyone, anywhere, any time" or "I'm just rattling cages" (I'm very seldom just rattling cages :-).
You can't just fiat "no nonsesnse shall appear in the main namespace of Wiktionary." You have to give clear, objective criteria. Otherwise you do get a free-for-all. -dmh 04:00, 19 January 2007 (UTC)
Having taken the trouble to look it up now, I'm no longer bothered by ingenuitive or of the opinion that ingenious should be offered as an alternative. They're two different words. I'm not sure what a better example would be above, so please pretend that ingenuitive is just an odd variant on ingenious -dmh 05:00, 19 January 2007 (UTC)

A dictionary which excludes slang is neither "useable" nor "real". As for "nonsense", well everyone agrees that that should be kept out, but the point is that everyone has a different idea of what constitutes nonsense. Widsith 17:12, 20 January 2007 (UTC)

I agree. That is why I said "kept out of the main namespace" (ambiguously.) To be less ambiguous, I am suggesting a "Slang:" namespace (fully searchable, but marked as slang by entry title.) --Connel MacKenzie 22:24, 20 January 2007 (UTC)

We can still describe slang words in Wiktionary without any of the usual crap found on Urban Dictionary. I'm inclined to agree with Widsith on this, but no free-for-all though. --Williamsayers79 18:13, 20 January 2007 (UTC)

Yes, in the longer proposal, I'd shunt such entries to "Vulgar:" or "Obscene:" namespaces. Since this is all so hypothetic, no real discussion of what the exact namesaces will be, has even started. But with positive feedback, I think I will propose a bunch. --Connel MacKenzie 22:24, 20 January 2007 (UTC)
I'm fine with "include, but mark" as a general approach. Namespaces may or may not work as a practical means of marking. I'm much less interested in the mechanism or even the categories than the rules for categorizing. Once again, rules. I don't think anyone has seriously proposed a free-for-all. -dmh 02:02, 21 January 2007 (UTC)
I regularly comment that there should be some area or areas (probably not in the main namespace) where misspellings, misprints, typos and scannos can be placed, subject to something similar to our present CFI, so they can be searched for. This apparent serious request for a definition shows exactly why. I have yet to hear a good argument why such entries are less useful than "correct" entries, although I accept that we need to find a way to discourage mirroring before we add them. Personally, I think an entry for niany would be more useful than, say, metropoleis. The latter should, IMHO, normally only be used to an audience who understand Greek inflections, since a more commonly understood plural is readily available. Few people would therefore need to look it up in a dictionary, indeed perhaps none yet have. However, many with limited knowledge of English might be expected to look up the scanno niany, or indeed the scanno bum, as repeated in one of the cites. --Enginear 20:38, 7 February 2007 (UTC)

(It should be clarified that nothing in this paragraph was intended with meanness) OMG thought police OMG! Your absolutist logic makes no sense to me. No reasonable person wants either option at all. Then your points - "No one wants to see goatse" - no one will if you watch for vandalism. However, a wiki is a wiki, and you can't change that. "No one wants to see an S&M term" why not? I think your "deranged" descriptor is frightening, to be honest, because S&M is a notable subculture, and there's no reason its definitions should not be included. Obviously, a completely bogus phobia should be deleted, and if they're looking up a real word it shouldn't be a redirect to a completely bogus phobia. I think that a pokemon's name is not a definition, and therefore belongs at Wikipedia, which would of course get a transwiki link. I don't think including misspellings is a bad thing either. I think one of the most confusing parts of your argument is that you suggest somehow including peripherary information makes it so readers can't find what they're looking for. I'm not a regular contributor, but I think the wiktionary needs no general policy revamp, and I'm very concerned that you think so. 09:55, 9 March 2007 (UTC) (Also Atropos.)

We can have our cake, and eat the bits we like too

Connel loves back & white discussions, doesn't he just. But the world is not black and white. It is more complex. My view is that

  • the Wiktionary database should contain everything possible
  • the reader, on registering, sets their preferences of what they want to see.

You set which languages you want to see. (Maybe you just want an English dictionary, or maybe an English/Hindustani dictionary, or "All Languages" is your interest. In future, maybe a Hindustani/Japanese dictionary). Maybe you are offended by Vulgar slang, so check the right exclusion box. But then you might be reading a text with a word you suspect is a slang word and you want to know it's meaning. So check the right box to allow all slang. Do you want to see misspellings or not ? Maybe you really don't want to see the etymology. I certainly don't, most of the time. Check if you want to see Obsolete terms or not. Check if you want to see Protologisms or not. And then change your preferences if your usage changes.

By having this kind of universal database, many views approach, we could conceivably keep everyone happy. It is, to me at least, an obvious compromise between Universality and Personal Useability. Certainly, seriously considering this has to beat playing Connel's simplistic black or white debate.--Richardb 10:30, 19 March 2007 (UTC)

Actually, I think I devoted a couple thousand words above describing just how much of the gray areas are gray. We should retain all vandalism, people's phone numbers, "JOSH IS GAY" entries? My, that is a new one, even for you. We don't have the technical ability to accomplish what you propose, anyhow. WM lookups are WM lookups...so if the garbage has an entry, it will always count as a direct hit. So the "misspelling vandalism" ends up being very effective. Great. --Connel MacKenzie 23:34, 24 March 2007 (UTC)

Structure of Wiktionary

One of the things I don't understand about en.wiktionary.org - which according to the main page is an English language dictionary - is why the entries define the word for multiple languages. For example, if I look up "hut" then as well as the English word I get translations for the Czech, Dutch and Old High German words "hut". In an English language dictionary it would be useful and interesting to have comparisons to related words in other languages (of the "c.f. Dutch, hut" type), but surely each language pair should have its own dictionary (in these cases Dutch-to-English, Czech-to-English, and so on)? What is the point of listing potentially completely unrelated words in the same article just because they acidentally happen to have the same spelling? To cope with the occasional event that I might have a word and not know which language it is in it would be much more sensible to have a "master lookup" feature across all dictionaries. Thus, I type in "hut" and it tells me the word is in the English dictionary, plus the Czech-to-English dictionary, etc. with links. The way it's organised at the moment seems bizarre to me. Perhaps I am missing some fundamental point. Matt 20:52, 16 February 2007 (UTC).

The point you may be missing is that this is a multilingual dictionary. We're striving to contain every word in every language (we still have a very long way to go on that). What distinguishes this as the English Wiktionary is that all of the definitions and explanations are in English. The beauty of this system is that if I, as an English speaker, want to know what the Dutch word "hut" means, I can look it up here and find out. If I look up hut in the Dutch Wiktionary, everything (the definitions, usage notes, etymology, etc.) is in Dutch, and I'll be completely lost. I hope that answers your question. Feel free to leave further clarifications if it didn't. Atelaes 21:23, 16 February 2007 (UTC)
I understand what you are saying, but if I, as an English speaker, wanted to know what the Dutch word "hut" means I would never consider looking it up in the same place as I looked up the definitions of English words. I would be looking somewhere for a Dutch-English dictionary. Perhaps this is just because every other dictionary that I've ever seen works like that. I've never come across anything with the structure of Wiktionary, which is probably why to me it seems so bizarre! Matt 22:00, 16 February 2007 (UTC).
Wiktionary is a Dutch-English dictionary. And a Czech-English dictionary. And an everything-else-English dictionary as well, and also a regular old English dictionary. We're just not constrained by space like those paper dictionaries you're used to, or constrained by lack of imagination like those online dictionaries that only mimic the paper dictionaries. bd2412 T 22:04, 16 February 2007 (UTC)
I remain unconvinced, but I appreciate that others take a different view. Matt 00:51, 17 February 2007 (UTC).
Matt, you are not alone. Hippietrail is putting finishing touches on a "Multi-lingual Wiktionary" extension to the MediaWiki software. I completely agree that lookups (and therefore, also edits) should be restricted to languages a user prefers. A simple note that the word's definition exists in other languages should be more than sufficient. Also of note: the Latin Wiktionary, I believe used "Wikipedia-style disambiguation" to separate the languages, instead of level two headings. It's the same, only different. --Connel MacKenzie 01:07, 17 February 2007 (UTC)
A further argument could be made against (by default) having translation sections and entries for foreign words. One or the other would lead to much greater consistency. --Connel MacKenzie 01:11, 17 February 2007 (UTC)
How would that help someone who either a) wants to know how to say "foot" in Spanish, and has no idea, or b) comes across the word "pie" in a Spanish essay and wants to know what it means? Would you suggest they look up "pie" in Spanish Wiktionary and hope to find the English translation? bd2412 T 01:24, 17 February 2007 (UTC)
What, what, what? No, click on "Show all languages" or "Show Spanish entries" before pressing [search] for "pie". For multi-lingual Wiktionary, many things assumed here (currently) about the [Go] button would not be/should not be valid. The way we've done it here, so far, is not scalable, nor flexible. --Connel MacKenzie 05:53, 19 February 2007 (UTC)
I will be bold and say that nobody (including me) is quite sure exactly how Wiktionary should be organized. From what I've seen, we really have three types of contributors at Wiktionary:
  1. people who create definitions for words
  2. people who obsess over the format of individual entries and/or the organization of Wiktionary as a whole
  3. people who write software to speed up mundane editing tasks
those are the three big ones. Most contributors to tend specialize in one of the three. I personally believe that we will not be able to know how Wiktionary should be organized until we have created entries a lot more words.
A-cai 12:23, 17 February 2007 (UTC)
Have gone through phases of each of the three stereotypes listed above, I'm not sure what you're trying to say. I will say that at this point, I do not wish to sit by, idly, while Wiktionary is turned into an un-parsable (programatically unusable) mess. --Connel MacKenzie 05:53, 19 February 2007 (UTC)
I'm not sure what I'm trying to say either :) The only thing that truly seems to unite the long term contributors to Wiktionary is the belief that Wiktionary could become something truly ground-breaking. Having now worked on Wiktionary for over a year now, I'm no longer under the illusion that it will happen any time soon. The biggest problem that I see is that we have a lot of people worrying about the form of Wiktionary, but not enough who worry about the content. I believe that the reason for this, based on the bilingual people that I have talked to, is that the process for creating entries is still too cumbersome. This is something that we old-timers tend to forget. Wiktionary needs to become much more user friendly if we are ever going to have a chance of attracting a large number of language enthusiasts (many of whom are not computer savvy).

A-cai 07:50, 19 February 2007 (UTC)

I fully agree with Connel that it would be very nice to make Wiktionary a customizable experience. Most of the users really only want to see a simple definition of an English word, but I'll be damned to let Wiktionary be limited to that. The OED online has some hints of what I would like to see Wiktionary one day become. It has buttons at the top which allow the user to determine which portions of their entry they want to see. If you want to see the etymology, you click the "etymology" button at the top. Same goes for pronunciation, quotations, etc. I think it would be nice if there was something similar to WT:PREFS at the homepage, where people could set up the default views. Here they could determine if they want to see etymologies, if they want to see translations (and if so, from which language(s)), etc. This would certainly require the imposition of a lot of rigid formatting rules, but I think it would make Wiktionary appeal to a broader audience, especially as our articles continue to (hopefully) grow. It certainly is rather intimidating to go to an entry, looking for the definition, only to find five pages of text that you have to sort through. But at the same time, I don't want to give up an iota of those five pages. Atelaes 07:36, 19 February 2007 (UTC)
I too like the OED buttons available on the front end accessed from [3]. Unfortunately, it seems that most US libraries have gone for the Oxford Reference front end [4] which does not use those buttons, so people like Connel can't easily see what we're talking about. However, your description seems pretty clear. The only things I can think to add to the description are that on OED the preferences last only for the session, which is a pity, and that the etc in your "same goes for" represents date chart.
Obviously, another useful button for us would be translations; and we need, as Connel has just hinted somewhere, a means of choosing what languages we want to see entries for. It would be an advantage to have the buttons visible on every page (as OED does) rather than having them on the home page, since I find I sometimes alter them a few times during a session, depending what I'm looking for.
I do think it makes sense to have a relatively high proportion of editors caring about setting the style while the number of entries is still in the 100k's rather than the 10M's. I agree that, once we have a format that seems scaleable to the "all words in all languages" goal, we should make the entering of words easier. However, I think it may actually be good that the present imperfect system throttles the number of edits and leads to a preponderance of nerds, at this stage when the structure clearly needs attention.
And to anyone who hasn't noticed Hippietrail's latest (at WT:BP#Wiktionary structure awareness extension prototype live for testing) then do look (though I haven't yet worked out myself how to vary it from the default). --Enginear 16:33, 19 February 2007 (UTC)

I agree with some of the sentiments expressed above. As a newcomer, the more Wiktionary pages I look at, the more of a mess it seems to be. Someone needs to sit down and properly design the structure, and then create an interface that *enforces* that structure, so that individual editors can't just go off and do their own quirky things. (I should emphasise that my comments are not in any way intended as a criticism of the people who have obviously put in a lot of effort to get Wiktionary to where it is. It's just the way the thing's grown I guess.) Matt 14:51, 20 February 2007 (UTC).

Some of the perception of messiness comes from having a large number of articles which are fine, but could be more complete, combined with articles that are fairly complete. E.g. if every article consistently had pronunciation, it would look less "messy".
But: is is very fortunate that we started out with the 'pedia s/w, and that no-one "sat down and properly design[ed] the structure"! Most of what we have done in the last two years would have been difficult or nearly impossible if we had had s/w that enforced the structure. For example, if the several people working on Greek/Ancient Greek right now had to make code changes and get them committed to the running s/w base, instead of playing with a few templates and formatting pages as they like, that work would almost certainly not be happening. They would just be forcing the information into the pre-conceived format, with inferior results. Sure, I could write s/w today that would look really good, but couldn't have done it 6 months or two years ago; we didn't know enough. And note that the previous sentence will still be true 6 months or 2 years from now ...
The WiktionaryZ/Omega project is trying to write such software, but it "freezes" some level of understanding (and when they did the first version, they didn't even know that Japanese could be written in 4 different scripts, and that entries were not 1-1). Even if they do it over, they just freeze at another point.
We are still at a fairly early point, still learning enormous amounts about what a dictionary can be freed of the constraints of paper. (Why have lots of lang-x to English dictionaries, when one can have an Any to English dictionary? I dreamed of compiling one of these in the 1970's, and figured out it run to to many dozens of volumes, so not be terribly useful ...) Right not we have about 300K entries, in less than a year we will have a million+, in two years probably 5-10 million as we move toward comprehensive coverage of 40-50 major languages. Anyone think they can predict what is going to be needed in the structure? All we can do is work on it and learn. Robert Ullmann 12:39, 23 February 2007 (UTC)
(Interposed comment; sorry if this disrupts the flow... not quite sure where to put it). The reason why the "any-to-English" format does not work as currently implemented in Wiktionary is that 99% of (English language) users, 99% of the time, either want an English definition of an English word or want a translation of an English word into a known, specified language, or want an translation of a word from a known, specified language into English. Mixing everything together on the same page just makes it more difficult for people to find what they are looking for, while adding no value. The one unusual circumstance where someone wants a translation of a word, and they don't know what language it is in, should be handled by some sort of "global lookup" feature. Matt 14:36, 25 February 2007 (UTC).
As I noted below, the number 1 hit on the English wiktionary (after the drunk-college-student obscenities ;-) is Category:Japanese language. I have a suspicion that contrary to your assertion, the vast majority of our users are in fact looking for English definitions of words in other not-necessarily-known languages. And if you are looking up something written in Han characters, which language/Englist dict are you going to look in? No value? The translingual/common and related languages (e.g. Mandarin/Min Nan) add a lot. A "global lookup" as you say? That's just what we provide. If it offends you (;-) that you get additional information: we are working on a filter, see below. Robert Ullmann 14:50, 25 February 2007 (UTC)
I do find it very hard to believe that most users are looking for an English definition of a word in an unknown language. If this is true then the community of Wiktionary users must be a very atypical bunch compared to your average dictionary user, I would say. Matt 15:13, 25 February 2007 (UTC).
I'm not suggesting that once designed the structure can never change; that would be daft. I am suggesting that a structure be devised and enforced to cope with the content that exists now, that can then be extended/revised as people have new ideas and want to do new things. What I'm talking about is tidying up the sort of mess that we have, for example, at note (just to pick at random one of countless examples), which has a list of definitions of the noun, an out-of-sync list of translations which if extended to "all" languages would be about 100 pages long, followed by the definition of the verb, followed by more translations etc. This is the sort of very unfriendly "ad hoc" layout that could be avoided if, for example, a sensible structure for handling translations were designed. Matt 23:08, 24 February 2007 (UTC).
I agree that what headings can be used should be restricted, but only (if and only if) that list of restricted headings can easily be extended by sysops. For example, I'd love to see ===Usage note=== never be allowed (indicating instead that only ===Usage notes=== is a valid heading.) There are now four different "flavors" of related cleanup lists...mine are at User:Connel MacKenzie/todo, todo2, 3, 4, 5 etc. No one has been eager to attack {{rfc-trans}} recently...it does seem to be a growing problem.
The "Preload" templates have gone a long way towards helping newbies enter English new words. Many of the preload templates still need expanded "-intro" fillers, like template:new_en_noun_intro. And other languages...ahhh. Big time.
The biggest roadblock to making it "easier" to edit is that Wiktionary serves a lot more pages to readers than contributors. Right now, it is still fairly easy for a newcomer to make a minor correction to an existing entry. And although cluttered, I think the entries are somewhat comprehensible for newcomers to read. But certainly starting a new entry is daunting, for newcomers. Unfortunately, I don't see many ways to simplify that. --Connel MacKenzie 08:23, 25 February 2007 (UTC)
The minor variations to headers aren't so difficult; we just need to periodically run something to fix them. (E.g. (^={3,6})\s*[Uu]sage\s*\[Nn]otes?\s*={3,6} (to) \1Usage notes\1 or such ;-) I have something that would fix all of them, but right now it "fixes" a bit too much ... as to the readers: 200+ hits day on MILF and choad? No wonder we have to protect those pages. It is interesting that Category:Japanese language is in the top 100.
Some kind of entry method for new users/new entries (much better than the preload templates) would be a fine idea. Robert Ullmann 12:09, 25 February 2007 (UTC)
Some kind of "model page" might also be useful: a page that is fully populated, with all sections present, translations into all languages present, etc. Has anyone done this? Initially it might be good for someone very familiar with Wiktionary to do this and invite comments so that a consensus view on how all the elements should be laid out is arrived at (for example, how to avoid breaking up English definitions with acres of translations). Then the page could sit as a useful reference for newcomers like me. Matt 15:13, 25 February 2007 (UTC).
A single model page would never satisfy this need; there are too many possible variations. There is no single word in the English language that can function as every part of speech and every subcategory of every part of speech. Examples would be needed for each part of speech, and also for handling words that serve in multiple categories. There are also relatively few words in English that have directly precise translations in all languages, never mind the fact that we don't have editors who speak all the various languages of the world. For example, there are more than 1600 languages spoken in India alone, and only a handful of those languages have any entries on Wiktionary.
That said, there are a small number of pages that show a high proportion of basic layout information. I started a project (which I work on only occasionally) to accumulate some pages to serve as models. One such page is listen, and I am working to make Central Europe, transparent, and round into model pages as well. You can see the starting putline of my efforts at User:EncycloPetey/Model pages. --EncycloPetey 18:54, 25 February 2007 (UTC)
It could just as well be a made-up word with made-up definitions and translations. The purpose is to ilustrate the structure and layout, not the actual content. Matt 21:06, 26 February 2007 (UTC).
WT:ELE originally was exactly that - the made up word "Hrunk" formatted a la Wiktionary. It has evolved a little bit, over the past couple years. --Connel MacKenzie 00:45, 25 March 2007 (UTC)
IIRC, it was moved to its current name in late 2004, and gained initial acceptance in early 2005. Not sure how long after that, that WikiMedia added log entries for moves, deletions and protections. --Connel MacKenzie 19:03, 17 April 2007 (UTC)

Plurals and translations.

I was recently browsing around and came across the entry for geese, the plural of goose.  I noticed there were three translations for that entry.  I know I've seen it elsewhere, but haven't noted it.  My first inclination was to delete the translations, but wondered if there is a policy for that.

I personally think entries marked as English plurals should not have translations sections.  The non-English plurals should be in their own pages. — V-ball 12:47, 28 February 2007 (UTC)

I think that is the policy. I certainly delete translation sections on plurals when I see them. Widsith 12:53, 28 February 2007 (UTC)
That is insane. For completely irregular plurals you'd delete useful content? No, that is not policy. The General case (which is wrong) is for regular inflections, to not require translations. --Connel MacKenzie 07:10, 1 March 2007 (UTC)
I am in total agreement with Connel on this - there is no sound basis for deleting translation sections from plurals. Consider the end user who wants to know how to say, for example, friends in French. They might well go first to friends, and seeing no translation section there, may give up, or may go to friend (which will have ami and amie, but not amis or amies. Here's another place where our user may give up, or if they are intrepid they may go on to look up ami and find what they sought (after first suffering two unnecessary disappointments). bd2412 T 07:20, 1 March 2007 (UTC)
How about etymologies? Atelaes 07:14, 1 March 2007 (UTC)
I have no problem with etymologies (or pronounciations, citations, 'pedia links, etc). An entry for a plural is an entry that happens to be for a plural; we may define plurals by reference to the singular, but that doesn't mean they have to be stripped barren but for that information. bd2412 T 07:22, 1 March 2007 (UTC)
Please try not to call me or my actions insane. I was under the impression that we had discussed this before and concluded that translation sections were only attached to singular forms? As for a sound basis, I find it hard to sympathise with BD's hypothetical user, since I can't believe anyone wanting to know the French for friends would not look up friend. That is the way all other dictionaries work. Anyway, I'll do whatever the community decides, but as I say I thought we'd been over all this in the past. Widsith 09:51, 1 March 2007 (UTC)
To take some odd examples, what would you do with news, data, or peoples, where I expect the translations are often quite different from the "singulars"? More generally, I agree with the others that it is normally inappropriate to delete any content which might be useful. --Enginear 20:33, 1 March 2007 (UTC)
I too had always understood that we made a fundamental distinction for certain information between the lemma form and non-lemmata. In particular, that when the only "definition" of an English word is "form of foob", that the translations will be given on the lemma page foob. Otherwise, we open ourselves up for incredibly bad headaches of maintenance. I for one don't want to try to correlate and verify all the translations of English present participles, to ensure that the Latin translation is the present active participle. I don't want to have to be sure that the correct gerund form is given under the English gerund form, even though the part of speech will not match between the entries. And which verb forms should we give then in the translation of English verb lemma, if we're going to open it up like this? The first person singular present active indicative? The present active infinitive? The passive preterite infinitive? Latin verbs have six infinitive forms (unless they're defective). Translations are not one-to-one. I say any non-lemma entry should point to the lemma for translations. --EncycloPetey 03:51, 2 March 2007 (UTC)
I agree with this proposal. Perhaps there should be exceptions made for words like news, data, and peoples, and all other forms where a plural is the only form, or has a separate meaning. But, in general, if a word is simply plural form of foob (which, humorosly enough, is a Hmong verb), it should simply refer back to the lemma, where all the pertinent information will be held. I think most users will be intelligent enough to figure this out, especially if we are consistent in this, and the non-lemma is simply a soft redirect with nothing else. Atelaes 04:10, 2 March 2007 (UTC)
I'm not seeing a "why" - paper dictionaries limit the information they provide in accordance with their corresponding limits on available space. We have no such barrier. bd2412 T 04:34, 2 March 2007 (UTC)
We may not have limits on space, but we most certainly have limits on manpower. Having all the information in both places adds little in terms of the user's experience, it is a simple matter to follow the redirect to the lemma form. However, it adds a great deal of workload in maintaining the entries, as well as figuring out which form to use (this becomes more relevant with verbs, as EncycloPetey mentioned earlier). And, if we decide not to maintain the entries, we are then presenting a low quality product, which no one wants to do. Atelaes 04:57, 2 March 2007 (UTC)
This is an all-volunteer project. Our manpower is whoever is interested in doing whatever they are interested in doing - so long as additional information is merely permissible, but not mandatory, I see no manpower problem. As for maintaining the entries, do you mean policing edits? I don't think having additional information in legitimate entries will increase the number of vandals. It really doesn't even give them additional targets, as we already have plurals as entries. bd2412 T 05:50, 2 March 2007 (UTC)
I'm not talking about vandalism, no. What I'm talking about is when some anon adds the Turkish translation of foob, but doesn't do it on foobs. Then we're offering a sub-par version of foobs, which is lacking in the Turkish translation. On the other hand, if we offer nothing but "plural of foob", then we're giving them the same high-quality product, as they're, in essence, forced to go to foob and see the Turkish translation. You can certainly say, "Well I'll just add the Turkish translation to foobs," but will you? I won't. I don't have time for tedious stuff like that. And while our manpower is theoretically limitless, in reality, it does have a very distinct limit. We have, what, maybe a few dozen solid contributors? I think it unwise to add work for ourselves which adds little to the overall project. Atelaes 06:18, 2 March 2007 (UTC)
Okay, but if some anon does go and add the Turkish translation to "foobs" do you think one of us solid contributors should then be tasked with taking the time to delete this information from the entry? bd2412 T 13:44, 2 March 2007 (UTC)
I suppose so, yes. Atelaes 13:58, 2 March 2007 (UTC)
I have to agree with BD2412 in this case. For example, the word marines refers to the marine corps, but marine does not refer to the marine corps! Depending on context, it can mean a member of the marine corps, or it can mean a variety of other things that the plural marines does not mean (you can't have a plural adjective can you?). This could be true of other languages as well. To use Atelaes' example, foob does not necessarily equal foobs in everyway. Take a look at the following (We'll call our language Fooblese for the sake of argument):
  • Fooblese: foobe
  • Fooblese: foober
This is especially true for a language like English, with its wacky plurals such as cactus/cacti, city/cities etc. I also disagree with deleting a valid translation just because it was placed under the plural form and not the singular. Even if Atelaes is correct about the plural/singular thing, the translation should be moved to the singular form, not just deleted out of hand!

A-cai 14:41, 2 March 2007 (UTC)

If foobs is not simply the plural of foob, then obviously it should have translations for those senses which are not simply plurals for senses of foob; I don't think anyone's claiming otherwise. Also, the existence of "wacky plurals" is not an argument one way or the other: the pages for cactus, city, etc. state what the plurals are. If you want to add the plural of a non-English noun, the place to do so is at the entry on the singular, under an "inflection" heading. —RuakhTALK 16:01, 2 March 2007 (UTC)
I wholeheartedly agree with the fact that a translation put in the plural should be moved to the singular form (if not present), instead of unceremoniously deleted. I apologize for not being more clear on that. As for the marines, as I mentioned earlier, there will certainly be exceptions. Anytime a word cannot be defined simply as "plural of foob", then all bets are off as far as what I'm talking about. As for cacti, certainly it's a goofy plural, but we'll have a succinct explanation of it: "plural of cactus". Perhaps it might also be wise to have a usage note (follows Latin declension) to explain it, but otherwise, I still think it should simply be a soft redirect. Atelaes 16:05, 2 March 2007 (UTC)
I don't know if "wacky" plurals like cacti are anything special.  To me, it seems weird to have translations sections on plurals.  For example, the page cactus will have a translation section, and in that one can see (after they make an unnecessary extra click to show the translations (my pet peeve since I can't seem to get my preferences to work)) how to say cactus in various languages.  Most likely, you will see the Russian word кактус.  If I really want to know what the plural of кактус is in Russian, I will click on it because it's entry should have a paradigm showing the plural, and I would not expect the entry for cacti to have the Russian nominative plural, кактусы, listed.  The plurals of foreign words should be listed the same ways English words are, meaning кактусы is mentioned on the кактус page as a plural, and кактусы has its own entry saying, "Nominative plural of кактус." — V-ball 16:20, 2 March 2007 (UTC)
Since I am not usually very interested in translations, I suppose my view -- about 100 lines up -- should not be given undue weight. But the general issue of how to deal with "incomplete" entries for inflections, that is, skeleton entries or entries which are less complete than what we normally call a full entry, is of wider interest. Perhaps the standard "definition" of an inflection should be along the lines of Plural of foob, where further information can be found. --Enginear 18:35, 2 March 2007 (UTC)
Take a look at friend and friends. Please tell me that we are not going to put the translation for the TV show under the singular entry! Also, note the difference between translations in Mandarin and Min Nan. Which information should be left in, and what should go into the entries for the Chinese words?

A-cai 18:42, 2 March 2007 (UTC)

Translations of the TV show belong on the capitalized Friends page and are a separate issue altogether. Details of how 朋友 is inflected in different Chinese languages belong on the 朋友 page. Widsith 18:47, 2 March 2007 (UTC)
Aha! But did you notice what happens when you click on Friends? It redirects you to friends! I'm not sure any more what Wiktionary policy is for that, although what you say makes sense.

A-cai 19:05, 2 March 2007 (UTC)

Policy is that the proper noun should be at Friends [though I can't remember if Proper noun is still used, or whether we now call them all Nouns] and the noun at friends. I've now removed the redirect and split the entry. --Enginear 20:25, 2 March 2007 (UTC)

Whoa, cool down guys. There is a very good reason why we do not give translations (or synoyms, etc) for inflected forms. It is that words often have multiple meanings and the translations usually do not apply to all of them.

Taking "friend" as an example, that currently has seven meanings. There are, correspondingly, seven translation tables. Examining these shows that translations differ with sense. For example, French has "ami(e)" for the first sense, "petit ami(e)", "copain"/"copine" for second, and so on. "Friends" however gives "amis", suggesting that this is a suitable translation for all senses of the word. This is utterly false.

A user can easily find the translation they require (and, more importantly, the correct translation) by following the link to the uninflected form, and then clicking on the link in the translation for the sense they require. A well-formatted entry will include the plural (and any other inflected forms) there.

In the case of plurals that have special meanings, such as "marines" or then, of course, translations can be given for these. Otherwise, entries for English plurals and other inflections of English words must not include translations.

Now, I understand Connel's point about removing useful information. The thing is, this information is in the wrong place and is unhelpful or misleading as it stands. The appropriate action to take is to move these translations to pages for the foreign-language singular forms (and plural forms, if required, especially if these are irregular) and then to delete them from "friends". Anyone willing to help me with this?

By the way, Friends should be deleted. It is encyclopedic. — Paul G 11:33, 3 March 2007 (UTC)

I'm for collecting information at one place in cases where it isn't controversial. There are a number of things besides translations that need not appear on "stub" pages, that is, pages where the only definitions are those that refer to other pages. These types of entries include alternative spellings and inflections, but not synonyms such as Allen wrench and Allen key.

There is no rule of thumb, I think, so much as an outcome of process. It is never acceptable to delete correct information that does not violate an accepted standard. Especially when in question, such as with word histories, a deletion should be noted, of course. Over deletion, it is much preferable to consolidate information such as synonyms and etymology (e.g. a full etymology becomes root plus inflection). By consolidation I do not mean "move" so much as "merge", although the original example of translations for plurals is minor and acceptable as per Paul G. Consolidating differing information should be allowed even if it leaves a stub page, provided the information does not contradict (as with color/colour, program/programme) and there is no controversy over the "correct" form, i.e. no clear principal spelling of e.g. irregular plurals. On the latter point, changing a principal page into a stub page without consolidation as a clean-up measure is not allowed. Expanding a stub page into a full page is already permissible if the contributor has good reason to believe that it should be a principal page. DAVilla 20:32, 7 March 2007 (UTC)

Widsith, I apologize for calling your idea insane. To clarify what I meant, the removal of beneficial information is much worse than standardizing the Translations section layout (in this manner.) Please note that BD2412's hypothetic user (in his example somewhere above) usually is not going to be a human being; rather, it will be that human being's software performing the lookup.
The notion that all software out there properly knows how to truncate a word form to a lemma (I assert) is insane - most software can't even tell (accurately) what language a given word is in. Looking at the Wikipedia pages on Corpora linguistics, I'm stunned that my trivial frequency analysis of 1.6 billion words from Project Gutenberg wildly overshadows the ANC.
My expectation is that there will be an order of magnitude more software components written over the years. Some will get better, but all new ones are very likely to start from the same starting point. If Wiktionary provides information directly for all forms of a word, the programatic mistakes are not only eliminated (before they happen) but subtle mistakes are avoided entirely. This comes about by human contributors here verifying the word forms individually, and noting exceptions accordingly.
My point of view (admittedly, my own) is that first hits to Wiktionary pages should contain as much information as possible. Every web-based extension of Wiktionary I've seen so far has tremendous difficulty linking back to anything other then the "direct hit." As those components become more elaborate, the navigation to what you call "the correct" lemma form will become more difficult, if not impossible. (E.g. try browsing Wiktionary on your cell phone - GOOD LUCK!)
With all that said, from my perspective, it is "insane" to remove translations from plural entries, especially as a matter of procedure. (Again, I think I'm using "insane" as an intensifier, not as an insult...that may be why you were offended by my wording initially?)
--Connel MacKenzie 14:14, 26 March 2007 (UTC)
I think that if a plural entry definition were to say "Plural of foob, where further information can be found." then many of the objections to adding incomplete sets of translations, etc, would be countered. --Enginear 19:58, 26 March 2007 (UTC)

Trademark names

We need a policy for trademarks. If we have one, I can't find it (and it should be at Wiktionary:Trademarks). I think that widely know trademark names should be included if they can be used metaphorically (e.g. Cadillac), descriptively (Mark wears a Rolex and drives a Lexus, whereas Joe wears a Timex and drives a Honda), if the mark is approaching genericism (Kleenex, Xerox), or if the mark is a specific use of word that would otherwise be in the dictionary anyway (Bounty for paper towels, Crest for toothpaste, Janus for mutual funds). I'm putting together a listing of the most widely known brand names at User:BD2412/brand names.

Also, I've noticed that from time to time folks need to look up trademark registrations here, so I'm going to provide some quick tips on how to do this.

1. Go to the United States Patent and Trademark Office trademark main page.
2. Near the top of the right-hand column, click [Search].
3. I recommend the Free Form Search. Type in the word you're looking for followed by [comb] and you'll get a combination of searches for the word alone, with punctuation, or as part of a phrase. Also, it often helps to add "and live[ld]" to a search, as this will limit it to live marks and filter out marks no longer registered.

Cheers! bd2412 T 23:18, 2 March 2007 (UTC)

I agree with most of this, but disagree with your view that the Bounty and Crest and Janus trademarks should be included just because bounty and crest are words and Janus is a dictionary-worthy proper noun. (Maybe they should be included anyway, but if we develop criteria for inclusion of trademarks, I think they should apply regardless of whether the trademark represents its own entry, or simply an additional sense in an existing entry.) —RuakhTALK 03:40, 3 March 2007 (UTC)
I think that trademarks that incorporate a common word for an uncommon purpose (e.g. Apple) merit a one-line entry because the word is already in the dictionary, and the trademark definition is a legitimate alternative definition of a word for which we are trying to give complete information. That said, I think such instances should be limited to trademarks that can be demonstrated by reference to a source such as a trade journal to be very widely known and very strong. bd2412 T 19:25, 3 March 2007 (UTC)
I strongly disagree with this (sorry bd2412, I just seem to keep picking fights with you, nothing personal :-)). Certainly we should have entries for bandaid (or is it band-aid?) and xerox, because they're used in an idiomatic sense, not necessarily related to the brand itself, perhaps rolex as well. But, my opinion is that they should not merit entries until they can be put in non-capitals, as xerox and bandaid can. Otherwise this opens us up to including every brand name in existence, which is not dictionary material. Unless someone can show exactly where else the line should be drawn, I say we draw it here. Atelaes 04:50, 4 March 2007 (UTC)
I think there are resources that would make it fairly easy to draw sensible lines. I've been tossing some ideas back and forth in my mind and would say, for example, that we can easily agree to exclude company names that are just collections of surnames (e.g. Morgan Stanley Dean Whitter, Bristol-Myers Squibb, and Ethan Allen). I do, however, think that we should make every effort to list all brand names for medications (Tylenol, Dexatrim, Motrin, Prozac) because I can see a particular utility to such listings, in part because the drug makers tend to come up with fanciful words, and in part because most such drugs can be described by reference to their key ingredient (i.e. acetaminophen, ibuprofen). I'd also be rather inclined to include fanciful car names (Integra, Montero, Prius). There are hardly so many that it would cause a fuss. With respect to other corporate or brand names, I'd set a higher criteria than the CFI to show that the brand name is used in some descriptive or attributive sense, but that should be easy for truly mega-brands such as Coke and Pepsi, McDonalds, Microsoft, etc. bd2412 T 03:30, 6 March 2007 (UTC)
That might be useful, but the utility argument is a very well documented logical fallacy. We're aren't aiming to be useful, we're aiming to be a dictionary (which is useful, of course, but not just). Usefulness includes TV listings, atlases, currency converters, whatever you can think of; there are lots of useful things. I fail to see why we would give any class of words immunity from CFI, and I especially fail to see why, if we did, we would want them to be medications and car models. Those aren't within the urview of a dictionary, but are more appropriate at an encyclopedia. Trademarked names or brand names still need to pass attestation with independent use (and not just mention). I would suggest a vote to make the point clear, but I'm already satisfied with CFI's wording: "To be included, the use of a trademark or company name other than its use as a trademark (i.e., a use as a common word) has to be attested." Dmcdevit 06:49, 25 March 2007 (UTC)

Use of ® and ™ in entries

Greetings! As a professional intellectual property attorney, I can assure you that there is no requirement whatsoever that we should use the ® and ™ symbols adjacent to the headword of names that are trademarks (registered or not). First, we're an educational organization making a purely nominative use of the terms (i.e. we're not selling hamburgers, so we don't even have to acknowledge that McDonald's is a trademark). Second, even so, we do indicate in the entry and often in a usage note that the word is a trademark or is a registered mark. Third, the ® or ™ symbol is not a part of the actual word. Finally, trademark registrations are neither eternal nor certain. Registrations lapse, get cancelled, or become abandoned all the time (I have personally seen some very big companies errantly allow the lapse of registrations for some very famous trademarks).

Frequently multiple parties claim ownership of a particular mark and spend years litigating who has the right to use the mark, or whether both parties can use the mark for different products (e.g. Ritz crackers and Ritz hotels); parties may have rights to a mark in limited geographic areas; and parties often claim to own generic or descriptive marks that can not actually be "owned" by anyone. In short, even information on the best known marks can become obsolete, and there are few people here with the technical background to determine the status of a mark, particularly one that is contested.

In short, we should get rid of those symbols. Cheers! bd2412 T 06:44, 9 March 2007 (UTC)

Any idea why all other dictionaries seem to use the marks, then? I don't understand what is bad about including the mark on a term that has had technical problems with renewals. OTOH, retaining the marks, alerts our readers that they probably should use the symbol as well. To me, it seems that removing the marks would be inconsistent and unhelpful. --Connel MacKenzie 10:47, 11 March 2007 (UTC)
It seems the cleanest way of marking trademarks and brand names to me. The symbols are universal and concise. --EncycloPetey 03:46, 12 March 2007 (UTC)
To Connel, I'm looking at the Webster's Collegiate Dictionary, Tenth Edition (which is the one I have handy at the moment) and it has a listing for Xerox without the symbol, but with trademark at the beginning of the definition line. I do not believe we need to 'alert our readers' in the manner that you suggest, as there is no reason for anyone other than the owner of the mark to actually use such a symbol. A Google book search for Coca Cola, Absolut, Tylenol, shows that such symbols are absent not only in works of fiction, but even in non-fiction works examining these specific industries.
To EncycloPetey, I think the cleanest way of marking trademarks and brand names is the same way we mark medical terms, slang, vulgarities, sports terms, etc., with a notation in the definition line. This is particularly evident where we have an entries such as Cadillac, Hartford, Lincoln, Mercedes, Nike, Quaker, and Saturn each of which is a famous trademark, but each of which has additional meanings for which a capitalized entry is necessary (place names, given names, surname, mythological figures, etc.). bd2412 T 16:46, 12 March 2007 (UTC)

I agree with BD; I have always found it a little weird that we include the symbols, when they virtually never appear in actual usage. We give the impression that such symbols have to be used with the word, which is not the case. Widsith 17:11, 12 March 2007 (UTC)

  • Sorry guys, but these symbols while concise are certainly not universal. In Spanish neither are used, instead MR marca registrada takes the place of both.
  • As for the symbols being used in the headword section but not in actual use, this is just silly. We also put m or f in the headword next to nouns but these are also never seen in actual use. — Hippietrail 14:34, 13 March 2007 (UTC)
I agree with bd2412, having a tag on the definition and not in the headword seems like a good solution. Regarding Hippietrail's comment: so, if we did decide to continue using TM in headwords, do we restrict this to ==English== parts of speech, and use MR in the ==Spanish== headers? -- Beobach972 15:44, 14 March 2007 (UTC)
Hippietrail, m and f indicate how the word must be used in speech and writing. ® and ™ do no such thing; rather they indicate how one party, the owner of the trademark, would like you to use the word, but not any way in which you are required to use the mark. And how about the many words (including some listed above) that have one sense that is a trademark and others that are generic? Also, how does Spanish account for unregistered marks? A word or phrase used in a Spanish-speaking country that is used as a trademark but is not registered gets no recognition, virtually no protection whatsoever. bd2412 T 16:17, 14 March 2007 (UTC)
bd2412, your original argument made no mention of "how a word must be used" but merely "the symbol is not a part of the actual word". Since you are content to change your argument as we go would you not concede that some writers do care whether a term might be a trademark or not. It could be a matter of style or policy in the departments of certain companies. I think it is never an error to include too much information on any word. Those who do not care about certain parts of the information can ignore them. If we erase information then people who do care have no way to create them however. Now the idea of putting the warning that a term may be a trademark into the sense sounds like one worth exploring.
Sadly I am not aware how the various Spanish-speaking countries account for unregistered trademarks. My experience is mostly in Mexico where I am only a visitor. — Hippietrail 17:01, 14 March 2007 (UTC)
Well let me point out that the way we are using the symbols adds to the appearance that they are, in fact, a part of the word. We are putting them in boldface right next to the word, with no space between. Our m and f and pl and so forth are in italics and set apart by a space. Now, if we were to do that with the ® and symbols, it would be more appropriate, but in my opinion it would look horrible. It also seems to me that the qualifications we use in the headword line are fairly stable. We expect a masculine noun to still be a masculine noun in a hundred years. However, a word that is vulgar today may lose that connotation in some years; a term that is slang now may become so widely used as to be deemed formal; and a trademark may become generic, or may simply cease to be used.
If we're going to indicate trademark status, we should do it on the definition line for the trademark definition. After all, what do we do with Ace, a common nickname for fighter pilots, but also a famous trademark for two unrelated companies (selling hardware and bandages, respectively). What do we do with Dove? In fact, if you go to the website of the United States Patent and Trademark Office (http://www.uspto.gov) you'll find that thousands of words in the English language are marks, including the names of most any figure from mythology or ancient history, most city, county, state, and other place names, most given names and surnames - should we put a copyright symbol next to all of them? There are nine current registrations for "Dan", should our entry read Dan®? Should we note marks that are registered trademarks for some purposes and unregistered trademarks for other purposes with both symbols, Dan®™? What if the owner of one of the 40 or so registered "Scott" marks should decide to sue us for using the ® with some marks but failing to use the ® with Scott? Or Venus? Or Rio? Or Smith? Or Taurus? You must know that if we include marks at all, we can not do so discriminately.
I have no need to change my argument, but I'll surely raise additional arguments, all of which indicate that we have no business playing with trademark symbols, especially when our use is bound to be inconsistent unless someone is willing to check every word in the dictionary against the constantly changing USPTO database every few months. bd2412 T 07:35, 15 March 2007 (UTC)
It's worse than that. Many, probably most, of us are not based in the US. There seems to be general agreement above that there is no legal requirement to use the symbols in the dictionary, so the location of the servers is irrelevant. It is where we use the words that counts. Many marks are registered in one or a few countries only. It would be necessary to check all countries' patent offices, and perhaps note in the entries where the marks were registered and where not. That is not the job of a dictionary. I agree with you that we should not use the symbols, but where known, we should gloss the entry to say it is used as a trade mark. --Enginear 12:29, 15 March 2007 (UTC)
Whatever we decide to do, one thing I think we really ought to have is a disclaimer, as some paper dictionaries do, that the inclusion or absence of a trademark symbol does not affect the trademark status of the word so indicated or not indicated (in other words, if you take our word for it but we've got it wrong and you get into trouble because of it, don't sue us). — Paul G 10:21, 18 March 2007 (UTC)
  • Now that I'm back in Mexico again I've kept an eye out and I have actually seen ® (but not ™) used here. I don't know if it's considered the same or different to MR. Also in at least one of my Spanish-English dictionaries, ® is used in both the English and Spanish sections for some words. Niether ™ nor MR is used in either section. — Hippietrail 18:03, 19 March 2007 (UTC)

I CALL FOR A VOTE (can someone put that together?) bd2412 T 05:05, 22 March 2007 (UTC)

Ok, I have started a vote at Wiktionary:Votes/pl-2007-02/Trademark designations. Cheers! bd2412 T 03:39, 23 March 2007 (UTC)

A proposed Vote concerning Placenames

A recently started a vote concerning the criteria for inclusion for placenames. See Wiktionary:Votes/pl-2007-02/Placenames This was not discussed beforehand and has degenerated somewhat. I propose that it be abandoned, and replaced with a simpler vote with fewer, and less specific options as follows.

  1. The criteria for inclusion for placenames should be exactly the same as for all other words - broadly attested, used rather than mentioned.
  2. One or more addtitional criteria should be applied to placenames - details to be discussed later if this option is caried.

Please make your thoughts known here and provide a better first vote if you can think of one. SemperBlotto 09:31, 13 March 2007 (UTC)

I agree that the vote should be abandoned, it's quite a mess. A (rather lengthy, I fear) discussion is needed before a vote of this nature could be restarted. I think that the current criteria are good, but I don't believe it's a simple matter of picking one or the other. Past that I have nothing useful to offer, sorry. Atelaes 10:07, 13 March 2007 (UTC)
I agree that the vote should be abandoned, it seems we didn't plan it out very well. -- Beobach972 20:56, 13 March 2007 (UTC)
  • I apologize for inadvertently spurring it into existence (in a discussion further up, on this page.) --Connel MacKenzie 06:02, 14 March 2007 (UTC)
It's alright, you're right that we need a vote; we just need to clearly plot out all the options we'll have on it. -- Beobach972 15:36, 14 March 2007 (UTC)

The criterion I like to apply to proper names (and with little difficulty to all words really) is if it can be understood out of context. For place names this is a little more lenient than I had previously been judging them. In Athens, Georgia the word "Athens" means the city in Georgia, so if this can be verified with three independent citations, e.g. newspapers or what have you, out of context meaning "Athens" (as per the title of the page) instead of "Athens, Georgia", then the place of Athens, Georgia is one sense of Athens that is understood regionally. Outside of that Athens could only be assumed to be the city in Greece or taken just as a general, unspecified place name. DAVilla 18:54, 23 March 2007 (UTC)

I don't agree with (or maybe don't understand) that logic. First, if we assume Athens, Georgia did not generate 3 cites for Athens, surely the term Athens, Georgia is understood out of context and can be attested as such. For that matter, what place can't be understood out of context regionally considering placenames belong to regions? The problem is that being understood out of context is not a criterion that established appropriateness for the dictionary. If I said the single word "blast" even a cytologist is not likely, in any region, to assume first that I mean an immature cell. Conversely, the cytologist in the US would understand Millard Fillmore, or even Mallard Fillmore, out-of-context. In general, while I wouldn't say that all proper nouns don't belong, I would say that while not all proper nouns should be removed (I just added Sahabi, for instance), since English, Taoism, and Enlightenment have definable meanings with or without context, Skagway, Transamerica Pyramid, and Lü Buwei can only be described by pointing at the physical objects they reference. They don't belong, and neither, I think, does Athens, Georgia, unless its an attributive or generic term. Dmcdevit 23:03, 26 March 2007 (UTC)
I don't think we should have an entry on Athens, Georgia, but we should most definitely have entries on Athens and Georgia! bd2412 T 23:09, 26 March 2007 (UTC)

Easter Competition 2007

This is an announcement to open the Easter Competition 2007. As with previous contests, the prize consists mainly of a warm woolly feeling inside, but the primary object is playful competition among Wiktionarians. --EncycloPetey 02:34, 15 March 2007 (UTC)

Results are now posted. --EncycloPetey 17:16, 10 April 2007 (UTC)

Interpretation of CFI

Someone (dmh) else said earlier in this forum that the CFI are too vague. I tend to agree.

A user created an entry for Friends, defining it as the US TV show. I nominated it for deletion, saying it was encyclopedic (that is, that "Friends" belongs in an encyclopedia). (Note that the entry currently does not contain this definition.) This provoked a heated discussion that comes down, in my understanding of the issue, to the interpretation of what the CFI say about inclusion of names, namely "A name should be included if it is used attributively, with a widely understood meaning".

It is being argued that, as "Friends" can be used attributively (as in "Jennifer Aniston, the former Friends star"), it should be in. Taking CFI literally, this means "Friends" is allowed in. My feeling, based on my experience of what does and does not get into Wiktionary, is that this is not the intention of that part of the CFI.

If the CFI are to be interpreted literally on names, then TV shows, movies, place names (no matter how tiny the places they refer to) and many other names are allowed in provided they have an attributive use ("Friends star David Schwimmer", "Gladiator star Russell Crowe", "Nowhereville resident Joe Bloggs").

Well, yes. If there are three such uses spanning a year in permanently recorded media. bd2412 T 04:30, 16 April 2007 (UTC)

However, if the CFI are not to be interpreted that way, and I don't think they are, then they need to be tightened up to state more precisely what can be included and what should not be. — Paul G 07:13, 15 March 2007 (UTC)

  • See the proposed vote two sections up. Should this vote be exapnded to include ALL proper nouns? SemperBlotto 08:15, 15 March 2007 (UTC)
I think that might be a good idea. Proper nouns CFI seems to be a hot topic as of late. Atelaes 08:19, 15 March 2007 (UTC)
I agree that the issue at hand is whether or not to include proper nouns at all. I also agree that WT:CFI is rather vague in this area. My vote would be to include proper nouns (place names, people, companies, TV shows, movies etc) as long as the definitions are kept short, and the proper noun is widely used in spoken or written English. Some contributors have expressed reservations about this idea, because they fear that we might not know where to draw the line (i.e. include Rocky, but not Rocky II?). My take is that Wiktionary is in its infancy, and it is probably too early to be overly conservative about what to leave out. We are breaking new ground with Wiktionary, and this is definitely an area where we have a potential to surpass our traditional expectations of a dictionary.
For example, did you know that there are three different Chinese translations for the 2000 movie Gladiator (PRC: 角斗士; Hong Kong: 帝國驕雄; Taiwan: 神鬼戰士)? These three translations are, of course, not to be confused with the Chinese translations of the 1992 movie Gladiator (PRC: 终极斗士; Taiwan: 神鬼拳王)! How are these terms pronounced, what are their etymologies, which one (if any) is the literal colloquial term for gladiator in Chinese? Wikipedia does a poor job of answering such questions, but Wiktionary is ideally suited to answer them all. -- A-cai 09:11, 15 March 2007 (UTC)
I agree that we should take a liberal course towards including proper nouns, with some caveats.
  1. I think that we should include place names with abandon - from the country level down to the town, borrough, or hamlet. In order to prevent ourselves from going crazy with 50 towns named Springfield, I propose a rule that if a place name is used for more than 5 places, then it gets a single line indicating that it is a commonly used place name (unless one of those places is a world city like Paris, or a capital like, well, Springfield (Illinois, that is).
  2. I think we should include a line for any brand name, movie name, TV show title, band name, or song title for which we should otherwise have an existing Wiktionary entry (Friends, Sneakers, Pledge, Nirvana, Joy, etc.)
  3. I mentioned somewhere above that I think we should include the brand names of medications and cars (remember, we're talking about one-line entries here).
  4. With respect to people, I think we have to act on the presumption that any combination of a first and last name (Joe Smith, Marcia Clark, Reginald Denny) is simply non-idiomatic unless it means something other than merely an identification of a human being (e.g. Shirley Temple, Benedict Arnold) bd2412 T 11:10, 15 March 2007 (UTC)
That sounds generally O.K. to me, except for criterion #3#2. I don't think "is a capitalized version of a common noun" should be a criterion for including a proper noun, any more than "is a protologistic use of an existing word" should be a criterion for a normal word. —RuakhTALK 16:12, 15 March 2007 (UTC) and 07:20, 16 March 2007 (UTC)
Good work - this seems reasonable. I particularly like number 2, which would cover the current "Friends" debate; I have already suggested in the discussion on RFD that we should cross-refer to Wikipedia in this case, and I think we should do this for all words in category number 2 above.
Ruakh, did you mean "#2" when you said "#3"? As I understand it, the idea here is to acknowledge the existence of a capitalised form of a common noun, but not to give it any treatment here unless it falls into any of the other categories that we will have articles for. So, for example, bath should have a "See also" at the top (indicating another entry in Wiktionary) that links to Bath, the place in England; cavalier and mini would do the same (brands of cars); but friends and nirvana would include links to the Wikipedia articles on the TV show and band name respectively under the "See also" section towards the bottom of the article. We would do this not because these proper nouns deserve special treatment, but rather because users might expect to read about them here, and would be directed to Wikipedia; and because it will also go some way towards preventing contributors from adding definitions for the proper noun to the entry for the common noun. Users searching for proper nouns that aren't capitalised forms of common nouns in Wiktionary will get the "Perhaps there is an article on X in Wikipedia" link and can then find what they want in Wikipedia.
Articles on particular people, such as "Shirley Temple", are clearly encyclopedia material and would get no coverage in Witionary at all, that is, not even a cross-reference or link. Users searching for "Shirley Temple" will get the page that says "Perhaps there is an article on Shirley Temple in Wikipedia". Vanity articles created by users about themselves will of course continue to be deleted.
I think this is pretty much what we do already, even if it is not explicitly stated in CFI. What is therefore needed is a form of words for this that we can put into CFI to make it much more clear exactly what is to be included and what is not. — Paul G 07:00, 16 March 2007 (UTC)
Paul, I don't think you meant to use Shirley Temple as the example, it is the counter-example: it is the name of a drink, which is why we have the wikt entry. And given person who does is not a drink or synonomous with traitor would just be in wikipedia. This all seems pretty good to me; but I don't think that we should have "Gladiator" as a film title because it is named variously in other languages; that should be in the Wikipedia articles (English and others) on the film; in particular there should be interwiki links. If WP is weak in this area, well, go improve it! (We don't have the invented term "sorcerer's stone" because J. K. Rowling thought—probably correctly—that the title had to be dumbed down for the American audience who wouldn't know what a philosopher's stone was ;-) If the Chinese names for Gladiator qualify in their own right, sure they should be included as ordinary words, with the same see-also reference(s) to w:zh etc. Robert Ullmann 09:53, 16 March 2007 (UTC)
Ah, my mistake. Yes, names of cocktails are certainly to be included, and many of these are named after people (famous or otherwise); and yes, we would have no entry for "Gladiator". People wanting translations of the name of the film should look at w:Gladiator and then follow the link for the language they are after, or request it if it is not there. The Wiktionary article "gladiator" would have an interwiki link in the "See also" that just says "Gladiator in Wikipedia" or something like that, without specifying that this is a film. (Who are we to say what article Wikipedia has or will have in the future at "w:Gladiator"? Currently, it's a page for the common noun, with a link at the top to the disambiguation page which contains a whole load of links, including not one, but three films with that title. So there you go — a Wiktionary article on "the" film Gladiator would be incorrect.) You make a good point about the treatment of languages that don't have capitalisation, such as Chinese.
Once there is agreement, I'll draft some text summarising what we are discussing here and add it to CFI. Or should it be treated as draft policy first? — Paul G 10:36, 16 March 2007 (UTC)

[Back to margin.] This discussion was only started yesterday, so I suggest that either it should be left a few days for other views to be added, or it should be written up as a draft policy.

However, subject to fine tuning, I think these are excellent criteria. The page that says "Perhaps there is an article on Marcia Clark in Wikipedia" needs to be made a bit more friendly (because it doesn't actually say that at present), and IMHO, our "See also" heading, when placed at the top for cases like this, should read "For additional senses see". Pop stars, et al, who use a single name, should be treated the same as single word film titles.

Incidentally, translations of place names can be covered by checking WP, in a similar fashion to film names, but we still need them here for their etymological value.

So to consider one example, say someone wants to find out what the name Sigourney Weaver means, and for some reason comes to wikt to find out, we seem to be aiming for the following:

  • Searcher finds no entry on wikt but is given page saying "Perhaps there is an article on Sigourney Weaver in Wikipedia"
  • Searcher finds there is such an article; in the article finds that Weaver is her family name, and that she chose Sigourney as her stage name to match a character in The Great Gatsby. (The article should perhaps say that this was etymologically a singularly appropriate choice, but it doesn't.)
  • Searcher therefore decides to check the individual words; Weaver's disambiguation page should direct the searcher to wikt for the meaning and etymology of the surname Weaver, and we should have an entry saying that, as a surname, it most commonly means descended from someone who made a living from weaving. Neither of these are in place yet, but since they are fairly obvious, that is not too important.
  • Searcher, having got the message that wikt deals with surnames, decides to try looking up Sigourney. At the moment there is nothing there but let's imagine...there is an entry saying
    1. Surname adopted in US by certain Huguenot families previously called Sigournay, named after their town of origin, Igournay, France
    2. Town in US, named after Lydia Huntley Sigourney
      Those seem uncontentious, and investigation on WP could find that the Hugenots were forced out of France by religious persecution, and brought their skills as weavers with them. (Neither WP nor wikt have entries on Igournay at present.)
      But should we also have:
    3. Stage forename chosen by Susan Alexandra Weaver, after a character Mrs. Sigourney Howard in The Great Gatsby. Mrs. Sigourney Howard was herself named after Father Sigourney, a tutor of F Scott Fitzgerald.
      Or should we just have
    4. For additional usage see w:Sigourney Weaver.
      and leave the searcher to search w:The Great Gatsby to find out (actually, it's not yet mentioned there, but then nor is it yet in wikt).
      or should we not mention it at all? My preference is for the second option. --Enginear 14:09, 16 March 2007 (UTC)
You make some good points. Yes, we should certainly carry on discussing this for a while until we have clarified what changes we are going to make. For now, I'll put a note in CFI that the policy is under review, pointing to this discussion.
Regarding given names and surnames, we already have a policy on this - given names go in; surnames go in if they are etymologically interesting. So "Sigourney" should probably go in, as, even if Ms Weaver was the first to adopt it as a first name, no doubt there are lots of baby girls who have been named after her. "Weaver" would also go in because of the etymological interest, along with Archer, Smith, Taylor and another surnames derived from occupations.
I prefer the "See also" option for the info on Sigourney Weaver. I lke "For additional usage", by the way. I would prefer the link to look like this: [[w:Sigourney Weaver|the Wiktionary article on Sigourney Weaver]], which makes it clearer what the user will get on following that link.
I'm intrigued... what was singularly appropriate about the choice of "Sigourney"? Perhaps you could update the Wikipedia article if this is of interest. — Paul G 12:29, 17 March 2007 (UTC)
Not really my bag to add it, but see [5]. --Enginear 16:23, 17 March 2007 (UTC)
I don't see how a surname can fail to be etymologically interesting. Virtually all surnames have some basis of derivation, whether by place, occupation, patronymic, even assignment as a form of derision. But isn't this discussion mostly about place names, brand names, and titles of media? bd2412 T 15:10, 22 March 2007 (UTC)
  • Are we getting anywhere with this discussion? Perhaps we should set out some specific proposals, e.g. my earlier proposal that brand names of cars and drugs should be included based on their respective demonstrably large populations of interest? bd2412 T 03:46, 7 May 2007 (UTC)
    • <incredulous>As dictionary entries?</incredulous> --Connel MacKenzie 05:17, 7 May 2007 (UTC)
      • Yes, I think we can do that. While names of cars and drugs are often brand names, in general usage, these terms are not used as brand names. Compare "I own two Toyotas", in which "Toyotas" refers to vehicles, with "The trademark Toyota is owned by (whoever)", in which "Toyota" refers to the trademark itself, which does not have a plural (IANAL, but I think trademarks can't be said to have plurals [or singulars, if already plural in form]). Hence trademarks can be said to have a usage that extends beyond the trademark itself. Likewise with drug names: "to take Viagra"; "to buy some aspirin" ("Aspirin" was originally a trademark, but no decent dictionary would exclude it).
      • And yes, can we please have some proposals drawn up that we can get some agreement on. — Paul G 09:52, 16 May 2007 (UTC)
  • Paul, that simply isn't true. You cannot say "Toyotas" to refer to vehicles; you can only say "Toyotas" to refer to a particular type. "What kind of Toyota do you drive?" is not the same as "What kind of car do you drive?". How about: "We sat on the roof, watching the Toyotas go by." <-- this is not coherent, unless you were sitting next-door to a Toyota factory. The "Toyota" example you gave is just normal use of a trademark!
  • Likewise, a director of a porn shoot won't ask an actor what type of Viagra he has taken; he'll only ask if the required pill has been taken. If a spammer says "G3NERIC V1AGRA" it is still a direct reference to the trademark! Asprin (the drug originally extracted from willow tree bark by the native American Indians) refers to a much more generic item, which is why that original "trademark" was deemed to be invalid.
  • The more I hear arguments for inclusion of trademarks, the more I am convinced they are inappropriate in a dictionary. --Connel MacKenzie 15:33, 16 May 2007 (UTC)

Categories as Wantedpages but not Wantedcategories

I suppose there's a good reason why many of the top Wantedpages are categories like Category:xx:Slang or Category:en:Slang, and yet none of these categories seem to be in the list of Wantedcategories? A quick search didn't turn up any revelant discussion about this. - dcljr 17:11, 18 March 2007 (UTC)

The reason seems to be that those categories aren't actually included — {{#ifexist:…}}s are used in the relevant templates to prevent non-existent categories from being included — so the pages aren't actually added to those categories. IMHO they shouldn't show up at Special:Wantedpages, either, but this seems to be part of a more general problem; for example, when you edit a page that includes a template in a non-active part of an {{#if:…}}, the list of templates-being-used at the bottom of the page does list that template. —RuakhTALK 18:28, 18 March 2007 (UTC)
The problems with a lot of them is an interaction between a MW bug and the nav template. Connel has fixed this one.
The problems with the ones for xx: and en:, such as xx:computing are from the context templates, which pass {{{lang}}} as a parameter even when lang is not defined; I pointed out to DAvilla that the calls on context/label should use lang={{{lang|}}}, but she insisted this was caught lower down. As you can see, it isn't, and other code uses xx and en and it gets picked up and used as a reference. One oddity about template syntax is that the variable namespace scoping is not at all what you think it is in some cases. May also be related to the same MW bug. (These templates are way more complicated than need be, not at all sure why.) Robert Ullmann 18:36, 18 March 2007 (UTC)
What a mess. This was working, at least for a little while. 10 cents to the person that can find the MW change that caused it! --Connel MacKenzie 16:19, 23 March 2007 (UTC)

Context labelling of inflected forms of Regionalisms

Hello all, I had a small conversation EncycloPetey (talkcontribs) about de-tagging inflected forms of Regionalisms. Specifically the Geordie ones in category:Geordie. I felt that the category had become unecessarily cluttered with plurals and verb forms therefore I set about de-tagging the ones where the infinitives/non-inflected forms are marked already, with exception given to inflected forms of non-dialect words that are specific only to that region.

Anyway, we thought it might be polite to ask others first. But I do feel that tagging ALL inflected forms adds clutter.--Williamsayers79 00:01, 19 March 2007 (UTC)

Why not have a Category:Geordie plurals, and so on, like with English plurals? —RuakhTALK 00:34, 19 March 2007 (UTC)
Its a possibility but I'm not sure everyone will like it since Geordie is a dialect (albeit very substantial) of English and does not follow the standard language + POS naming convention. I remember my very first entry radgie nearly got killed by Connel because I mistakenly had Geodie as the language header :-> --Williamsayers79 00:57, 19 March 2007 (UTC)
I agree with your view that only the singular form should be tagged, which comes from my general preference of having lemma pages as the sole information containing entries, with inflected forms simply as soft redirects (with certain exceptions, of course). However, I have received flack for this view before, so take it for what you will. Ruakh's suggestion would make an excellent compromise if people are adamant on having the the inflected forms categorized (although I must admit I think it somewhat pointless myself). In any case, I strongly feel that putting lemma forms and non-lemma forms in the same category is highly effective at making those categories useless. Atelaes 00:53, 19 March 2007 (UTC)
Yes I think your right on it there.--Williamsayers79 00:57, 19 March 2007 (UTC)
  • I'm not sure what you mean be "tagging" in this context. I strongly believe there should be a label on each sense which is regional. On the subject of putting articles in Categories I have no strong opinion other than agreeing than flooding categories with inflected forms is a bad idea. If the problem is that some template is being used that adds a regional label and a category I'd say don't use the template on inflected forms but still use a regional label. — Hippietrail 17:50, 19 March 2007 (UTC)

Against demolishing the present Policy structure

One CM is trying, through the RFD method, to blow away the present policy structure which allows the gradual development of Policies. He is basically proposing that all discussion again return to the rowdy environment of the Beer Parlour. My experience was that the Beer Parlour was generally a never ending discussion that never reached any conclusion, that never developed a single policy in Wiktionary. I totally oppose his idea.

What I also have to question is the basically sneaky way that is being used to try to achieve this. When the change which would be brought about by these policy method deletions is advocating that all policy discussion take place in the Beer Parlour, isn't it rather odd that there is actually no mention in the Beer Parlour of this planned complete change to the very idea of policy development. In fact there seems to be no (easily found) coherent explanation of what CM is really proposing to put in place of that which he wants to demolish.

I see no merit what so ever in what CM is proposing. It is purely destructive. It would leave no real way of developing policies. In the past, no policies were developed until this policy development structure was put in place and a concerted effort was made to develop policies beyond the discussion stage outside of the highly volvatile forum of the Beer Parlour. In future, CM would have us believe that somehow we could make a leap from Beer Parlour discussion (more like a shouting match half the time) to a fully fledged, approved policy, without any intervening stages. My view is that the change is a recipe for killing off any future policy development. It is not a positive move at all.

I have to observe that it seems the proposal is quite naive about the whole need for policies, and the ways policies are developed in real world organisations. The genius of an idea may come from a Beer Parlour discussion. But to make it all the way to an Official Policy it needs to go through varous development stages of serious consideration. It involves draft proposals, policy focus groups, white papers, proposals, and only at the end, a vote on the Official Policy. The Beer Parlour is not the place, nor the right tool, for this sort of back room work to take place to fully develop and refine a policy.--Richardb 10:58, 19 March 2007 (UTC)

What you refer to as the "present Policy structure" is entirely obsolete, it was not being used. All he is doing is marking the remnants for deletion.
The actual present "Policy structure" is to tag draft policy documents with the {{policy}} template with the draft= parameter to describe the status as it is drafted and discussed here and on its talk page, then conduct a WT:VOTE and remove the draft=. It then takes a vote to make changes. That is the whole Torah, the rest is commentary. Robert Ullmann 11:24, 19 March 2007 (UTC)
Far from it. He has not made real changes. seeWiktionary:Policies and guidelines which still has all the steps described, and is thus, according to CM's own tagging, now Policy. :-) Show me some evidence that CM ever took this Policy change to a vote. Show me any serious discussion of the change. In fact, since you say "The actual present "Policy structure" is to tag draft policy documents with the {{policy}} template", how about you show me where that policy is ? The whole point is that the discussion of any change, by either the past policy or the CM idea, should be on the talk page of Wiktionary:Policies and guidelines. But it is not. This has all the appearance of a totally unilateral move by CM, very illogical and very incomplete. Very destructive. Very much not properly discussed and voted on. Just a typical CM jackboot approach. He is a techo through and through. He should stick to techo stuff.
OK, we could use the {{policy}} template with the parameter. But, to me, this is a typical unnecessary complication much beloved of techos, hated by the ordinary user. And totally unnecessary.
Yopu say "What you refer to as the "present Policy structure" is entirely obsolete, it was not being used." what you mean is that things were not moving in it. Perhaps some of the policy ideas captuired were in fact stable, and could have been promoted. But CM has always been one of the last to do any real work on Policy. He much prefers to shout louder than anyone else and do things unilaterally. To be destructive of other peoples' work. (comment continues below)
Wait just a cotton-pickin' second! Are you talking about me or someone else? You other comments may have had some basis, but what is this all about? Unilateral changes? Obsolete/ignored/superseded policies were stable? What on earth? No, those proposed policies were in DIRECT conflict with existing practices, and represented (in each case) one person's POV of how things should be. --Connel MacKenzie 15:31, 19 March 2007 (UTC)
Funny, the proposed policies stood for a year or so, and the fact they are stable generally means acceptance. During the process I constantly publicised what I was doing, and, contrary to your statement, quite a few other people did contribute to the ideas. Richardb.
(comment continued from above) I stand by the current policy of Wiktionary:Policies and guidelines. And it says we have the various development steps. That policy should not be changed without discussion and a vote. So it still stands. So that is the clearly stated current policy, not CM's half-baked, half implemented, half-forgotten proposal.--Richardb 11:45, 19 March 2007 (UTC)
Your abuse of Connel is completely, totally out of line! It constitutes a personal attack. Stop right now. (comment continues below)
I contributed heaps to Wiktionary for two years or more, till I got totally p'd off by CM's too destructive approach. I'll desist fromo calling a spade a spade when he stops trampling all over other peoples rights.--Richardb 12:15, 19 March 2007 (UTC)
While some of that is true, that doesn't automatically give you the right to assume bad faith. Richardb and I have at times worked exactly towards the same goal, other times directly opposite (while both trying to achieve the same end result.) While I understand Richard's bitterness now, anyone who wasn't here for all of the fireworks probably does not understand the ins and outs of the situation. I'd like to request that no one fight for me, per se. Richard has some genuine complaints, along with some very major misconceptions about what has transpired, and those circumstances. --Connel MacKenzie 15:21, 19 March 2007 (UTC)
Thanks for the conciliatory note Connel. Now I've got your attention I'll try to be more polite. Richardb.
(comment continued from above) For everyone else's information, this has been discussed; a lot of it was on the IRC channel; Connel is in no way whatsoever acting unilaterally or improperly. There is a lot of cruft to clean up. Robert Ullmann 12:02, 19 March 2007 (UTC)
The IRC channel has absolutely no standing in deciding policies. You make no attempt to point to any evidence in the log, or the talk page, or any where, that this was ever discussed in writing. I can only assume that is because there is no written log of the discussion. So how does that fit into any sort of policy ? And are you going to point to the "policy" you purported to quote ? Or does the policy I pointed to have more standing. You are only demostrating your own complete ignorance of the written policies of Wiktionary, and your unfounded faith in CM's good faith. Point to some real evidence, or just back off with your useless platitudes.--Richardb 12:15, 19 March 2007 (UTC) (In no mood to be polite with people who put politeness above actually following the rules. Connel is clearly way outside the written rules. If you can find any rules he has followed in this area, please point them out to me. Otherwise just - shut up! The most cruft to clean up is the useless waffle about being polite.--Richardb 12:15, 19 March 2007 (UTC)
By the way, I checked the Beer Parlour Archives for January Wiktionary:Beer_parlour_archive/2007/January#63275614825 and found not a scrap of discussion about this issue, yet the RFD's were put up around Jan 27th. Did find a bit of discussion in the Sept06 Archives, but nothing to actually back up what Connel did.--Richardb 06:57, 20 March 2007 (UTC)
Is that a public expression of intent to wheel-war? That sounds like fun... --Connel MacKenzie 15:35, 19 March 2007 (UTC)
Wheel Wars is something that exclusionists indulge in, not Inclusionists such as me. I haven't touched a single one of your entries in regard to this deabte (But couldn't resist RFDing your apparently useless, unexplained Catgory on "Pages with a shortcut"). But I do ask you to rethink. Inclusionists such as me just try to bury you in verbosity :-) See the note "A way to go" below.--Richardb 06:18, 20 March 2007 (UTC)

Try reading w:Wikipedia:Off-wiki policy discussion for the Wikpedia view on the standing of IRC discussions when it comes to deciding policy. To quote their highlights :-

  • "Consensus" in the Wikipedia context means consensus amongst comments posted on Wikipedia. Off-site discussions do not contribute to "consensus".
  • IRC can also be used for the purpose of consensus-building. Quite simply, Serious policy discussion should be common on IRC. When good ideas or proposals result from such a discussion, participants should publicly post a summary of the idea on Wikipedia.

So where is this summary of CMs idea posted on Wikipedia ?--Richardb 13:00, 19 March 2007 (UTC)

At present, I don't recall even which section of w:WP:AN is is archived under. I'll try to dig up links this evening. --Connel MacKenzie 15:24, 19 March 2007 (UTC)
I'm confused here, why would Connel post comments about Wiktionary policy on Wikipedia? --Versageek 21:17, 19 March 2007 (UTC)
May I ask for some clarification on specifically which of Connel's actions are being called into question here? Perhaps a link to the relevant diff, or at least the page which he deleted, or whatever it is. I'm totally lost, and would very much appreciate being brought up to speed. Thanks. Atelaes 20:44, 19 March 2007 (UTC)
start reading here it's this entry, and several that follow. --Versageek 21:04, 19 March 2007 (UTC)
The very point I'm trying to make. Connel has used the RFD process to change Policy. There is no-one place (That I can find) that puts forward a proposal for change. The talk pages of the affected policy pages do not include any discussion for the change. Ricahrdb

A way forward ?

Being optimistic, is this an indication of Connel being willing to actually work on developing policies. If so, I'm more than willing to spend a bit of time working with him, and anyone lese. But, we have to do it the right way. Changes have to be proposed and publicised and slowly a consensus built. All completely transparent and in writing in Wiktionary, in the talk pages of the pages affected. No using RFD to push changes through. First signs of goodwill I'd like to see would be:-

  • Connel to withdraw the RFD for/from each of these pages.
  • Connel to put up a written proposal somewhere (probably in the talk page of Wiktionary:Policies and guidelines as to what he proposes. If there was a considerable IRC chat about this, can we have a summary.

I would also suggest that we possibly try to align somewhat with the more mature Wikipedia. See w:Wikipedia:Policies_and_guidelines. They seem to have "Policies", "Guidelines", "Proposals", "Essays". Not so different from "Official Policy", "Semi Official Policy", "Draft Policy", "Policy Think Tank", but perhaps not so apparently rigid and "bombastic". But nevertheless a recognition that it takes stages to develop a policy.

Hope we can work together on this Connel, even though I can barely spare the time.--Richardb 05:29, 20 March 2007 (UTC)

Wikipedia needs many levels of policy development because of the number of people involved. While I agree that we need more than one level, we should be cautious of going for an over-complex and over-rigid framework more suitable for a large organisation. The result of excessive complexity is that the system falls into disrepute and is ignored...which has indeed happened here. --Enginear 12:31, 20 March 2007 (UTC)
A few notes on this. First, I agree that simply nominating these things for deletion may not have been the best approach. However, it should be noted that, at the very least, he did not delete them outright. RFD still allows the opportunity for discussion and debate (as the very fact that we are now having this debate shows). Second, I think it should be noted that many of these pages were in a state of being sidetracked and ignored when Connel nominated them. Third, I would really like to have a page that has a listing of all the policy pages (as I noticed one of the RFD'd pages was). Wiktionary has such a ridiculous amount of policy (and yet rightly so), and I find it hard to keep track of it all myself. And, while I am not a veteran like some folks here, I'm not exactly a newbie either. Yet I still find myself unaware of certain policies. However, most of the pages that were nominated for deletion do need a great deal of work if they are to remain useful, as they certainly do not reflect current practice (which is really what Wiktionary policy is, in reality). (comment continues below)
If they don't reflect current Wiktionary current practice, there are at least two ways to go.
  • Update the policy to reflect the current practice.
  • Modify practice to more follow policy, thus slowly edging away from the current poor practice.
Those who generally agree with the benefits of having written policies will tend to the latter, with some of the former. Those who generally disagree with having policies will tend to the former to some extent, but actually are more likely to just ignore any policies anyway. Which they are free to do. (Indeed that in itself iis a Wikipedia policy). But no need for them to try to knock down policies which are useful to newbies, and to those who do want to try to build and use them. --Richardb 06:18, 20 March 2007 (UTC)
(comment continues from above) At least some of them merit (in my opinion) such work. Finally, while I agree that it is sometimes beneficial to have policy discussions in a location other than the BP (as, for example, I found the discussions on the talk page of the About Greek to be much more focused than many BP discussions), as it largely limits the people involved to those who are interested and somewhat knowledgeable in the topics at hand. However, each and every single discussion of this type absolutely must have a note on the BP publicly announcing that the discussion is happening and where. Atelaes 05:55, 20 March 2007 (UTC)
Absolutely. The BP should always have a noticeboard of what policies are being discussed. And guess what. That is what is there right at the start of BP. But, even so, it's worth standing up in the Beer Parlour every so often and shouting "Anyone interested in seriously discussing ..... should go to ..... for the serious debate". Which, I guess, is what I've done. Was it just a bit of a tactic to also throw a couple of swings at "my mate" CM in the process, to get some extra attention ?--Richardb 06:18, 20 March 2007 (UTC)
I get enough of that from rolling back vandals, thank you very much. As Versageek was baffled earlier, let me try and assemble some of the relevant events. There was a Wikipedia blowup about Wiktionary, with a couple Wikipedia admins visiting Wiktionary and immediately running afoul, based on their assumption that this is Wikipedia.
The policys that I tagged for RFD were specifically called out as being what led those contributors astray. Each was undeniably obsolete. Each one was also long abandoned.
During this time, much of the confusion was resolved on IRC. The fallout of that, after the RFDs was my rearrangement of what the existing policies are, for visiting Wikipedians' sake. In a nutshell: WT:CFI and WT:ELE are the absolute pillars of Wiktionary. Discussions are (for better or for worse) held in the central WT:BP area. WT:VOTE is used to implement/validate new policies and practices. What I abandoned, was devoting a couple hours per week of my time to keeping them up to date...once the three Wikipedians in question got comfortable, there was no urgent incentive to pursuing the policy maze that Richardb had originally set up (for further simplification/dismantling.) NOTE: Richardb spent a lot of time and effort singlehandedly trying to implement a policy structure that he though was appropriate...however, with the lower traffic of en.wiktionary.org, the system was enormously too complex, and overkill for the situation by several orders of magnitude. Remants of his proposed policy structure (which everyone ignores) shouldn't be left around for Wikipedians to trip over...that was the impetus for the initial RFDs!
So, where do we go from here, indeed? --Connel MacKenzie 16:16, 23 March 2007 (UTC)

Revert first, look later

After a string of incompetencies by mister Connel Mackenzie I see everyone talks about recently (see my IP's Block Log for edifications), today I see another idi... um, fellow, reverting my changes and then his after probably actually SEEING what he has reverted. Now, I know that these people are busy, but to revert a change based on the comment, or worse, on Connel's side, for just editing a word that he knew as mostly vandalised seems to me incompetence, not to mention a violation of a certain statute, if I'm not mistaken, of this site's, that mentions ,,good will" assumed of one's modifications. That may apply to people that actually read the modification, but what do you call those incompetent idi... um, folks that just revert cause they don't like the edit summary or the word edited? 15:56, 19 March 2007 (UTC)

If you think any reversion of edits on Wiktionary is a violation of any statute, then you are indeed mistaken. Cheers! bd2412 T 15:59, 19 March 2007 (UTC)
Well, if you tolerate bans for no reason from admins or modification in the detriment of an articol and take no action against that or see no problem with it, I must congratulate you on a job well done promoting vandalism. —This unsigned comment was added by (talkcontribs) 16:12, 19 March 2007 (UTC).
I didn't say that. All I said was, there's no statute violated. But take it however you wish. Cheers! bd2412 T 16:28, 19 March 2007 (UTC)
I'm not sure what you're saying. If you mean that no reversions violate policies, then I respectfully disagree. If you mean that not all reversions violate policies, then you're correct, but I'm not sure what your point is; anon wasn't claiming otherwise. Rather, he was saying it violates the "assume good faith" policy (w:WP:AGF; I don't know if Wiktionary has a similar counterpart) to revert an edit without looking at it. —RuakhTALK 19:55, 19 March 2007 (UTC)
No, I'm just nitpicking. We have policies. We do not have statutes. Cheers! bd2412 T 22:15, 19 March 2007 (UTC)
It would help if you referred to a specific reversion or edit, rather than making vague grumbling noises. We can't fix a generic problem without addressing the specifics first. --EncycloPetey 16:16, 19 March 2007 (UTC)
Anon is referring to http://en.wiktionary.org/w/index.php?title=Special:Log&type=block&page=User: and http://en.wiktionary.org/w/index.php?title=pizda&diff=next&oldid=2149808. Frankly, I agree with him/her: Connel MacKenzie (talkcontribs) seems to have acted indefensibly in this case. Anon made a series of contributions pertaining to Romanian, all seemingly reasonable and correct (I don't speak Romanian, but that's how they seem), culminating in a contribution to pizda adding the Romanian sense of that word (which, unsurprisingly, is the same as the sense of that word in the various nearby Slavic languages). Connel MacKenzie responded by reverting the edit and blocking the user, writing "don't mess with constant vandalism targets please". This is indefensible; the edit is quite reasonable-seeming, and he seems to have made no effort to determine whether it was accurate. If he feels that anonymous editors shouldn't edit this page, he should semi-protect it rather than block any who try. —RuakhTALK 18:55, 19 March 2007 (UTC)
I looked into this case a while back (the anon had posted a friendly little note on Connel's talk page). The extra history behind this is that someone had been adding a Romanian section, when Dijan had noted (a number of times) that a Romanian section did not belong here, but rather at pizdă. A number of different anons had attempted to incorrectly implement a Romanian section at this entry, all being reverted. Perhaps Connel acted a bit hasty in this block, but it is not as though there was no reasoning behind it. This page definitly did have a history of anons doing things that shouldn't be done to the page. To the original author of this thread, as EncycloPetey notes, you would do well to cite specific grievances, and to ask for specific remedies, instead of making incoherent claims and poorly veiled attacks. I agree that Connel did in fact make a mistake, but I have yet to find an admin who hasn't. However, a quick search of your own contribution history finds you making the understandable mistake of adding a false link [6], so I think it should be admitted that no one is perfect. If you feel that some action is necessary in response to Connel's mistake, then please propose one. Otherwise, I suggest you get on with your life. Atelaes 20:29, 19 March 2007 (UTC)
I'm confused: there's currently a Romanian section at pizda, and it's been there for almost three weeks with no comment. What's changed? —RuakhTALK 21:02, 19 March 2007 (UTC)
pizdă and pizda are different words. Cynewulf 21:09, 19 March 2007 (UTC)
No, they're not … —RuakhTALK 22:07, 19 March 2007 (UTC)
Well, the pizda entry says that one is articulated and one is unarticulated. However, I don't quite understand what that means. Anyone care to curb my raging ignorance? Atelaes 06:02, 20 March 2007 (UTC)
Romanian, like Bulgarian, marks definiteness of nouns in the ending. For this particular word, pizdă is the citation form and means cunt. The spelling pizda means "the cunt" (nominative and accusative); pizde is genitive/dative, and pizdei is the definite genitive/dative (of/to the cunt). —Stephen 23:01, 20 March 2007 (UTC)
As I've stated, I don't speak Romanian; but I understand it to mean that pizda means the poontang while pizdă means simply poontang (see w:Romanian grammar#Articles). —RuakhTALK 16:58, 20 March 2007 (UTC)

On a related note, perhaps Wiktionary should have a policy specifying when users can make automated reverts. Such a policy might specify that automated reverts are to be made only in cases of clear vandalism. I know the arbitration committee on Wikipedia criticized an editor for not explaining his reversions once. Without an explanation, many editors assume the worst as to why a reversion was made. Some editors are embarrassed by them because it makes it look as if their edits were so bad that an explanation is un-necessary. Users who write detailed summaries of their edits may feel like they're being ignored. Others may feel like their new-comer status is being highlighted by the use of such powerful tools.--Νικα 22:41, 20 March 2007 (UTC)

I have to apologize, but let's bring this back down to reality :) An anonymous contributor edited a vulgar word in a language that most of us do not speak or understand. Why should we trust the edits? The anonymous editor made no attempt to establish a track record of credibility (this is especially important when editing slang words). When I make an edit, people generally trust that it's correct because I have demonstrated, over time, that I am knowledgeable in the languages that I work on. Anybody can look at my edit history and decide that for themselves. This is key, especially when the word is poorly documented. If nobody can verify that you know the language, your only other option is to include where you got the information from (not in the history comment, but in the actual article under the references section). Remember, we're all anonymous here. Nobody should believe what anybody does unless it can be verified in some way. -- A-cai 02:37, 21 March 2007 (UTC)
Shouldn't editors assume good faith, though? And if someone (a sysop, editor — anyone) isn't knowledgeable about a subject, shouldn't they simply leave the entry alone? Or, if they find it suspicious, they can RFV it, rather than simply revert edits. But I could be misunderstanding your comments. Could you elaborate on exactly what types of practices you support vis-a-vis newer editors?--Νικα 03:29, 21 March 2007 (UTC)
Frankly I think we should do what Wikipedia does and lock down our most frequently vandalized words. After all, the meaning of fuck or fag is not likely to change radically anytime soon, making it far more stable as a dictionary entry than George W. Bush or Hillary Clinton are likely to be as encyclopedia articles. I suggest that for particularly vandal-prone words, we make the entry as complete as possible and then lock it down, with an invisible note at the top of the page to tell would-be editors to take their suggestions to the talk page. Cheers! bd2412 T 03:42, 21 March 2007 (UTC)
Well, before we lock something down, we should make sure it has translations in all (order of magnitude 10000) languages, all derived terms set, all "see also"s added, all synonyms and all alt spellings. -- unsigned
They have pretty thorough coverage of major languages as is. With respect to synonyms, we have WikiSaurus. Anything else, take it to the talk page. bd2412 T 05:20, 21 March 2007 (UTC)
In most cases, the correct action would be to submit suspicious definitions to WT:RFV. However, vulgar and contentious words tend to invite vandalism, which is why sysadmin's tend to be quick to revert suspicious edits. Is this the correct course of action? I guess that's depends on your perspective. I'm simply saying that if you make such edits, you should be able to demonstrate in some way that the edit is legitimate. Perhaps bd2412's suggestion is correct. Maybe Wiktionary should only allow registered editors to directly edit contentious words. This would not be out of keeping with what's going on over at Wikipedia. -- A-cai 04:05, 21 March 2007 (UTC)
Guys, you're sidetracking part of the issue. My problem isn't that these guys take one look at the definition and if they see something that they don't understand, they revert it. That would be... let's say, a little arrogant, but somewhat understandable. My problem is that, by all apparences, THEY DON'T LOOK AT WHAT THEY'RE REVERTING. Simple as that. They just hit the revert button, hell if I know where they have it as to not see what they're reverting, and then look at what they did... if they actually do, that is. It wasn't Connel's actions that made me take stand, one bad weed I can more or less understand, but it was yet another blistering show of ignorance: http://en.wiktionary.org/w/index.php?title=Australia&action=history which makes me think this is a regular habit for people with some experience around here. It didn't end in a ban like the last time, but it sure got me annoyed once more.
As for discussions about track record, achieving credibility... come on! Do you think every (or any, for that matter) anonymous contributors knows about the ground rules you set up for them? For me in particular, I just edit where I see apropriate to change/add/whateva. smth, I don't care about what words I edit or how many. paused
inserted response I understand your point of view as a casual contributor with respect to assuming good faith. However, you must understand that my comments about credibility have nothing to do with "ground rules." I'm simply stating reality: life is not fair! A sysadmin is not likely to give the benefit of the doubt to an anonymous contributor who makes a questionable edit (i.e. an edit that cannot be independently verified by a sysadmin). This is because too many people (registered or not) make bogus edits to words. One glaring example of this is editing a word for a language that you do not speak!!! You can complain about the sysadmins all you want, and in some cases, you may be justified. But life is a two way street; don't do things that will get you reverted, and you won't be reverted. I've added thousands of words to wiktionary over the last year. I have only had my edits called into question on rare occasions. In every case, I resolved the matter, not by cursing at the person who did it, but by citing evidence for the validity of my edit. end of inserted response -- A-cai 03:07, 24 March 2007 (UTC)
bd2412, nice of you to try to close the issue by nitpicking (taking advantadge of the fact that I said statutes in stead of policies... that was quite fair-play of you, I must admit).
inserted response Perhaps, you are unaware of the fact that bd2412 is a lawyer :-) end of inserted response -- A-cai 03:07, 24 March 2007 (UTC)
And for the record, as I saw mentions of my gender, I am a male :P Signing off is here default getaway with a friendly warning: mister Ullmann was strike two, if there is ever another strike of stupidity from someone that considers himself/herself superior enough to revert stuff just 'cause they don't like the editing message, as much as I appreciate this project's ambitions, I shall feel forced to use the ,,big guns" (dear old Proxy Switcher works like a charm ;) )to thank them in a civilised (NOT) order. Toodles! 14:44, 21 March 2007 (UTC)
This is a dictionary, so I feel I have the right to be particular about the meanings of words. A "statute" implies that violation thereof is unlawful. A policy is more like a guideline to be interpreted in accordance with the dictates of the situation. bd2412 T 20:33, 23 March 2007 (UTC)
Wow. All this from a troll/vandal (talkcontribswhoisdeleted contribsnukeedit filter logblockblock logactive blocksglobal blocks), who has been reentering an item that has previously failed RFV/RFD, who not only resorts immediately to personal attacks, but gets support for those attacks? Why wasn't this section immediately rolled back? WTF is going on here? --Connel MacKenzie 15:45, 23 March 2007 (UTC)
I don't think he's a troll/vandal, and you've provided no evidence that he is. All of his contributions seem well meant. The only person he seems to be attacking is you, which I think is understandable, seeing as you had blocked him for no reason and have never looked back. (That doesn't make it acceptable, mind, but eminently understandable.) —RuakhTALK 16:43, 23 March 2007 (UTC)
User:Ruakh that is bullshit, and you know it. Someone who not only expressed intent to use open proxies, but is already intimately familiar with them is not a vandal? It is a good reason to review his edits in detail, but certainly no reason to feed the troll, nor to hide in fear from threats of vandalism. If he wants to post goatse on my user talk page now, we can certainly use the exercise of blocking new/residual open proxies. As to the pizda entry, go take a look at the history. Frankly, I trust Dijan's research more than Stephen's knee-jerk assumption of good faith, but I don't have a handy method of checking either, at the moment. Did the vandal resubmit with three citations? Is it attested? Come off it. It is run-of-the-mill vandalism. --Connel MacKenzie 17:36, 23 March 2007 (UTC)
Sorry for saying so, but you're the one bullshitting; I guess you find that easier than recognizing your error and apologizing for it? I'm also familiar with open proxies; I've never used one, but to be honest, if an administrator blocked me for no reason, I might decide to use one; does that make me a vandal, too? (Granted, I probably wouldn't try to circumvent an unjust block, as I'd more likely just say "fuck this" and give up editing entirely — but I can't say for sure one way or the other. I guess it depends whether I felt the problem was Wiktionary in general, or a single power-mad administrator.) It seems quite obvious to me that pizda is a legitimate Romanian word, whose definition should be something like {{form of|articulated (definite) nominative|pizdă}}. A Google search for google:site:ro "pizda" will show you instantly that pizda is much more common on Romanian Web sites than pizdă is (whether because people don't bother typing the breve, or because the definite nominative form is more common than the indefinite nominative, or what). Also, your attack of Stephen strikes me as just this side of crazy; if anyone here has made a "knee-jerk assumption", it's you. You blocked a user for a good-faith (albeit slightly misguided) edit, and now you're acting like his angry response justifies your having blocked him. —RuakhTALK 19:34, 23 March 2007 (UTC)
O.K., having say that, I see that now that you've re-blocked him, he's started to genuinely and blatantly vandalize under a different IP ( So, congratulations; you've been outdone in the bad-guy department. *is done defending the anonymous-editor-turned-vandal* —RuakhTALK 19:51, 23 March 2007 (UTC)
Again (as if you didn't already) see comments below. He always was a vandal, from the very start. I suggest you redact your "outdone" comment. --Connel MacKenzie 20:27, 23 March 2007 (UTC)
Okay, here's a better example, untainted by all the vulgarities and slang nonsense. This reversion violates Assume Good Faith in my opinion. Deletions like this should be commented at the very least, and probably noted on the talk page as well. DAVilla 19:20, 23 March 2007 (UTC)
You're suggesting that isn't nonsense? DAVilla, that edit is nonsense - you have now (a half a month later) dredged up a student's (now assistant professor or something) web-page as "evidence" that all astronomers make the same mistake as this one former student? If astronomers use the jargon term metalicity, with a similar meaning, then that entry might merit an entry here - but such a blatantly bogus redefinition of metal? Without references? What is going on around here? --Connel MacKenzie 01:00, 24 March 2007 (UTC)
As this vandal likes to point to: User talk:Connel MacKenzie/archive-2007-3#Thanks for the ban. Note clearly, that any "good" contribs this guy has ever made are far outweighed by his initial, and constant, vandalism, interspersed throughout. This is not some guy who is "slightly misguided;" rather, he is an insipid troll. Frankly, the more obvious vandalism he's doing now is much easier to deal with, than the subtle mistakes he was intent on inserting into Wiktionary. --Connel MacKenzie 19:48, 23 March 2007 (UTC)
Sorry for butting in here as a relative newcomer, but may I suggest that Connel MacKenzie might do best to take a small step back here, and admit to a slight glitch. Glitches happen! I see good faith in general (although admittedly, cannot back up this faith with "evidence"). It is clear that Connel cares a lot for the Wiktionary project. --Keene 01:17, 24 March 2007 (UTC)
Nope. As evidenced, it was no glitch to block the vandal. --Connel MacKenzie 01:55, 24 March 2007 (UTC)
You seem to call users vandals all the time Connel. --Keene 02:16, 24 March 2007 (UTC)
I never tried to cite that definition as it was never posted on RFV. What I was doing was taking 5 seconds to give a reason for my reversion. If anyone does even a half-ass job of looking they'll see it's actually quite common in astronomy. The point is that there's really too much information out there for any single person to claim to know what kinds of entries are bogus or not. That's why you have to assume good faith. DAVilla 17:27, 24 March 2007 (UTC)
I have to partially disagree with DAVilla. It's not about what any one person knows or good faith etc. It's about the integrity of the information in wiktionary. "Take my word for it" is not a viable solution for any of the wikis. I will grant that there are plenty of words, that are often not found in mainstream dictionaries, which should be included in Wiktionary. However, someone saw the word somewhere, or it shouldn't be on our site. Here is my approach to this (mind you, I'm not stating Wiktionary policy, just my own opinion). For example, if I create an entry for a basic Chinese word such as 杯子 = cup, anyone can verify the definition from any number of on-line resources[7]. Technically, I should provide proof of the entry's validity, but I didn't in this case (laziness), because it's so easy to verify. Now take a look at 缓冲器. This word is poorly documented in other dictionaries. As a result, I was questioned about it by a contributor (see: User_talk:A-cai#bumper). My solution was not to say, "Trust me, I speak the language and you don't!" Why should that person have to take my word for it? He doesn't even know me. I say, good for him! My solution was to find proof, and then include that information in the article. -- A-cai 02:53, 25 March 2007 (UTC)
That's not what this discussion is about. I don't think DAVilla is arguing for assuming that someone is right or wrong, but for assuming good faith. No one should ever assume that someone is right or wrong. If you have a strong suspicion that something is wrong, you might request verification. If you strongly believe that something is wrong, you might undo an edit. But under no circumstances (in my opinion) should you assume that someone is wrong based solely on the age of their account or on whether an addition is sourced. Perhaps it has to do with my world view, but I believe that humans, in general, act in good faith. I think that they are good at heart. Look at the recent changes for this wiki and 99.9% of the changes you will see are made in good faith and are factually valid. Most of the additions are also unsourced. Assuming that these edits are by their unsourced nature incorrect would be a logical fallacy and also tragic. We saw with Essjay on Wikipedia that having an old screen name on the internet means nothing. The best way to ascertain accuracy is to discuss the content and not the editor.
Also, to get back to my other point: Under no circumstance should anyone be reluctant to explain why they have done something. If you have carefully examined an edit and you are reverting in good faith, then you should have no trouble explaining yourself. In fact, you should be eager to tell everyone. On the other hand, if you do not have a valid reason for making an edit, then you will be reluctant to explain yourself.--Νικα 05:44, 25 March 2007 (UTC)
Unless you're talking strictly in the abstract, you seem to be misunderstanding something here. No one asked the anon to justify his edit; rather, an admin reverted it and blocked him for having made it — and didn't even leave a note at his talk-page explaining why. It's fine to be a bit cautious while assuming good faith — we have to be, especially at oft-vandalized entries — but the admin made a strong assumption of bad faith without any support for that assumption so far as I can discern (though he maintains that there is support for it). —RuakhTALK 05:13, 25 March 2007 (UTC)
Let me clarify my position, if good faith is so important, then the anon must also assume good faith on the part of the sysadmin. The proof that you have cited of Connel's unfairness is not a slam dunk case in my opinion. It appears to me that Connel was making a good faith attempt to stop what he believed to be a vandal. Remember, the anon still has not demonstrated a proficiency in Romanian, nor has he offered any proof from a credible source of his definition (an example sentence might be nice. For example, see 上穷碧落下黄泉). Had one of those things happened, I might have been more likely to side with the anon. What he has done instead, and you can read it above, is threaten to make edits via some kind of voodoo proxy in the future, so that he can't be blocked as easily (rather than making an attempt to support his edits with evidence). With respect to the original potty mouth word that started this whole thing, until a fluent Romanian speaker comes along and sets us all straight, I'm not sure what else we can do at this point. -- A-cai 09:08, 25 March 2007 (UTC)
Someone is lucky that the anon was not civil after what, aside from the history of the page, could look like an unjustified slam. Had the anon not been the same contributor, had he acted civily and been able to credibly source the definition, there would have been no justification for blocking without communication, regardless of the histroy of the page, and I might have recommended disciplinary action against the admin for violation of AGF. That the anon did not act civily means the history of the page backed the admin. So I think the point is that Dijan knew what he was talking about, and that Connel is either just lucky or he really knows what he's doing, which I hope doesn't mean don't suspect means singling people out and driving them acts of vandalism like this. DAVilla 11:45, 1 April 2007 (UTC)
"Unjustified slam"? I suppose it could look like that, if you are blind, perhaps. --Connel MacKenzie 04:09, 2 April 2007 (UTC)
Yes, that's almost exactly what I said. If you're blind to the history of the page, then it could look like (not equal to "is") an unjustified slam. The history of the page makes the story turn into one of an ambitious contributor not listening to reason. The incivility of the anon makes the story into one of an ambitious contributor not listening to reason. If the history were debunked with proof of the word, and if the anon had acted civily, then the story would be completely different. If the history were debunked with proof of the word, and if the anon had acted civily, then the block would have been inexcusible. But that is not the case. You blocked the right guy this time. Is that because you're lucky or you really know what you're doing? For a successful stockbroker, they say that's impossible to tell, the difference between luck and skill. And so it might be here. And so why bring up the question? To point out its irrelevance. You blocked the right guy this time, and everything else we can say, and pretty much everything I said, is nothing more than speculation on the difference between luck and skill. DAVilla 19:04, 2 April 2007 (UTC)
Well well. Just how many people are to be totally p'd off by CM, and as a result quit contributing, before someone decides it is time to rein Connel in. I've been around on wiktionary for a few years. And was active at the time Connel started and helped him for a couple of days. Only a few months later he was already into his stride of using the jack boot approach, and has been doing so for a couple of years now. He rarely takes a backward step, or looks backward at the damage he does. Whilst being aware of how much stuff he does, I still feel he is a real danger for the future of Wiktionary. It is still being run as a private club with a few arrogant people prepared to ignore all the rules and just smack users down. Which is probably why it is not taken seriously by many people. Connel - for a while, just take a back seat on being the policeman, and lets see if the world collapses around us, or if it becomes a friendlier place to contribute (even if it might be a bit smuttier).--Richardb 10:05, 14 May 2007 (UTC)
There are some things as a CU I must not discuss, particularly in a public forum. However, your assumptions that my block of a vandal was unjustified are misplaced. I assure you, that if I were to unblock every OP I've ever blocked, yes, en.wikt: would be a smuttier place. I can safely guarantee that it wouldn't be friendlier.
I'm almost curious as to how you're misguided mode of thinking arrives at the conclusion that WT:VOTE is "run as a private club." Please spend some time constructively, rather than feeding trolls with baiting personal attacks. --Connel MacKenzie 10:55, 14 May 2007 (UTC)
Connel still manages to offend people? Absolutely. The latest victim: User:Keffy. His absence since then has left me, frankly, rather dismayed about the whole ordeal.
Connel doesn't back down? I'm not sure I completely agree with that. I've seen what I speculate to be a slight modification in his very personality. Not a gigantic change in behavior. Not an unwillingness to be the only person arguing his side, as yet. But an acknowledgement of error in a few incidents that, for lack of fanfare, may not have blipped on your radar.
Likewise I have noticed that SB has re-evaluated his position on what constitutes a worthy entry. Again, not a radical change, but definite in my view, and pragmatic in contrast to the idealism that they, and we all, try to uphold.
Those are two of the longest-standing members with whom I have ever had any real gripe. They also contribute a ton more than I do. My politeness requires humility on my part for their consideration of especially my own opinions. A more cynical view says that if they were made of concrete and not oak then they would have already fallen in the strong changing winds. But if you find knots, then they are not the concrete towers we might imagine them to be. (Apologies for the poor literary attempt.)
There is no one here for whom this equally applies, that simultaneously he more deserves CU status, and that it is more difficult to defend that fact, than Connel. Connel himself has said that he is looking for a replacement. Wiktionary is maturing, it is evident, and when the day comes, I expect that Connel will give up CU voluntarily. That will be the end of headaches for himself far more than for you. For you will discover that even then, apart from the natural maturing of Wiktionary, of which I should mention you have already been instrumental, not much else will change. DAVilla 16:48, 14 May 2007 (UTC)


Do we have guidelines anywhere for what it takes for a word to get this tag? It seems to me, like it is a last ditch plan for prescriptivists to defame words which thwart rfv/rfd :-) Several of the tagged words were not neologisms, so I removed the tag from them; many others would probably be better off deleted. The very nature of this tag just seems contradictory: either something passes RfV/RfD, or it does not :-) Though, I don't want it to sound like I have anything against DaVilla (who created the template), in fact I think DaVilla is a fantastic contributor and we can all learn much from their contributions :-) Anyway, if we are going to use this template, we could make link to a page describing the specific, objective criteria used to make the classification. That is more in line with Wikimedia philosophy in general, and would no doubt stir joy and happiness in the hearts of all our readers!!!! :D -Signed, Language Lover

Word. I think we're much better served by appropriate use of {{context}}. —RuakhTALK 16:46, 20 March 2007 (UTC)
To clarify: the template's talk-page does say how it's to be used ("Use this template […] on pages that have passed the RFV process or are otherwise well sourced, but which do not appear in any of the six major English dictionaries […]"), but I'm not sure it's actually being used that way, and the current wording is grossly misleading. —RuakhTALK 16:53, 20 March 2007 (UTC)
I fail to see why inclusion in the "six major English dictionaries" is relevant. For one thing, Wiktionary is itself a major dictionary :-) For another thing, such a philosophy reeks of copycatting, I mean if we're just mirroring those dictionaries, how are we better than dictionary.com? For yet another thing, the classification of dictionaries as "major" or "non-major" is mostly arbitrary (the arbitrariness is of course obscured by lots of appealing to authority and such)-- why is OED "major" and Urban Dictionary not? Does the fact UD's contributors don't all have college degrees, mean that the words they speak aren't words? The way I see it, the "six major dictionaries" can look to us for inspiration/confirmation, not the other way around (and if it's not like that now, it ought to be our goal anyway) :) Especially with all the wonderful work all of you guys like Ruakh do :-) Language Lover 17:19, 20 March 2007 (UTC)
Although I have commented that the neologism template in its current state is useful, Language Lover's comments are so perfect, I can't help but second them.  What Language Lover said is the essence of "wiki is not paper" IMHO.  If a word exists somewhere out there, it should be here and others should be able to find definitions and usage notes on it here. — V-ball 17:24, 20 March 2007 (UTC)
A few comments. First, in my opinion, there is a glaring distinction between the OED and the Urban Dictionary. Yes, one of them is that the editors of the OED have degrees, and, in general, the editors of the UD do not. But more importantly, the OED is consistent, extremely well researched, and more representative of the language as a whole. UD includes definitions used by small communities, or sometimes solely of individuals, whereas the OED's definitions generally represent the semantic understanding of millions. That being said, there is, nonetheless, some merit in your comments, Language Lover. There definitely is a point where we should strive to carve our own niche in the dictionary world, and not simply try to imitate the OED. However, at the same time, we have to deal with the ambiguous line between descriptivism and prescriptivism. I think that most of the editors here are quite in favour of going with the descriptivist school, meaning that we are striving to describe language as it is actually being used, not trying to tell people how they "should" use their language. However, many of our readers are not in on that frame of mind. Many of them look to a dictionary to find the "correct" spelling of a word, or the "correct" context in which to use a certain word (I must admit that I do from time to time). If we do not make some distinction between correct and incorrect (in certain situations), then we are misleading our readers. Whether that is our fault or theirs is irrelevant. That being said, Ruakh makes an excellent point that the context tags might often be more appropriate and more useful for many places where the neologism tag is currently in use. Finally, thank you very much, Language Lover, for properly signing your comment. Atelaes 19:51, 20 March 2007 (UTC)

Good discussion from everyone :-) It is an unignorable fact that one purpose of a dictionary is to help elementary school students check whether words should be put in their book reports. The prescriptivist response is to single out "bad" words and "condemn" them. The progressive response is to take a stance of, we are the ones in the forefront, it's not a matter of the words being bad, but of the English teachers being out of touch. I think for the most part everyone agrees that tags like {{slang}} are an excellent compromise :) I doubt many readers will say to themselves, "I don't mind putting slang in this report, but I can't put neologisms in it!" :D

For the sake of being constructive, here's a possible guideline for neologism status. I'm just putting this up with little thought, hopefully others will expand on it.

  • To be considered a neologism, three out of the following four conditions are required:
    • The word (or sense) is known to have been coined within the past five years, by a single individual, for the sole purpose of coining it. (See santorum)
    • The word is not a straightforward construction by agglutination (ruling out common sense words like windshieldlike, podiumward, etc., which could easily be "accidentally" "coined" by an author without even realizing it; also rules out tsunameter)
    • In a 3/4+ majority of the word's citations, the author talks about the word itself, or defines it as soon as it is brought up; as opposed to the author naturally, seamlessly slipping it in among other words (this rules out things like lolicon)
    • The word is not an eponym, Latin/Greek construction, or other such construction, defined in peer-reviewed academic literature, government literature, etc.
In addition, the word cannot have more than twenty-five independent citations in preserved mediums (as described in CFI).

This is just a rough proposal, hopefully it'll inspire some terrific discussion :-D Language Lover 00:47, 21 March 2007 (UTC)

Defining foreign-language verb forms

This is something that has been on my mind for a while. It seems to me that our current convention for defining non-English verb forms is not to give the translation, as with all other (as far as I know) non-English entries (and as recommended at WT:ELE#Variations for languages other than English), but to give a definition. What I mean is that just as, for instance, the entry for hola is "#hello, hi" which tells me what that word means in English, I would expect a verb form that I've looked up, like comido, abro, getroffen, or karju to tell me "eaten," "(I) open," "found," or "(you) shout," with the appropriate glosses to designate which senses are meant. Instead, the definitions given there, and commonly at all non-English verb forms are in the form of "The past participle form of the verb comer." "The first-person singular of abrir in the present indicative." "past participle of treffen," and "Second-person singular imperative of karjua." These definitions are confusing and not very helpful for the reader. The meanings of first-, second-, and third-person might be common knowledge, but it's not readily obvious to the reader what we mean by conditional, subjunctive, imperfect, past participle etc. mean, or how they are translated into English verb forms. Imagine going to a dictionary to look up fuesen and finding "familiar second-person plural imperfect subjunctive form of ir".

I can't make out any good reasons for them except for the ease of automation. I think we should make it clear that while bots might mass-generate easily created definitions like these, the ideal one one should give a proper translation to an English word for to reader's reference, and should include a gloss if it is necessary for the reader's comprehension, as with other non-English entries. I'm thinking about a change like this one, which I think improves the meaning considerably: [8]. If I go fill in yéndose, should I write "leaving" as I've done for animado, or "present participle of irse"? Any thoughts on this? Dmcdevit 04:26, 21 March 2007 (UTC)

This is an excellent point you raise, one I've pondered a bit myself. One thing to keep in mind is that "translations" are really only approximations, there is no one-to-one correspondence between most languages and English. I think there's a balancing act going on. One type of reader uses the dictionary directly to translate text. For them, your suggestion would simplify things. The other type of reader uses the dictionary as a companion for learning a language. For them, the suggestion might make things seem overly complicated. See Ruakh's excellent examples below. Hmm, it is a subtle and interesting thing which you have brought up!!!! :-D Language Lover 04:40, 21 March 2007 (UTC)
While you make a good point, I think that the technical information should not be removed. It is highly useful to many people. In addition, I think the "soft redirect" nature of these entries needs to remain. People need to know that they are not seeing the whole picture. I propose a format similar to that which is in place at λύῃς. Atelaes 04:55, 21 March 2007 (UTC)
I don't mind including that information, certainly, but I'm still wary of putting it in the translation, when it isn't. Could we move information like "Present active subjunctive 2nd singular form of λύω" to a "Usage notes" or "Etymology" (or "Verb form"?) section, or do something with it to differentiate? Dmcdevit 05:12, 21 March 2007 (UTC)
Oooh, I like λύῃς :-) You did a fantastic job with that word, Atelaes! :-) The only thing I'll add to your comment is this "best of both cakes" approach should be optional, to allow the quicker, more mechanical old method as well. One other thing, I wonder how one would directly translate the te-form of Japanese verbs? Is that possible? It seems to it is impossible. Language Lover 05:00, 21 March 2007 (UTC)
Translation is surely impossible, as it's used in so many ways. An explanation (meaning by meaning, like in any entry) can be put in (and maybe te, or -te?), which is the relevant suffix. In the grammar taught in Japanese mandatory education there is no "te-form" (conjunctive form? as far as I know there's no Japanese term). It's analyzed as the ren'yookei (continuative form?) of a verb followed by the auxiliary "te". Similarly "masu", "rareru", "ta" (of the perfective(?) form), and many others, are handled as auxiliary verbs.
In the case of the conjunctive form and some others (such as the perfective form) linking to the English term explaining the form may be useful, as they're commonly considered separate forms in Western explanations of Japanese grammar. Perhaps perfective and conjunctive, with added meanings for how they are used in teaching of Japanese and link to Wikipedia article on Japanese grammar (which, unfortunately, doesn't even say what approach it uses, but looks like a mixture of various explanations). With the stem forms one can link like this: continuative form and have something useful there, such as a list of suffixes which attach to that form. -- Coffee2theorems 13:51, 11 May 2007 (UTC)
I think we should define the form in terms of the lemma, as is currently common practice here, for a few reasons:
  1. If the lemma has a number of different senses, then it makes sense for all information to be contained on the lemma page, rather than listing all the different senses for each inflected form (a maintenance nightmare).
  2. Forms correlate poorly across languages. When you define yéndose as "leaving", you ignore the fact that Spanish gerundios behave quite differently from English gerunds and present participles; yéndose often means "in leaving"/"while leaving"/"by leaving" rather than simply "leaving", and conversely, leaving often means "yéndome"/"yéndote"/etc. or "irme"/"irte"/etc. By using a standard template for gerundios, we leave for ourselves the possibility of having and linking to a useful Appendix:Spanish gerundios or whatnot, rather than giving a largely unhelpful "translation". (Note: I use the term gerundio because there doesn't seem to be a good English word for this form. Are we really referring to them in entries as "present participles"? That's grossly misleading, because Spanish gerundios are quite different from present participles in languages that have true present participles.)
  3. I think you underestimate people. I think most people looking up a Spanish word in an English dictionary have some basic familiarity with the terminology; and if any don't, it's at least helpful to know that yéndose is a form of irse, even if they won't get what form without actually knowing a bit of Spanish.
RuakhTALK 05:03, 21 March 2007 (UTC)
Yes, I was being too brief with yéndose, but your good point about poor correlation is true with all translations between languages, including gerundio, it seems, not just verb forms. That's why we suggest glosses to convey the proper sense. In any case, it may have been unclear of me to say that the current definitions arenot obvious. I don't actually think they are definitions. Even in English, it would be like defining cats as "The plural form of cat". But that's more akin to a part of speech, not the real meaning of cats, which would be "more than one cat". It's not that people won't get it, but that it just doesn't convey meaning, except by being a degree removed from the actual usage of the word. I deally, I would think it is best to put the translated meaning in the definition space, and then include the technical terminology specifying the precise tense/person/etc. in a section of its own. Dmcdevit 05:27, 21 March 2007 (UTC)
Your English comparison is a good one; but I think the solution is still to stick to an explanation of what the word is (plural form of cat, adverbial participle of irse, etc.), but to use italics so it's clear it's not actually a definition. By the way, I don't just do this for inflected forms; a while back, I rewrote English adjective sense #6 of gay in a way that gave no definition, only italicized explanation. In that case, it's because there doesn't seem to be an actual definition, whereas in the case of inflected forms, it's because I think an explanation of the word is much clearer than an attempt at translation, but it's the same result either way. —RuakhTALK 06:11, 21 March 2007 (UTC)
In your example edit, inspired is also the simple past tense of inspire -- what is the simple past tense of animar? (Assuming it has one, the French simple past isn't so common) Inspired is also an adjective! What would you propose for 言われた, passive past tense of 言う say? "was said" and nothing more? How about 言いました, past polite form? How would it differ from 言った plain past? Cynewulf 05:07, 21 March 2007 (UTC)
I have been thinking about the formatting of λύῃς and similar words for some time now. This word illustrates an excellent example of what Ruakh is saying. The subjunctive sense does not always mean "might" (although it certainly sometimes does), but has a whole array of nuances. Thus, the translation given is not entirely accurate, or at least not comprehensive. I think this illustrates an important conflict which goes on throughout all considerations on Wiktionary. Do we make it user-friendly or comprehensive? Often times we can do both, and so it is not an issue. However, sometimes we cannot. For example, a word may have a subtlety in meaning which is not adequately covered in less than a decent sized paragraph. But most users simply want a quick and dirty definition and are not concerned with nuances of meaning. In my opinion, we should always strive for both, if at all possible. In this situation, I feel that providing both serves the casual user who simply wants a quick definition of yéndose or λύῃς and then wants to get the hell out of here, as well as the linguist who wants to know what those words "really" mean. It seems that the two do not detract from each other. In addition, including both takes care of Cynewulf's excellent critiques. Atelaes 05:19, 21 March 2007 (UTC)
I guess this would have been a better way to do it? It's important to me that the usage notes are not an actual translation of the word though, and belong separate (as long as there is true translation, that is), and that we should encourage adding true translations and glosses to words that have only tense terminology. Dmcdevit 05:40, 21 March 2007 (UTC)
I think that's a reasonable way to go. However, as we fill out these pages more (which I see as a good thing), we should come up with a way to show that these are "stub" pages, in essence, and that there is (hopefully) a whole lot more info waiting at the lemma page. Any thoughts on that? Atelaes 05:48, 21 March 2007 (UTC)
I would have the bot move its additions to the usage notes section like in animado and leave a note, like "{translation needed}" sign (a template with a category?) after the # in the definition with a link to some explanatory page, in its place. Perhaps not the most aesthetically pleasing, but it seems like the most correct option. Most regular verbs in English with clear translations will be easy to add without needing glosses. Dmcdevit 05:56, 21 March 2007 (UTC)
I have to say I disagree with that. Perhaps it could put it on the definition line and then also put it in a cat. as you say. However, there are a lot of inflections (do YOU want to go through the 90,000 Spanish inflected forms?), and I think we should simply admit that that's a project which shall be waiting for some time. Atelaes 06:04, 21 March 2007 (UTC)
Well, I do, just not personally. :-) My thinking was that either format without clear translations is less than ideal, but moving the current content to a usage notes section (a bot could do that, I'm assuming) at least clarifies the entry. It's a work in progress either way. It's not a big deal though. I would like to at least update WT:ELE (or wherever it should go) with the preferred format, because it appears to me (notice that the two non-Spanish entries, German and Finnish, in my original post were not created by bots) real, live editors are now seeing the inadequate bot-processed creations as the conventional format. Dmcdevit 06:15, 21 March 2007 (UTC)
I think usage notes is a little less than optimal. Usage notes is supposed to be a place for pointing out quirks and such. We could instead make a new section header called "Grammar". Or we could put the grammar data in the line where the word itself appear bold. For example:
kreota Future participle of krei (plural kreotaj, accusative singular kreotan, accusative plural kreotajn)
  1. which will be created
I guess that could cause trouble with words which have tons of conjugations on that line. But maybe such words should be dealt like Japanese, with a separate conjugation table below? Incidentally, I think someone already pointed out the maintainance problem. If we do this, then any time we significantly change an unconjugated word, we'll have to make appropriate changes to all its forms... yikes :-) Language Lover 14:29, 21 March 2007 (UTC)

(Coming back to the margin.) I'd like to reiterate the point already made here (and which I made a long time ago, in another long-since-archived discussion) that giving translations of inflected forms, rather than grammatical information and a cross-reference to the uninflected form, is a bad idea. One reason for this is that the English translations may have many senses. The French word poser can be translated as "to set", but the English verb has dozens of meanings (take a look at it in the OED). So if I edit the page for posé (the past participle of poser) and just give the translation as "set", then — leaving aside the fact that the past participle is identical to the infinitive in English and so "set" is ambiguous (it needs a gloss) — it is unclear which of the senses of "to set" I am referring to.

Of course, you could (and should) include a gloss, and this would be one solution. However, suppose the word being translated has many translations, and someone adds, edits or deletes one for the uninflected form. Then, in theory, they would also need to update all of the pages for the inflections. If they make a mistake, that means a lot of pages to roll back, especially for languages like French, in which verbs conjugate to give very many different forms. If they didn't do the updates (which is very likely) then the pages end up giving different information, or, in the worst case, contradicting each other.

If there is just a cross-reference, none of this extra donkey work is needed, and users can still find all the information they need. Note that we already do this with English inflections: if the noun "foo" has the meanings "foo: 1. an X. 2. a Y. 3. a Z.", we don't give three meanings at the entry for its plural: "foos: 1. Xs. 2. Ys. 3. Zs"; we just say "plural of foo".

The grammatical information is useful for those who understand it, and those who don't can find out what it means by looking it up in Wiktionary or elsewhere. — Paul G 15:43, 21 March 2007 (UTC)

To cover the issue raised by Ruakh about differences in usage in different languages (such as "yéndose"/"irse" for "leaving") then this can be covered by giving usage notes and examples in the entry for the inflected form. — Paul G 15:45, 21 March 2007 (UTC)
What I'm not understanding here with the concerns about ambiguities is that I don't see why the inflected form posé is any more ambiguous in translation than the infinitive poser. However the word is translated at the infinitive, it should simply have the same translation in the inflected form, except the English should be inflected to the proper tense as well. Ambiguities are a concern for all non-English words in translation; how does pointing back to the infinitive (which is then translated) with a tense specification change that problem? What makes these verb form problems different from normal translation issues with ambiguous English equivalents, which we just have to deal with and try to clarify as best we can with glosses or notes or whatever else the situation requires?
(If it's mostly about the extra work (editing the uninflected form requires changes to all its children), well, that's true, but it doesn't strike me as a very compelling argument. Endless work is the nature of the project.) Dmcdevit 17:30, 21 March 2007 (UTC)
Oh, now I full on disagree with you. The point of having all the information at the lemma is so it doesn't have to be repeated. We can do a rather thorough job at the lemma, add twenty different English translations, an etymology, whatever we need to try and get it adequately covered. That in itself is difficult, but doable. Having all that at all its inflected forms, is not do-able. Not at all. This is the beauty of having non-lemmata as soft redirects. Once we state which part of speech and what their lemma is, we're done. We can work on the lemma for years, trying to get just the right translation, and it's no problem. However, if we include all the same info on all forms, inflected languages become a nightmare. I absolutely refuse to change all 200 or so forms of φιλέω every time someone adds a slightly better translation. And someone will, it's a pretty simplistic translation right now that I'm sure does not covere everything. Having full entries at inflected forms is simply not practical at all. Atelaes 18:32, 21 March 2007 (UTC)
There was recently a similar discussion re Translations of inflected forms of English words at WT:BP##Plurals_and_translations. To go back to your earlier example, I should like to see something like:
  1. (present active subjunctive 2nd-person singular of λύω) often You might loosen (see λύω for further information).
(but preferably less verbose). While I should like to see as much info as practicable at the inflected entries, eg I feel cites using that form should be included there rather than at the lemma entry (and this in itself will help clarify meanings), it will not be possible, for the foreseeable future, to give all the detailed info on meanings and (for English words) translations, that are at the lemma entry. However, this need not prohibit giving the most common meanings and translations of inflected forms, provided it is clear to the reader where they can find further info if they want more than a quick and dirty answer. --Enginear 19:43, 21 March 2007 (UTC)
Re: "[…] differences in usage in different languages (such as 'yéndose'/'irse' for 'leaving') […] can be covered by giving usage notes and examples in the entry for the inflected form": I strongly disagree. The solution is for irse (the lemma entry) to explain everything that's specific to the verb irse, and for Appendix:Spanish gerundios or the like to explain everything that's specific to Spanish gerundios (though this might actually be better as part of an Appendix:Spanish conjugation or something, rather than as its own appendix). It seems crazy to re-explain the function of gerundios at the entry for every single gerundio. —RuakhTALK 19:34, 22 March 2007 (UTC)

I've thought of another problem: non-analogous lemmata. In Hebrew, for example, the verb "Template:wlink‎ (halákh)" means "to go", but the actual verb form "Template:wlink‎ (halákh)" is the third-person masculine singular past tense (suffix conjugation); the infinitive is Template:wlink‎ (lalékhet) (well, or Template:wlink‎ (lékhet)), or Template:wlink‎ (halókh) — different linguists apply the term slightly differently to Hebrew — the take-home point being that no one uses any of these infinitives as the lemma). The current system handles this well: "Template:wlink‎ (halákh)" is translated as "to go", per universal tradition, and "Template:wlink‎ (lalékhet)" is explained as the infinitive of "Template:wlink‎ (halákh)", and so on. How would your system handle this? —RuakhTALK 21:21, 24 March 2007 (UTC)

A similar problem occurs in Latin. Verbs in Latin have five infinitives: present active infinitive, future active infinitive (with three sub-forms), present passive infinitive, and so on. I can't even begin to imagine trying to translate correctly the sense of each infinitive form on every one of the verb form pages. I'd much rather say "present active infinitive of verb X" and put the grammatical explanation into an Appendix. --EncycloPetey 18:52, 25 March 2007 (UTC)
I think definitions should only be given in the main article. If giving a definition for 10-50 inflictions (some languages has alot) and someone finds this definition could be explained better, then he has to change the definition in all 50 articles. If not, we will get tons of more or less thought-through definitions for all inflictions, all saying different things. It will be impossible to find the right discussion page of all these. The work with adding inflections by bot will be 100 times more work. And as a user, you will not be sure where to look for the information of best quality. Focusing on one main article, will make the quality so much better than spread it out on 50 different articles. To find the inflicted forms should just be a way for the user to find his way to the main article, the article with all the information. Including special information about certain inflictions aswell as perhaps a grammatics table of different forms. Creating "definitions" by bot, only stating the grammatical form, is the best way to keep it clean and simple, and adopting this standard will speed up the work considerably to include these forms and present them in a standardized and easy understandable way. The viewpoint many of you already suggested, that different inflictions of different languages often also lack direct equivalence in other languages, giving the grammatical info is also a way to give exact and correct information in an effective way. Then, creating grammatics tables in the main article will be a better way to serve the user with information since he can see the information of the infliction in the main article in its context related to other inflictions. ~ Dodde 01:07, 27 March 2007 (UTC)

Format of abbreviations

I've added a section to WT:ELE on how to format abbreviations. In particular, I mention that expanded forms should notobvious error corrected --Enginear 20:01, 21 March 2007 (UTC) be in their usual forms and not capitalised just because the corresponding abbreviation is made up of capital letters (eg, the expansion of AI should be given as "artificial intelligence", not "Artificial Intelligence") and that expanded forms should link to Wiktionary or Wikipedia articles, as appropriate.

It looks sound to me, but please make any necessary revisions. — Paul G 13:12, 21 March 2007 (UTC)

I mention SNAFU there - it needs a gloss, as someone has already pointed out in RFC for that word. — Paul G 13:14, 21 March 2007 (UTC)
Thanks for doing this. It looks good overall, but I think I disagree on one point. If the expanded form doesn't have and doesn't warrant a Wiktionary entry, then I think the components should be wikified as links within Wiktionary. Whether or not the expanded form warrants a Wiktionary entry, the {{wikipedia}} template should be used to link to relevant Wikipedia articles, which can include any Wikipedia articles on the abbreviation itself (e.g. w:SNAFU) as well as any Wikipedia articles on the expansions (e.g. w:Recreational vehicle). —RuakhTALK 13:44, 21 March 2007 (UTC)
I see what you mean. The reason for linking to a Wikipedia article rather than the Wiktionary articles for the component words is that the user is likely to want to know what the whole expanded form means rather than its component words. But if we can do both, then great. Could you perhaps give an example to illustrate how this would work, and then we can update WT:ELE accordingly if people agree with your idea? — Paul G 15:21, 21 March 2007 (UTC)
The disadvantage of the {{wikipedia}} approach is that it doesn't make clear, for those abbreviations with several meanings, which senses can be found there. I am tempted to specifically write (see Wikipedia article) by the appropriate senses. However, that would lead to an error if the 'pedia article were modified. --Enginear 20:01, 21 March 2007 (UTC)
The problem you mention with {{wikipedia}} is not specific to abbreviations; it's a problem with any noun that has multiple distinct senses, and we need to formulate a general solution rather than a hackaround. (Actually, I think this is much less of a problem for the typical abbreviation, since Wikipedia tends to name articles after the expanded form, so the link text will make clear which sense is being referred to.) One option is to give that template a way to specify which sense is intended. —RuakhTALK 19:29, 22 March 2007 (UTC)
The {{wikipedia}} template allows the entry of the direct diambiguated link. --Connel MacKenzie 15:57, 23 March 2007 (UTC)
Don;t forget that it's possible to use a directed {{pedialite}} template in-line or at the end of the entry under "See also". The specific article name may be entered as a parameter. --EncycloPetey 18:38, 25 March 2007 (UTC)
I honestly think the {{wikipedia}} template needs to be completely rethought. Intended to appear once on a page, it could not do more than link to the primarly definition of a word. But in many cases, not just abbreviations, there are a good number of relevant Wikipedia articles that need linking to. Proper names such as Disney are an example, but also any word that has a more specific technical sense, or several common meanings such as trunk, etc. DAVilla 18:25, 27 March 2007 (UTC)
Part of the original problem was that Wikipedia links are not fixed; articles move around and are renamed. This happens less often today, than a couple years ago, but still enough to be a valid concern. IIRC, that is why we link to "disambiguation" pages whenever possible, rather than specific senses. --Connel MacKenzie 15:46, 16 May 2007 (UTC)

Our deletion logs are being harvested

It appears that any deletion with a deletion summary that contains "content was: 'text here'" gets harvested for the following site: http://www.in-vacua.com/interdiction.html

Now would be a good time for all admins to sign up for the "Replace text in deletion log comment." of WT:PREFS, so we don't accidentally expose personal info posted by vandals in the deletion log. fwiw, --Versageek 07:09, 22 March 2007 (UTC)

Note: This has been de-Connel-ized to the Wiktionary namespace now. Please be bold rewording it. --Connel MacKenzie 21:54, 24 March 2007 (UTC)

The guy behind the site has posted a response, here. It might be good to send him a note explaining why we obscure that information. JesseW 23:43, 5 May 2007 (UTC)

And why is that, anyways? DAVilla 03:54, 6 May 2007 (UTC)
Erm, no, it might be better to ignore him. --Connel MacKenzie 05:53, 6 May 2007 (UTC)

X form headers

The question of not using X form headers (Verb form, Adjective form, Noun form) was never quite formally resolved; WT:POS says at one point it is under discussion, but the tables say that X form is deprecated.

Any objection to just treating this as settled and routinely correcting X form to X? (Which a number of people have been doing for a long time? ;-) Robert Ullmann 14:00, 23 March 2007 (UTC)

(Oh, the reason I ask is that AutoFormat is finding these with some frequency, should it be fixing them? Robert Ullmann 14:09, 23 March 2007 (UTC)

I think that would be non-contentious for ==English== entries. But I recall some respected contributors claiming that X form was almost essential for some highly inflected languages. --Enginear 14:43, 23 March 2007 (UTC)
I don't recall that being the conclusion at all. As I recall, it was that making the "form-of" distinction is even worse for foreign languages, than for English. This is supposed to be targeted to English readers after all. --Connel MacKenzie 15:55, 23 March 2007 (UTC)
On the other hand, bot edits are supposed to focus on non-contentious edits, so this is probably outside the purview of AutoFormatBot. I don't recall seeing a proposal for it, by the way. Looks good so far, but should have more community input. --Connel MacKenzie 21:59, 24 March 2007 (UTC)
Indeed, it is out of scope (User:AutoFormat#Principles ;-) if it is not long resolved. Probably should be voted on and the resolution added to WT:POS. If you look at the control table (User:AutoFormat/Headers) "Verb form" is listed as POS, and non-standard. That means "Verb Form" will get changed to "Verb form", but not to "Verb". And the section will be treated as a POS section. Robert Ullmann 22:29, 24 March 2007 (UTC)
To be clear, I recall (and agree with) Connel's viewpoint on this, but I do not recall a clear consensus re highly-inflected languages, even though there was for English (there may have been a consensus, but I don't recall it). (But this is irrelevant if the change is only to regularise the capitalisation.) --Enginear 15:04, 25 March 2007 (UTC)

Wiktionary:Things to do, Category:Wiktionary

Wiktionary:Things to do and the sysop pages in Category:Wiktionary need some attention. They are out of date. Thanks --Keene 01:01, 24 March 2007 (UTC)

Homophones as a L4 header

I formally propose that we modify WT:ELE to recommend Homophones as a Level-4 header under Pronunciation, just as we have L4 headers for Synonyms and Antonyms following the definitions. Homophones are important enough to warrant their own header, particularly as they may confuse English Learners. Such words should not simply be listed in-line within the Pronunciation section, since they are separate words, and not aspects of the entry under which they appear. Unless there is mass opposition, I'll start a VOTE on the matter in the next week or two. --EncycloPetey 19:14, 24 March 2007 (UTC)

I think you mean level four. (?) Pronunciations is L3 (unless under Etymology n) Not a bad idea. I've seen a number of them. Robert Ullmann 19:44, 24 March 2007 (UTC)
Yes, you're absolutely right. I've modified the text above (and section header) accordingly. --EncycloPetey 21:08, 24 March 2007 (UTC)
I think that's a good idea. Before it goes to a vote, though, we should probably have some discussion on how to make clear that homophones depend on dialect and speaker (Mary/marry/merry, witch/which, etc.). BTW, would this be used at words in all languages (or at least, all languages with non-phonemic writing systems), or only at words in English? —RuakhTALK 21:39, 24 March 2007 (UTC)
I'm not sure how much would be needed. Each entry should have its own pronunciation(s) marked by region. Are you thinking about cases in which the homophones exist only in a limited range of dialect? I could see that as an important issue, and would like to hear suggestions. I seem to recall having seen some odd examples marked before, but can't recall which words they were.
Yes, this would be used in all languages, BUT each would be specific to the language section in which it appears. There would not be any reason to link a German word as a homophone in an English section, just as we wouldn't cross-link Related terms between languages. --EncycloPetey 22:36, 24 March 2007 (UTC)
Before it goes to a vote, I'd rather see someone creatively come up with a new/better scheme for the L3 Pronunciation sections. The "look" of the Pronunciation sections currently is awful. Adding subsections to that would only make it worse. --Connel MacKenzie 22:02, 24 March 2007 (UTC)
Something should be detailing the format of the Pronunciation section. Perhaps at Wiktionary:Pronunciation? (or is there another page already?) Then referred to from ELE. As Connel says, it needs some style ;-) Robert Ullmann 22:07, 24 March 2007 (UTC)
It is my ultimate intent to have a fully-fleshed out style guide for the Pronunciation section at Wiktionary:Pronunciation along with a thorough summary at WT:ELE, but there are many, many issues to be resolved in the Pronunciation section and I am trying to attack them in small steps. Otherwise, we would have too many discussions going on simultaneously and none of them would be fully resolved. I started with the AHD --> enPR proposal, and am now tackling the issue of Homophones. I have a laundry list of other concerns too :) My thought is that the homophone issue makes a nice self-contained sub-issue that could be then formatted independently of the rest of the pronunciation section. We could deal with formatting the rest of the Pronunciation section next. --EncycloPetey 22:36, 24 March 2007 (UTC)
Well then, my vote is for having a bulletted, indented "Homophones" tag within the pronunciation section. I don't want to have to rewrite Dvortybot to account for the intervening section. If you are only going to make it uglier and less consistent, I don't see the point at all. --Connel MacKenzie 22:45, 24 March 2007 (UTC)
Who said anything about ugly or inconsistent? I'm suggesting we adopt a standard, and (if it assauges your concerns) this is the only subsection I can see as being worthwhile within the pronunciation section. Everything else should be part of a bulleted list (unless we come up with a better idea).
Part of the problem I have with a bulleted tag for Homophones is that such things don't show up in the Table of Contents (yes, I use them and like them). Having the section separate also eliminates the need to decide where to put the homophones. With a subsection header, it comes at the end of the pronunciation section every time. With your proposed bulleted item, it could show up anywhere in a list of items that may or may not all be included (regional pronunciations, various audio files, rhymes, and hyphenation, at least). This is part of what is making the Pronunciation section look "ugly" right now -- we have a mish-mash of items that all look different but are all set up in a list as if they had parallel format. --EncycloPetey 22:57, 24 March 2007 (UTC)
It is hard to see how to deal with regional homophones without using bullets, each region having homophones shown after its pronunciations. But you're probably much more in touch with the ideas than I am. --Enginear 15:14, 25 March 2007 (UTC)
Having now read EP's explanation below, I understand better, and think that his examples 1 & 3 are the best, for complex and simple cases respectively. --Enginear 19:42, 26 March 2007 (UTC)
I've found some entries where that would rapidly become a mess. Within a region, there may be more than one pronunciation of a given word. Each specific pronunciation has homophones both in and out of the region, which vary with which of the regional pronunciations is compared. One recent headache is sere. There's a UK (Commmonwealth?) pronunciation of /ˈsɪə/, which is a homophone of UK sear and one pronunciation of UK seer. In the US, there are two major pronunciations of sere: /siːr/ and /sɪr/, with the former having a southern US variant of /siːɚ/. Only this Southern variant is a homophone of US seer, but it is homophonic with seer as generally pronounced in the US. The second US pronunication is also regional, and depending on region is homophonic with either sear or sir, but not seer. The first US pronunciation is homohonic with sear, but not as pronounced in the Southern US.
Frankly, I can't envision any means of communicating even a fraction of that information cleanly if the homophones are interpolated between the various regional pronunciations, and we've only considered the US and the UK so far. I think it would be much better to list the homophones (and the rhymes?) in a Homophones section that is structured first by a bulleted list of IPA pronunciations. Each IPA pronunciation would begin a line of homophones, each identifying in parentheses the region (or dialect) for which it is a homophone.
*{{IPA|/ˈsɪə/|lang=en}}: [[sear]] (UK), [[seer]] (UK)
*{{IPA|/siːr/|lang=en}}: [[sear]] (US)
*{{IPA|/ˈsiːɚ/|lang=en}}: [[sear]], [[seer]] (US)
Of course this is just one possibility. I could imagine the structure of the Homophones section paralleling the main Pronunciation section by organizing along regional lines, just as the Synonyms and Translations sections parallel the list structure of the definitions:
*{{a|UK}}: [[sear]], [[seer]]
*{{a|GenAm}}: [[sear]]
*{{a|Southern US}}: [[sear]], [[seer]]
Or we could start off less ambitiously and just use:
*[[sear]], [[seer]]
...and although that would eliminate all the dialectical information, there are some words for which that simpler form would be sufficient. Please keep in mind that this is not the best example for the potential difficulties, but it happens to be one fresh in my mind and therefore easier to find and discuss.
My feelings are rather strong on this issue because the homophones are words with distinct entries, rather than elaborations of the entry in which they appear. Just as we separate the synonyms, antonyms, and related terms into their own subsections rather than interpolating them among the definitions, so I would like to see the homophones separated into their own subsection rather than interpolated among the pronunciations. Particularly since there may be more than one pronunciation in a given region listed on the same line, and not all of them may share the same set of homophones. This wouldn't happen with our definitions, where each definition gets a separate line, but in the Pronunciation section it is a possibility and happens not unfrequently. --EncycloPetey 18:27, 25 March 2007 (UTC)

I think my preference is for something like this:
Note: homophones vary by dialect and speaker. Each of the following words is a homophone of ''sere'' for at least some speakers:
only preferably less wordy. To see exactly who treats those words as homophones, they'll need to look at the various pronunciation sections, but this both lists possible homophones (useful for language learners) and makes clear that they may not homophones for everybody. —RuakhTALK 23:40, 25 March 2007 (UTC)

I agree with EncycloPetey homophones needs its own heading just like synonyms, antonyms etc. It's not always easy to state clear regions of which a certain pronounciation is used, so I think that information in paranthesis should be optional. The good information about the homophone's pronounciation should be in the page entry for the homophone anyway, not in the page where the homophone is listed. So I reject the idea given by example 2 where homophones are devided by region. Though I think it's great to devide the list by pronounciation given by example 1. It should also be possible to add homophones, even without adding the IPA pronounciation, and sometimes the page entries aren't that complex with many different pronounciation. Therefor also example 3 with a plain list of the homophones should be acceptable, in my opinion. ~ Dodde 02:07, 26 March 2007 (UTC)

Please consider the value of getting as much information as possible to the user on the first screen seen. Making homophones a header "costs" about two (2) lines more than the current WT:ELE approach if there is only one line for the homophones and one additional line for each homophone if the generic approach for derived terms, related terms, and see also is used. That's a lot of prime real estate. To use it without much knowledge of the effect on average WT users in the service of conceptual consistency seems unwise. DCDuring TALK 22:56, 27 January 2008 (UTC)

Retrieved from "http://en.wiktionary.org/wiki/Wiktionary_talk:Votes/2008-01/Homophones_section"

See also

While we are talking about headers: this is one of the most common headers on the wikt, and not mentioned in WT:ELE. It gets used at L4 under a POS, typically after Synonyms, Translations, etc, but before (recognized) headers External links and References. Convention seems to be that See also is references inside the wikt and WM projects that are not Syn/Ant/Related/Derived terms. (It also shows up at L3 when it shouldn't, and sometimes when it maybe should, and also shows up at L2, which it clearly shouldn't!)

I'd think it ought to be listed in WT:ELE in that place in the sequence, as references to other words/indexes/related bits that don't fit in the preceeding headers, but aren't external links, which follow. (Is all that clear as mud?) Robert Ullmann 19:44, 24 March 2007 (UTC)

I can see uses both as a L3 and L4 header. When phonemics has a See also listing phonetics, that usage could certainly fall under the POS as a level-4 header. However, when that Afar entry links to the *Afar edition of Wiktionary, that usage of See also should be at level-3. I can't justify including such interwiki links in a subcategory of the Noun part of speech. --EncycloPetey 21:13, 24 March 2007 (UTC)
That seems just about right. Something has to also say that the L3 use of "See also" has to be at the end of the language section, not intermingled with POS sections. Robert Ullmann 22:03, 24 March 2007 (UTC)
If anything, WT:ELE should specify it as an L3 heading. The L4 headings are inappropriate, and should be "disambiguated" at the L3 level instead. --Connel MacKenzie 22:05, 24 March 2007 (UTC)
I'd be fine with putting them all at level 3, or with using a combination of L3 and L4. --EncycloPetey 22:38, 24 March 2007 (UTC)
I would be wary that "See also" sections are just an invitation for random trivia and spam to accumulate. Anything that we actually want readers to also see can fit under an existing heading, and if not, a new heading could be considered. The phonetics in phonemics above is a related term (or derived term), and "See also" shouldn't be used. It might not be worth going through all the instances of "See also" and changing them, but I don't see any reason to codify that type of header into policy. Dmcdevit 03:20, 25 March 2007 (UTC)
No, not everything can be coded under an existing header, which is why the regular sysops use "See also" so often. For example, Semper uses it for taxonomic entries to link to subtaxa. (e.g. to link Oleaceae to Fraxinus). It's how I cose to link to . These are just two cases where none of the existing headers are really appropriate, and there any many more similar situations besides. Although the "See also" isn't officially sanction in the ELE, it's used all over the place and has been for a long time. I rarely see unwanted detritus accumulating there, though it does happen from time to time. I don't see that as a new problem, though, since we have the same rate of additions of duplicate definitions. --EncycloPetey 03:50, 25 March 2007 (UTC)
I can't think of any cases where a "see also" section is necessary — lists of subtaxa fit more comfortably at Wikipedia or Wikispecies (though if there are just a few top-level subtaxa, or a few particularly important ones, then those should be mentioned in the definition line), and the astrological symbols are conveniently grouped into the interestingly named Category:Astronomical symbols, and phonemics can link to phonetics either in its definition line (in a "contrasted with" phrase, like at white-collar) or in the usage notes (in a "not to be confused with" note, like at affect#Verb), or both — but seeing as "see also" sections aren't going away anytime soon, it would be nice for WT:ELE to mention them and give guidelines on how to use them (where they should go relative to other sections, what kinds of links they should contain, how to format each link, how to order the links, whether and how to group the links, etc.). —RuakhTALK 05:35, 25 March 2007 (UTC)
To be clear, what I mean is that if there is a case where none of the existing headers work, I would much rather that the editor add a descriptive one than a "See also". So, I'd rather see someone using "Subtaxa" (or whatever) than "See also". Dmcdevit 17:12, 25 March 2007 (UTC)
The flip side of that is that it would proliferate the number of various headers, which we definitely don't want to happen. A limited set of headers at L3 and deeper means that (1) we can more easily search for and remedy spelling problems, and (2) we can have a short list for new users to learn and grow comfortable with. Too many extra headers makes it harder to control the structure of the data as we've been trying to do. The See also remains a much more flexible option, particularly when the user is directed to one of the Appendices. --EncycloPetey 18:04, 25 March 2007 (UTC)

The above discussion (appropriately) ignores the other use of "See also" at the beginning of an article to link to alternate Capitalised/non-capitalised spellings, and sometimes spellings with diacritics. Such usage needs separate consideration. --Enginear 19:47, 26 March 2007 (UTC)


What would be a good term or phrase to define a situation or just a word that becomes used excessively- to where it begins to annoy people? Something other than redundancy. Basically comes to mind as an example. From what I see on TV, law enforcement and military personell are the main offenders. It becomes a mindless usage used in every other sentence. It ends up clouding conversation and not complimenting the talker. Another example could be the word absolutely. Where is the spelling checker on this thing? —This unsigned comment was added by Gord 6789 (talkcontribs) 04:03, 25 March 2007 (UTC).

I'd say a cliché, catchword, or buzzword, depending on the details; but you might want to take your question to Wiktionary:Tea Room, which is more suited to that kind of question. —RuakhTALK 05:16, 25 March 2007 (UTC)
Yes, cliché is the appropriate term here. You can also use "hackneyed phrase".
Wiktionary has no spellchecker. You can always spellcheck content in a text editor or word processor and then copy and paste it here. — Paul G 09:51, 25 March 2007 (UTC)
You can turn on Wiktionary's (primitive) spellchecker at WT:PREFS. Note that Firefox 2's spellchecker is quite superior. --Connel MacKenzie 06:00, 6 May 2007 (UTC)

Use of anchors in {{t}}

The template {{t}}, used for linking translations to other wiktionaries, is great, but it doesn't allow, as far as I can see, for the use of anchors. I've just been working on "vine", of which one sense is translated as "vite" in Italian. As this is also a word in French and probably several other languages too, I wanted to link the translation to the Italian section, thus, [[vite#Italian|vite]], but this won't work if the translation is given using the {{t}}.

I see that this was discussed when the template was created. Was it ever resolved? Couldn't the template be parsed to recognise an optional template following one containing a hash? — Paul G 10:04, 25 March 2007 (UTC)

It would be so very, very useful if our language code templates didn't contain wikilinks, then we could translate any code to the canonical language name in another template, and this case would be trivial, {t} could just always generates the anchors (#{{{{{1}}}}}). And it is easy to link the result of a template call anyway, so someone wanting (say) Scottish Gaelic linked could just use [[{{subst:gd}}]]. But it is impossible to unlink the result of a template call. But there is resistance to just unlinking all the code templates, even though would be incredibly useful. Robert Ullmann 14:52, 25 March 2007 (UTC)
What about another series of templates for "unwikified" language names? {{n-en}}, {{n-es}}, {{n-gd}} etc. You are trying to use existing templates for something they were not intended for. (Actually, doing that might be a bit crushing to the WMF servers - checking multiple cascading templates on each translation, in 21,000+ entries?) --Connel MacKenzie 15:06, 26 March 2007 (UTC)

Move to WT:GP??? --Connel MacKenzie 15:06, 26 March 2007 (UTC)

@Paul G: right now, the template adds the link automatically, but due to this, it only works with languages in the WT:TOP40. It is this technicality that is discussed above. So go ahead and use {{t}}. You can see in the preview that it gives the correct link. H. (talk) 16:46, 26 March 2007 (UTC)

I go with Connel and suggest we use two setups of templates containing the language name, one containing the language names with wikilinks (or whatever it rules for the TOP40 and such is right now), and one for use with the {{t}}-template containing the language name without anything else whatsoever. I am not sure why Connel suggests the naming convention "n-". I think "t-" would be more suited since it will be used with t-template in translations lists. ~ Dodde 00:25, 27 March 2007 (UTC)
Thanks. I picked "n-" thinking "name" but really, any prefix will do. What it really should be, is a list of Wiktionaries that exist. So, at the template level, if the language code doesn't have a language name, the template would know not to link to the non-existant foreign language Wiktionary. The "top 40" list is great, for what it does, but this really is a separate problem/function. --Connel MacKenzie 16:12, 28 March 2007 (UTC)

Medical Eponyms

I believe the preferred form for medical eponyms in the AMA style book is to omit the possessive 's. However, there is some debate on this - http://www.medtrad.org/panacea/IndiceGeneral/n5_dirckx.pdf

Wikipedia reports:

In 1975, the US National Institutes of Health held a conference where the naming of diseases and conditions was discussed. This was reported in The Lancet (1975;i:513) where the conclusion was that "The possessive use of an eponym should be discontinued, since the author neither had nor owned the disorder." Medical journals, dictionaries and style guides remain divided on this issue. - http://en.wikipedia.org/wiki/List_of_eponymous_diseases#Punctuation

Should our von Willebrand's disease be Von Willebrand disease as in Wikipedia? (For now, let's ignore Wikipedia's unfortunate use of the capital "V"!)

Ben 12:42, 25 March 2007 (UTC)

Therefore, the attested form you are suggesting we should move? That's not quite right. We should have entries for both forms with Usage notes describing the AMA's/The Lancet's prescription. Funny that they would use that logic - the person's attribution "owns" the disorder. Seems like a pretty weak excuse for trying to change lots of common disease names (which are used primarily by newspapers, not medical journals.)
If/when each new form is attested, we can add each "disorder" entry here. (The Lancet, itself, certainly counts as a "reviewed journal" - that is quite likely the publication for which that clause was added to WT:CFI.) --Connel MacKenzie 15:00, 26 March 2007 (UTC)

The underlying reason for omitting the apostrophe, I suspect, was to simplify spelling, especially when the name ends in "s." The Lancet is a very fine journal, but The Annals of Internal Medicine omits the possessive (Neil A. Goldenberg, Linda Jacobson, and Marilyn J. Manco-Johnson. Brief Communication: Duration of Platelet Dysfunction after a 7-Day Course of Ibuprofen. Ann Intern Med, Apr 2005; 142: 506 - 509. "......concern given the high prevalence of von Willebrand disease (1 in 100 individuals)....". So does the Journal of the American Medical Association (at least since 1982 or so). The New England Journal of Medicine uses the pssesive for the disease, but omits it for "von Willebrand factor."

At any rate, how should we proceed? Would it be necessary to find a published example of each form and set up a new page for each? Abels test and Abels' test? Osler's nodes and Osler nodes? Or, do we just put a usage note on one form that indicates the other is sometimes used? Ben 12:05, 27 March 2007 (UTC)

For a word/phrase to pass WT:CFI it should normally be possible to find three durably archived cites (or one in a refereed academic journal). However, if a word is categorised as a "misspelling" (or perhaps "misuse" though this is more contentious) a higher bar is set (not AFAIK defined). So is von Willebrand's disease a misspelling or misuse? I don't know, but I suggest that there has been a change of scientific fashion which is broader than medical usage.
Previously, those who discovered (or improved knowledge of) scientific entities were often linked to their discovery, as in Halley's comet or Weil's disease. But nowadays this is considered a flawed description -- Edmund Halley did not own "his" comet, and Adolf Weil did not suffer from "his" disease, as is perhaps implied by the descriptions.
So now, scientists refer to Comet Hale-Bopp (and indeed Comet Halley) and von Willebrand disease. For older phrases, I suggest that both are valid, perhaps with a note on the 's version that the usage is now deprecated within the scientific community. For newer discoveries, perhaps they should be treated the same, or perhaps the 's version is a misuse. Of course, if less than three (or one refereed) cites exist for a spelling, then the issue does not arise as it cannot even meet normal CFI.
To answer the specific question, I believe there should preferably be separate entries for each spelling which meets CFI, with cites for each. However, this doesn't mean that it is essential for a contributor to add more than one entry or add any cites at all. It is better to add a single entry for a term believed to meet CFI than add none at all; it is even better to add a "soft redirect" entry for "the other" spelling; having one or both entries cited is better still (some say this is best, while others of us would prefer two "full" cited entries). This is a wiki. Once a basic entry is in place, others can build on it (and usually will if its appropriateness is later challenged). --Enginear 15:45, 27 March 2007 (UTC)

I think we should have, as Enginear suggested, seperate pages for each spelling and include relevant context labels, etymology or usage notes as appropriate.--Williamsayers79 13:31, 28 March 2007 (UTC)

Enginear, thank you for restating what I said more clearly. --Connel MacKenzie 16:14, 28 March 2007 (UTC)

OK, I like this solution, and I think I understand it, too, except: What is a "soft redirect?" Thanks --01:37, 29 March 2007 (UTC)

A "soft redirect" is what would be considered a "stub" entry on Wikipedia. The minimal Level two language heading, the minimal level three part-of-speech heading, and a "#" definition line using one of the form-of templates, such as {{alternative spelling of}}. --Connel MacKenzie 18:17, 29 March 2007 (UTC)
Basically, a "soft redirect" is an annotated link: hametz is an example of a simple soft redirect to a full cited entry at chametz, cat-flaps is a cited soft redirect, while cat-flap and cat flap are full entries each noting the existance of the alternative spelling. --Enginear 18:26, 29 March 2007 (UTC)

Language sort order

In WT:ELE, we specify that languages (after English) are to be sorted in alphabetical order by the English name in L2 sections, likewise in Translations sections. Strictly, that means that Classical Nahuatl should sort under C, and Old English under O, etc.

I think it would be better if we sorted on the base name in these cases, (which is what people often do anyway), so that Old English sorts as "English, Old", while remaining "Old English" in the header. And would group with English on the page in this case. "English, Middle" and "English, Old" conveniently sort into reverse chronological order following English.

Prefixes treated this way would be Old, Middle, Middle High, Ancient, Classical etc. Or we could treat any language name that ends with a recognized language name as something to be "inverted" for sorting?

Or do we just stick to strict alphabeticity? (is that a word?)

See wine and vino Robert Ullmann 16:30, 25 March 2007 (UTC)

I'm not sure how I feel about that, but it adds another layer of complexity to transaltions section. We already group some languages rather strangely, varying by editor. Should the various forms of Chinese (which are not called "Chinese") be grouped together (look at the entry for birthday)? Should all eight or more flavors of Sami (look at Monday)? In short, your question is part of a larger sorting issue for languages in the Translations section. For instance, would you want to group Scottish Gaelic with Irish (Gaelic) becuase the end of their name is "Gaelic", or separate them because we have arbitrarily decided to call Irish Gaelic simply "Irish"? Do we group Tosk Albanian together with Gheg Albanian because they both contain the word "Albanian", or do we separate them because they're not mutually intelligible anyway? And if we decide on a case by case basis, just how long a list of little sorting rules would be too long?
I absolutely do not agree with placing language families together in the translations section. We should be consistent, applying simple rules. Even doing it for Chinese, this opens a can of worms. Are we then to do the same for other language families? No, each language or dialect that is identified should be alphabetized independently. DAVilla 20:44, 26 March 2007 (UTC)
Your preliminary list merely scratches the surface of possible prefixing words; consider Western Apache versus Plains Apache, Moroccan Arabic versus Egyptian Arabic, Upper Sorbian versus Lower Sorbian, Inari Sami versus Lule Sami, and note that Tok Pisin is etymologically a compound as well (though I doubt the average user would guess that).
That said, I think it would be good to alphabetize while ignoring words like "Old", "Middle", and "Ancient", primarily because these describe a specific period of development in a language. I think it would be good to group the various forms of Arabic, and possibly the various forms of Chinese. However, this is a very tricky issue with many angles and I don't think I've gotten them all sorted out in my own head yet. --EncycloPetey 17:56, 25 March 2007 (UTC)
Sort of what I was thinking: the "age" qualifiers should be secondary key (not ignored entirely). I think this will make a lot of intuitive sense to people, as well as being fairly simple to code where needed. (if starts with word in set, moe it to the end, then sort) I don't want to get into groupings, it is endless, and they overlap in various ways; this is (one of the things) that the alpha order was intended to avoid. Robert Ullmann 15:09, 26 March 2007 (UTC)
I don't think this would be very transparent to contributors, so it really makes sense to keep the rules very simple. If you really believe this sort order is desirable, then you'd have to be willing to allow for the naming convention to be "English, Old" etc. But in reality this is of minimal benefit. Olde English could just as easily be called Anglo-Saxon, and there are other languages where the "old" language is only known by an entirely other name. DAVilla 20:44, 26 March 2007 (UTC)

Does the ISO language definition code sort in an intelligible order? Ben

Not really. Some of the codes are similar to the English names, but that isn't the objective of the coding. For example, Mandarin, German, French, Dutch, Cantonese are in alphabetical order ... (cmn, de, fr, nl, yue ;-). There are wikts that use the code templates all the time, and sort on them (which produces a consistent, but often apparently random order). Robert Ullmann 15:09, 26 March 2007 (UTC)
  • I think Hippietrail's work on the MediaWiki extension to group language names in a sane manner, is the much better approach. http://wiktionarydev.leuksman.com/ We really should be looking at improvements to his methods there, and adoption of his extensions here. --Connel MacKenzie 16:10, 28 March 2007 (UTC)
But I haven't seen anyone try to implement this within a section of an entry. How would this work apply to the various translations sections, and would such a format make it difficult for visiting translators to add or check translations? --EncycloPetey 21:59, 30 March 2007 (UTC)
Have you given Hippietrail that suggestion on http://wiktionarydev.leuksman.com/ yet? He may already have something up his sleeve... --Connel MacKenzie 16:00, 31 March 2007 (UTC)
Good idea. I've thought about this but not when I've been editing my todo list on WiktionaryDev. I'll add it now. — Hippietrail 17:19, 31 March 2007 (UTC)
I'm sorry that I don't understand what the extension does yet. Hippitrail, if you could automatically pass a {{{languagecode}}} and {{{languagename}}} parameter to every template included within any section, it would be useful to the utter extreme. DAVilla 12:36, 3 April 2007 (UTC)

Constructed languages

Hey, I know we've brushed over this topic before, but I really think it would be best to finally come to some sort of decision on the matter. Do we include constructed languages, or more specifically which ones? It seems rather clear that we do include Esperanto; I don't think there is much debate about that. But what about Quenya? It's an Elvish language constructed by J. R. R. Tolkien for his Lord of the Rings series. We currently have an anon cleaning up the section, and well, I guess I'd feel sort of shitty if a few months down the road we decide to squash all their hard work. We should either put a stop to it right now, or decide to allow this language. I must admit I don't have any strong convictions about it one way or the other. If any dictionary is ever going to include such things, we are certainly the perfect format for such a venture, not being limited by paper. However, this admittedly opens the doors to all sorts of nonsense. If I was forced to make a decision right now, I would say allow Quenya, but disallow certain other languages, such as Brithenig, simply because I like one language more than the other. But it seems that perhaps Wiktionary ought to have some more rigid standards than that. Any thoughts, anyone? Atelaes 23:12, 26 March 2007 (UTC)

I would prefer to put lexicons for minor constructed languages in the Appendix namespace in a single page rather than in the mainspace, but I'm not sure of a good metric for differentiating major constructed languages (Esperanto, Interlingua, Ido, Lojban, etc.) from minor ones (Quenya, Klingon, etc.). Words, and by extension languages, whose use is restricted to a single literary work like Quenya would seem to fail CFI in my opinion, but that doesn't settle the matter completely, considering other languages like Toki Pona. Dmcdevit 23:54, 26 March 2007 (UTC)
Does Quenya have an ISO language code? Wasn't that part of the stadard (or at least rule of thumb) we were using? RJFJR 16:04, 27 March 2007 (UTC)
Yes, 'qya. But the relevant section of WT:CFI#Constructed languages says that uncoded languages are not acceptable, but coded constructed languages may or may not be; and gives a specific list. The current list (and policy) seems pretty good. I would think if someone wants to change the status of any given (coded) language, it just goes to a vote. At present Quenya does not meet CFI, it is explicitly listed as not approved (all of the constructed languages coded in 639 are explicitly listed as in or out).
So the question that presents is: do we want to change CFI to permit Quenya? Robert Ullmann 16:15, 27 March 2007 (UTC)
I think we should stick with what CFI states until given good reason to do otherwise. It's just that I've never heard anyone interpret that particular CFI paragraph so simply. Last time I brought up this issue, it was a whole lot of "ummmmm"'s and "I don't know"'s. Well, that certainly answers the question to my satisfaction. All that remains to be said is this: If anyone disagrees with this, speak now. If I don't hear a community uproar in about a week, I'm going to start going through that list and cleaning out all the Quenya, Brithenig, etc. However, the question also remains of what to do with all these entries. My instinct is to go with Dmcdevit's excellent suggestion of putting them all in their own indeces. Atelaes 16:31, 27 March 2007 (UTC)
What, precisely, do you mean, "cleaning out" that list? You'd need a separate vote on each one, would you not? --Connel MacKenzie 16:17, 28 March 2007 (UTC)
By cleaning out the list I simply mean moving all the mainspace entries which do not meet current CFI (because they are part of a non-CFI language) to appendices. It does not mean that I'll be changing the list. I don't think that requires a vote. If you think it does, please say so. Also, does anyone know of a good example appendix which I can model the Quenya appendix after? Another question, should I leave a redirect (to the appropriate appendix) in place of the article, or just delete it entirely (after all the info has been moved)? Atelaes 05:38, 29 March 2007 (UTC)
Whew. Thanks for the clarification; I'm glad I merely misinterpreted it the first time. --Connel MacKenzie 18:15, 29 March 2007 (UTC)
I'm confused by that list. It says Interlingue is accepted, while Occidental is not; but according to our and Wikipedia's articles on them (Interlingue, Occidental, w:Occidental language), they're the same language, Occidental being an older name and Interlingue a newer one. Am I missing something? —RuakhTALK 17:34, 27 March 2007 (UTC)
That's correct. It appears Occidental isn't used at all, and is not very notable except as Interlingue's predecessor. Dmcdevit 07:14, 29 March 2007 (UTC)
My thoughts are tat if these oddities do not meet the current CFI then they sould be removed from the main namespace. There is no harm in having them in an index or appendix are like the proto-languages.--Williamsayers79 13:06, 28 March 2007 (UTC)
BTW, Quenya is stretching it a bit anyway, but Brithenig really takes the biscuit!--Williamsayers79 13:06, 28 March 2007 (UTC)

The problem with this though, is that while I can say why Quenya is forbidden, as someone who isn't familiar with these languages, I can't tell why Novial, for instance, is included. I can't even find any indication its noticeably more well-known or used than the others. Dmcdevit 07:14, 29 March 2007 (UTC)

Novial has some active speakers/writers/users. See, for example w:nov:Chefi pagine ;-) Quenya is just a vocabulary in a literary work (albeit a very notable one). Robert Ullmann 18:21, 29 March 2007 (UTC)
I suspected as much, though I was hoping for a more quantitative measure to differentiate between the non-literary conlangs. Dmcdevit 22:06, 31 March 2007 (UTC)

This isn't a keep/kill vote really, but I actually benefited from our Quenya entries just yesterday, when I read this, leading me to look up tengwar. If not for wiktionary, I probably never would have figured it out. In that sense, it is nice to have entries for Quenya words, and I'm tempted to say, "what harm can it cause?" On the other hand, obviously Brithenig words shouldn't be put in unless they see a massive increase in usage. I think Quenya really does fall right smack in gray area, and it really is a tough decision. I think it would be good if, assuming we move the Quenya stuff to appendices, when people search for a word which we don't have, search results might include the Quenya appendix (or other language appendices) if applicable. Although our appendices are awesome, I don't imagine many of our casual readers have discovered the many joys of appendices yet :-) Language Lover 23:36, 29 March 2007 (UTC)

One thing though, which I just realized, is that all our Quenya entries are morphologically transliterated into the Roman alphabet! If Tolkien's Elves really did exist, wiktionary would be next to useless to them, since we wouldn't have the words in their rune forms, even if those runs could somehow be transmitted into the search box! :) Language Lover 23:39, 29 March 2007 (UTC)
As it turns out, there actually is a Unicode range reserved for Tengwar. How many people can see this:  ? Somehow I can. But, I admit that many people probably can't. In any case, I think it might be nice to have both Latin and Tengwar scripts. Atelaes 07:00, 30 March 2007 (UTC)
"Reserved" is not the word I'd use. The ConScript Unicode Registry attempts to coordinate the use of the Private Use Area for artificial scripts, and recommends the use of a certain part of the Private Use Area for Tengwar use; but so far, according to w:Tengwar, only one font supports it, and given the nature of the Private Use Area, this can never become standard. —RuakhTALK 21:11, 30 March 2007 (UTC)
Moving constructed languages that are used in one or more major works, but do not meet CFI as living languages, to appendices seems like a brilliant idea to me. -- Beobach972 21:32, 31 March 2007 (UTC)

Amending WT:CFI

I do think we should codify this better in WT:CFI though. If we agree that constructed languages whose primary use is restricted to a (series of) literary work and its fans do not meet WT:CFI, may be allowed in lexicons in the Appendix: namespace, but are not appropriate in the main namespace, shall we put that to a vote? Currently, CFI seems to imply that there is no agreement either way.

I can't think of a good metric for other ISO 639-3 languages. It has to do with how well used it is, but a measure of that would be nice, if anyone can think of one. Dmcdevit 22:06, 31 March 2007 (UTC)

Specifically, WT:CFI#Constructed_languages implies that there is no agreement; I would like to change the section to add a fourth bullet stating "There is consensus that languages whose origin and use are restricted one or more related literary works and its fans do not merit inclusion as entries or translations in the main namespace. They may merit lexicons in the Appendix namespace." Dmcdevit 00:51, 4 April 2007 (UTC)

From the current list of languages in WT:CFI#Languages to include, which would be moved there? --Connel MacKenzie 04:45, 4 April 2007 (UTC)

Quenya, Sindarin, Klingon, and Orcish I think are the applicable ones mentioned in other categories. Dmcdevit 05:20, 4 April 2007 (UTC)
Sounds worthy of a vote, to me. Thank you. --Connel MacKenzie 05:26, 4 April 2007 (UTC)
I created the subpage: Wiktionary:Votes/pl-2007-04/Fictional languages. Any last suggestions about the wording, or should I make it live? Dmcdevit 06:01, 4 April 2007 (UTC)

Archiving of WT:RFV

I notice that for the last year Wiktionary:Requests_for_verification/archive, linked from the header on WT:RFV, no longer has a list of words which have failed RFV (and the Jan & Feb 07 archives don't exist at all). Is the list of failed words available somewhere else? If not, how are we meant to check if a word has failed in the past? Previously a search for it on the /archive page was sufficient. I vaguely remember this being discussed, but can't remember the resolution. --Enginear 11:57, 28 March 2007 (UTC)

Someone had volunteered to manually maintain that list. I've taken stabs at automating it, but have not had time to devote to completing my preliminary efforts on that task. --Connel MacKenzie 16:06, 28 March 2007 (UTC)
I have a suggestion for how to maintain an archive of RFV-failed terms without maintaining the terms themselves anywhere on wiktionary where search engines could find them and archive them and cloud future verification requests (which seems to be a community concern and reason against keeping them on Wikt): remove them from the RFV page, then put a link to the diff in the archive. What do you think? -- Beobach972 18:38, 28 March 2007 (UTC)
Thanks. That is an excellent suggestion, actually. I forget why it fell into disfavor in the past. As we move towards an automated solution, I think that merits another look - as it is a superior method. At any rate, the problem is implementing the solution - the day to day drudgery of someone actually doing it. --Connel MacKenzie 18:13, 29 March 2007 (UTC)

WordWeb 5 Freware Dictionary

Anyone have experience with this yet? They seem to be using Wiktionary (Yay!) so it might be worth checking into, in detail, at some point. (I stumbled across it, here.) --Connel MacKenzie 18:22, 29 March 2007 (UTC)

Hmmm, they don't seem to be complying with the GFDL very well though. --Connel MacKenzie 18:29, 29 March 2007 (UTC)

RFVing of words with generous b.g.c. hits

Kind contributers, I wonder if anyone here would agree with the following proposal. I propose that if a word has at least 10 immediate citations right at the front of books.google.com upon a simple search, and they are independent, then the burden of proof should be upon the person who wants the word deleted, not upon those who want the word kept. With the current system, a person could go RFV cat, dog, the, and pencil, and the burden of proof would be on those of us who like those words, to spend some of our time writing citations for these obviously good words. For example, someone recently RFV'd usurpress, even though providing cites for this word is just a tedious task of entering it in b.g.c. and choosing some cites from there and typing them in here. We could better use our time than that, unless the person doing the tagging offers an actual reason why the word should be deleted :-) The oversight of words is an important part of the dictionary, but in cases where a 1 minute search will immediately make it clear a word passes CFI (without even resorting to controversial cites like usenet), I think such words' presence is definitely a good part of our dictionary :-) Language Lover 19:06, 29 March 2007 (UTC)

That misrepresets the current practice quite wildly. If something is "clearly in widespread use" there is no reason to issue an "rfvfailed" for it. But in theory, all entries should have references, so an RFV isn't the sinister thing you are making it out to be. Re: "oversight:" Wikipedia has a special meaning for this term; in general, I mean the common meaning of the word, not the Wikipedia special meaning. --Connel MacKenzie 19:14, 29 March 2007 (UTC)
Wow, thanks Mr. MacKenzie, this is an interesting aspect I didn't realize. :-) You're definitely not considered one of the leading Wiktionary contributers without good reason!! :D See below (my response to Enginear) for more... Language Lover 20:28, 29 March 2007 (UTC)
I agree with Connel on this one. It does not seem to be the case that people actually are RFVing common words, and then just forcing someone else to work on them. From what I can tell, people are generally only RFVing rather obscure words, which are the ones which most desperately need cites. Ultimately, I think it must simply be admitted that part of the drudgery of entering obscure words is fighting for their existence. If people start RFVing dog, cat, and the, then perhaps the policy might need some revisiting. However, for now, I think it works well as it stands. Atelaes 19:26, 29 March 2007 (UTC)
And also, this is a wiki. We don't all need to do everything ourselves. If someone refers a word they don't recognise, without checking adequately, then as you say, it is quick for someone else to correct them (as I've just done for WT:RFV#lose one's rag). OK, it then takes time to add the cites to the page, but I've yet to see a case where a link to plenty of clearly appropriate cites had been added to the RFV page and yet the entry still failed. Either someone copies the cites into the entry, or someone maintaining RFV uses their discretion to strike it from the RFV page, or to leave it in place for another month until one of us has time. --Enginear 19:51, 29 March 2007 (UTC)
Wow, thanks for all this great discussion :-) Alright, it looks like people generally are cool with the current system, at least as long as noone starts indeed RFVing cat and dog. :-) Altering the subject, what would you guys think of a new {{rfcites}} tag for words which one does not want deleted, but simply for which one feels some in-article citations would make our readers happier? :-) Mr. MacKenzie brought up the great point that sometimes an article would improve with more cites, but is not one we want to delete outright. A tag similar to rfphoto would be entirely appropriate :-D Thanks Connel, you are a great innovator!!! :-D Language Lover 20:27, 29 March 2007 (UTC)
I think that an {{rfcites}} is an excellent idea for words which are obviously in use, but could use some cites simply as an effort at improvement. However, one thing which would need to be considered is some general guidelines as to what sort of entries could specifically use cites. Because, ultimately, all entries which don't have cites could be improved by them, but I imagine that {{rfcites}} would be used on entries that, for some reason, would especially benefit from them. Atelaes 20:50, 29 March 2007 (UTC)
Also of concern with such a teplate, is Eclecticology's idea of separating {{rfv}} and {{rfv-sense}} onto separate pages. That same distinction (for {{rfvcites}}) seems advisable, so this sort of confusion is (perhaps?) less likely to arise. Maybe. I don't feel like creating a separate scheme for them though, nor am I particularly inclined to monitor yet-another-maintenance-page. --Connel MacKenzie 15:54, 31 March 2007 (UTC)
While I have struck words for being clearly widespread use (and I probably should have done so for attender, but what the heck), I could not do so for usurpress as much as it is evident to me that it belongs. DAVilla 20:10, 29 March 2007 (UTC)

Wiktionary:Contact us and OTRS

You might notice our brand new "Contact us" link in the sidebar. It goes to the new Wiktionary:Contact us page. This is a feature of all Wikipedias, and it leads to various help pages as well as a link to the Foundation email address (OTRS). We decided to start answering Wiktionary-related emails at OTRS, and that's one of the primary reasons for the new sidebar link. I think we'll see what the volume is in the next few days and then determine what to do about volunteers, if anything. However, currently that page is mostly a copy of Wikipedia's page, with the content substituted for Wiktionary equivalents by me. Please spruce it up, make it better, and more Wiktionary-like. Dmcdevit 21:00, 29 March 2007 (UTC)


A few months back, default http://www.dictionary.com/ lookups started displaying translations.

Dictionary.com has always been the number one source for inadvertent copyright violations, on en.wiktionary.org. Particularly, from visiting Wikipedians unfamiliar with our rules and our particular copyright concerns.

To me, there seems to be a direct correspondence between the change at dictionary.com, and the increase in contributors entering questionable translations. It did not occur to me that d.com was the source of the translations, particularly for sockpuppets of people who did not speak those languages.

Could all sysops, when checking translations, please remember to check dictionary.com, to see if any patterns evolve? To me, edits like this are particularly disconcerting. Do we automatically block indef for stuff like this?

Thanks in advance, --Connel MacKenzie 15:33, 30 March 2007 (UTC)

I think block indef would be a large overreaction. However, you have a good point that this is something that should be watched for and stopped. Atelaes 18:08, 30 March 2007 (UTC)
Blocking them for a while and let them explain themselves. If not indef block. That edit you picked out highlights the kind of arrogance some of these people display, a good blocking always brings them down a peg or too.--Williamsayers79 18:27, 30 March 2007 (UTC)
Well deal with it as you like, but please don't block the user who's editing jungle. I'm working with them. Atelaes 18:30, 30 March 2007 (UTC)
I agree. I don't see wht's so disconcerting about that edit, Connel. Eseentially, the only changes are (1) changing the outdated language name "Hindustani" to the more usual "Hindi", and (2) changing the POV so that it leaves open whether the word came into English through Urdu or Hindi (instead of definitiely from one or the other). I don't see either of these changes as potentially a copyvio. --EncycloPetey 21:44, 30 March 2007 (UTC)
I wouldn't say that the term Hindustani is outdated. The term encompasses Hindi and Urdu, and it especially refers to the colloquial versions (mutually intelligible) of both standards (which have become so distant from each other, due to political reasons, that they're almost mutually unintelligible). --Dijan 20:33, 1 April 2007 (UTC)

Math words with many equivalent definitions

In math, there are some terms for which there are many definitions, such that the definitions are all actually the same, but that fact is not at all obvious. For example, computability theory is famous for having tons of definitions of computable functions, which seem utterly different, but turn out to be identical (but proving that takes a lot of work). I wonder how we should define such words. How about if we gave a broad handwavey definition, together with a link to a subpage which lists the most common formal definitions? What do you guys think? :) I'd like to make a page for semisimple, and I'd also like to add computability theory senses to recursive and computable. :-) Thanks, y'all!!! :D Language Lover 06:21, 31 March 2007 (UTC)

Hopefully, in such cases there will be a comprehensive article on Wikipedia that may be linked. In such cases, a general definition and link to Wikipedia should suffice, since Wikipedia allows for a lengthier discussion. --EncycloPetey 08:00, 31 March 2007 (UTC)
With the current software limitations, I am pretty strongly opposed to "subpage-for-everything" concepts. The "Citations" tab was enough of a nightmare, but at least it has direct lexical relevance to dictionary-making. --Connel MacKenzie 15:56, 31 March 2007 (UTC)
It's not only in maths that precise definitions are required -- some of the terms used in linguistics, for example, seem equally complex to me, and certainly in physics/engineering, work is defined in several ways which all end up with the same result (though I accept this is a lot simpler than the words you are talking of). It's just that the math ones stand out more in a dictionary. I think there's a general consensus that "technical" definitions should be included, provided that they are not too distracting to that majority of users which only wants the everyday meaning.
In the long run, once someone works out how to do it, I like the idea of collapsible sub-sections for precise definitions, and again for cites <onto soapbox>(to me, the idea of Citations sections/sub-pages reliant on glosses is wrong: citations help to draw out the exact meanings of definitions so should be adjacent to them) <off soapbox>. But meanwhile, here's "one I prepared earlier". I did it a few months ago, and I don't really like it. But I've offered it for criticism before, and so far no one's improved the layout.
Obviously, it would look better, to most, if the (more precisely) sections were collapsed, and only opened up to those who wanted them...Now I've remembered it, I'll add some cites soon. --Enginear 18:14, 31 March 2007 (UTC)
I'm sorry, but most of that information strikes me as encyclopedic rather than as definitional. For one thing, what's with the list of common heat sources? Is this to imply that the use of the term "boiler" is dependent on what heat source is used? If a new heat source were discovered today and people started using it to build boiler-like devices and started referring to these devices as "boilers", would you consider that to be an extension of the existing sense? —RuakhTALK 21:58, 31 March 2007 (UTC)
I largely agree, which is why I said "I don't really like" the article. I did it 10 months ago, when I was far less aware of what should/should not be here. I am fairly sure that the short list of prohibited heat sources (hot water and steam) in the (more precisely) sections is definitional and never likely to change. Some of the rest should probably be in Usage notes (since it makes clear circumstances where the word should/should not be used, at least in the building industry), and some should be junked. Improving it is still on my "to do" list, but I won't be offended if you rewrite before I get to it (which will be a few days at least). --Enginear 20:00, 1 April 2007 (UTC)

Much ado about Graphemes

Since I see that the last edit to the Beer Parlour was on the "Connel is an asshole" topic (and I know we've all heard enough of that), I thought I'd try and start a discussion on something a bit more constructive:the formatting of letter entries. Seeing as we currently have about five letters in RFC, and, in my opinion, most of our letter entries are kind of messy and unstandardized, I thought a BP discussion on the topic was in order. In my opinion, the first topic which needs to be covered (and has been much discussed on the RFC page, without any conclusion) is what header does a letter go under. The precedent appears to be translingual, and perhaps it should stay that way. However, Stephen has made the excellent point that, while most Latin letters are used in a slew of languages, most other letters are somewhat more restricted. Take the letter β for example. As far as I can tell, it is used only in Greek. Now, bear in mind that I'm taking specifically about β as a unicode character. Coptic also uses beta, but it is a different beta: . So, in that respect, it's not terribly translingual. How about the character 𒊕 (don't worry, I can't see it either), used in Sumerian and Akkadian? By the way, there is an insightful discussion about this topic centering around that particular character at Wiktionary:Requests for cleanup#𒊕. Another question which arises is what information should be included at the entry for a letter. Should it have a pronunciation for every language which uses it? Should these pronunciations all be within a single L2 header, or should each language receive its own L2 header with a pronunciation section within it. If they're all within a distinct Language header, what's the part of speech? And where would we include IPA in all this mess? Does a letter get an etymology? It certainly descends from something (in the case of β, it comes from the Phoenecian letter 𐤁). I imagine there is a whole slew of other issues that could be raised, but I figure that's enough to start. To facilitate this discussion, I've created the entry β/test, which is an identical copy of the β page. I figure we can use this page as a testing ground for different ideas without worrying about presenting the users with some half-baked crap. I picked a Greek letter because it's a bit less complicated than a Roman letter (fewer languages), and thus seemed rather more appropriate for testing. It even comes with it's own special template: {{greek letter-temp}}, which is only used on this page. Hamaryns has asked that this template be cleaned up anyway, so I figure people can fiddle around with it a bit without screwing up all the Greek letters in the process. So, any takers? Atelaes 12:38, 2 April 2007 (UTC)

I’m not having any luck with the encoding [[&#55304;&#56981;]]. I wonder if you mean 𒊕 Wiktionary:Requests for cleanup#𒊕. The original works for me, as does the last, but not &#55304;&#56981;, it looks like <|=| H. (talk) 09:51, 4 April 2007 (UTC)
Yes, besides that fact that some scripts such as the Roman alphabet are used for many languages, other scripts are used only for one or two languages (e.g., Lao, Burmese, Thai, Korean, Oriya, Khmer, Tamil, Malayalam, Cherokee, and so on). Also, there are some letters in common scripts such as Cyrillic that are used for only a single language (e.g., Cyrillic is used by a number of languages, but the Cyrillic letter Template:RUchar is limited to Chuvash). And while some scripts are used by multiple languages, there are some languages that use multiple scripts (e.g., Serbian).
In my opinion, the way I set up the Cyrillic and Arabic scripts takes everything into account and works well. See for example ж and Template:ARchar. The principal header is the name of the script (==Cyrillic alphabet==, ==Arabic alphabet==) and the next level heading is ===Letter=== (in the case of alphabets), ===Syllable=== (for syllabaries), ===Logogram=== (such as Sumerian).
As you see in ж and Template:ARchar, the "definition" lines indicate the position in the alphabet and the pronunciation in the different languages that use that letter in that script, in alphabetic order.
After the script section that describes the letter, if the letter in question is also a word or abbreviation of some languages, these sections follow with second-level headers (==Russian==, ==Urdu==, etc.).
I always thought that that word translingual was very odd for this purpose. A few scripts such as Roman are used by many languages (translingual), but most scripts are not. And there are some letters in the Roman and Cyrillic alphabets are restricted to a single language. To me, symbols such as !@#$%&*()-+=/., are translingual, because they are used not only by virtually every language that uses the Roman alphabet, but also by languages that use several other alphabets, including Cyrillic and Greek (although the meaning of specific symbols vary from language to language even if the typographic symbol does not.
We have had short discussions about this several times over the last couple of years, but nothing has ever been decided. I did the Cyrillic and Arabic alphabets and could do the some for many other scripts, but I don’t want to do it when this is still all up in the air the way it is. —Stephen 18:25, 3 April 2007 (UTC)
I think you set those up very nice, but am unlucky with the L2 headers X alphabet. That’s why I would propose to just put them one level lower, with, indeed, Translingual as the l2 header. As I suggested in the page about the cuneiform above, we probably want to think of a better word instead of translingual. Maybe ==Symbol==, with l3 ===Cyrillic letter===, ===Roman letter===, ===Cuneiform logogram===, ===Diacritic===, ===Ligature===, ===Reading mark=== (for !;:.$, but there is probably a better English word for that), ===Mathematical sign=== (for +-%#∃∄∃∈∉...), ===IPA symbol===, ...
For the other use of Translingual, for things that are more than symbols such as µg, ff, mW etc., it may be kept, or we think of a second alternative. H. (talk) 09:22, 4 April 2007 (UTC)
Thanks for bringing this up. As some might have noticed, I’ve been spending a lot of time on the first few letters of the Greek Alphabet. β sort of represents what I think it should look like now. But if you browse through the recent history, you see that I’ve come a long way to getting to that form. I would recommend that (and perhaps also for α, γ, ...) before commenting here. H. (talk) 13:08, 2 April 2007 (UTC)
Wait, I thought Wiktionary treated Modern Greek and Ancient Greek as separate languages? β is used in both.
At any rate, I've been thinking about this for Hebrew letters, and what I was thinking was:
  • letters are translingual; even if the letter is only used in one language's writing system, actual graphic references to the letter are the same no matter what language the referring text is in.
    • they don't have a pronunciation per se, though we could have e.g. an Appendix:Greek alphabet that gives that kind of information (though insofar as this borders on a discussion of the phonological history of Greek, it might be more appropriate on Wikipedia).
    • uppercase and lowercase letters are separate, for a few reasons, most notably that case mappings depend on the language (e.g., in Turkish "i" and "I" are different letters, with "i" having a dotted uppercase counterpart and "I" having a dotless lowercase counterpart), and that the Greek lowercase letters should have their uppercase counterparts as their etymologies.
  • names of letters are language-specific; for example, "beta" is an English word that refers to both β and Β.
  • the definition of a letter in an ordered alphabet should link to its predecessor and successor. (The reason I say this should go in the definition is that the same letter may appear in multiple alphabets — especially common with Latin-based and Cyrillic-based alphabets — in which case the letter's predecessor and successor may vary. For example, n should have a separate definition for the Spanish alphabet, giving ñ as its successor.)
  • if a letter is a specific form of an abstract letter (like β is of beta, and a Japanese katakana character is of a kana character), then it should link to the other forms.
  • So, for example, I think β should be something like "==Translingual== ===Letter=== β # Lowercase beta, the second letter of the Greek alphabet (uppercase form Β), coming between α and γ." (Plus the other definitions at that page, obviously.)
Is that reasonable at all?
RuakhTALK 16:47, 2 April 2007 (UTC)
Excellent conclusions. I especially like the distinctions you make between the symbols and their relationship to the alphabets, in case and in order.
Does any transligual word have a pronunciation? How many different expanded forms are there for ? Apparently translated as знак номера in Russain! How many ways are there to say TAXI? In Nigeria, /dag'zi/... DAVilla 19:20, 2 April 2007 (UTC)
Actually, I think this is a case where translingual is misapplied. As far as I know, languages that use the Roman alphabet do not use the symbol . It is very familiar and perfectly readible, but it is quite unusual to actually use it in a text. Cyrillic is a better description of , because langauges that use Cyrillic do not have the letter N readily available (at least in pre-Unicode days), and therefore that symbol is specifically provided on Cyrillic keyboards (the uppercase of the number 3 or sometimes 4). Japanese uses various things for this, including , etc., and as far as I know, the symbol , although it appears to be a Roman symbol, is Cyrillic only. —Stephen 18:57, 3 April 2007 (UTC)
Once again: if we simply think of an other word which does not have the connotation that it is to be used in more than one language, this could be solved. I am all for one header which applies to everything which is not a word in a language and thus does not fit under a ==language== header. H. (talk) 09:22, 4 April 2007 (UTC)
Indeed. Grumble, grumble, now I’ll have to redo a lot of my greek letter work. And indeed you’re right that the lower case forms were derived from the upper case ones, I should have thought of that. Hm more input still welcome. H. (talk) 10:14, 3 April 2007 (UTC)
I don't think your template is useless. Just put it under a language-specific header. The example above would be placed in Greek. There's no reason not to define it in both the Translingual header and in the languages from which the letter actually derives. But realize that there could be more than one instance of the template on a page, and work towards making a more concise format when shared by may languages, e.g. on n. DAVilla 12:16, 3 April 2007 (UTC)
No. I don’t like that at all. The template is some sort of extra thingie, it would be ugly if there were more than one on one page. But it can be extended to allow for more than one previous / next letter, by using named params or something. I’ll give it a try, since the problem just arised for ζ (sixth in modern Greek, seventh in Ancient Greek).
I think n is a bad example, it really needs some cleaning up (which I’d be happy to do, once this discussion has settled, and I finished the Greek alphabet, and perhaps some other ones :-) ) H. (talk) 09:22, 4 April 2007 (UTC)
I had a go at {{greek letter-temp}}, to accomodate more than one previous letter, and put it into use in β/test. Have a look. H. (talk) 09:51, 4 April 2007 (UTC)

And by the way, do we also have to make distinctions between different forms of letters that are conflated in English? The two lower-case a's have different meanings in IPA. This is handled by unicode, as are, strangely, a number of other very similarly looking characters, but what about cases that are not? The number 7 can have a stroke through it in some parts of the world, two strokes in others. In Taiwan the left bar of the 5 extends upwards vertically. (I have even had my handwriting "corrected" by a local.) Print, in block letters, the words "island", "glands", and "sliding" and then compare them. You might be surprised! What about symbols that don't have a unicode equivalent, such as the happy face, many of the more obscure and antiquated astrological symbols, and some of the symbols used in print by various magazines, journals, etc. to indicate the end of an article? DAVilla 19:41, 2 April 2007 (UTC)

I think every unicode symbol deserves its own page. No redirects at all. The fact that different languages use different orders in the alphabet makes templates like {{greek letter}} uselessdifficult to use. That’s a pity though, since they are nice. Anybody have an idea how to combine the two? One such table per language using the letter is absurd, but something similar would be nice. H. (talk) 10:14, 3 April 2007 (UTC)
What if they're exactly the same symbol, just with different uses? Why bow down to Unicode? DAVilla 12:20, 3 April 2007 (UTC)
It's not a question of "bowing down" to anyone or anything. What can anyone possibly look up terms here, using? Since the distinction by spelling has already been made, it seems only reasonable to extend that same by spelling (of headword) to individual symbols. --Connel MacKenzie 03:36, 4 April 2007 (UTC)
Good point. Unicode gives a short description of each symbol, maybe for a starters it is possible to import that with a script? And even non-Unicode symbols are welcome, but do they still exist? H. (talk) 09:22, 4 April 2007 (UTC)
Does the happy face have a Unicode character? Might seem silly, but remember we're talking about a noncommercial symbol that's instantly recognizable internationally and used in writing today. DAVilla 16:16, 4 April 2007 (UTC)
Indeed, it has two: (U+263A, WHITE SMILING FACE, = have a nice day!) and (U+263B, BLACK SMILING FACE). —RuakhTALK 16:31, 4 April 2007 (UTC)
Wow! Unicode is so complete that counter-examples are clearly difficult to come by. Really compelling ones, that is. The handicap sign and boy/girl stick figures just aren't used in running text. Not that I'm aware of, anyways. For the more contemporary ones, I'm sure I've seen a little symbol for a TV here and there, as a fancy bullet or what have you. Nah, maybe just an icon. What are we down to, the ancient Chinese only coded in BIG-5? DAVilla 00:06, 5 April 2007 (UTC)
I just read up about Chinese in Unicode (due to the decomposing suggestion below): there are 70000+ Chinese ideographs in Unicode, so you’ll have to search far to find some which aren’t, but indeed, they do exist (there are some examples in the document referenced in the below discussion). And I’m pretty sure there are some obscure mathematical symbols which aren’t, yet. But eventually they all will be, I suppose. Hell, even the most abstruse cuneiform symbols are in there. H. (talk) 10:38, 5 April 2007 (UTC)
By indices, similar to Chinese characters. The symbol (that is, one of the symbols) for Pluto uses a combination of P and L. Going from the planet to the symbol is easy. In the other direction, if you found it online and wanted to look it up you could copy and paste it. But if you saw it in a book and you didn't know what it meant, there would be no other way of telling the computer "look at this and tell me what it is" than to decompose it.
I have no objection to making a separate page for each unicode character. It's not certain that it's the ideal solution but it's certainly the most clear one. I would jost hope that some of them are very closely linked, even tighter than a simple "see also" at the top. DAVilla 16:09, 4 April 2007 (UTC)
We might make exceptions for symbols that are only present for backwards compatibility purposes, though, such as CJK Compatibility Supplement: U+2F800–U+2FA1D. H. (talk) 10:38, 5 April 2007 (UTC)
You don't have any choice to make an exception in this case ;-) the WM software (correctly) maps to the standard character, so you can't make an entry at the compatibility code-point. FYI: User:Robert Ullmann/Han is a complete map of the CJKV/Han characters we have. Robert Ullmann 11:53, 5 April 2007 (UTC)
I've altered β/test to conform to my vision of what the proper formatting should look like, which can be seen here. I suggest that others might consider doing the same, as it's much easier to see the stuff in practice than in theory. I've put my name at the top as an L1 header, in case others put their own versions, just so it'll be easier to keep track of whose version is whose. A few notes: First of all, it should be remembered that Ancient Greek did not actually use this letter, which is kind of interesting. We use minuscules in our Ancient Greek words because that is the general standard in other Ancient Greek works. I think it best to simply get the Wiktionary Ancient Greek section up to the standards of other lexicons before trying to outdo them. But it's something will will certainly come up in the futre, but is not really germane to this particular conversation, and so I'll drop it for now. I've dropped a lot of the stuff which should really be on the majuscule version's page. All of the information which I feel is specific to the character (outside of the context of any specific language) I've put under the translingual header. Everything which depends on the context of a specific language, I've put under the headers of the languages. As for the template, I think that, with a bit more tweaking, it could be general enough to be used for most languages, and would be best used in the language sections on the letter entries. Atelaes 21:54, 4 April 2007 (UTC)
Good idea, I put my version in its own section below it: [9]. I borrowed some of your ideas, and interspersed mine with small comments, where suggestions are welcome. Most important I find that I use the template only once, with the accommodations I made to it to have multiple previous/next letters for different languages. I am not enough of a historian to decide on some points. which I put in the comments. H. (talk) 10:38, 5 April 2007 (UTC)
That is an excellent idea (I was thinking people would each just have a version, but your idea is much better. The facts are, ultimately, unimportant at this point, only the format. Atelaes 15:32, 5 April 2007 (UTC)

Some input please

It seems that only Atelaes and me are interested in this any more. What do others think of my suggestion to use ==Symbol== instead of ==Translingual==? Who else wants to experiment with β/test? Stephen, you at least should have a go. I want this settled, so I can continue with the Greek alphabet. H. (talk) 15:25, 6 April 2007 (UTC)

I can accept ==Translingual== for symbols that are used by numerous languages and even in different scripts (Roman, Greek, Cyrillic, etc.), such as !@#$%*()[]/:;,.?, but it strikes me as silly if the symbol in question is only used by one language and in only one script, such as (a Tamil "ka"). There is noting "translingual" about it. So, Symbol would be a better choice, although still a problem in some cases, since the alphabets used by some languages include digraphs, trigraphs and tetragraphs (e.g., Dutch IJ, ij). If a tetragraph can be considered a "symbol", then it wouldn’t be too bad.
However, if we use Symbol, then some "symbols" will be letters of alphabets, some symbols will be punctuation, some symbols will be numerals, and some symbols will be symbols (e.g., @#$%*)). That means that there would be cases where the L2 heading was ==Symbol== and the L3 heading was also ===Symbol===.
Besides ===Letter===, ===Symbol===, and ===Punctuation===, there will also be ===Logogram=== (e.g., Sumerian, where a glyph has both syllabic and semantic value), and ===Syllable=== (e.g., the syllabaries of Amharic, Oriya, Gujarati, Bengali, Thai, Khmer, Lao, and so on). Also, there are some true alphabets that only write "letters" that have been composed into complete syllables (e.g., Korean, Phags-pa).
So I still hold that the name of the script (Roman alphabet, Cyrillic alphabet, Greek alphabet, Cuneiform script, and so on) are the best choice for L2 headers, keeping the type of symbol (punctuation, symbol, letter, syllable, logogram) for L3 headers. But if it comes down to "translingual" vs. "symbol", I much prefer "symbol". —Stephen 05:10, 15 April 2007 (UTC)
There is a serious problem with using things other than languages at L2: there are hundreds of bots and programs that read the en.wikt, to add entries to other wikts, to extract various kinds of info, etc. Level 3/4/5 headers (if valid) are in a smallish set, 50 or so; a program can have a table of what it is interested in, and treat others as unknown/errors. But at level 2, the program cannot reasonably have a "complete" table of the languages (7000+ coded now), so the only way it can parse the heading is to recognize "Translingual" as not a language, and treat all of the others as language names. And that is what they do. If there is another open-ended set of headers at L2, with no syntactical indicator that they are not a language, the parsing is irretrievably broken. And we don't have any syntactical indicator. (If we were using XML or something, we'd use L2-lang and L2-thing or whatever.)
More abstractly, to maintain the ability to abstract the semantic meaning from the entry syntax, L2 must always be a language name.
The other point is that "Translingual" is exactly the right header for the Cyrillic and Arabic alphabets, each is used in dozens of languages. (And the letters aren't "symbols".) Things like the Tamil "ka" can just be under Tamil (as all of the Hiragana entries are under Japanese.) Robert Ullmann 12:00, 15 April 2007 (UTC)

English to Arabic wordlist relicensed to GFDL

Arabeyes.org is proud to announce that its GPL English to Arabic wordlist was relicensed to GFDL to meet the Wiktionary needs. The source PO files can be found here. It already has a web interface named Qamoose. It can be a valuable addition to the Wiktionary. --Chahibi 01:23, 3 April 2007 (UTC)

I'm quite limited on Wiki-time right now, myself. Please (everyone?) see Help:Bots / WT:BOTS etc. (The help page is obviously my first draft - please be bold rewriting it.) I think if 20-30 of our current admins take an hour to install the bot framework, we'd have a respectable pool of bot operators to draw from (and much greater understanding of the advantages and limitations, all around.) --Connel MacKenzie 04:54, 3 April 2007 (UTC)

Words that are the same in other than English language.

What to do with words that are the same word as in English, in some language other than English, and with largely the same definitions? I.e. most of the time words that come from Latin or Greek, such as epsilon: in Dutch it means about the same as in English (of course), except for the computer science meaning. The question is: what to put in the Dutch definition line:

# [[epsilon#English]] (letter, mathematics, phonetics) 

i.e. a short gloss (but not so nice, and can get long if a lot of definitions coincide) or

# The name for the fifth letter of the [[Greek alphabet]].
# {{context|phonetics|lang=nl}} The [[IPA]] symbol that represents the [[w:open-mid front unrounded vowel|]].
# {{context|mathematics|lang=nl}} An [[arbitrarily]] small [[quantity]].

i.e. a repitition of the English definitions? H. (talk) 10:35, 3 April 2007 (UTC)

Other languages use a translation, not a definition, where possible, which means that the first option is better. However, there should still be a separate definition for each foreign sense of the word. That might mean making three definitions which all translate to the same English word, with three different glosses. DAVilla 12:08, 3 April 2007 (UTC)

{{trans-top}} and AutoFormat

At Connel's request, I added code to AutoFormat to convert top/mid/bottom only within Translations sections to trans-top/etc.

If you add {{rfc-auto}} to an entry when editing it will find the entry, even if not run for a while.

The gloss is correctly folded into the template if it is ;... or ... a few variant cases won't work (see name), these show up in Category:Translation table header lacks gloss. This is only done in the Translations section; top isn't supposed to be used elsewhere, but often is. Robert Ullmann 11:06, 3 April 2007 (UTC)

The bot probably shouldn't touch anything under {rfc-trans} or {checktrans} either, or if it does then it should treat those cases specially, with the "gloss" being 'Translations to be checked' or similar. DAVilla 12:05, 3 April 2007 (UTC)
If the "Translations to be checked" header is there it won't. (You might be surprised at how often it changes "Translations to be categori{sz}ed" to the correct header ;-) Stopping at either of those two templates is a good idea; will do; it will just leave the rest alone. Robert Ullmann 12:11, 3 April 2007 (UTC)
Thank you. Shall I change all "{{top}}"s to "{{rfc-auto}}{{top}}"s? :-) --Connel MacKenzie 03:39, 4 April 2007 (UTC)
Please don't. I've cleaned out a number of the table-header-lacking-glosses entries in that category and found the work to be tedious and mundane. In a few of cases I actually had to write a gloss, or used one the bot missed, but on most pages it was unclear and all of those translations had to be ttbc'd, and adding ttbc tags is a repetitive chore. On the other hand in a few cases like summer I was able to do some research to discover when the second sense was added, and wound up being able to write a gloss after all, one that applied to translations in several dozen languages. I think we should strive for that kind of solution, not overburdening the translation work any more than it is, and I feel that there's a lot of clutter that we really don't need to be digging up until there's a more automated solution. In other words, marking those where a gloss does not exist does not solve any problems. It floods the more interesting work with trivial tasks that really only pass the buck onto the translators. I don't have an immediate solution, although hopefully some day about half of the checktrans traffic I think could be eliminated with a bot that were history-aware. Maybe someone else could clear out part of the category and get a feeling of what sort of things need to be done. DAVilla 15:45, 4 April 2007 (UTC)
Please note my "smiley"! --Connel MacKenzie 20:19, 5 April 2007 (UTC)
By the way, you'll find that the fewer the number of definitions, the easier it is to salvage the table. But the majority were ttbc'd as I said. DAVilla 15:48, 4 April 2007 (UTC)
If we were to do this, there is a much easier way (add the cat to {top}!); but we shouldn't do that yet. I've changed the code for now to not convert the templates where it can't find the gloss. (So as to avoid flooding that cat for now.) If you wanted to tag entries that have ''' or ; at the start of one line and {{top}} on the next, that might be useful. Then we can see where we are. I wonder how many instances of top outside of translations sections we still have? Robert Ullmann 12:06, 5 April 2007 (UTC)
Answers to my own questions: top is used about 24 thousand times, in just over 15 thousand entries; about 12 thousand do not have glosses. It is used about 700 times outside of translations/ttbc, where it shouldn't be used; mostly in derived and related terms. Robert Ullmann 15:57, 10 April 2007 (UTC)

Components of Chinese characters

I'm not sure if this has been discussed before (the discussion archives are a bit difficult to search), but the Chinese character entries are missing a decomposition into components, as described in wikipedia:Radical (Chinese character), subsection "Character decomposition".

The decompositions could be given as Unicode ideographic description sequences (see [10], figure 11-8) and if necessary also in some other format. It would also be useful to have indices based on them, as most dictionary programs have a way of doing component search and Wiktionary should too. Multicomponent search and other such complicated things could be left to external software which could just get the indices from Wiktionary. The ultimate wiktionary project could also provide the extended search functionality if/when it materializes.

Of course there are many characters that are hard to produce good decompositions for, but most are easy, and there's no need to fret over the details. Simple graphical decompositions provide good enough indices for searching. Actual radicals and etymologies etc. are also a separate matter. If there's some kind of decision on this then one could start adding the decompositions right away, just like stroke order diagrams are being added incrementally. -- 11:32, 3 April 2007 (UTC)

Thank you for your suggestion. A character decomposition section may indeed prove useful to someone wishing to know more about a particular character. I would anticipate that the most challenging aspect of such an undertaking would be the shear amount of time and effort involved in inputting such information. Unless a non-copyrighted database containing this information is already in existence, we would have to type this information by hand, one character at a time, into Wiktionary. My hope is that some day, we will have enough Chinese speakers to tackle such tasks in a short amount of time. For the time being, there are only a handful of contributors that work on Chinese entries. Of these, I'm the only one fluent in Chinese that regularly contributes Chinese words (Mandarin and Min Nan). My main activities to date have been focused on two areas:
  1. creating entries for useful Chinese words and phrases that are not found in other Chinese-English dictionaries
  2. creating entries for words found in the Appendix:HSK list of Mandarin words
I also recently finished the Appendix:Amoy Min Nan Swadesh list, and completely revised the Appendix:Mandarin Swadesh list that originally came from Wikipedia. If you are interested in working on character decompositions yourself, there are several of us here who could offer formatting suggestions, proofreading etc. If this sounds like something you would like to work on, I would suggest that you create an account for yourself. Once you have done that, you should read WT:ELE and WT:AC. -- A-cai 12:18, 3 April 2007 (UTC)
It is something I'm interested in, but I don't tend to contribute much on wikis. I would contribute decompositions now and then if there was an accepted format for them. I don't know any Chinese though, only Japanese.
There's no public domain database of decompositions that I'm aware of, but there is a GPL one at [11]. GPL is unfortunately incompatible with GFDL, even though both are GNU licenses. You can do searches on the aforementioned database at [12]. E.g. if you enter 糸車口 it gives you a list containing 轡, and with 肉退 a list containing 腿 (because the 月 is 肉月 you have to enter 肉; I think it would be more useful to allow 月退 too as that's what it looks like graphically). It allows both the actual radicals and their meanings, e.g. both ⺅中 and 人中 give you 仲.
Anyway, as there's an existing (free, even if incompatible with GFDL) implementation, it's both possible and useful. I don't think there's need to do this in a short amount of time - it's not like this information will become obsolete any time soon. It will eventually be complete even if done little by little. If there were a few examples and maybe a category of "Character decomposition needed" like there is "Cantonese definitions needed" etc, a casual visitor like me might add a few when they see they're needed. I've added some entries from time to time for Japanese words and would do that for character decompositions if there were an accepted format for it. -- 12:59, 3 April 2007 (UTC)
How about you just go ahead, create one or two entries as you see fit, post them here, and then others can comment on it and make suggestions. Someone has to be the first... H. (talk) 10:09, 4 April 2007 (UTC)
Robert, I'm thinking that this is something that should be in your Template:Han char template under the translingual section. Do you think it would be a problem to add a variable to the template? If we use as our model character, then the character decomposition would look like: 宀子. We would put this information under a variable called comp or something. For example:
{{Han char|rad=子|rn=39|as=03|sn=6|four=3040<sub>7</sub>|canj=十弓木 (JND)|comp=宀子}}
would produce:
字 (radical 39 子+03, 6 strokes, cangjie input 十弓木 (JND), four-corner 30407, composition 宀子)
That should do the trick I think. -- A-cai 23:05, 3 April 2007 (UTC)
I'd also like to see for example 字 listed on both and or the proper indices. The radical is more important of course, but this dictionary is not limited by paper constraints. DAVilla 15:54, 4 April 2007 (UTC)
I think it would be nicer to use IDS descriptions instead of a plain list of components. E.g. 字 would be "⿱宀子", 轡 "⿱⿲糸車糸口" and 疑 "⿰⿱匕矢⿱龴疋". This way the layout and the count of each component are also present. IDS is originally meant to describe characters missing from Unicode to the reader, so having such descriptions would also be useful if the user's font is lacking some rare characters that are in Wiktionary. Having a list of these would also facilitate advanced searching in external software (such as browser plugins or free dictionary software). Simple indices should of course ignore this extra information, as that would get too complicated. Simple component lists (i.e. "宀子", "糸車口" and "匕矢龴疋" for the above) are not bad either, but I think the extra information with IDS is useful, too. -- 17:46, 4 April 2007 (UTC)
This is probably obvious, but.. The component list should be restricted to characters that have entries in Wiktionary and be linked there. Index:Chinese_radical lists the radicals. There are some compatibility characters in Unicode that look the same but don't have Wiktionary entries, e.g. ⼥(U+2F25) vs. (U+5973). As a result some differences will have to be ignored, e.g. instead of using the compatibility characters ⻌⻍ one would always use . In the same vein characters like would be decomposed as and . Using instead of is better because that's how it looks like; similarly is better as and than and . -- Coffee2theorems 13:31, 7 April 2007 (UTC)

As it looks like the discussion has died, here's a concrete proposal (much the same as A-cai's above): Add parameter "comp" to Template:Han char, e.g. comp=⿱, and display it as composition. Indexing by these may be done later. At least for now such sequences should be limited to elements that have Wiktionary articles (or at least redirects) and all the elements should be linked. If such a parameter is added I'm interested in adding decompositions from time to time.

Examples for 10 random characters: 付=⿰, 鳴=⿰, 鬩=⿵, 蛾=⿰, 掴=⿰, 職=⿰, 潔=nothing for now, 核=⿰, 巾=nothing because it's atomic, 余=nothing for now. The "nothing for now" characters didn't have decompositions into Wiktionary characters that I'd consider obvious (although they can be decomposed), so I let them be. I believe most characters can be described this way. These are somewhat useful even without indexing (knowing the components helps in learning the characters for instance) and there's always the "what links here" page. -- Coffee2theorems 05:34, 30 April 2007 (UTC)

I added ids= (as being more specifically IDS than, say, "comp=". You should think about whether you really want to link them; if you do, you break the Unicode IDS sequence: a browser or extension that would render them cannot. Without linking, they are an IDS sequence both in wikitext and in HTML. (Note that we can always automatically link or unlink all of them later.) And you are correct above, we never use the compatibility characters, only the standard ones in Han Unified + Ext A + Ext B. Robert Ullmann 12:59, 2 May 2007 (UTC)
Thanks! I tested it on , looks like it works. Good point about breaking the IDS sequence, I didn't think of that. I still prefer linking for the following reasons:
  • links are helpful to the reader
  • visually (from the reader's, not software's point of view) the IDS sequence is correct, and the description is meant for reader's consumption
  • there's no widely used IDS renderer as far as I know, and special rendering is not required by the IDS specification
  • such rendering may not work at all correctly if later someone wants to use less obvious sequences (e.g. of the kind ⿰水十 instead of ⿰氵十 to represent 汁 for cases where a 氵-like alternative form character does not exist)
  • all the characters this is used for already exist in Unicode, and if e.g. ⿰氵十 were rendered as 汁, there would be no purpose in using the description at all (unless one could still copy/paste the parts, but still linking is better)
  • non-standard characters are easy to spot because they become broken links
  • as you say, this can easily be changed later automatically
Basically I suggested IDS because it contains slightly more information than a pure component list and may in the future be useful for indexing. Many (all?) electronic kanji dictionaries allow you to search by components (or do a search such as "kanjis which contain a component with this reading") and in SKIP codes there's precedent for indexing by structure (e.g. stroke counts of left and right parts). A full description is the most general way possible and there's little extra cost in it. -- Coffee2theorems 16:18, 2 May 2007 (UTC)
I added these for the easy cases of Grade 1 kanjis (though I may have missed some). I described as ⿱ despite its etymology, as that is what it looks like. I didn't add ⿴ for or ⿱ for yet. A common similar case outside grade 1 would be e.g. ⿰ for . As etymology is not such a simple thing and there's already a section for it, perhaps it would be best to use the IDS field for a graphical decomposition and leave the etymology to its own section. The other choice would be to use the etymological decomposition (e.g. ⿰ for ). One could also give both. Thoughts? -- Coffee2theorems 17:46, 2 May 2007 (UTC)
The etymology should (must) always be how the character originated. IDS is purely descriptive, as defined by Unicode. So they are definitely different in some cases. Robert Ullmann 17:51, 2 May 2007 (UTC)
I will consistently use a graphical description for the IDS field then. -- Coffee2theorems 11:53, 4 May 2007 (UTC)
Would be good to also add the etymology when you know it is different. Can be very simple: "From (flesh) + {hide/hidden)." Someone else can go into more detail if they have the reference information. Robert Ullmann 12:04, 4 May 2007 (UTC)

Getting backing before making drastic changes.

I've been reprimanded recently for making changes to some of Wikt's pages. Sorry for that. I'm still quite new tho'. To make my point, where does one go to get support for changes here? One example is my recent creation Template:Keene-un. This is a template which I figure is used to save time, and isn't a 'bot, so is it ok to use? Do I have to get backing to use it? Also i editted WT:ELE recently, making only minor changes to improve the flow of the page, but got blocked for it. Is Wikt so stringent as to worry about things like this? --Keene 23:14, 4 April 2007 (UTC)

I don't know what the policy is for having personal templates in the common space. I guess there should be some recommendation about it since it's easier to type {{subst:Keene-un}} or the like than {{subst::User:Keene/un}}. These kinds of templates are useful, and could be developed into a Go-failed button. Do make sure you do substitute it though, including the 5 pages listed under "What links here". DAVilla 00:14, 5 April 2007 (UTC)
I have done this {{xhan}}; of course I can just delete it myself when I'm done. I don't think it is a problem if you make sure the name doesn't conflict with various reserved spaces (2 and 3 letter templates, and things starting with 2 or 3 and -). "keene-un" seems reasonable. Make sure it says in noinclude tags that it is yours, and can be deleted if left around, and do tag it with {{delete}} when done.
As to the WT:ELE edit, you did more than "improve the flow"; you deleted important text, explaining that they should not be entered manually. (IMHO, the section could be reduced to just that sentence; it is the only thing most users need to know: don't add or modify iwikis!) Robert Ullmann 12:16, 5 April 2007 (UTC)
WTF? Why aren't you just using the preload templates? Is there a bug in one of them? --Connel MacKenzie 21:33, 5 April 2007 (UTC)
But why was he blocked for this? The edits don't appear all that radical. Granted, he deleted the last sentence, which was perhaps a mistake. But, it does not appear to be a malicious act on his part. As for the preload templates, maybe as a new contributor, he did not know about them. Am I missing something? -- A-cai 05:52, 6 April 2007 (UTC)
The 3-day block seems a little harsh. Aside from the last paragraph, the edits did not change the substance. But that issue is completely unrelated to this. DAVilla 21:38, 7 April 2007 (UTC)
Hehe... what are preload templates? *Language Lover deftly dodges all the thrown tomatoes and eggs* Language Lover 14:01, 6 April 2007 (UTC)
Is there no process by which contributors can go about making new tools? This template clearly had a more specific purpose than any of the preload templates provided. DAVilla 21:38, 7 April 2007 (UTC)

Thesaurus resource


--Connel MacKenzie 20:17, 5 April 2007 (UTC)

Before I start the pagefromfile.py to populate Wikisaurus with some real entries, does anyone have comments on this? --Connel MacKenzie 05:57, 7 April 2007 (UTC)
As thesaurus entries are generally interesting, I plan on not requesting the bot flag for these, to increase visibility, and throttle them to one entry per 20 minutes so people can fiddle with them. --05:59, 7 April 2007 (UTC)
I thought the argument against using a bot for the Thesaurus was that entries were too complex and required close scrutiny of the precision of a given term for a given definition. I would be interested to see what pagefromfile pages looked like, but I have to imagine that very few meaningful pages would emerge from them. - [The]DaveRoss 01:39, 11 April 2007 (UTC)
Wow I wish I had your mad skills at programming, Connel :-) A programming master like you is a great boon to the wiktionary. Let's turn Wikisaurus into a Wikisaurus REX!! :-) Language Lover 02:16, 11 April 2007 (UTC)

Time to whittle

Original by dcljr

The entry for time has become our longest regular definition page, at over 40K, thanks to hundreds of "Derived terms" added by User:Paul G in February. I wanted to bring people's attention to this because it seems to me that many of the added terms are unnecessary, being either technical terms that probably don't warrant their own entry here (such as acquisition feeding time or clot retraction time), terms that are [arguably] easily understood by considering their constituent words (such as at what time or closing time), or alternate forms of other derived terms (such as about time too, when about time is already listed). (Note: I've notified Paul G about this comment, in case he wants to respond.) - dcljr 22:06, 5 April 2007 (UTC)

I would be quite happy to keep them all (Wiktionary is not paper). They are nicely hidden, and we might even get around to defining some of them one day. I am a bit miffed that he has beaten my list of defined terms at poly- (definitions in progress). SemperBlotto 22:23, 5 April 2007 (UTC)
What I don't like about this is that is obscures the more critical words like timely in this huge list. I have suggested before another section called Compound terms which would take phrases and compound words, those formed by simple concatenation of words with or without spaces, and leave Derived terms for the remaining words, those being words formed as blends and in particular with affixes. However, I'd like to hear what User:AutoFormat has to say about this since he or she likes to revert my edits and is clearly more knowledgeable on what would be best for Wiktionary with regard to this matter. DAVilla 07:38, 6 April 2007 (UTC)
Sounds like a WT:VOTE is needed for "===Compound terms===" then? --Connel MacKenzie 06:07, 7 April 2007 (UTC)
Does anyone have a better suggestion for what to name them or how to define the differences? A good test case might be vineyard. Should I bundle into the proposal that their priority placement is much lower than Related terms, even lower than Tranlations? If it's a 3-level header then it isn't dependent on part of speech. Is it dependent on etymology? DAVilla 21:29, 7 April 2007 (UTC)
Compounds like at what time are completely transparent to fluent English speakers, but if you've ever studied other languages, you know that these are actually very idiomatic. The prepositions are mostly arbitrary. For someone learning English as a 2nd language, such constructs are not transparent at all. Now as for the bigger issue... I seem to be in the minority for being in favor of making lots of "/" subpages. If I were a supreme arbiter, I'd make a list of the most "important" derived terms, and below that, have a link to a subpage with the complete list of derived terms. :-) What does everyone think of that idea? Language Lover 13:56, 6 April 2007 (UTC)
Subpages are NOT supported for this stuff, by the WM software. Don't use subpages for anything other than "Citations" (which has only rudimentary SW support.) --Connel MacKenzie 06:02, 7 April 2007 (UTC)
Long pages are not bad, in and of themselves. --Connel MacKenzie 06:02, 7 April 2007 (UTC)
Derived and compound terms should be dependent on etymology, yes; rush hour is certainly unrelated to the Old English rysc. -- Beobach972 19:41, 9 April 2007 (UTC)
Well, yes, but currently, as derived terms, they depend on more than the etymology. Being level-four headers they would depend on the POS. This is deliberate and supported by Paul G. But I'm not the only one who has had difficulty in classifying them. At the same time, for those that are classified correctly, do we want to toss that differentiation out? I need to look at time again... DAVilla 20:38, 9 April 2007 (UTC)
Yes, this can be confusing, e.g. timer is derived from the verb, not the noun. But at least it's clear where that one comes from, and that's a bad example because it really should be a derived term anyway. Paul G had brought up two examples with seal, I think, that even he wasn't sure of, but those cases are rare.
There are also some terms that include "time" but are not derived from it, such as counter-time. So I'm not entirely certain that Compound terms even at level four is an appropriate as a header unless we were to clarify that they are also derived terms, or if we can accept that they may not be. I do think being able to extract timely from that list would help a lot. DAVilla 23:52, 9 April 2007 (UTC)
While long pages aren't bad necessarily, they are usually bad anyway. Even though we aren't paper and we technically have the capacity for gigantic pages, they aren't generally easy to navigate or particularly useful beyond a certain size. 40k of non-prose text is HUGE, and I think that if anyone were to do a study on the readability of pages like time et al. it would be right down there with technical documents for lay persons...bad. We want to balance the inclusion of as much relevant information as we can stuff in there with cleanliness and readability, if we have everything anyone could ever want to know about a given term on a page that is wonderful, but if no one is actually able to sift through the stuff that they could care less about to get to what they actually need than what good have we done? I agree that that list should be cut down, we don't need every collocation and phrase ever written that includes "time" to be listed there, probably just idiomatic and other "interesting" terms belong. - [The]DaveRoss 20:50, 9 April 2007 (UTC)
The problem is that they are all idiomatic, or they shouldn't be listed. DAVilla 21:03, 9 April 2007 (UTC)
"Achilles tendon reflex time", "French Revolutionary Time", "QuickTime", "Hawaii-Aleutian time"...there is plenty in this list that doesn't belong, timezone names, random phrases which aren't idiomatic containing the word time, they are certainly not _all_ idiomatic. There are plenty there which should be on the page, but I guess what we are getting down to is that it is time for a more strict criteria for "derived terms", "related terms" etc. sections, especially for the exceedingly large pages. - [The]DaveRoss 22:05, 9 April 2007 (UTC)
Hmm... part of the problem is that it's impossible to tell from the list what deserves an entry and what should be removed. "Achilles tendon reflex time" = Achilles tendon + reflex time as far as I can tell, but the expression "a stitch in time saves nine" was removed! Plus it's difficult enough keeping the list alphabetized. Someone decided to list old as time itself under "A" with as old as time itself. What is this, a topical list??
I'm moving the red links to Wiktionary:Requested articles:English/time so that if anyone wants to argue their inclusion they can simply create the page. DAVilla 23:52, 9 April 2007 (UTC)
Sounds like a good cleanup for this page, but I think a general discussion is called for regarding treatment of these sections. It is obvious that some delineation needs to be made, but where to draw the line? - [The]DaveRoss 00:01, 10 April 2007 (UTC)
Long pages aren't bad, you say? I just spent over half an hour, probably more than an hour actually, going through the derived terms at time. All I was doing was correctly alphabetizing the list (per below), removing extra words like "the" and trailing <!- comments -> (per below) many of which I intended to move in creating the actual page later, and standardizing other comments like <!- a stitch [in time] saves nine ->. I pushed the wrong button at some point and the browser paged back, which 50% of the time means I lose all of my work. I lost all of my work. So if you want the list to be managed, have fun managing it yourself. I've already rolled back my move to WT:RA, and it's not my fucking problem any more. DAVilla 15:36, 20 April 2007 (UTC)

Policy proposal

This policy is narrowly intended for pages with a great number of derived terms. However, it hashes out some specifics with regard to the Derived terms section in general, and may have implications on other such sections.

  1. The section is to be listed alphabetically. That means closely related words with different spellings—such as old-time and old times, or tact time and takt time—must be listed separately.
    Rationale: An ordering that is alphabetical does not necessarily coincide with one that is topical, even weakly so. Consider Taiwan time and old-timer, which would separate the above examples. Of the two incompatible orderings, only the first can be clearly defined in formal specification. It also has the advantage of being manageable by bot.
    Point of contention: It may be permissible to list on the same line terms that use the same letters but have different spacing or hyphenation, or use ligatures like æ=ae and ï=i which are conflated. However, note that these are not always synonymous, e.g. some time and sometime. Likewise summer time and mean time, as spaced, are systems of measuring time, while summertime and meantime are not.
  2. Only blue links are to be shown, with the exception of closely related words such as alternative spellings (which would be shown in the see-also at the top of a page and/or as alternative spellings in the language section) or inflected froms (where there are additional definitions, as would be shown in the see-also at top of a language section).
    Rationale: Red links are fine for giving an indication of what needs to be done, but an overwhelming number of red links are impossible to manage. To avoid removal, red links need comments if they do not appear idiomatic, such as short time and to time, or legal/medical terms. The term just-in-time is an example of one removed from time (by Paul G no less) perhaps because, lacking a comment, it did not appear idiomatic. On the other hand, these partial definitions, information that really belong on the pages themselves, may not be is not commonly removed after the page has been established. Furthermore there is no process for determining if the comments are correct, or if certain words are in fact idiomatic, other than the RFV process for entries themselves.
    While a sea of unverifiable red links do injustice to the page, and in my opinion more closely resemble requests for articles than a useful compilation, at the same time we cannot push requests off to another space when a closely related term exists in the Wiktionary. Doing so would be asking for a good number of pages that could be soft redirects, or very brief at the least, to be recreated from ground zero. This ties up time of knowledgeable contributors in wikifying the page, finding the existing alternatives perhaps much later, and then having to coalesce the information. At the same time, not allowing these red links to remain on the page might suggest that there is one principal spelling and no alternatives to a term. While that may certianly be the case for many spellings, for spacing and hyphenation in particular there is a good variety even among the major English dictionaries.
  3. Terms that are added in the derived terms of a derived term (especially one that is a string prefix; see below) should then be omitted from the page. For instance, space-time and time series are derived terms of time which themselves have a number of derived terms. Otherwise any blue link is acceptable.
    Rationale: Either this system or a more complete one are feasible, but this is more elegant since anyone looking for e.g. time series analysis, time series data, time series database server, time series model, or time series prediction (assuming those are all idiomatic) would be just as inclined to follow a link to time series. At the same time, terms that are not derived terms of the derived term time series in this example, such as time series animation, should not and would not be excluded from the listing at time. Another such example is space-time trade-off.
    Point of contention: Since words that are not string prefixes of the derived term, such as anti de Sitter spacetime, are alphabetized differently, could they also be included as a derived term of e.g. de Sitter, Sitter, space, and time? Presumably not of de?
    Point of contention: Are blue links unquestionable if they are redirects to other pages? One example is take time to smell the roses, which redirects to stop and smell the roses. In that case it is not possible to link to the primary title as a derived term. There are other cases where both could be listed, e.g. have a whale of a time and its redirect whale of a time. Should they both be?

DAVilla 05:38, 10 April 2007 (UTC)

Stupid question: when you say "may not", do you mean "might not", or "must not"? ("these partial definitions […] may not be removed") Ordinarily they're distinguishable from context, but that sentence is kind of confusing me. —RuakhTALK 05:45, 10 April 2007 (UTC)
Might. Not dumb, thanks for pointing it out. DAVilla 13:25, 10 April 2007 (UTC)
I disagree strongly with the removal of red links. They are our friend, and tell us what terms are still to be defined. (Some of us actually define words here.) SemperBlotto 07:35, 10 April 2007 (UTC)
I want to agree with you, but if we are to find another solution then please aknowledge that not all red links imply the term is needed. Some of them should simply never be defined. The more questionable include "former + times", "Old + Father Time", "time-and-motion + expert", and "waste of time", and then there are the musical meters (now there's a can of worms). Longer phrases like "at the present time" and "this is no time for" might be better at shorter ones like present time and be no time for. And you can't know that all, possibly shortest remaining time and worst-case execution time for instance, are idiomatic until you look them up. I wouldn't have known man time was tosh™ until I saw the defintion "a man's bowel movement". What would you say if I added rotation time as a derived term? Considering you've deleted the page before, I would hope that means you would be willing to remove it from the list. You've also deleted preposition of time, stoppage time, and even time limit, presumably for content one would hope? I suppose "at" is a too succinct definition of the first.
The hedges that grow on time are possibly some of the most laboriously trimmed. Do note that I added a number myself, of already existing entries, but I also seem to be the only one using clippers (May, Sept 2006). If you want to keep all of those links, please propose a system for keeping track of what is or is not worthy of inclusion. DAVilla 13:25, 10 April 2007 (UTC)
I also disagree with the removal of red links. A cleanup of the section in which those appear is the proper way to go, laborious though it may be. Red links show us what needs to be done, but at the same time, if you see a red link of which you think it should not be defined, and the page has no comment regarding some idiomatic meaning, removing it is probably less time-consuming than actually creating the page and defining the term. H. (talk) 15:19, 10 April 2007 (UTC)
I agree that we shouldn't be basing any sorts of content decisions on the "redness" of a link, whether or not we currently define a term doesn't hold water when deciding it's relevance in this case. That leaves us still with the decision of how to choose what does and does not merit listing in a given headwords "derivatives" section, not an obvious set of rules.
I like the idea of second tier derivations being pushed onto the first tier derivation's pages (space-time continuum on space-time but not on space or time). How we should organize them...well I suppose that comes down to what we think they are actually used for. I am not exactly sure what the purpose of these sections are, but the purpose should define the form. - [The]DaveRoss 20:19, 10 April 2007 (UTC)
2B. Alternate proposal to #2. Derived terms are not to be <!-commented-> with definitions, context, or any other information specific to a term. Any red link can be removed by any contributor to the requested articles page indefinitely if he or she has any reasonable (if uninformed) doubt of the term's idiomatic status. If any of the terms are recent additions by non-regulars, the edit should be so commented, e.g. "indefinite removal to RA per DT policy".
Conduct: This provision shall not be abused. Contributors are advised to perform a simple search of any terms that appear to be jargon before deciding on them. Deletions can be rolled back if the contributor is not familiar with the RFD process or does not make a good-faith effort to abide by existing standards, as would likely be indicated by a removal of red links en masse. However, deletions cannot be rolled back simply because the contributor was wrong. Subjective opinion is allowed, and individual removals are not to be questioned. If the term has idiomatic status, the page can simply be created before a term in the list is reinstated.
At the same time, other contributors are not required to check the history before adding derived terms. While they are instructed not to reinstate terms they feel were removed incorrectly until that page exists, they are neither liable for accidentally reinstating derived terms that have been previously removed, for instance one added by another contributor formerly and included in a long list of new additions.
Summary: En masse additions are okay. En masse deletions in the general case are not. Individual deletions are okay, and should not be reviewed unless the contributor intends to turn the links blue. Essentially this gives all contributors veto power on any term. However, this is a weak power since any link can be reinstated by simply creating the page.
Rationale: This proposed policy allows for a large number of red links and at the same time avoids vilifying the targeting of red links by those who are willing to tidy a page, to remove links that could never be blue. More importantly it avoids the need for commenting Derived terms. Comments are not visible to the outside world and are a waste of our time. DAVilla 10:32, 11 April 2007 (UTC)
While I like the spirit of this option, I question the functionality. One of the more annoying things about editing lists of red and blue links is that while you are editing you can't tell what is what. If we have large lists of variously commented terms in these sections they will quickly become difficult to edit and control. Is there some way we can prevent that from happening? - [The]DaveRoss 20:40, 11 April 2007 (UTC)
I don't understand. Why do you think the terms would be "variously commented"? DAVilla 23:33, 11 April 2007 (UTC)

Inclusion of derived terms

Wow, I'm surprised that my contributions to time have provoked so much comment. I'd like to add some of my own.

"Time" is, apparently, the commonest (clean) four-letter word in the English language, according to a question on The Weakest Link (they gave a source - I don't remember what it was, though). A large number of the uses of this word are, no doubt, in idiomatic phrases, and so, necessarily, the list is long.

The derived terms I have been adding to "time" and elsewhere are compiled from various print and online dictionaries (onelook.com is very useful in this regard, given that it allows for the use of regular expressions in searches). Many (or most) of the terms that I find I am unfamiliar with. Some are obscure or dubious. I prefer to err on the side of inclusion, figuring that if the terms linked to are not idiomatic or do not exist, they will be removed, but if I leave them out and they are worthy of inclusion, no one else might ever enter them. That is not to say I have entered everything I could find - there is plenty that was, to my mind, not idiomatic or too obscure that I therefore left out.

The derived terms for "time" took a very long time to compile and verify, needless to say, but they are there to be edited, so by all means whittle away any terms that fail CFI. However, note that many of these terms are in the OED with citations, or in reputable online sources. Terms that appear to be unidiomatic might in fact be idiomatic. I suggest checking the OED, other print dictionaries and onelook to confirm one way or the other before entries are deleted from the list. (Inclusion in any of these sources doesn't necessarily mean that a term passes Wiktionary's CFI, of course.)

All the terms for time zones that I found (mainly in Wikipedia) have been included. It's debatable whether these should be in. Some print dictionaries give "Greenwich Mean Time", so why not the others? The list of these is finite and fairly short. Again, delete if these don't pass CFI, but my thinking is that they do (all or most have Wikipedia entries).

Technical (including medical) terms certainly do belong in Wiktionary if they pass CFI. In fact, they are more likely to do so, as they often appear in print in journals and other scientific publications.

I have tended to list terms B derived from terms A derived from "time" under B rather than under A itself. For example, "a stitch in time saves nine" comes under "in time", I believe, with a comment to that effect. I think the "derived terms of derived terms" system is cleaner, but this might make it harder for users to find terms or make them think that terms have been overlooked. (Incidentally, this is why "just-in-time" has been removed from the derived terms: you'll find it under just in time, which is the phrase from which "just-in-time" is derived.) If there are inconsistencies (such as "Achilles tendon reflex time"), then please fix these.

In short, the list of terms derived from "time" is not set in stone. None of us are infallible experts on everything, so please edit anything that I have not got right, and if I might be so bold as to ask, possibly be grateful that I researched and entered these hundreds of terms? — Paul G 09:23, 11 April 2007 (UTC)

On the whole it's a good list, yes. I have no doubt that most of the terms, nearly all in fact, should be included. I will revert my change shortly. DAVilla 10:37, 11 April 2007 (UTC)

Wiktionary:About Ancient Greek

I realize this has been a while coming, but I feel that I've finally gotten this page to a point where it's ready to be accepted as official Wiktionary policy. Will everyone who has any interest in the state of Ancient Greek on Wiktionary please take a look. I've recently made a few minor changes to the page, in preparation for this. In particular, the Pronunciation & Romanization section has been updated. Unless something major comes up, it is my intention to start a vote in a week or so to make it official policy. Please, if anyone has any problems with the page (or is considering having problems), please bring them up now, before the vote. Thanks very much. Atelaes 04:12, 6 April 2007 (UTC)

I think the policies/guidelines there are great, but much of the page seems intended to inform the reader about Ancient Greek (especially the "Diacritics & Accentuation" section); I think that that information is fascinating and should be kept somewhere, but probably not at Wiktionary:About Ancient Greek. (Maybe it could be put at a Appendix:Ancient Greek or the like?) To a lesser extent, I don't think that Wiktionary:About Ancient Greek should duplicate as much of WT:ELE as it currently does; I really think Wiktionary:About Ancient Greek should simply tell people-​who-​understand-​Ancient-​Greek-​and-​have-​read-​WT:ELE the Wiktionary policies that are specific to Ancient Greek — which is to say, the specific things they'll need to know in order to contribute to entries on Ancient Greek words.
That said, I do have one minor policy/guideline quibble; I think primary-source attestations should go in unordered lists after the senses they correspond to, or in "Quotations" sections, or in /Citations subpages, like at entries for words in other languages. (My personal preference is for unordered lists in each sense, but WT:ELE says that there's no consensus yet.) I don't see what benefit there is in giving these in the "References" section.
RuakhTALK 05:09, 6 April 2007 (UTC)
Concerning the excessive information in the diacritics & accentuation section, I tend to agree. However, I was ordered to write that section (at gunpoint, I might add). Perhaps it should be trimmed down somewhat. As for the primary sources in the references section, I feel that to be somewhat of a shortcut, for the time-being. Writing citations for the Ancient Greek entries is incredibly time-consuming, and I don't think it will happen much in the immediate future, although ultimately they should all get some. For an example of what all goes into them, take a look at θεῖον. I really don't like the convention (that a few people have tried) of simply scattering the sources throughout the definitions, as I think it's rather unhelpful and makes the entry look messy. Putting these sources in the references section provides a quick and easy (and temporary) way to reference the words. Atelaes 05:38, 6 April 2007 (UTC)
What's the difference between a gloss and a translation? Am I to understand that the gloss is in an original somewhere? If so it isn't cited as to which version it comes from, and it needs to be to give credit. If you like your translation better then why have the gloss at all? By the way, does the translation belong in italics or not? DAVilla 07:46, 6 April 2007 (UTC)
The difference between the gloss and the translation is that the gloss retains more of the original language, at the expense of English. It doesn't come out terribly well in these two passages, admittedly. A gloss is not an authoritative version, by any means. Rather, it is an attempt at as simplistic a translation as possible, which follows the word order, grammatical structures, etc. of the original. The translation is meant to feel like real English, but this often requires a bit more freedom with the language of the original. Its main benefit is to allow people who actually have some handle on the language to see an intermediate step between the original and the translation. Atelaes 07:55, 6 April 2007 (UTC)
By the way, I'm not sure how to reconcile "The normal standard for modern languages is three independent attestations. However, Ancient Greek, as a dead language, requires only one attestation." with WT:CFI. I didn't think language considerations pages could override CFI? Or is the thinking here simply that all surviving Ancient Greek manuscripts can be considered "well-known works"? —RuakhTALK 07:44, 6 April 2007 (UTC)
Yeah, I was expecting to get more flack on that when I first proposed it, but no one said anything. It's certainly open to debate, but I think that one citation should be the norm for all dead languages because they're not subject to the same flux that living languages are. And, yes, I would say that all Ancient Greek works would count as well-known works, at least within a certain context. Atelaes 07:55, 6 April 2007 (UTC)
I think it is OK for an "About Language" page to differ from both the ELE and the CFI. However, those differences should be clearly spelled out, and each About Language page must be voted in as policy. There are enough oddities and special cases in various languages that we can never hope to have a concise ELE or CFI document if we try to incorporate them all into those two primary documents. --EncycloPetey 22:27, 6 April 2007 (UTC)

I thought I'd explain my motivations for the most recent changes to θεῖον. First, I really hate ELchar. I really have no idea why it does this, but on my browser it puts all the characters into this weird loopy font that just looks ridiculous. Polytonic does not do this for me. My hope is that polytonic is allowing people to see just as many characters as ELchar is. Any feedback on this? Are people seeing more or fewer characters with the template switch? I see them all completely regardless of fonts templates. A second comment, I changed the indentation, because I think it rather important that the words in the three lines (most especially the original and the gloss) are in line with each other as much as possible. Responses? Atelaes 08:19, 8 April 2007 (UTC)

Either template looks fine on my screen. In fact ELchar is a little straighter and the present one more cruvy, but not "loopy" or anything. But it needs to be one or the other, or I can't read it... rather, it doesn't show; I can't read it regardless.
I really have to say, Ruakh, that I don't like the new look. "Original" and "translation" are just unnecessary, and the word "gloss" is confusing. The only reason I knew it wasn't an annotation in the original text is the source itself. You know, the Bible is rather ancient and all. But in a modern work that's what "gloss" would mean to me. As to indentation, will there ever be a need to preserve a translation that was in the original work? I would think placing them at the same indentation should be preserved for that. Or maybe it would be enough to put our own words in italics. Or maybe we really ought to do both. I don't know if this has ever been discussed. DAVilla 00:26, 12 April 2007 (UTC)
That's O.K., I don't like it that much either. My preferred versions are the first two I did (http://en.wiktionary.org/w/index.php?title=%CE%B8%CE%B5%E1%BF%96%CE%BF%CE%BD&oldid=2284749 and http://en.wiktionary.org/w/index.php?title=%CE%B8%CE%B5%E1%BF%96%CE%BF%CE%BD&oldid=2290015 — they differ only in indentation levels, with one putting the translation on par with the original, the other indenting it less than the original and more than the gloss); the last one was just an attempt to line them up nicely, as Atelaes prefers. (Seeing as he actually understands Ancient Greek, I think it makes sense to trust his instincts.)
It's actually pretty standard to use the term gloss to refer to a pseudo-translation that maps each word in the original text to a word or phrase in the target language, sometimes with annotations like "-DATIVE" and whatnot. If you can suggest an alternative word (or short phrase) to use, though, I'd be O.K. with that.
You know, rather than give a separate gloss, we could do something like this:
γένος οὖν ὑπάρχοοντες τοῦ θεοῦ οὐκ ὀφείλομεν νομίζειν […]
(Most of those are probably wrong, but you get the idea.)
It would probably be a lot of effort, though. :-/
RuakhTALK 01:44, 12 April 2007 (UTC)
That is an interesting idea, but would indeed be a lot of work to implement on a regular basis. Also, I don't know how many users would get the idea, unless we had a little box saying, "scroll over text to see gloss" or something. I've fixed it, by the way. And yes, Davilla, this is virgin territory which I've never seen a discussion on, nor have I seen anything of the sort anywhere else on Wiktionary. We just might want to start a new discussion just on this, as we might benefit from the opinions and technical expertise of others. Unless I'm sorely mistaken, this will be setting a precedent for all other languages, as I have to imagine that we (eventually) want to have citations for our foreign language entries as well as our English entries. Atelaes 04:14, 12 April 2007 (UTC)

Placenames redux

Since the discussion earlier up petered out without a resolution, I want to bring this up again. We have a considerable number of placenames that seem to be in contravention to WT:CFI#Names of actual people, places, and things, which gives "A name should be included if it is used attributively, with a widely understood meaning." and "A name should be included if it has become a generic term." In essence, a placename still needs to meet the same attestation standards as any other term, since this is still a dictionary, not a placename database or encyclopedia. We should only include placenames with some significance towards our goal of defining words, not collecting geographical data. However, I seem to be able to find a large number of placenames that cannot be attested as generic or attributive, in my opinion. Consider Alagoas, Maceió, Abilene, Afula, Aeolian Islands, Lipari Islands, Ahmedabad, Aegadian Islands, Adyghe Autonomous Oblast, Adigoppula, (yes, these are just grabbed at random from the first page of the proper noun category) Bursledon, Titchfield, Tula, Thousand Oaks, etc. As you can imagine, there are a lot more. Not just the 50 in Category:English_counties and 100 in Category:Towns, but the many hundreds more in Category:Place_names. Clearly I can't just go on a deletion spree, can I? (It would take forever!) The main problem is that even if any of these have generic or attributive senses, and some, though not most, do, almost all of them are "defined" in the form of "A town in Oaxaca, Mexico." (Juchitán) That, to me, is an encyclopedia article (if a stubbish one), not about the word. So, what to do about these? I don't think they are adding much to the dictionary; this isn't Wikipedia. Frankly, I'd like to see most of them gone: all the ones that cannot be attested according to the standards currently in WT:CFI. Is there an efficient way to do this that doesn't involve hundreds of RFD listings? Or people violently disagreeing, ideally? Dmcdevit 09:03, 6 April 2007 (UTC)

I, for one, am fully supportive of such a deleting spree. Although, I imagine others might disagree. I think the CFI paragraph you quoted is quite clear on this, and should be followed. I suppose somewhat major placenames (at the deleting admin's discretion) should be placed under RFV, so that people are allowed the chance to save such words, if they care to try. But, Juchitán should just go, as far as I'm concerned. Atelaes 09:08, 6 April 2007 (UTC)
On the other hand, what place name entries can do here that they can't do at Wikipedia is provide translations. If I want to know, or inform others, what the Aeolian Islands are called in Yiddish, or Turkish, or Swahili, the place for that is Wiktionary. Wikipedia is willing to provide the name in the local language (in this case Italian), and the interlanguage links work for those languages that have an article, but not all other languages do have an article. Wikipedia does have lists of translations of place names, to be sure, but most of them have been nominated for deletion at one time or another on the grounds that the information there is more appropriate for Wiktionary than for Wikipedia. Angr 09:31, 6 April 2007 (UTC)
That's Wikipedia's problem though, not ours. Which is to say, that is fallacious logic: just because we can do something that another project doesn't, does not make that dictionary-appropriate. Wikipedia doesn't give translations (transliterations) or all of its Latin-script people, either which is another tens of thousands of entries we could add (or phonebook entries of restaurant reviews, for that matter). But if Juchitán doesn't belong in a dictionary, neither does Juchitán in Hawaiian, if there were such a word. You might get aways with sticking a compendium of placename translations in an appendix, but I still don't think they belong as articles. Dmcdevit 22:19, 6 April 2007 (UTC)
Juchitán gets [http://books.google.com/books?hl=en&q=Juchit%C3%A1n&btnG=Google+Search&ie=UTF-8&oe=UTF-8&um=1&sa=N&tab=wp 675 Google Books hits. I'll bet you a thousand dollars that at least three of those are uses that I would consider "attributive" (but don't hold me to it right this moment, I'm going to be incommunicado on vacation until mid-next week). Cheers! bd2412 T 06:15, 7 April 2007 (UTC)
I was about to put it on RFV and say "prove it," but I'll just wait then. :) Part of the problem is that "attributive" as noted in a discussion somewhere earlier on this page, is a bit ambiguous. I think it's clear it's intended to mean a having specific meaning tha describes something other than being simply in or from the place. So three cites of Juchitán being uses in the same sense, as in (totally made up) "a Juchitán pizza" or "a Juchitán sandwich" meaning "a pizza with fish" or "a sandwich with fish". "A Juchitán pizza" just meaning a pizza made in Juchitán is not the spirit of the criterion, since any placename can be used to modify in that way. The problem with the current entry is that even if there is such an attributive use, it is certainly not the one defined, which gives encyclopedic data about the city's location. Dmcdevit 06:42, 7 April 2007 (UTC)
Saying "That's Wikipedia's problem though, not ours" shows precisely what Wiktionary's problem is. Wiktionary and Wikipedia are complementary sister projects, not two completely unrelated websites. Dictionaries, not encyclopedias, provide translations of individual words. Angr 11:06, 7 April 2007 (UTC)
Wikipedia does not exist in a vacuum, but Wikipedians tend to behave as if it does. They have no one but themselves for their reputation. Perhaps if Wikipedians were inclined to cooperate, they'd find sister projects more willing to help where they can.
All that aside, it is a problem for Wikipedia, and not for Wiktionary. It is an encyclopedic concern; demographic statistics are the useful criteria - and that should not be in a dictionary. Including demographics has certainly met fierce resistance, in the past. --Connel MacKenzie 20:15, 7 April 2007 (UTC)
In my experience, it is Wiktionarians who behave as if Wiktionary exists in a vacuum, free from all established norms of lexicography. Translations for place names are not demographic statistics, and they are not encyclopedic. Where they belong is in a dictionary. Angr 08:45, 8 April 2007 (UTC)
The problem is not translations of place names in general, since many are validly included, but translations of place names that should not be included according to CFI. I don't see how translations of words deemed inappropriate for a dictionary are appropriate for a dictionary. Dmcdevit 22:35, 9 April 2007 (UTC)
The fact that "any placename can be used to modify in that way" does not detract from placenames being words; it enhances it. Consider it this way: a reader of a passage describing a Juchitán pizza may not be able to tell from the context whether this is a pizza from a particular place, or just a particular kind of pizza. The CFI exists, I think, not just to tell us that certain words are the kind that fit in a dictionary, but to tell us that words need to have a certain level of use before we should worry about potential readers coming across them and having the need to look them up in a dictionary. bd2412 T 23:23, 9 April 2007 (UTC)
No. Do not go on a deletion spree. Many of think that ALL placenames deserve an entry here. By all means have a vote though, but make it a simpler vote than the one that I started. I would suggest something like - "should we allow entries for all placenames, or should there be some other criteria (which criteria would be the subject of a second vote)". SemperBlotto 11:08, 6 April 2007 (UTC)
I would instead, suggest separate votes on each criteria proposed. Just not lumped together into one vote. That would give each enough time for debate, without as much crossover. --Connel MacKenzie 05:56, 7 April 2007 (UTC)
But vote on what? I don't disagree with CFI. Does anyone have a proposal to change it? And what does "should we allow entries for all placenames, or should there be some other criteria" even mean? We do allow “all words in all languages”, as long as they meet our guidelines for proof of being true words in common use. Similarly, we allow all placenames already, as long as they meet our guidelines for proof of being true words in common use. If you are suggesting that we include placenames that don't need such proof, I don't think that is tenable. All words need to pass attestation; why give carte blanche to placeames, of all categories of words? I just named the floor under my bed Pirate's Alley. It doesn't need to be attested? We think of some solution for this proliferation of encyclopedic entries with no attempt at giving attestable definitions. Dmcdevit 22:19, 6 April 2007 (UTC)
WT:CFI#Names of actual people, places, and things says "A name should be included if it is used attributively, with a widely understood meaning" and "A name should be included if it has become a generic term", but nowhere does it say only under those conditions. You want place names to be attested? That's easy: "This is the smallest and western-most of the inhabited Aeolian Islands and lies about 67 miles from Milazzo" is an attestation for the words Aeolian Islands and Milazzo. We just have to find two more such attestations, and both have passed the CFI. Angr 11:06, 7 April 2007 (UTC)
Yes, CFI is quite clear on this; but nobody follows it. I couldn't even get the name for Narita International Airport deleted.. should probably do a vote on this to change CFI. This sort of inconsistency is silly. Cynewulf 15:17, 6 April 2007 (UTC)
The CFI should not state that this is the policy if it is not followed. Someone needs to be bold enough to alter it to say that attribution is only one suggested criterion, and that more definitive criteria are under debate. DAVilla 16:09, 7 April 2007 (UTC)
As I write each time this is mentioned, many placenames are among the oldest and most interesting words we use (see what I have written before for examples). They can also be confusing to people who do not know a language well -- if I cannot understand a sentence in German containing the noun Köln, say Heute, mehr als dreißig Jahre danach, sind viele Bauten von europäischem Rang in Köln immer noch nicht wiederhergestellt, why should my life be made more difficult because Köln is banned from the dictionary? Well, OK, I knew that Köln translated as Cologne, which is where you turn right to get to Austria, but substitute Leverkusen, Hundhausen, or another small town in the area, and I would have been stumped. I can't imagine why placenames should be subject a tougher CFI than other words. Also, the attribution rule seems particularly odd. It seems to suggest that Cologne would get in because of Eau de Cologne but Köln would not because Kölnerwasser is merely a compound word. --Enginear 18:00, 7 April 2007 (UTC)
Yes, but how could Wiktionary help you in the above-mentioned circumstance? Say you looked up Leverkusen and discovered that it was a town in Germany. Well then... what? From the context, it is appearent that Leverkusen must be a placename, I am not certain how confirming that it was would do you much good. If you wanted to know where, specifically, it was, you should have consulted a map anyway, and not a dictionary. (What's that Wikimapia site, anyway?) -- Beobach972 00:12, 8 April 2007 (UTC)
Besides, Wikipedia would have an entry on Leverkusen, with the German and English names. Why should wiktionary have an entry for it — so you could look up the Russian translation? Well, if you're using Russian-language directions to try to navigate around Germany, you have — and I say this is all humour — some nerve coming to the English-language wiktionary and expecting it to help you! :-D -- Beobach972 00:12, 8 April 2007 (UTC)
I think Kölnerwasser would get in because it is not water from Köln, and thus Köln is used figuratively, so it would, I think, get in, too. -- Beobach972 00:14, 8 April 2007 (UTC)
I am, as I have surely stated before, in favour of deleting non-attributive placenames — or modifying our CFI to allow them (the Appendix idea is good, but in action it could become unwieldy and large). As it stands now, they should be deleted. -- Beobach972 00:12, 8 April 2007 (UTC)
CFI does not exclude them. It says place names used attributively should be included; it does not say place names not used attributively should be deleted. Angr 08:48, 8 April 2007 (UTC)
It says very clearly that they should not be included. What do you think that means?
I have said before that I do not agree with CFI on this point, but I'm not going to try to change it through re-interpretation alone. DAVilla 20:03, 9 April 2007 (UTC)


What are you doing? This isn't going to work. We define languages down to 639 code level for a reason, and treat variations/dialects within them within the section. We need the Min Nan section for everything that isn't Amoy anyway. (Chaozhou might reasonably be nan-ch.) We can't do this for the same reason we can't have "British" as a language header. Please stop. Was this discussed anywhere? Robert Ullmann 12:41, 6 April 2007 (UTC)

This was not discussed anywhere because I didn't think anyone cared. However, I did post an explanation of my reasoning at Category talk:Min Nan. In brief, it is not comparable to putting British instead of English. As I explain in my post, the prestige dialect of Min Nan is widely considered to be Amoy (Xiamen dialect). Therefore, I originally thought it no problem to label entries as Min Nan. However, this becomes problematic if we ever want to create separate entries for other Min Nan dialects, which is already happening with the case of Teochew. The most likely scenario would be Teochew, since there are a large number of Teochew speakers living in Western countries. Teochew is part of the Min Nan language family as well, but is only 50.4% mutually intelligible with Amoy.[13] Since, Amoy is a well established name for the language/dialect spoken in Quanzhou, Xiamen, Zhangzhou, Taiwan (known there as Taiwanese), and Southeast Asia (known there as Hokkien), it seems like the best choice. The language code can still remain as nan. If we ever need to create a separate language code for Teochew, we can do something like nan-CN-44 (per ISO 3166-2). I will give you some time to digest this. I realize it's all of the sudden. I honestly didn't expect anybody to know or care since I'm the only one that has ever created entries in Amoy Min Nan on Wiktionary (with the exception of maybe one or two words). I look forward to your response. -- A-cai 12:53, 6 April 2007 (UTC)
Indeed, way too sudden, and other people do care. Recall that we had some serious discussion on BP about the use of Mandarin v Mandarin Chinese, and whether Chinese should be subdivided at all. Renaming Min Nan (the ISO standard name) to Amoy without even mentioning it on BP is not good (note that we discussed "Scots Gaelic" to "Scottish Gaelic" for a while). The headers should almost certainly be Min Nan and Teochew. (And note that we are not using 3166 based code variants, they are deprecated; we, and ISO 639, code the languages, not the countries.) We have exactly one standard language header for each code. Finally, note that using the name of the "prestige dialect" might be considered serious POV. This has to be discussed on BP. Robert Ullmann 13:09, 6 April 2007 (UTC)
I have posted the above from my talk page in deference to Robert's wishes. If anyone else has opinions on the subject, please let us hear from you. -- A-cai 13:15, 6 April 2007 (UTC)
  • We have exactly one standard language header for each code
This is the crux of the problem. There are instances in which two languages/dialects are not mutually intelligible, but are assigned the same language code. This is such a case. Essentially, I'm looking for a solution. -- A-cai 13:22, 6 April 2007 (UTC)
Some background reading:
Enjoy. -- A-cai 13:29, 6 April 2007 (UTC)
Your first link doesn't really support you, as it says there are only two Min languages, North and South. (I take it the "Nan" in "Min-Nan" is the same as in "Nanjing", i.e. "South"? So would the other one be "Min-Bei"?) —RuakhTALK 14:01, 6 April 2007 (UTC)

Way over my head here since I've never studied Chinese. I read somewhere, though, that while the dialects are mutually unintelligible, the writing systems are mutually intelligible? Is that true, or is it a load of crap? If it were true, then except for specific idioms and such, we could just make "Chinese" entries but include lists of pronunciations in different dialects... You Chinese speakers totally rock, someday I will join you!!! :D Language Lover 13:48, 6 April 2007 (UTC)

Thanks for your response. Not true, this is a myth. Most Chinese are in fact bilingual, meaning they usually speak Mandarin and one other dialect. Since Mandarin is well established as the official language in most Chinese speaking countries, Mandarin has become the de facto written lingua franca. However, if one were to write one of the other Chinese languages/dialects in Chinese characters, it would generally be incomprehensible to a Mandarin speaker. For an illustration of my point, take a look at the right hand column in Appendix:Sino-Tibetan Swadesh lists, note the variety. -- A-cai 13:57, 6 April 2007 (UTC)

I don't know enough to comment on the specifics of Chinese, but on a general point I think that equating our language headers with IS-639 isn't necessarily ideal. Its use of separate codes for Middle and modern English encourages us to separate them, which I've argued above is not helpful; whereas languages like Jèrriais and Guernésiais, which are certainly distinct languages, are lumped together (with much else) under ‘Romance: Other’. In this case, then, A-cai seems pretty persuasive to me. Widsith 14:04, 6 April 2007 (UTC)

In response to Ruakh, Min is actually family of language families (read further down in the first article, I agree it's misleading) which includes Southern Min, Eastern Min, Central Min and Northern Min (based on their geographical location with respect to the Min river in Fujian). Southern Min contains four distinct strains: Amoy, Teochew, Qiongwen and Zhejiang Min Nan. None of these are mutually intelligible. Similarly, Eastern Min also has several mutually unintelligible dialects (Fuzhou dialect being the prestige dialect in that case). Amoy is known as Taiwanese in Taiwan, and Hokkien in Southeast Asia. Obviously, Taiwanese is inappropriate as a language header, because it leaves out the speakers not from Taiwan. Hokkien means Fujian. Like Min Nan, it is popularly identified with Amoy. However, since other languages/dialects are also spoken in Fujian besides Amoy, it doesn't seem appropriate either. I think Amoy is appropriate because it refers to the place of origin of this form of speech (similar to how English is a reference to England). -- A-cai 14:11, 6 April 2007 (UTC)
I also don’t know enough to reasonably contribute here. Let’s hope that in future versions, the ‘unintelligible dialects’ will be recognised as languages and get their own code. In the meantime, we have to think of something, since I think A-cai’s arguments hold ground (is that English?). Silly enough, the other thing occurs too, although that is much less of a problem: Vlaams is recognised as a separate language of Dutch, but I certainly do understand most of it if I make an effort, and almost no one from that region is going to treat his language differently from Dutch. They have a vls:Wikipedia, though.
I think we should trust on the judgement of the most knowledgeable persons here. H. (talk) 16:26, 6 April 2007 (UTC)
No, Vlaams is not recognized as a separate language, it merely has an ISO code. It is very important to realize that the SIL does give codes to major dialects as well as to languages and "super languages". The existence of an ISO code should not be taken necessarily to mean that it represents a distinct language. Interestingly, the article about West-Vlaams (on the vls Wikipedia defines West-Vlaams as "a dialect group in Dutch" ("de meest zuudwestelyke dialectgroep van et Nederlands"). So the Wikipedia in that :language" doesn't even define itself as a language. --EncycloPetey 22:24, 6 April 2007 (UTC)
I think part of the problem is that ISO 639 is fairly detailed with respect to Western languages, but falls down on the job with respect to lesser known languages, especially Chinese dialects. I think the nature of the problem is not fully understood by the average person in Asia either. This is partially a result of the promotion of Standard Mandarin as the official language. Most people here will not be able to read the following link:
However, I would like to post it so that it is part of the record. It is a discussion about what to call the language that I'm proposing we call Amoy (based on a history of usage that actually predates the use of the term Min Nan). Various people from Amoy speaking areas (Singapore, Taiwan, Malaysia, PRC) have posted their opinions in both Mandarin (some in simplified script and some in traditional script) and Amoy (some in Chinese characters, some in Romanized script). I wish I had time to sit down and translate the whole thing for you, but it would take way too long. In short, some of the posters did feel that the term Min Nan is too broad to be useful. Min Nan is an academic term that describes a group of languages/dialects spoken by people who originally came from Southern Fujian. In that sense, it is a legitimate label. However, it is not useful as a label to describe a single language that is mutually comprehensible to all of its speakers. To put it in terms that a western audience will understand, saying that Amoy and Teochew are the same language by virtue of the fact that they both belong to Min Nan would be akin to saying that Spanish and French are the same language by virtue of the fact that they both belong to Romance. I should mention also that Chinese dialects do all have one thing in common; they don't generally distinguish between plural and singular. In other words, the Chinese word for Min Nan may be interpreted as either Min Nan language or Min Nan languages. The way you translate it would depend on the context. If your talking about Min Nan in the context of one specific dialect such as Amoy, then it would be Min Nan language. If you are talking about all of the varieties of Min Nan, then it would be Min Nan languages.
In summary, does anyone have objections if I continue with my work. Robert, are you satisfied with the discussion? Do you still have any concerns? -- A-cai 23:26, 6 April 2007 (UTC)
Part of the reason we follow ISO-639 is so that we have someplace to defer these ridiculous "splitter" debates to. It is not within our remit to make these decisions. No ISO 639? No heading. (You'll note that I lost the "Chinese" vs. "Mandarin/Min Nan/..." debate, based only on the argument that ISO 639 gives them codes - even while those language names are not recognized by the broad majority of English speakers. If this were a reasonable Wiktionary, we'd call them all "Chinese", precisely as they are called in English.) --Connel MacKenzie 05:48, 7 April 2007 (UTC)
Connel, I can tell that you're against the idea. But what I don't know is whether your response is a knee jerk reaction or whether you've taken the time to actually read all of the info I posted above. My responses to all of your concerns are already up there. I'm sorry that Sinitic languages are not cooperating with ISO 639 standards. You make ISO 639 sound like a well established standard that has been around for years. In fact, ISO 639 was first published in 2002, and has continually undergone revisions since then (the most recent being February 2007 with the publishing of ISO 639-3:2007). Do we really want to put all of our eggs in that basket at this point? I'm trying to dispell myths and avoid confusion. Sometimes, a square peg just won't fit into a round hole. I realize that remarks are often taken the wrong way in BP, but I feel like I'm being ordered to comply with some arbitrary regulation! I've practically single handedly built up our inventory of Amoy Min Nan words from scratch. Frankly, I think that entitles me to more of an opinion than the rest of you about how to format the entries. Is that wrong of me? -- A-cai 06:55, 7 April 2007 (UTC)
Ok, maybe I was a little too forceful in my last post. Let me try a different approach. Take a look at the translation section for the word child. I have reformatted the Chinese section in a way that I think makes sense, based on my experience with this. Let me make it clear. We are not talking about synonyms within one language called Chinese. As I stated before, both Teochew and Amoy belong to the Min Nan group. However, the Amoy word and the Teochew word are not interchangeable in some unified language called Min Nan. This is an inconvenient fact, despite what the ISO language codes imply. The ethnologue page for nan specifically states that Amoy and Teochew are not mutually intelligible.[14] So here is the question I pose: what exactly should we do about this situation? -- A-cai 08:35, 7 April 2007 (UTC)
Sorry for being so blunt in my last post. I was (seemingly) knee-jerking in response to Widsith' knee-jerk. Brooklyn-ese is (for the most part) mutually unintelligible with Texanglish. I hope you weren't suggesting that the same phenomenon doesn't exist even within America, let alone when considering US/UK issues. Both bum & fanny are cutsie baby-talk words in America, yet are apparently quite vulgar in UK English...that is, here, you can say "Sit your fanny down" to a three year old, and everyone will smile; if you say "Sit your ass down" to that same three year old, someone would instantly call Child Protective Services.
There is nothing knee-jerk about my thoughts on this issue. Watch your tone. Widsith 16:33, 7 April 2007 (UTC)
Perhaps you and I interpret knee-jerk differently? You said above "I don't enough about the specifics..." then went on to reiterate your stance from a previous conversation that was essentially turned down. Or, was your comment about tone a reference to the example words I picked, because they are vulgar in your dialect? I didn't mean that as a slight - it was a simple statement of fact. Your threat, on the other hand, seems rather pointed. --Connel MacKenzie 03:47, 8 April 2007 (UTC)
I see the dialect issues you raise, as equal or lesser, to the US/UK debate, which has concluded (many times now) with the language heading ==English==. --Connel MacKenzie 14:03, 7 April 2007 (UTC)
It may be necessary to fall back on some other authority for our sanity, but these language-splitting debates are not ridiculous by any means. They may be political, and people might say the same thing about politics, but ridiculous is being drafted into war, deported to another country, or imprisoned for your beliefs. Issues of opinion cannot be discounted as such. They can weigh very heavily.
Many, I think most who know anything about Chinese, would think the opposite of what you said, that it is a Wiktionary that classified all Chinese languages as "Chinese" which would be unreasonable. The existence of ISO codes may have been why you gave in on the distinction, but it is not the only reason you lost. And Widsith's response was not a knee-jerk reaction. If you want to insist that the criteria be objective, that we not make distinctions for ourselves, that's fine. However, it is not only permissible but appropriate and in fact necessary to gauge how well the criteria meet our needs. DAVilla 16:44, 7 April 2007 (UTC)
I maintain that it is ridiculous for us (Wiktionary) to be taking on the role of mediating what "is" or "is not" a language, particularly when the ISO-639 does exist, and does have methods for ammending it, directly. --Connel MacKenzie 03:51, 8 April 2007 (UTC)

Min Nan (language)

  • Xiamen (dialect)
    • Amoy (subdialect)
    • Fujian
      • Fukien
      • Hokkian
      • Taiwanese
  • Leizhou
    • Lei Hua
    • Li Hua
  • Chao-Shan
    • Choushan
    • Chaozhou
    • Teochew
  • Hainan
    • Hainanese
    • Qiongwen Hua
    • Wenchang
  • Longdu
  • Zhenan Min

See why we want to resolve things on the level of ISO 639 coding? If we use "Amoy", we need at least 17 more names and codes, and we still will have nothing for Min Nan itself. (And this is just this language, there are 12 others, we end up with several thousand if we code sub-dialects) We should keep Min Nan (code nan), which is primarily Amoy, but will have the other dialects noted in pronunciations, etc. The exception is Teochew (Chao-Shan), which is not mutually intelligible to any useful extent, and needs an extension code. (nan-tch or whatever, in the Min Nan WP they are discussing defining an extension code). Note that this coding applies to all of WM: it is used in the domain names and prefixes. The only thing we should be doing is that: deciding on a nan-xx extension for Teochew. Robert Ullmann 14:17, 7 April 2007 (UTC)

Connel, US/UK English being under the same L2 header makes sense because:
  • Anglo-Frisian ⊂ West Germanic ⊂ Anglic ⊂ English (mutually intelligible: US/UK/Australian etc.)
In parallel, we have
Chinese ⊂ Min ⊂ Min Nan ⊂ Amoy (mutually intelligible: Quanzhou, Xiamen, Zhangzhou, Taiwanese)
However, if you were to say:
therefore, Scots language and English should have the same L2 header called ==Anglic==, this would be analogous to saying:
  • Chinese ⊂ Min ⊂ Min Nan ⊂ Teochew
therefore, Teochew and Amoy should have the same L2 header called ==Min Nan==. Obscuring this issue, is the fact that many people think of Amoy when they think Min Nan, just like many people think of Standard Mandarin when you say Chinese. -- A-cai 15:00, 7 April 2007 (UTC)

Robert, are you suggesting that we keep nan for Amoy and call it Min Nan, but use Teochew with nan-whatever but call it Teochew? BTW, I agree we need more codes for Chinese languages. They have been short changed, there's no way around it. -- A-cai 15:00, 7 April 2007 (UTC)

Also, your list implies that we give separate codes for Taiwanese and Xiamen etc. This is not what I'm saying. I'm only talking about having separate codes for groups of mutually intelligible languages/dialects. So the number wouldn't be 17, but it would be more than just one, which is simply inadequate. -- A-cai 15:05, 7 April 2007 (UTC)

If I'm reading you correctly, the translation section for child would look like:

Is this what you're proposing? Doesn't that look funny, since Teochew is also Min Nan? -- A-cai 15:09, 7 April 2007 (UTC)

I understand Robert to mean
which is very similar. Although I agree with the both of you on the utility of distinguishing these, I would suggest that the Teochew entries just be labeled contextually under Min-nan until such time as the ISO codes are updated. DAVilla 16:46, 7 April 2007 (UTC)
I'm proposing that we use nan=Min Nan for Amoy (which is what most people mean, and this is the common name) and mutually intelligable variants, and nan-tch=Teochew for Teochew, both as L2 headers and languages in the translation tables. Where "nan-tch" is whatever code the WM projects overall adopt. We can't wait for SIL/ISO; and WM already has extension codes where needed: fiu-vro is Template:fiu-vro. Robert Ullmann 12:38, 8 April 2007 (UTC)

Amoy: prestige dialect policy vs. inclusive dialect policy

As I see it, we need a decision about Wiktionary policy. Here are our three choices (if anyone has another choice, I'm open to suggestions):
Model 1 (in cases where only one ISO 639 code exists for more than one mutually incomprehensible dialect, we will label the prestige dialect according to its localized name, and label non-prestige dialects by their colloquial name, and add an extension to the code)
  • cdo = Fuzhou dialect -> ==Fuzhou==
  • cdo-extension (TBD) = Fuqing -> ==Fuqing==
  • nan = Amoy dialect -> ==Amoy==
  • nan-extension (TBD) = Teochew -> ==Teochew==
  • nan-extension (TBD) = Qiongwen (Hainanese) -> ==Qiongwen==
  • wuu = Shanghai dialect -> ==Shanghainese==
  • wuu-extension (TBD) Southern Wu -> ==Southern Wu==
Translation section would be (child):
model 1a
model 1b
Model 2 (in cases where only one ISO 639 code exists for more than one mutually incomprehensible dialect, we will label the prestige dialect according to the ISO code, and label non-prestige dialects by their colloquial name, and add an extension to the code)
  • cdo = Fuzhou dialect -> ==Min Dong==
  • cdo-extension (TBD) = Fuqing -> ==Fuqing==
  • nan = Amoy dialect -> ==Min Nan==
  • nan-extension (TBD) = Teochew -> ==Teochew==
  • nan-extension (TBD) = Qiongwen (Hainanese) -> ==Qiongwen==
  • wuu = Shanghai dialect -> ==Wu==
  • wuu-extension (TBD) Southern Wu -> ==Southern Wu==
Translation section would be (child):
Model 3 (only the prestige dialects would be given a separate L2 header, hope for new ISO 639 codes in the future)
  • cdo = Fuzhou dialect -> ==Min Dong== (Fuqing may be included in the pronunciation section, but only Fuzhou gets example sentences)
  • nan = Amoy dialect -> ==Min Nan== (Teochew and Qiongwen may be included in the pronunciation section, but only Amoy gets example sentences)
  • wuu = Shanghai dialect -> ==Wu== (Southern Wu may be included in the pronunciation section, but only Shanghainese gets example sentences)
Translation section would be (child):
So here's the crux of the matter; which of the above three models should we go with? Model 1 (A-cai), model 2 (Robert), or model 3 (Connel)? -- A-cai 01:27, 8 April 2007 (UTC)
I'd change "hope for updated ISO..." to "push for updated ISO..." in the above. But that "push" needs to happen there, not here. --Connel MacKenzie 03:55, 8 April 2007 (UTC)
Please avoid the POV term "prestige" in this conversation (if possible.) --Connel MacKenzie 03:56, 8 April 2007 (UTC)
What term would you prefer? -- A-cai 04:10, 8 April 2007 (UTC)
"Widespread"? "Widely recognized"? I really don't know, but prestige has serious negative connotations in this context. --Connel MacKenzie 15:39, 10 April 2007 (UTC)
Take a look at 朋友, I've added example sentences for Amoy, Teochew and Mandarin. I don't know Teochew, so I had to rely on this site. I think the Teochew sentence is correct (enough for this discussion anyway). I hate to do it this way, but if we go with Model 3, I don't know how else we could reasonably do it, and still do justice to the languages in question. Opinions? -- A-cai 07:26, 8 April 2007 (UTC)
That example is good, now change Teochew to an L2 header ...
SIL/ISO are working on more codes (4 and 5 letters ;-), but that is a long process. For WM to have a Teochew Wikipedia now (something for which there is interest), we need a code like nan-tch. I don't know the currect state of the discussion there (reading Min Nan / Amoy in POJ is a bit beyond me). We need to be able to add a limited number of extension codes like this.
As to "pushing" ISO, the precise way to do it is to see what we need to code, and what definitions we want and use, and then feed that into their process. That's how it works. (Not ringing them up: "Hello SC2? We need more codes ..." ;-) Robert Ullmann 12:57, 8 April 2007 (UTC)
This being an multilingual issue, it would be good to have the support of other Wiktionaries before proposing such changes in the outside world. Not that we want to be taking official positions on this sort of thing, but anyway if we did we couldn't say it was Wiktionary's position, only the English-language Wiktionary, or it would be misleading. DAVilla 18:42, 8 April 2007 (UTC)
Before we go to other projects, it would be nice if we had more of a consensus here on the English Wiktionary about what to do. I say this because I think a lot of the other Wiktionaries look to English Wiktionary as the model (rightly or wrongly). After we achieve a consensus (hopefully), where would we go to ask other Wiktionaries? Would that be on some page at Wikimedia? I know they have a lot of pages there that are sort of gathering places for multilingual issues like these. It sounds like Robert believes that model 2 is the best approach, whereas Connel was leaning toward model 3. I think model 1 is the best from a linguistic precision point of view. However, I recognize the technical standards arguments, and agree that model 2 might be the best we can do in light of the fact that the ISO standards may not be updated for quite some time. I think model 3 is like trying to force a square peg into a round hole. Obviously, we don't run into this a lot yet. We actually don't have that many Teochew words. The ones we do have are mostly in the translations section. On the other hand, our policy deficiencies in this area just might be part of the reason for this. Do we have anyone that could act as a tie breaker? -- A-cai 23:25, 8 April 2007 (UTC)
See, normally, I'd refer you to our resident expert on such matters: some guy with the username "A-cai". --Connel MacKenzie 15:39, 10 April 2007 (UTC)
I guess the deafening silence that followed the last few posts has given me my answer. Since, Robert and Connel feel strongly about leaving the L2 header for Amoy as Min Nan, and nobody else has offered any passionate counter arguments, I will not push for Amoy to be an L2 header (at least for now). What to do about Teochew is another matter, probably best tabled for the time being (until we actually get a regular contributor of Teochew words). By the way, some of you may have noticed that I only recently created the w:Amoy (linguistics) article on Wikipedia. The article was actually featured in the Did you know? section on Wikipedia's main page for 10 Apr 2007. Not bad for a language/dialect (whatever) that doesn't even have its own separate ISO code ;o) -- A-cai 10:27, 10 April 2007 (UTC)
The deafening silence reiterates my point, that we are not situated to act as an authority on this matter. --Connel MacKenzie 15:39, 10 April 2007 (UTC)
Let me break the deafening silence after my long week-end: I feel for option 1, but could live with option 2. Option 3 just twitches linguistic reality too much. It is not because those languages are unknown, that they don’t deserve a header. Just like Jerriais or Tagalog, languages I have never heard of as well, but about which there is no discussion.
And indeed we are not an authority, but presumably there are just too few knowledgeable people on this topic at all in the world, so let’s do what seems most reasonable: listen to those who at least know some of it.
However, I am unsure about the romanizations: you link them all, do we want that? Are they used as words? Even those with the numbers? In child, the Teochew translations have to be cleaned up: either put wikilinks around them, or parentheses etc. Please edit Wiktionary:About Chinese and report here. H. (talk) 09:16, 11 April 2007 (UTC)
Based on the discussions so far, I have revised WT:AC in the following section: Wiktionary:About_Chinese#Min_Nan. If anyone feels the wording needs revision, or we need more added, please let me know. I think one success story of Wiktionary, so far, is that by nature of the fact that we are a multilingual dictionary, various languages/dialects get thrown together that might not otherwise have had to live in the same space, and this tends to put us face to face with the question of what exactly is a language? We thought we knew until we started playing with Wiktionary :-P
BTW, I do want to provide a more complete response to the US/UK English argument, because that has come up several times in this and other similar discussions. Has anyone noticed that we do not have a separate Swadesh list for US/UK/Australian etc English? That's because these variants of English are so closely aligned phonologically and lexically, that nobody seems to feel a pressing need for separate lists. It's why an American doesn't generally need subtitles when watching a Hugh Grant movie. US/UK/Australian etc English is what we mean when we talk about mutually intelligible languages. Now, take a look at the Appendix:Sino-Tibetan Swadesh lists. You can't even get past the third word in the list without running into significant differences among the Chinese dialects (word seven for Amoy and Teochew)! This is because you are now looking at languages/dialects which are not mutually intelligible (in other words, there is no such thing as one big happy language known as Chinese). There, now I have that off my chest. -- A-cai 12:50, 11 April 2007 (UTC)

The logo in the upper left hand corner

Hi, we often see people posting things here which belong in wikipedia. I've given some thought to why this might be and one thing I realized is our logo on the upper left corner says "a multilingual free encyclopedia". Of course that's because it's supposed to look like a snapshot of a page out of a paper dictionary, which would list Wiktionary right after Wikipedia. Personally, I love the logo and whoever made it kicks all kinds of ass :-) I wonder, though, if the way it is might contribute to the confusion some of our readers seem to suffer. What do you all think? Language Lover 22:13, 6 April 2007 (UTC)

See WT:FAQ. :-) Brion was astounded that the logo he threw together in a couple minutes (if that long) had lasted two years. When it was clear that it was still superior to the logo-vote proposals, he was shocked. Back then, the entry for Wikipedia was just before Wiktionary. --Connel MacKenzie 05:39, 7 April 2007 (UTC)
It's an interesting thought, but somehow I doubt that's really the reason, or even a contributing factor. I think it's just that (1) many or most people lack a clear sense of the difference between a dictionary entry and an encyclopedia article, and (2) many or most people who stumble upon one of the two projects don't fully appreciate that both exist and are sister projects and that a given fact is generally not appropriate for both. (Indeed, given people's propensity to add useless facts to Wikipedia articles, I think there might be a more general principle that people are happy to contribute regardless of the usefulness of their contribution. Surprisingly, this actually seems to work out pretty well.) —RuakhTALK 05:48, 7 April 2007 (UTC)


Maybe you'll be interested to know that there a template on fr: that converts automatically IPA to X-SAMPA, via javascript. We can choose to switch to one of them (or both) with some lines on the CSS page.

Thanks to that, we got rid of the API/X-SAMPA distinction in the pronunciation sections, and we also use it in the flexion templates (see chat, chanter for examples).

Do you think such a template could be used here on en: ? - Dakdada 17:12, 7 April 2007 (UTC)

Wouldn't it be easier on us to convert X-SAMPA into IPA? I can't read most of the IPA characters in the edit box, and a few on the page, even with the fancy font stuff. DAVilla 18:04, 7 April 2007 (UTC)
It can be adapted like that, yes. - Dakdada 16:22, 10 April 2007 (UTC)
I'll try and look at it later; if it converts it, then displays both, then I'm fine with it. If both are displayed, I can't imagine what objections might arise. --Connel MacKenzie 19:59, 7 April 2007 (UTC)
Yes, it can display both, like « /ʃɑ̃.te/, /SA~.te/ » (or something else). The only thing is that it is done by a script, not by the software like the {{UC:}} stuff. - Dakdada 16:22, 10 April 2007 (UTC)
I really like this idea, especially because I often encounter SAMPA transcriptions that indicate a different pronunciation to the IPA on the same page. --Wytukaze 17:58, 12 April 2007 (UTC)

Language or dialect

Last year I proposed changing "language" to "language or dialect" in the ELE if there were no objections, but even I forgot about it. I would like to know if there are any objections now. As I see it we use "language" to mean anything (sans Translingual) that is acceptable as a two-level header, which could be a language or a dialect. In fact the distiction between language and dialect is not liguistically precise.

I believe this change is very closely coupled to a change in the way we list certain languages under Translations. If Mandarin (or Mandarin Chinese) and Min-nan are languages by our definition, then they should be alphabetized under M. I know this is going to generate some controversy, and I anticipate having to bring the latter to a vote. Related issues include what to classify as a language (Amoy, Serbian), what to name the languages, and how to alphabetize, but aside from trying to force "Chinese Cantonese" as a name, although even "Chinese Mandarin" has been turned down in the past, it is possible to keep those topics independent of the question I'm raising. DAVilla 17:16, 7 April 2007 (UTC)

What we have now is a nightmare to parse; you wish to make it an order of magnitude worse? I think I could oppose that measure. We don't have Hippietrail's extension loaded here yet (go test it on http://wiktionarydev.leuksman.com/) that groups languages together by language groups, arbitrary groupings, and possibly by arbirary user preference groupings (I don't know that he has that part working yet.)
Without underlying software that can unify the different language names entered, I will strongly oppose "opening the floodgates." Even then, we'd need some way of describing just what a dialect is. Brooklynese? Connelese? --Connel MacKenzie 19:56, 7 April 2007 (UTC)
What are you talking about? I'm not proposing a change to what we consider to be valid 2-level headers, and as far as I know Brooklynese is not one of them. All I'm proposing is that we acknowledge that there is no distinction between "language" and "dialect", and that by "language" in our terminology we sometimes mean what most people would consider a dialect.
And I don't intend to make anything worse to parse. In fact it would be easier to parse if
* Chinese
*: Cantonese
*: Mandrin
* Japanese
* Cantonese
* Japanese
* Mandarin
If you misread above, that's all I'm proposing. DAVilla 20:37, 7 April 2007 (UTC)
Introducing the misleading term "dialect" into the debate, I cannot see as being helpful. --Connel MacKenzie 03:30, 8 April 2007 (UTC)
We keep talking ourselves in circles whenever we raise the issue of language or dialect. A dialect is a language, and a language can be a dialect. Let me demonstrate my meaning by using an analogy. Let's substitute the word language with fruit and the word dialect with apple. ISO 639 codes can either represent a fruit or an apple. Of course, there are many varieties (accents) of apples (Washington apple, crab apple etc.), but we don't worry about that for the purposes of an L2 header. An orange is also a fruit, but it is clearly not an apple, so it gets a separate L2 header. But what about a pear? There are some pears that look remarkably similar to apples. In trying to define which things are fruit and which things are apples, we run into a problem that an apple is a fruit, and a fruit can be an apple. In the Amoy post, I'm essentially arguing that Amoy (apple) and Teochew (pear) are two types of fruit. Connel's counter is that they are the same fruit because they both have the code nan (which contains several types of fruit, each type of fruit having several varieties). I'm trying to separate things out at the fruit level, but am not always aided in this effort by ISO 639 codes, because two or more types of fruit are sometimes covered under the same code. This is because sometimes we argue about whether an apple is a fruit or just an apple. -- A-cai 00:19, 8 April 2007 (UTC)
Okay, so it has to be handled a little more sensitively than I thought. My intent was simply to say that these two-level headers that we call "languages", whatever they might actually be, are all in the same basket, so to speak. DAVilla 03:26, 8 April 2007 (UTC)
Yes, leave the word "dialect" out of it (and out of ELE). (There are a number of linguists who eschew the term, preferring "group", "language" and "variant", precisely because the term gets misused.)
That said, by all means lets get rid of the nested construction, and just put the languages in alphabetical order. Definitely easier, and Hippietrail-like things can present language groupings however preferred by the user. Robert Ullmann 12:14, 8 April 2007 (UTC)

Links to Google books and groups

...should be discouraged. There is an example on Wiktionary:Quotations which could not be any better:


When I try to read page 131 of Treasure Island, I'm asked to log in (even though log-in is not a requirement for this book, as it is for some). We don't ask users or even contributors to register with us. Why would would ask them to register with Google?

Furthermore, the link includes information in a &sig field that tracks who copied the link, and which doesn't reference the correct page if removed. Until I took it out, the link even contained all of the search criteria used to locate the quotation. I would remove such links on much simpler grounds, namely that they point to a dynamic CGI page rather than a static one, when even static URL's are highly susceptible to breaking.

Anways, the whole point is that the book is durably archived, not the website, so we should be using ISBN's. The URL is essential only when it is part of the record. For instance, when I quote websites—which isn't often as they almost never meet CFI—I always print the domain name, e.g. secretstrom.blogspot.com [15]

DAVilla 17:58, 7 April 2007 (UTC)

It's worth pointing out, the extra data contained in the link does have a positive side: it often causes the words to be highlighted in the text, which is a very nice feature, especially if the page is large with small print. If we had a way of just linking directly a page, there'd be no way for Google to know what words to highlight, and everyone would have to painstakingly read on the order of the whole page to find the word. Language Lover 21:19, 7 April 2007 (UTC)
I agree. Besides, if someone wants to link to Treasure Island, they can find the full text at our sister project Wikisource. --EncycloPetey 18:30, 7 April 2007 (UTC)
The precedent was set by the proponent of the current RFV system (Muke.) He obviously would never have been given that inch, if he didn't include exact pointers so that people can check.
Until we have a more reasonable WT:CFI, and a working WT:RFV, it is only for obfuscation, that one might wish to remove the direct pointers. Perhaps a week after a successful RFV, that might be reasonable. But during the verification phase, it is just a waste of everyone's time to camouflage the links. --Connel MacKenzie 19:41, 7 April 2007 (UTC)
By the way, constantly referring to our CFI in such a derogatory tone could be considered mild propaganda. Personally, I like our CFI, but that's as irrelevent as your disliking it. The fact is there are two sides to a dictionary: the readers who want to make their vocabulary sound smarter, and the readers who want to figure out what a word that living humans use means. Our CFI should be a compromise between both, and I think that combined with good context tags to accomodate the former, it does a good job at being that compromise :-) The fact is, no matter how dubious or "unwashed masses" the citations are, that doesn't mean they aren't words (1), nor that noone will ever want to know what they mean. Language Lover 21:19, 7 April 2007 (UTC)
Tee-hee, he called Connel's propaganda "mild" ><. - [The]DaveRoss 21:37, 9 April 2007 (UTC)
Also of note: the usenet archives exist only on that website now. That (as ridiculous as the notion is,) is precisely what is considered to be "durably archived" (well, sorta) for the purposes of our broken WT:CFI. So no including a very exact link for those, would mean the link is not truly "durably archived" after all. Therefore, all usenet "citations" should once again be removed. Is that what you are asking for? --Connel MacKenzie 19:44, 7 April 2007 (UTC)
You seem to be conflating <it is durably archived> with <we provide a link to a durable archive of it>, but the two strike me as quite different. Also, I understand DAVilla's comment as referring to citations given in entries, not to discussion at WT:RFV. (?) —RuakhTALK 20:16, 7 April 2007 (UTC)
My edits to Wiktionary:Quotations, reverted at Enginear's request, made it clear that links were acceptable during the RFV process, but I did not mention it above. When passing an RFV, which isn't a chore I take up regularly, I always check the Google book links before removing them from the page. I don't wish to over-proceduralize that aspect of the process, so I simply suggested that they be put in the talk spaces. But I didn't use a chisel to write that. DAVilla 20:59, 7 April 2007 (UTC)
I didn't go into detail on Usenet. I can accept linking those discussions, but I have suggestions for the links. When performing a search, the URL is the same garbage as with Google books:
However, messages have individual ID's that are part of the Usenet structure, such as 915eac996e0b21a9 for the link above. On Google groups, this is retained in a static URL that could be link as alt.fan.james-bond [16]
DAVilla 20:59, 7 April 2007 (UTC)
To DAVilla: I agree wholeheartedly. I don't suppose you could amend WT:CITE to indicate how exactly ISBNs should be formatted? —RuakhTALK 20:16, 7 April 2007 (UTC)
I agree that it is the book which is durably archived, but the paper copy is often not the most accessible source, particularly when it is rare and old; even the British Library normally gives access to facsimile scans of old books, rather than to the books themselves. We give a quote which is perhaps one sentence long, while a page or two would be needed to give reasonable context. The main purpose of a quotation is to show exactly how the word was used. It is important that people who want to find this out are able to inspect the full context. The ability to facilitate this is one advantage we have over paper dictionaries and I believe we should therefore give links wherever possible. --Enginear 20:56, 7 April 2007 (UTC)
Google books links should be encouraged. Removing them is almost vandalism. They allow users to check context without physically going down to their local library and likely having to order the books from somewhere else, which requires a significantly more inconvenient registration process than signing up for google books. Kappa 13:17, 11 April 2007 (UTC)

RFP: Format of examples and quotations


This is a request for proposals on the format of examples and quotations between definition lines. Only those options which are seconded will be included in an approval vote, to be held no sooner than April 21. Proposals by new and anonymous contributors must also be sponsored by a regular (200 edits, two weeks prior). Contributors can make several proposals, but may be asked to limit the number they sponsor if there would still be five or more options overall. As always, discussion is welcome.

Given the objections in the preceding conversation, I see this "request for proposals" as completely inappropriate. --Connel MacKenzie 03:58, 8 April 2007 (UTC)
I meant for this to be completely tangential to that discussion. Since only the information provided should show in any proposal, links were not meant to be part of the discussion. I didn't think anyone would conder a link to be a requirement in adding a correctly formatted citation since there are books that do not appear anywhere online. However, there is certainly enough information to obtain a link, so for clarity it is worth declaring that, regardless of one's opinion on the propriety of external links, they should be excluded included in every case, and that doing so does not reflect in any way on one's convictions with regard thereto. DAVilla 12:26, 8 April 2007 (UTC)
I assume that you would not ban an "important" cite, say the earliest cite so far entered, because some of the info is not known to the editor (eg the publisher of some early documents is often not clear). Similarly, as you say below, an important cite should not be banned just because it is not easily verifiable, unless there is reason to believe it may be fake. Certainly, it should not be banned just because it cannot be verified online. However, if there is a choice between two otherwise similar cites, the one available online should be chosen as being the most convenient for those readers who want to research the context further. --Enginear 17:48, 10 April 2007 (UTC)
This request has no bearing on other citations, except to lead by example in the final selection. I have asked that the specific information listed below be included in any proposal, no less and no more, only to have consistency across the different options. The way this has been done in the past has concerned these abstract placeholders that you're alluding to. I'm a little fed up with that because it doesn't give me any valuable feedback on the sorts of quotations I find in the real world. I could have come up with some really wacky stuff, but these in comparison are pretty tame and still don't fit the mold very well. So here we have three real-world quotations (and one real-world example, don't forget) and a different approach to the same problem. DAVilla 21:57, 10 April 2007 (UTC)

The proposals are by example. They must show all of the following information, regardless of correctness or verifiablity, unless it is deemed irrelevant, and unless it is tied into the proposal as a requirement, no more than the following:

    • Example:
      Our grandson owns a radio, but he’d like a transistor.
    • Quotation:
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • Original work:
      Author: Ruth Bondy
    • Quoted work:
      Date: 1968
      Subtitle: The people of Israel’s story in their own words: from the threat of annihilation to miraculous Victory
      Translator: I. I. Taslitt
      ISBN: 0491008392
      Page: 25
      Publisher: Sabra Books
      Location: New York
    • Quotation:
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • Original work:
      Date: 1973
      Author: Iris Murdoch
      Title: The Black Prince
      Publisher: Viking Press
    • Quoted work:
      Date: 2003
      ISBN: 0142180114
      Page: 407
      Publisher: Penguin Classics
    • Quotation:
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”
    • Original work:
      Date: July 25, 2004
      Interviewer: Liane Hansen
      Interviewee: Lon Simmons
      Title: Baseball Announcer Simmons Enters Hall
      Production: Weekend Edition Sunday
      Producer: National Public Radio

Previously (Oct - Dec 06)

Current practice, so far as I can tell, is shown immediately below. Please feel free to edit it, with comment, if you feel you know better. There are many things I am uncertain about. DAVilla 01:35, 8 April 2007 (UTC)

  1. A transistor radio.
    Our grandson owns a radio, but he’d like a transistor.
    • 1968, Ruth Bondy, I. I. Taslitt (tr.), Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory, Sabra Books (New York), →ISBN, p. 25—
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), →ISBN, p. 407—
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004, Lon Simmons, Liane Hansen (interviewer), “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio, July 25—
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”
edited --Enginear 19:55, 8 April 2007 (UTC)
added pedia links, missing comma, changed some to colons as was the standard before, edit: abbr. page, (date)->date DAVilla 22:04, 8 April 2007 (UTC)
Did't we use to use dashes before the quote? Anyways, doesn't matter. This one isn't up for consideration. DAVilla 22:07, 8 April 2007 (UTC)
This is getting a bit anal, but...changed colons after date to commas...colons predated this period, being the standard from Dec 05 - Oct 06...Dashes were used pre-Dec 05. Thanks for the other corrections. :-) --Enginear 17:58, 10 April 2007 (UTC)

The sources can be found on Google books here and here (registration required) and on NPR.org here. Edit: The last page contains an audio link but does not have the word in print, as transcripts are not available without charge. DAVilla 20:30, 7 April 2007 (UTC)

Actually no registration is required for this one: don't be mislead by b.g.c. asking you to log in; usually closing the form asking for log in is sufficient (though there are a few where login is required). --Enginear 20:07, 8 April 2007 (UTC)


(as interpreted by --Enginear)

Please feel free to edit, with comment, if you feel that this does not reflect policy or the practices agreed upon. DAVilla 21:34, 8 April 2007 (UTC)

  1. A transistor radio.
    Our grandson owns a radio, but he’d like a transistor.
    • 1968, I. I. Taslitt et al. (translators), Ruth Bondy, Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory, Sabra Books (New York), →ISBN, page 25,
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), →ISBN, page 407,
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004, Lon Simmons, Liane Hansen (interviewer), “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio (July 25)audio,
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

I believe this is the best format.I'm changing my vote to one with a minor tweak about six sections below. --Enginear 17:13, 14 April 2007 (UTC) The date and author (or translator) are the most important features, so are put first. The page no is put after the publisher and date, since it relates to a particular edition, not to the text in general. I assume that the quotes are chosen to illustrate different features. (The second book has been linked to its Wikipedia entry, but if it were available on Wikisource or Gutenberg, that would take precedence.) However, IMO, the first cite is imperfect because only a snippet is available on the net. Similarly, audio (the last one) is technically substandard because the spelling cannot be checked. Personally, I would try to find better ones, although for what I assume is now an archaicism, and rare compared with other uses of the word, I might not succeed. --Enginear 19:55, 8 April 2007 (UTC)

I have removed your addition of Ohad Zemorah because, apart from Google books, every source I can find claims that he is an editor. If he were an author I could understand, but anyway the RFP specifically says that no additional information is to be added unless it is required. In fact if he were an author then being listed second it might still be permissible to exclude him, especially since the words quoted are primary those of the translator. I am not certain that "et al." is necessary but I will leave it be.
It is my understanding from Wiktionary:Quotations that tr. is to be used in place of "translator". If that is incorrect then great! I do not believe we need to abbreviate anything here. However, you should update the policy page if you do not want to revert your change above. I cannot send this into a vote claiming it is the current standard if it is not in fact.
I have also edited the last quote since "editor", which I introduced, is incorrect. And the definition line is not the subject of this vote, so it should be uniform across all proposals. DAVilla 21:18, 8 April 2007 (UTC)
Does the third source need a comma at the end? DAVilla 21:38, 8 April 2007 (UTC)
Yes. Added. I interpreted the note re tr. abbreviation as meaning that it was permitted rather than required. I think there should be some limited personalisation allowed in such matters. I have modified Wiktionary:Quotations in line with this, and we'll see if it is reverted — I'm not aware of anyone here who really likes abbreviations, so I think it's uncontentious even when the subject is under discussion. ducks just in case (I also added the use of et al. which, as you politely hinted, was not previously there.) --Enginear 18:14, 10 April 2007 (UTC)

Proposal by DAVilla

  1. A transistor radio.
    • Our grandson owns a radio, but he’d like a transistor.
    • 1968, I. I. Taslitt et al. (translators), Ruth Bondy, Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory, Sabra Books (New York), →ISBN, page 25
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), →ISBN, page 407
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25, Lon Simmons, Liane Hansen (interviewer), “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio [17]audio
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

As above, but use a bullet for the example, put the date all together, and leave off the trailing comma. DAVilla 18:27, 14 April 2007 (UTC)

Second proposal by DAVilla

  1. A transistor radio.
    • Our grandson owns a radio, but he’d like a transistor.
    • 1968, I. I. Taslitt et al. (translators), Mission Survival, The people of Israel’s story in their own words: from the threat of annihilation to miraculous victory, Ruth Bondy (author), Sabra Books, New York, →ISBN, page 25
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), →ISBN, page 407
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25, Lon Simmons, “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, Liane Hansen (interviewer), National Public Radio
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

As per first proposal, being more careful with the subtitle, experimenting with the links, leaving off "audio" (which honestly I've never seen before), and using a little less italics. Also put off people who did not use these exact words (the author in another language, the interviewer) to information about the source. DAVilla 18:33, 14 April 2007 (UTC)

Third proposal by DAVilla

  1. A transistor radio.
    • Our grandson owns a radio, but he’d like a transistor.
    • 1968, I. I. Taslitt et al. (translators), Mission Survival, The people of Israel’s story in their own words: from the threat of annihilation to miraculous victory, page 25, Ruth Bondy (author), Sabra Books, New York, →ISBN °
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, page 407, Viking Press, Penguin Classics (2003), →ISBN °
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25, Lon Simmons, “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, Liane Hansen (interviewer), National Public Radio °
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

Still under expeirmentation. Does anyone have a better suggestion for a link symbol? Please, try it out above. DAVilla 08:59, 11 April 2007 (UTC)

Proposal by BD2412

My proposal is this. No matter what configuration we end up using for quotes, we should make a template with parameters for the parts that we use (year, author name, title of work, text of quote, whatever else), so we don't have to go changing thousands of quotes around if we decide to change up the config in the future. Cheers! bd2412 T 03:51, 11 April 2007 (UTC)

  • (unless there already is one, and I just haven't noticed it). bd2412 T 03:52, 11 April 2007 (UTC)

Word. Indeed, we can have a few different templates, one for each of the major common kinds of sources (normal-books, Usenet messages, periodicals, compilations, anything else?) —RuakhTALK 04:18, 11 April 2007 (UTC)

No. Absolutely not. This will not work. There are simply too many variable pieces of information that would (or might) have to be considered. Even for just books, one has to consider: edited books, books of collected stories, books translated into English from another language, revisions of books, books with multiple authors, books with separate section titles...and leave in the options for linking important persons to Wikipedia and appropriate works to Wikisource. This doesn't even consider plays, journal articles, DVD subtitles, scriptural texts, speeches, television programs, poetry, and the myriad other forms we regularly cite for quotations. There is too much to bind in a single template, and too many possible templates for me to want to have to figure out which (if any) existing template will do what I need it to. --EncycloPetey 04:23, 11 April 2007 (UTC)
I disagree. Most of those cases don't need to be handled separately. To wit (sticking only with the cases you mentioned — feel free to bring up others):
  • edited books ← all books are edited. The editor is only relevant if it's a compilation of some sort, in which case the individual work has an author and the whole collection has an editor. I already listed compilations as warranting their own template.
  • books of collected stories ← this is a kind of compilation, and I don't see that it needs separate treatment.
  • books translated into English from another language ← all we need are optional "translator" and "sourcelang" parameters in the various templates.
  • revisions of books ← edition information is standard. I don't see how that's a complication.
  • books with multiple authors ← no one will object if the "author" parameter lists multiple authors.
  • books with separate section titles ← this is only relevant if the different sections are written by different people, in which case this is a compilation. We don't currently note section titles anywhere, do we?
  • links to other Wikimedia projects ← nothing is preventing this. No one will object if the "author" or "title" parameter contains a link to Wikipedia or Wikisource.
  • plays ← generally we'll cite print copies, no? And there's no reason we can't support "act" and "scene" and even "line" parameters in addition to "page" parameters.
  • journal articles ← journals are periodicals. I mentioned those.
  • DVD subtitles ← sorry, I don't know how these work. Worse come to worst, things that aren't worth templatizing can always be given a Category:Templateless citations or something so we can keep this sort of thing organized, at least.
  • Scriptural texts ← Good point; these probably warrant their own template. Don't worry, such a template would get plenty of use in entries on Ancient Greek and Hebrew words.
  • speeches ← in what form is it durably archived? In a compilation? We'll have templates.
  • television programs ← we might be able to treat these the same way as periodicals, I'm not sure; I guess it depends on the details.
  • poetry ← in what form is it published? As a book? In a compilation? In a magazine? We'll have templates for each of those.
(I can't guarantee this will work perfectly, but I don't understand how you can say "absolutely not" at this point.)
RuakhTALK 05:44, 11 April 2007 (UTC)
Well, at the very least how about a template for the bread-and-butter typical Google Books result, a quote from a book with an author, a title, a year, and a page number! bd2412 T 21:14, 15 April 2007 (UTC)
The general case is very complex. I wouldn't mind a simple case if you personally think it would be useful to at least yourself, provided it's substituted, or at least substitutable e.g. by AutoFormat. The reason is that templates can turn off new editors who just need to make a minor change, e.g. if Google Books says author but it's an editor as per this example. (Heck, even I haven't found my way around the POS templates.) That sort of flexibility isn't so keen with a context label, say, and not so common with a POS heading, but probably the majority when it comes to quotations. With subst: the format couldn't be changed instantly if needed, and anyways monkey-see monkey-do is going to win out in the end, so a substituted template is very different from your original proposal. DAVilla 06:25, 21 April 2007 (UTC)

Since this proposal does not list the example sentence and three quotations provided, it could not be included as a candidate in the upcoming vote. It doesn't appear intended for that anyway. If this proposal would be valid "no matter what configuration we end up using for quotes" then it is independent of that formatting issue. BD2412, if you agree then I would like to voice my own opinions on the above. DAVilla 08:57, 11 April 2007 (UTC)

I have no objection. Cheers! bd2412 T 21:14, 15 April 2007 (UTC)

Proposal by Ruakh

My proposal is mostly like it is now, but with periods to separate things more clearly:

  1. A transistor radio.
    Our grandson owns a radio, but he’d like a transistor.
    • 1968. I. I. Taslitt et al. (translators), Ruth Bondy. Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory. Sabra Books (New York), 1968; →ISBN. Page 25.
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973. Iris Murdoch. The Black Prince. Viking Press, Penguin Classics, 2003; →ISBN. Page 407.
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25. Lon Simmons, Liane Hansen (interviewer). “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday. National Public Radio. audio link
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

Fourth proposal by DAVilla

  1. A transistor radio.
    • Our grandson owns a radio, but he’d like a transistor.
    • 1968: I. I. Taslitt et al. (translators), Ruth Bondy (author). Mission Survival, The people of Israel’s story in their own words: from the threat of annihilation to miraculous victory. Sabra Books, New York, →ISBN, page 25
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973: Iris Murdoch. The Black Prince. Viking Press. Penguin Classics (2003), →ISBN, page 407
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25: Lon Simmons, Liane Hansen (interviewer). “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio [18]
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

As above but with a bulleted example and a bunch of tweaks. Periods might help make the distinction between different versions more clear, so I've added one between Viking Press and Penguin Classics, but I've removed the one before NPR since Weekend Edition is wholely a part of it. Still not sure how to handle the audio. Anything you object to? DAVilla 19:00, 14 April 2007 (UTC)

Proposal by Enginear

  1. A transistor radio.
    Our grandson owns a radio, but he’d like a transistor.
    • 1968 (US), I. I. Taslitt et al. (translators), Ruth Bondy, Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory, Sabra Books (New York), →ISBN, page 25,
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973 (UK), Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), →ISBN, page 407,
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 (US), Lon Simmons, Liane Hansen (interviewer), “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio (July 25)audio,
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

This is identical to my interpretation of the current style, except for the addition of the country of first publication of the cite (in the current language), immediately after the date. The date, country and author (or translator) are the most important features, so are put first. The country is important for charting the development of usage of the word with the defined meaning. It is usually simple to determine, but not always. For example, the first edition of the second book (in 1973) was by Chatto & Windus, which was at the time a British company, though now owned by a US corporation. I have trialled this version at a few "real" words, eg mardi gras (French), half and half and ned.

The page no. is put after the publisher and date, since it relates to a particular edition, not to the text in general. The second book has been linked to its Wikipedia entry, but if it were available on Wikisource or Gutenberg, that would take precedence.

I assume that the quotes in the current example were chosen to illustrate different features. However, IMO, the first cite is imperfect because only a snippet is available on the net. Similarly, audio (the last one) is technically substandard because the spelling cannot be checked. Personally, I would try to find better ones, although for what I assume is now an archaicism, and rare compared with other uses of the word, I might not succeed. --Enginear 17:13, 14 April 2007 (UTC)

Umm.. you do realize that the {{UK}} and {{US}} tags mark words or definitions that are restricted to a particular region? They are therefore inappropriate for marking quotations. Also keep in mind that we're discussion how to format quotations that are interspersed with definitions. These will tend to be short lists, so identifying the country of publication is not especially relevant or useful. Such identification only becomes significant for longer lists, which are typically transferred to a /Citations page. I'm not sure how or where I would want to see the country (or region) information, but I'm sure it would be only an addiitonal burden for short lists of quotes placed among the definitions. I'm also dubious about giving the nation of publication such significant standing. Suppose the earliest publication of some of Ghandi's writings is in the UK. That's a politically charged issue I'd rather not open. Suppose the earliest publication of T. S. Eliot (born American) is in the UK. Eliot spent his first 25 years of life in America, and his poetry will reflect that. How would marking quotations from his poems as "UK" tell us anything other than the place of publication? I really don't see the publication information as so important that it must be placed up front next to the date. Let's stick with more traditional layouts like all other publishers have chosen to folow. --EncycloPetey 17:59, 14 April 2007 (UTC)
{{label}} is in the works and will mirror the context lables without using categories. Although unlikely, it would be possible to redefine certain templates such as {{US}} and {{UK}} to default as labels rather than context labels, therefore requiring {{context|US}} and {{context|UK}} on definition lines. More likely, {{label|US}} or {{label|UK}} could be written in above, or {{italbrac}} could be used more directly, or simply the literal desired styling. DAVilla 19:11, 14 April 2007 (UTC)
To answer EP:
  • No I didn't know that the usage of {{UK}} etc, and nor is the fact stated either on the relevant pages or their (blank) Talk pages, unless one is expected to know that they must be used in accordance with the Categories that are noted on their pages. I understood that categories usually noted that the head word is sometimes used in accordance with the category, rather than required to be used in that way, see eg, et, which is only colloquial in one of its several languages. But I've now altered the examples above to use {{italbrac}}.
  • I disagree with you re the usefulness of regional information for intersrersed cites, partly because <soapbox> I dislike the use of /Citations pages and would rather that, in the future, longer lists of interspersed cites could be collapsed, as is done for Translations. It seems wrong that long lists of cites, which demonstrate usage in detail, should be on a separate page (divided only by glosses) where the full details of the definitions are not available.</soapbox> There are many cases where a particular meaning of a word starts in one area and later spreads to others which have previously used the word in a conflicting way. Billion and trillion spring to mind. Sometimes a particular usage of a word dies out in one area while remaining in another. This information is important.
  • I use the same facts you mention to reach the opposite conclusion. The words one uses, and the meanings one intends for them, depend markedly both on one's ideolect and one's intended audience. One may use jargon or obscure language to people who will understand it. If multi-lingual, one will choose a language to suit. We generally avoid on this site suggesting that we table anything. I messed up a few weeks ago by forgetting we were international, and referring to a gas fueled camping stove, as if I was dealing only with a UK audience. Therefore both author and intended audience are important in understanding why a word is used in a particular meaning. The nearest we can easily get to the intended audience is usually to know the publisher. I believe that the country of publication, which is often not obvious from the publisher's name, and which is also an indication of the likely nationality of a book's editor, is worthy of separate mention.
  • TS Eliot is indeed a person worthy of discussion in this context. His earliest work was first published in US, and much of his later work first published in UK. In 1920, he published two books of poetry, one in US & one in UK (where he had by then lived for five years). The contents differed. I suggest that he and/or his publishers were aware of differing audiences. I suggest it is therefore reasonable to tag words only in the first as US and only in the second with UK. The 75% published in both books was presumably considered OK for both audiences, so presumably the "more important" cite would be chosen, for example if it marked a time when a word common in US was first used in UK, the UK cite would be given, and vice versa. It's the work, not the author, which is being tagged.
  • I don't see that it is particularly "politically charged" to note where an author has a particular work first published. In the case of Gandhi, it could have been during in Britain during his English education or later visits, during his time in British-ruled South Africa, where he first rose to prominence as an activist (or should I say passivist), or in British-ruled India (as I think it actually was, in 1917). It does not change his nature, nor his success at improving all three countries. --Enginear 16:17, 15 April 2007 (UTC)

Proposal by

Think again

This section is really putting the cart before the horse. As BD2412 pointed out, the simplest way to address this problem is to always use templates for citations. Then if we want to change the house citation style (italicisations, commas, colons, and whatnot), we simply change the template. And yes, we do already have templates for this very purpose. See Template:cite book, for example. Use them! Uncle G 11:50, 24 April 2007 (UTC)

Template talk:cite book My God! I'd rather put the cart before the horse than tie the horse to a grounded barge. That thing is as far from being substitutable as a two ton boar is from being Kosher. And regardless of which goes first, or if there even is a horse, we're still going to have to design the cart. DAVilla 15:12, 24 April 2007 (UTC)

Well, no one has seconded any proposal, so it's dead in the water. DAVilla 20:19, 29 April 2007 (UTC)

nonstandard & illiterate

Hi, would anyone object if I reclassify words classified as "illiterate" as "nonstandard"? The two classifications appear to be identical, save that "nonstandard" is kinder and less controversial. Language Lover 23:55, 7 April 2007 (UTC)

Personally I like the category. It makes me feel better about myself to think that Arthur Conan Doyle is not "undefeatable", that I am "unexceeded" by illiterates like Clark Ashton Smith, "unexcelled" by L. Frank Baum, Benjamin Harrison, or Grover Cleveland. DAVilla 01:01, 8 April 2007 (UTC)
I'm astounded to learn that prominent and prolific authors and speakers are never given license to deviate from accepted norms. Is that what you are suggesting? --Connel MacKenzie 16:35, 10 April 2007 (UTC)
I'd object. To me, those two terms are not synonymous. Y'all is "non-standard", in that it's not part of any dialect of Standard English; but irregardless is "illiterate", in that no one who uses it can really be considered an English speaker. (O.K., that's an exaggeration; but you know what I mean.) Nonetheless, I really don't like the term "illiterate", because it makes it sound like people who use the term are illiterate, which is needlessly provocative; I'd prefer a name like Category:Malapropism or something, which describes the term instead of its users. —RuakhTALK 05:33, 8 April 2007 (UTC)
On further reflection, it seems we have (a) certain editor(s) who is/are a bit trigger-happy in using that tag. Absent any objective way of distinguishing non-standard forms from "illiterate" ones, I think we'll have to make do and let the "non-standard" tag do double duty. —RuakhTALK 20:55, 11 April 2007 (UTC)
Jesus christ. We have an "illiterate" category? And us two is in it?!? There is no hope for Wiktionary. --Ptcamn 13:40, 8 April 2007 (UTC)
"us two" is good enough for Charles Dickens and Sir Arthur Conan Doyle. Well known illiterates. Robert Ullmann 18:25, 8 April 2007 (UTC)
And I've just noticed Template:smarter. I quit. --Ptcamn 13:58, 8 April 2007 (UTC)
Yeah, someone needs to be informed that "us two" is perfectly acceptable as an object. But I guess it's just easier to classify it as a "misspelling". Want to look smarter? Try using "misconstruction" instead. ;-) Personally I would delete on the grounds that it's not idiomatic. "Us" is commonly used, incorrectly of course, as a subject whenever other words, whether they be "two" or what have you, obscure that fact.
Don't give up only because of Template:smarter, which is relatively new and not widely used. I only ran across from this thread. It has four inclusions at the moment, only one of them grossly incorrect. On the other hand, 75% isn't a very good score, is it? DAVilla 14:10, 8 April 2007 (UTC)
I would argue that "us" as a subject is not incorrect in any objective sense. There are certainly people who regard it as incorrect, and we should note that, but there's a very big difference between frowned upon and being somehow intrinsically wrong.
Template:smarter isn't that bad, but it's the straw that broke the camel's back. There seems to be too many people here who are more interested in telling people how to speak than informing them about words. (And if you think what people need to be informed about is how to use words correctly, ever heard of covert prestige?) --Ptcamn 14:37, 8 April 2007 (UTC)
I created Template:smarter as an attempt to replace the illiterate tag and still please Connel MacKenzie. It took some prodding, but I managed to get him to articulate why he dislikes undefeatable (see "Undefeatable :)" at his talk page), and the reason is that he thinks "invincible" sounds smarter. So, I thought, we can convey that in a less derogatory manner. But he insisted on keeping the illiterate tag nevertheless. If it were up to me, we wouldn't have put any prescriptivism at undefeatable, but after the soap opera gone down on Connel's talk page over it, I think I'm not passionate enough about it to press the point :P Language Lover 15:10, 8 April 2007 (UTC)
(Linked the user page above. DAVilla 18:24, 8 April 2007 (UTC))
Wow. Having read his talk page, it seems like Connel's pot has cracked. The OED isn't a real dictionary now? Crazy.
I'm removing both Template:smarter ("invincible" is not exactly synonymous anyway) and the illiterate tag from undefeatable. If anyone doesn't like it, cite a source. Verifiability not truth and all that. --Ptcamn 15:38, 8 April 2007 (UTC)
I'm over my head in this all, having never studied the subtle differences between UK English and US English (and other dialects), but one thing Connel brought up near the bottom of that discussion, was the possibility that undefeatable might be {{US}} or {{UK}} (I'm not sure which he meant since I don't know where he's from, he just said "across the Atlantic"). Never having rigorously studied English, I defer all judgement on the word to those who have, and I look forward as always to learning awesome new things from everyone :-) Connel's intentions are good, regardless of the specifics of individual words, and since I've done some crazy things in the past myself, I'd really hesitate to call anyone crazy :) Language Lover 17:18, 8 April 2007 (UTC)
OED is a major dictionary by any means, and we all know the entry is there. Keffy claims that Webster's Third and Random House Unabridged also list it, and I take him at his word. Though I can't access it, M-W online does seem to have an entry. So the removal of "illiterate" on that page was entirely justified in my opinion. This for a word that an admin had considered deleting on sight, and thankfully didn't. DAVilla 18:24, 8 April 2007 (UTC)
I have an observation here: I grew up in the US, with quite a bit of exposure to British television (The Avengers to Monty Python to The Saint etc.) and literature (everything). I now live in Kenya, in the opposite circumstance; everyday English is Commonwealth English, and I only hear pure GenAM on television (Boston Legal ;-). I'd use "undefeatable" in UK/Commonwealth English, and it sounds right, but at the same time it sounds wrong in US English (should be "invincible" or some other.) I don't see the distinction as very strong though; but then perhaps I wouldn't. Robert Ullmann 18:51, 8 April 2007 (UTC)
It's not a very common word in UK English either. That doesn't mean it's wrong; it's just fairly rare, on either side of the Atlantic. Widsith 21:48, 8 April 2007 (UTC)
Both terms sound fine to me, though I wouldn't expect to hear them under the same circumstances. An invading army of super-villains might be described as invincible (just before the heroes show up and defeat them), while a sports team with a long string of spectacular successes in a season might be termed undefeatable (or undefeated). --EncycloPetey 23:28, 9 April 2007 (UTC)
In what region is that acceptable? "Undefeated" is very commonly used, but saying a team is "undefeatable" is wrong. --Connel MacKenzie 16:39, 10 April 2007 (UTC)
Who says it's wrong? Have you seen this in a style guide somewhere? Because as long as it's in dictionaries it doesn't seem very wrong to me. Do you have any references to back it up? Widsith 13:39, 11 April 2007 (UTC)
That is the rub; it is absent in all normal "abridged" English dictionaries. That's what started all this, in the first place. --Connel MacKenzie 03:29, 12 April 2007 (UTC)
But...abridged dictionaries by definition do not include every word, only the more common ones. No one is saying undefeatable is common, just that it's not wrong. Widsith 14:22, 12 April 2007 (UTC)
So even though it is abnormal (erroneous) in American English, it can't be marked as such, because the possibility exists that someone misusing it in British English might be understood? --Connel MacKenzie 06:22, 6 May 2007 (UTC)

Wiktionary:Alternative spellings

Language Lover put together some text on this long-debated topic, which I moved in order to begin a policy page on the matter. It's about time we set some of our ideas down as policy since the issue is raised quite regularly in various fora. The name for the new page follows the style of Wiktionary:Pronunciation and Wiktionary:Etymology in using the standard header form from the ELE. --EncycloPetey 23:32, 9 April 2007 (UTC)

I think it should be changed to "Alturnitave spelengs". Ha, joking. bd2412 T 23:38, 9 April 2007 (UTC)
I'd rather that the more common spelling always be the main entry, except in cases of regional spellings. (I'd actually be O.K. with always having the more common spelling be the main entry, even in cases of regional spellings, but I know that people from certain regions would object to that, so whatever.) —RuakhTALK 05:00, 10 April 2007 (UTC)
Good start, Language Lover. I think this really could develop into some kind of offical status. I have a few nit-picks (which I hope don't detract from the fact that overall, I like the initiative you have taken) here:
  1. The comment about "saving disk space" seems simply wrong. The only rationale I've heard for the "alt" designation is to make them easier to keep in sync. That should be reworded before people start screaming "wiki is not paper."
  2. When both terms are obscure, the "alt" entry should have a gloss (on the same line.) There should perhaps be some note, that "alt" designation can be suplemented with a full definition at any time and is generally encouraged.
  3. "One hundred years" is the limit we've used for "obsolete," not two hundred (but even that has recieved lots of objection from people who want it to be 50 years or 25 years.)
  4. For "leet", instead of saying "in some rare cases", I think it should say "for the five that have entered the general lexicon," so people understand that no new leet inventions are welcome.
  5. Has attestation been an issue for regional spellings? Rather than emphasize WT:CFI, it might be reworded to emphasize the different regional tags we are supposed to use (since that seems to be the more ambiguous distinction, from time to time.) As a formatting note, because the regional tags also add categories, there should be mention of when to use the tag, and when to simply list it in italics.
Nice start! --Connel MacKenzie 15:02, 10 April 2007 (UTC)
Great observations, I took the liberty of adding some of them to the entry. Thanks for correcting me about the obsolete designation. By the way, the idea for this entry comes more from HippieTrail than myself. I agree about the tags, since those should help people who are designing third party software that uses Wiktionary. Thanks for sharing your senior wisdom here Connel! Language Lover 01:58, 11 April 2007 (UTC)

Incidentally, I think there should be a fourth category, variant or alternative form or something, for when a word has multiple spellings that are pronounced differently, but that are still clearly the same word. One region-based example of this is AmE negotiation (with two [ʃ]s) -slash- BrE negociation (with one [s] and one [ʃ]); one non-region-based example is pescatarian (with a [sk]) -slash- pescetarian (with a [s]). —RuakhTALK 16:14, 11 April 2007 (UTC)

"Listed as a spelling error by..."

Given the enormous bias in the "illiterate" section above, I've simply started a new section.

Currently, Wiktionary has no clean way of identifying words that all/most/some spellcheckers list as errors. Since I am under direct attack for applying what I think are reasonable tags to such things, I'd like to take a step back, and ask this community what the best approach might be.

Some of en.wiktionary's entries have "Dictionary notes" sections. While that one section has not had any reasonable criticism offered against it and has had considerable support, the occurrences of that section often are covertly removed, inexplicably.

But even then, the lack of a dictionary listing is the sort of thing that can change from revision to revision. Yet at the same time, a "spellchecker" is not a "dictionary" and so can't/shouldn't be listed the same way, anyhow. On the other hand, perhaps regular contributors here would like to see both sections, for all terms where a spelling is an obvious mistake?

I'd like to see a section, in Wiktionary entries, that lists which spellcheck programs list a term as a misspelling. Since finding the localization may be difficult for some, it would make sense to be very explicit in what the format should look like, eventually. As a secondary concern, of course, is which spellcheckers are deemed "widespread" enough.

Of important note:

Please let me be absolutely crystal clear, that I am not proposing any WT:CFI change: only the very most common misspellings meet WT:CFI currently. I am only talking about entries that clearly are garbage are overwhelmingly considered errors, yet easily pass our WT:CFI (of which I grumble about so often.)

So, would adding a ===Listed as misspelling=== section work? If so, what format would you prefer? --15:21, 10 April 2007 (UTC) This comment was typo'd as three tildes instead of four. Sorry...here are four tildes: --Connel MacKenzie 15:41, 10 April 2007 (UTC)

We want to avoid propogating new section headers when possible. Considering that, I can see two possibilities. Either we can include the inforation in Usage notes, or we can use a general header ===Spellings===, which would be more flexible and allow for other sorts of spelling information to be included in the section, such as Alternative spellings and Hyphenation. It would also allow for a better place to include alternative script information, such as for Serbian entries where there is a standard spelling in both Roman latters and Cyrillic letters. --EncycloPetey 15:33, 10 April 2007 (UTC)
Adding new stuff in ===Spellings===? For misspellings? I could see listing these as a sub-section of ===References===, if that's what you mean? --Connel MacKenzie 15:43, 10 April 2007 (UTC)
I'm thinking in terms of a much bigger picture than just the one issue, but I see your point. My concept could accomodate a list of spellings "listed as misspellings" with references keyed to the References section. --EncycloPetey 15:48, 10 April 2007 (UTC)
But that is addressed by the =alt spellings= header elsewhere. Which speaks not at all, to the entry itself.

The thing is, certain pinheads think every random sequence of characters put together, merits inclusion. Our WT:CFI generally supports such idiocy, despite the fact that a usable dictionary will clearly tell someone when they are misspelling, misusing or mis-constructing a term. What else is a dictionary used for, after all? Obscure linguistic research? That's what makes the OED not be a true dictionary; it isn't useful to any, but for the most bizarre "descriptivist" questions.

Many terms entered here are pure garbage. But garbage that passes WT:CFI. How should the garbage be labelled? --Connel MacKenzie 16:11, 10 April 2007 (UTC)

Please don't insult contributors with pejoratives like pinhead, since we are all judged by how we speak. Anyhow, I think you're way off saying this, can you cite even one example of a regular (non anon) editor who "thinks every random sequence of characters put together, merits inclusion"? I'm probably pretty radically far leftist when it comes to inclusion, but I still carefully research every word I submit or defend. Most are way less liberal than I. The overwhelming majority of our editors do a fantastic job of researching words :-) Language Lover 18:47, 10 April 2007 (UTC)
There are a few different kinds of garbage, and we need to take care to distinguish them. If something is genuinely a (common) misspelling — a (common) non-standard spelling of a word that does have a standard spelling — then the appropriate way to handle it is to define it as "Common misspelling of foo." If it's actually a completely non-standard word that meets CFI, and an editor encountering it would be best advised to replace it with a completely different word (or phrase), then the appropriate way to handle it is to define it normally, but to mark it with {{context|non-standard}} (or {{context|obsolete}} or whatnot, if it's an issue of a formerly standard word), and provide a usage note that explains the situation.
Of course, that's if everyone agrees it's a non-standard spelling or non-standard word. If it's arguably a common misspelling, then we should define it as "Common misspelling or alternate spelling of foo", and if it's arguably a non-standard word, then we should tag its definition with {{context|possibly non-standard}} and provide a usage note that explains the situation.
RuakhTALK 16:46, 10 April 2007 (UTC)
That is very "descriptivistic" and useless to say. If 9 our of 10 spellecheckers list something as a misspelling, to then call it "possibly non-standard" is inaccurate. That's why, rather than someone taking my word that something is "illiterate", we should simply list what resources do list it as wrong. --Connel MacKenzie 16:57, 10 April 2007 (UTC)
You realize that your argument contradicts your conclusion? If you think that consensus can never be reached on any word, then no words can be labeled non-standard; instead, all such words should be labeled possibly non-standard with usage notes explaining who says so. —RuakhTALK 18:28, 10 April 2007 (UTC)

This is some good discussion :-) I just wanted to wonder aloud, why are we fussing about spellcheckers? A spellchecker is never intended to be a flawless arbiter of English, rather it is intended to catch typos; anyone who uses one very frequently knows it's not at all uncommon to get false positives (positive meaning "misspelling") from them. Especially anyone who writes anything remotely technical or context-specific. What's more, the English language is not a fixed, crystallized thing, else anyone could just pick up the original Beowulf and start reading. Spellcheckers can be expected to usually be a few years behind the actual language as used and spoken by people with a pulse. We are not the Borg, nor is this the 10th Edition Newspeak Dictionary! :-D By the way, don't read this as an attack against you, Connel, I totally love all the sweet work you do, so please don't accuse me of attacking you! :) Language Lover 18:47, 10 April 2007 (UTC)

Do spellcheckers even list misspellings? It was my understanding that they only list the correct spellings, with misspellings determined by absence from that list. --Ptcamn 18:40, 10 April 2007 (UTC)

Some of them do, yes. In MS word, a number of common misspellings are programmed in so that spelling is checked while you type. This can be frustrating when the word segreant is unhelpfully "fixed" to become sergeant while typing. --EncycloPetey 18:43, 10 April 2007 (UTC)
Regarding segreant; why would you expect such a rare term to not be listed as a probable typo? --Connel MacKenzie 13:48, 11 April 2007 (UTC)
Absolutely it should be considered a probable typo. The problem is the *expletive* program (this is Microsoft we're talking about here) decides it knows what you really meant, and happily changes it for you. Without your consent. Without even informing you, in fact. And these are the default settings. And this is the standard editing application, on the standard operating system, and none of that is going to change any time soon. DAVilla 23:55, 11 April 2007 (UTC)
I do not use that program regularly, but am familiar with many deficiences of that "auto-correct" feature. If you use it frequently, you have my sympathy. (Last time I checked, it still "auto-corrected" both "Connel" and "MacKenzie" to incorrect spellings.) --Connel MacKenzie 03:36, 12 April 2007 (UTC)
Yes, spellcheckers function by "stoplists"; otherwise all checks would be on the order of magnitude of N**length instead of N. From what I've looked at, most use sevreal types of stoplists. Vlad, I know you'd like to think that English has no rules at all, but the fact remains that there are rules for word formation and spelling. Being purely descriptive, ignoring those rules, is a massive disservice to anyone that wants to actually use content created here.
That is why, almost every day, I repeat that WT:CFI is broken. With very few exceptions, all the words nominated on WT:RFV have had problems in one spellchecker or another.
So, these week, I've decided to take my "fussing about dictionaries" and rephrase the complaint in terms of "fussing about spellcheckers" to try to reach some certain pinheads that call me crazy. Pure "descriptivism" is "crazy", not pure "prescriptivism." At least the limits of prescribing spellings, is essentially known. On the other hand, there is no limit to the stupidity one can find when searching usenet archives in the name of descriptivism. <insert "infinite" Einstein quote here>
Yes, I understand that it is "easier" to be lazy, and take only a "descriptive" approach. But, sorry, the English language does have rules. And again (as I always seem to have to repeat in different ways) a resource with a low signal to noise ratio cannot be useful.
The new stats have been encouraging. I honestly thought the signal to noise ratio had already fallen below 1.00. Evidently, such cries of imminent demise are still early.
--Connel MacKenzie 13:48, 11 April 2007 (UTC)
Language rules are created based on the language people speak. If, overnight, 99% of English speakers suddenly decide to spell night as "nite", the rules will change to accomodate this. The "rules" are really more of a model, like Newtonian physics is a model of the true universe. Descriptivism isn't the lazy route at all! If we wanted a pure prescriptivist route, we could just stop all further development right now and say, "these are the English words, anything else is wrong". Incidentally, your objections to "undefeatable" go more against the rules than for the rules: very simply, there are rules which say you can add "able" to a transitive verb to get an adjective, and add "un" to any adjective where it is phonologically acceptable and doesn't have a particularly unusual etymology. You are in essence saying undefeatable is an exception to those rules. Well, maybe you're right, (I don't know myself), but I think very few editors so far have been convinced. (comment continues below)
"Stop right now?" Um, how do you reach that conclusion? We didn't start out with complete coverage of the English language, nor have we done more than covered a fraction of what most dictionaries have for "standard English." No, descriptivism is the cheap way out; to look up the prescriptive rules for correct use in each case is much harder. To properly cross reference all misuses with accurate indication of why they are wrong is much harder. --Connel MacKenzie 03:36, 12 April 2007 (UTC)
Impossible in fact: there are no reasons, it's all convention. Widsith 14:20, 12 April 2007 (UTC)
There are no rules for spelling or word formation in English? Wow! Learn something new every day. --Connel MacKenzie 06:30, 6 May 2007 (UTC)
(comment continued from above) The purpose of language rules is to simplify things so that people don't have to rigorously research every word the first time they use it. But we are in the business of doing such research, so to us the rules are just a guideline (which usually applies, but not without exceptions). Let us embrace the everchanging nature of language and rejoice in the beauty thereof :-) If we don't, it'll just change anyway and we'll be left yelling at those damn kids to get off our lawn! ;) Language Lover 16:17, 11 April 2007 (UTC)

Spellcheckers are relatively small dictionaries. If they were the basis for what we should include, we wouldn't have bothered creating Wiktionary in the first place, we'd just use an actual spellchecker. I have no problem labelling misspellings as such (though personally I'd just leave them out altogether), but I can't help feeling that Connel's idea of a misspelling is any word he's never heard of that gets a red wiggly line under it in MS Word. Sadly — not sadly! — there are many many perfectly valid words which fall under those criteria and which are neither garbage nor added by pinheads hell-bent on corrupting the rules of our language. If authors took out every word not recognised by their word processors, then Ulysses would read like The DaVinci Code. Widsith 14:02, 11 April 2007 (UTC)

Who says they are words I've never heard of?
Who says there is a GFDL spellchecker? There certainly wasn't when I discovered Wiktionary.
Please note clearly the "pinhead" comment was about the previous insult.
I've seen one person claim that one British source dictates that "-able" can be added willy-nilly. I do not believe that to be true in the general case, in en-US, nor in the specific exception case of "undefeatable".
Just how many years separate "Ulysses" and "The DaVinci Code"?
--Connel MacKenzie 16:24, 11 April 2007 (UTC)

How has Wikimedia Changed your Life?

This message is being crossposted around village pumps and mailing lists - apologies if you receive it more than once!
Have any of the Wikimedia projects had an effect on you in real life, or do you know of someone, or some group of people, who use our projects in real life? If so, we want to hear from you at m:Success Stories - How has Wikimedia Changed your Life?. The hope is that this page can become somewhere to which we can point members of the press so that they can immediately get an idea of the usefulness of our projects. Please, take a look, and add your stories! Martinp23 16:02, 10 April 2007 (UTC)

By editing wiktionary, I've developed super powers and ninja skills. I can now turn invisible, fly, and turn back time by shoving the Earth the opposite way around its axis. Everyone who wants super powers of their own, should come define some words for us!!! :D Language Lover 18:25, 10 April 2007 (UTC)
Haha - I knew there'd be at least one ninja :D Martinp23 18:31, 10 April 2007 (UTC)
Now I feel ripped off... - [The]DaveRoss 02:00, 11 April 2007 (UTC)

Proposed order for "see also" template

Some of the terms which use this template have a large number of terms stuck in there, enough so that I would like to propose the following guidelines on how to order terms in the template. I propose the following scheme:

{ {see|man|Man|MAN|man-|-man|mān|mán|mǎn|màn}} { {see|pan|Pan|PAN|pan-|Pan-|pān|pán|pǎn|pàn}} { {see|nu|NU|Nu|.nu|nú|nǔ|nù|nü|nǚ|nǜ}} { {see|cu|Ca|CA|ca.|.ca|ca'|cā|cǎ|cà|ça|çà|çã}} { {see|cu|Cu|CU|.cu|cū|cú|cǔ|cù|ĉu}}

The order of terms in the template should be:

  1. The basic uncapitalized and unadorned script.
  2. A variation that merely capitalizes the first letter of the term (e.g. Bush as a surname; Atheist as atheist in German - many German nouns being identical to English but capitalized).
  3. A variation that capitalizes all letters of the term (e.g. MOD).
  4. Variations that capitalize a letter other than the first letter (e.g. pH or more than one but not all letters of the term.
    Note that this order of precedence in capitalization should be the same for similar variations further down the list; so pan- comes immediately before Pan-, and so forth.
  5. Variations containing punctuation:
    1. A prefix followed by a hyphen (e.g. man-).
    2. A suffix preceded by a hyphen (e.g. -man).
    3. Followed by an period (e.g. co., Sun.).
    4. Preceded by a period (e.g. .co).
    5. With a period between letters (e.g. t.ex.).
    6. Followed by an apostrophe (e.g. ca').
    7. Preceded by an apostrophe (e.g. 'kay).
    8. An apostrophe between letters (e.g. c'est).
    9. All other terms incorporating punctuation, order to be determined as cases arise.
  6. Variations containing a diacritic over a single letter, with the earliest diacritized vowel listed first.
    1. I prefer to order diacritics with Chinese tonal order first (e.g. mān, mán, mǎn, màn) followed by umlauts, breves, tildes, carats, rings, cedillas, etc., in no particular order (though I think one should be set).
  7. Variations containing diacritics over two vowels (e.g. mêlée, résumé.
  8. Variations containing diacritics over three vowels (can't think of any).
  9. Variations containing a diacritic over (or under) a single consonant (e.g. ça, ĉu).
  10. Variations containing a diacritic over (or under) a consonant and a vowel (e.g. çà, çã).
  11. Variations containing multiple diacritics over a single letter (e.g. , ).

Once an order is settled on, it should be possible to instruct a bot on that order and have it fix all see also templates accordingly. Cheers! bd2412 T 23:31, 10 April 2007 (UTC)

Looks largely OK to me, but this isn't really a Grease Pit issue. There is not a technical problem to be addressed. --EncycloPetey 23:37, 10 April 2007 (UTC)
Hmmm... Beer Parlor? bd2412 T 23:46, 10 April 2007 (UTC)
One change I'd suggest to simplify this is to streamline the bit about diacriticals at the end. Order these first by number of diacriticals (1, 2, etc.), then by appearance order (diacriticals on first letter, second letter, etc). It makes the process needlessly complex to worry about whether the diacritical is associated with a vowel or a consonant. I would also expand the idea of a diacritical to include letter variants such as Polish slashed-L (Ł, ł). When there is a tie, run the order from top to bottom (over, through, under) the character in question. --EncycloPetey 23:53, 10 April 2007 (UTC)
So ĉu would come before, say, cù because the diacritic is on the first letter? But both would come before çà, which has multiple diacritics? bd2412 T 00:40, 11 April 2007 (UTC)
Yes. Though I'm not sure any of this will help the list much on pages like a. --EncycloPetey 00:44, 11 April 2007 (UTC)
Hence Appendix:Variations of 'a' (like a hammer to break the glass in case of emergency, we can trot out an appendix for terms with a truly monumenal number of forms; this includes all of the vowels, so take single-letter terms out of your thinking on this). Ok, I like it. bd2412 T 01:11, 11 April 2007 (UTC)
Although I do think Wiktionary needs to establish a precise order for alphabetization, which varies from language to language, I don't think this is such a crucial issue for see also sections at the top because of the appendix option for variations. (Incidentally there are several pages that currently need such appendices.) But I do think some general guidelines should be laid out, which would establish a partial order of the strings, to use the mathematically precise term. That is to say, as far as diacritics, letter variants (including ligatures), and that sort of thing go, let contributors use their best judgements. Bots would be able to insert words wherever the owner might best try to fit them, and if people want to rearrange them then the bot would not override that. However, the bot would be able to ensure something like:
Incidentally I'm not sure if I agree even on the order of captials and puctuation. What would be best would be to find a lot of examples, even short ones like compound words with optional spacing and hyphenation, determine which ones we agree on, and derive a partial order from that. DAVilla 08:25, 11 April 2007 (UTC)
I like the unrestricted format we have currently; I try to organize the entries in this navigation tool by order of frequency. The recent barrage of overloading the {{see}} template has been pretty silly. The purpose is to help people find the entry they actually are looking for, not to list irrelevant obscurities in an incoherent fashion. --Connel MacKenzie 03:43, 12 April 2007 (UTC)
Interesting point, but what if someone is actually looking for an obscure term (particularly one using diacritics that can't readily be typed into the search window)? If I'm looking for ĉu or or çã, I'd be inclined to type in cu or nu or ca, and would expect to either find them there or find a link to them. As for the order, I think we need some kind of normalization so that (eventually) the templates can be completely bot generated/updated. A bot should be smart enough to pick any given word (man, for example) and find every other word composed of the letter sequence "m" (cap or lowercase, with or without diacritics), "a" (same) and "n", with or without punctuation at any point, and to stick that word in the right place on the "see also" template on all other pages having that sequence. That's my thought and I'm sticking to it. Also, I think the "see also" template should handle up to twelve, and we should thereafter go to an appendix. Cheers! bd2412 T 04:20, 12 April 2007 (UTC)
On http://wiktionarydev.leuksman.com/, running Hippietrail's extension, they ARE fully automated. We need that code here. --Connel MacKenzie 06:07, 12 April 2007 (UTC)
I propose to simply have them sorted by a bot, that uses some or other built-in sorting algorithm. Easy, simple. H. (talk) 11:53, 12 April 2007 (UTC)
  • I used to put a lot of thought and work into sorting these. My ordering was much like the proposed one but I also sorted by type of diacritic as per which are most common in English words to which are most exotic to English speakers unaccustomed to other languages: á, à, ä, â, cedilla, macron, others.
  • But when I was developing the DidYouMean extension to automate the entire "disambiguation see also" process I realized this sorting would be too complex and slow to impose on mediawiki and I just settled for the natural Unicode order because that at least meant it would be fast and consistent. — Hippietrail 19:01, 13 April 2007 (UTC)
    Unicode order is fairly similar to this "more familar" first order: the diacritics (specific combinations) common to Western European languages are coded from 00A0 to 00FE (Latin-1), and then other characters less and less familiar. So this isn't bad at all (except that it may look fairly random in some cases ;-) Robert Ullmann 19:09, 13 April 2007 (UTC)
    Yet further proof that Wiktionary is Unicode's bitch.
    What about the double-letter stuff? Shouldn't that at least be put at the tail? DAVilla 18:13, 14 April 2007 (UTC)
    You mean two letters with diacritics and/or letters with two diacritics? I suppose the rare single letter with two diacritics has its own unique unicode combo, and should be ordered accordingly if that's what we use. As for two letters with diacritics, I would put them at the end and use the first diacriticized letter as the basis for sorting against any others. Cheers! bd2412 T 05:42, 17 April 2007 (UTC)
    No, I'm not talking about æae or the like. I mean r vs. rr in pero and perro, o vs. oo in good and god, etc. These should go at the end of the list regardless of how the rest are ordered. DAVilla 06:06, 21 April 2007 (UTC)

Outcome of rfd and rfv

Is there any way of marking terms as good after they have been nominated for deletion or verification and then show to be acceptable? Clearly, adding citations is one way, but maybe we need a note on a page (or against an individual definition) saying "This term has been confirmed as being good" or something like that. I ask because apparently asdf has been nominated for deletion for a third time. — Paul G 15:00, 11 April 2007 (UTC)

It's supposed to go on the talk page (see Category:Verification templates). It's just a pain to have to do, and automation has fallen through the gaps (see {{process}}). DAVilla 23:37, 11 April 2007 (UTC)
Extended discussion (more feedback really is needed): Wiktionary:Grease pit archive/2007/April#Better, more, faster archiving. --Connel MacKenzie 06:05, 12 April 2007 (UTC)

Wiktionary:About Persian

There is a new page at Wiktionary:About Persian about issues related to entries in Persian (Farsi), where any comments or ideas are welcome :-D Pistachio 00:07, 13 April 2007 (UTC)

Italian compound words

Recently someone made a perfectly valid request for fargli in Wiktionary:Requested articles:Italian. This word is a compound of the verb fare and the pronoun gli. There must be more than a million such compounds, and adding them would be a nightmare. However, you come across them all the time, and newcomers to the language don't always recognise them for what they are. If we were to add them, it would be the job of a bot (and I am considering applying for bot-status in order to add several thousand Italian verb forms).

  1. Do such words merit inclusion (and meet our CFI)?
  2. Would "Verb" be a reasonable part of speech?
  3. Would an explanation e.g. "Compound of the verb (...) and the pronoun (...)" be acceptable, as there isn't really an easy translation?
  4. What Category should they go in (we have "Italian verb forms")?

Thoughts please. SemperBlotto 14:36, 14 April 2007 (UTC)

Seeing as, by my understanding, we're already committed to supporting polypersonal agreement in languages like Georgian, we might as well support the limited polypersonal-agreement-like compounds in Italian, Spanish, and so on — especially in those languages where it can affect spelling (fare + glifargli and not *faregli, haciendo + lohaciéndolo and not *haciendolo, etc.). —RuakhTALK 15:17, 14 April 2007 (UTC)
P.S. Yes to questions 1–3. As to the category, I think Category:Italian verb-pronoun compounds would probably be clearest. —RuakhTALK 15:46, 14 April 2007 (UTC)
  1. Yes
  2. Yes
  3. Yes
  4. "Italian verb forms" seems as good as anything. Widsith 15:31, 14 April 2007 (UTC)

OK, I have made a first attempt at fargli, but won't add any more for the moment (unless requested). SemperBlotto 17:07, 14 April 2007 (UTC)

Yes, yes yes etc. But please use {{form of}} or create a derivative from it, such that the correct css is applied. And maybe re-read WT:ELE: Capitals please, #* before example lines, italicise examples but not their translations, do not bolden the translation etc. See my last change to the page. But please make the template yourself (as an ‘advanced’ speaker, you are more aware of special needs that might occur); I am willing to help if you don’t know how. H. (talk) 21:38, 15 April 2007 (UTC)
See Wiktionary:Votes/2006-12/form-of_style. H. (talk) 21:45, 15 April 2007 (UTC)


I've corrected some errors in WT:STATS on this pass; I had been counting "#:" and "#*" as definition lines previously. I refined the "form of" and "slang" detection to also notice those respective stopwords even if not formatted properly. So "English slang" jumped from 6,000 to well almost 18,000. Enjoy the new numbers! --Connel MacKenzie 16:30, 14 April 2007 (UTC)

Drama Sucks (Wikidrama part four)


Lots of Wiktionary history reads like it would make a good soap opera.

The first year or two, no one noticed it; a dozen or two contributors tried to sketch out the basic coverage of language features. There was a bureaucrat, and then a couple sysops, who just deleted crap as it rolled in. The only bot activity in that time was NanshuBot which made quite a mess, leaving all sysops here with an astronomically negative opinion of all bot activities.

The next year or two showed a ramping up of sysops (essentially, all the regular contributors.) Concerns about format of entries were raised, as some milestones of basic English language coverage (compared to other basic dictionaries) were met. Some of what is now formalized in CFI & RFV started to take shape. Some things were discussed reasonably and adequately. Other things (for a variety of reasons) ended up on strange tangents, such as enforcing the "etymology" hierarchy we still have today. Bot activities were violently and vehemently attacked and discouraged.

As the number of regular contributors increased, the number of sysops generally did not (minor jumps here and there, only occasionally.) As (critically necessary) bot activities increased, the old school resisted more strenuously; by the time Category:English nouns came to a head, the original bureaucrat found himself in an intractable mess, unable to save face. Since that time, he has taken a background role, contributing infrequently or not at all, yet remaining active on the wiktionary-l mailing list, biding time out of the spotlight.

During last year, an enormous effort also was made to codify many accepted practices. Some were done faithfully, others inaccurately. Attempts at making anything an official policy failed, by and large. But a core set of principles emerged. WT:ELE and WT:CFI seem to be the undisputed pillars of Wiktionary.

But I post today, to talk about bots.


When I got to Wiktionary, I was surprised and relieved by the lack of automation. Here were people creating a resource that would be free (as in speech, and as in beer) to the world to use ever-after. Copyright violations were dealt with sternly and very quickly. Nonsense was ripped out faster than you could blink your eye. And what remained was a core group of solid contributors, all miraculously working in concert towards a common goal.

But early attempts I made at automation were astoundingly, uniformly, universally and beligerantly resisted. I was baffled. (To this day, Webster's 1913 remains outside of Wiktionary.)

As I cooperated and interacted with the core group, they understood quite clearly that I was interested in the same end result; a respectable, usable free dictionary. As time went on, my off-the-beaten-path attempts at automation were gradually accepted. Better still, I made significant progress getting actual approval of certain bot tasks. But it was very painfully slow progress; parsing the original dumps (and much later, the XML dumps) was always a nightmare. It was clear then (much clearer now) that without some uniformity, the problems would only get worse. As a result, I contributed to the current inflexibility we have, sometimes in a very large way.

The shift I felt, was that I became more interested in encouraging others to automate, rather than trying to simply automate more myself. This was more for garnering support when facing off against the "old school" than anything else. But in some ways, the old school was right: each task should have separate approval. Last time I checked WT:BOTS, it still did (much to my chagrin.)

Today, we have numerous, fantastic bot operators, contributing in fantastic ways. But I find myself at a certain stopping point with regard to one minor technical matter. To simply bot-war/wheel-war it into non-existence would be (ahem) trivial. But that clearly wouldn't be productive, for me, or for the excellent bot operator in question.

More history

Two things Wiktionary has always done wrong are:

  1. Over emphasizing etymology, and
  2. Over emphasizing "part of speech" headings.

In traditional dictionaries, and in colloquial expectations, dictionaries are about definitions.

The part-of-speech of a definition is fluid, or irrelevant. To native speakers, the part-of-speech is irrelevant most of the time. To translators and ESL learners, it is of obvious importance.

The etymology, likewise, is perhaps the only interesting part of an entry (since the definition, in most cases, is obvious; petty squabbles usually only arise about particular wordings.) But with etymologies, the importance to ESL learners and translators is completely misleading. The English words that have multiple etymologies always have blending back-and-forth over time, from one etymology to the other.


Because of our current inflexible strictures, we hold the "etymology" heading level to by holy, likewise the part-of-speech headings. This forces truly stupid things to occur at lower levels. It also misrepresents what naturally occurs in the English language; etymologies themselves blend and are borrowed as "second meaning" uses become more common than primary senses.

Likewise, synonyms, antonyms, related terms and derived terms blend from one meaning to another.

With the most recent automation activities, this is being further misrepresented, by forcing L3 headings to L4 whenever a particular bot thinks it is appropriate. Not only does this undermine intentional L4 to L3 movements, it obnoxiously enforces a technicality that never had widespread support in the first place (having been added to CFI only during an edit war, in direct conflict with the widespread practice and the rest of CFI, unnoticed. The rest of CFI that conflicted was conveniently removed later.)

On a more technical note, the bot currently being run is not "well-behaved" in that it re-corrupts human corrections, when it is pointed out that it has done something wrong. (In regards to WT:BOTS, the bot/bot-operator is never supposed to make controversial formatting changes at all; the disregard for that point, is what has me in a tizzy.)

It is my opinion that this is not only wrong, but entirely misleading, and should be stopped. Furthermore, the bot in question should have activities subdivided, with separate approval phases, now that it is (long!) past the 100 test-entries threshold. The crossing the "t"s and dotting the "i"s of the formal bot-approval phase should provide sufficient feedback to make the bot operate smoothly, instead of covertly.

End of four-part observation and complaint.

--Connel MacKenzie 05:17, 17 April 2007 (UTC)

I am stunned that we haven't imported the 1913. Let's do that. bd2412 T 05:43, 17 April 2007 (UTC)
Yes, the most annoying aspect of this is that it is particularly time-consuming and counter productive. There are lots of better things we could be doing. --Connel MacKenzie 15:37, 17 April 2007 (UTC)
I wonder why there is a need for such secrecy. The bot in question is User:AutoFormat, which is run by User:Robert Ullmann (at least, I believe that's who Connel is talking about here). Personally, I believe that a dictionary has a very pressing need for uniformity and strict formatting policies, and I believe AutoFormat is a boon to our goals. However, I agree that before it goes any further, we should codify some of our practices, so that they are the result of the community's vision, and not of a single contributor (it should be noted that most of the rules were not simply invented by Robert, but were taken off of policy pages). I believe that this should take two forms. First, of all, the bot as a whole should get a vote, on the general grounds of its overall purpose (i.e. making various formatting changes in an automated fashion) in the normal fashion. Then, a draft should be made of all the formatting rules that AutoFormat follows (sorry to do this to you Robert). The Wiktionary community should be able to discuss each of them separately. I suppose a page should be set up somewhere (somewhere other than the Beer Parlour). I am unsure of how exactly this should be done, but in some way the community should discuss and come to decisions on each of these rules. Some of the rules will be consented almost immediately (such as switching "Derivated terms" to "Derived terms"). Others will involve more arguing, I imagine. I'm hesitant to require a vote for every single formatting rule that AutoFormat is allowed to do, because that's just ridiculously lengthy. Perhaps, to start out with we could do this. We have the bot vote, and Robert puts up his list of AutoFormat rules. Any rule which is not questioned by two users with over 100 edits gets to go unmolested, and everything that does get questioned by two such users is temporarily put on hold, until we can discuss it further. This would allow AutoFormat to get back up and running quickly, and still allow us to discuss formatting issues which require discussion. Any thoughts on this? Atelaes 06:42, 17 April 2007 (UTC)

While I agree with some of your reservations about Wiktionary policies (in particular, I agree that separating by etymology is a bad idea), I don't think we can be critical of AutoFormat (or any other bot) for enforcing long-codified policies. Indeed, overall it's better to have the uniformity gained by enforcing such, firstly because if it turns out we dislike its changes, then we know to fix the policy, and secondly because if we ever decide to change the policy, it's easier for bots to update the structure of a consistently-formatted Wiktionary than that of a hodge-podge. If AutoFormat undoes something a human editor does, that most likely means either that the human editor made a mistake (in which case AutoFormat is doing the right thing) or that the human editor is intentionally violating policy (in which case AutoFormat is doing the right thing). Nonetheless, if people think it's essential that it be possible for humans to violate policies, then I suppose we could create a {{nobots}} that would add a page to a Category:No bots, and require that all bots skip content between the {{nobots}} and the end of whatever section it's in (or the entire page if it's not in a section). —RuakhTALK 11:25, 17 April 2007 (UTC)

Those are excellent observations Ruakh. But the interpretation of well established policy is what is in question here. AF is taking a stricter interpretation, than WT:ELE actually says. I think the {{nobots}} idea is too tenuous to try to back-support for all existing bots. Sometimes it might work; other times it would likely be misused (accidentally or intentionally.) --Connel MacKenzie 15:45, 17 April 2007 (UTC)
(Atelaes: don't be sorry ;-) There is a list of what it does, it actually has documentation: User:AutoFormat.
While AutoFormat is written to run autonomously, it runs by itself in its own window on my laptop. But I watch it when it runs, and I check every edit, looking at the diff and the result unless I can tell for sure from the edit summary that it is okay. In this way it is more like someone running AWB. I haven't suggested getting a bot flag for it, because it turns out to be useful to have the edits in RC, where more people will look at them. (Clearly, if it is handed a large task, like fixing all of the {top} calls, it would need to flagged.) Robert Ullmann 12:09, 17 April 2007 (UTC)
(As to "re-corrupting", look at last straw, which was the example on the talk page: AF corrected the structure, Connel reverted it, Hippietrail immediately re-corrected it. At question the structure was clearly wrong; AF got it almost right, but needed a bit more refinement and re-run, and it is now correct.) Robert Ullmann 12:37, 17 April 2007 (UTC)
I'm going to allow myself a snarky comment, just because I like and respect Connel: If I was running this automation under my own account, perhaps using an edit summary of "===Hdr===, fmt", no one would have ever noticed. Robert Ullmann 12:48, 17 April 2007 (UTC)
Weren't you here back then? Yes, every semi-automatic edit was critiqued, lambasted, slowcooked over a fire and served with toast. (That was what all the "history" above was all about.)
My objection isn't to the fantastic work done with AutoFormat, so far. It is against the notion that something that hasn't been codified can be enforced by bot. The L4 stuff has never had solid consensus one way or the other; Ncik's flamewars and editwarring with me was about that specific topic.
On the other hand, the etymology "splitter" mentality has had consensus, even though it is quite certainly wrong (notably, support from me, myself.)
--Connel MacKenzie 15:03, 17 April 2007 (UTC)
re: Snarky comment' Actually, that is why my talk page is archived; it was overwhelmed by certain complaints and flamewars as the result of such edits, that first year or two. --Connel MacKenzie 15:35, 17 April 2007 (UTC)

Is ELE policy?

AutoFormat stopped due to threats from Connel. User Talk:AutoFormat#This is wrong

Apparently we need a vote to either confirm that WT:ELE means what it says, and is in fact en.wikt style and policy, or to remove the {policy} tag and to treat it as (what?) I didn't realize there was any such basic doubt about our basic format. Or is there? Is Connel just off the wall? Robert Ullmann 11:15, 17 April 2007 (UTC)

(Quote from above, Connel:) With the most recent automation activities, this is being further misrepresented, by forcing L3 headings to L4 whenever a particular bot thinks it is appropriate. Not only does this undermine intentional L4 to L3 movements, it obnoxiously enforces a technicality that never had widespread support in the first place (having been added to CFI only during an edit war, in direct conflict with the widespread practice and the rest of CFI, unnoticed. The rest of CFI that conflicted was conveniently removed later.)

(I presume he means ELE) The fact is, this is the way it was resolved; and presently is working and used by the majority of entries and users (I've run stats). If you want to change it, then the correct action is to re-open ELE for consideration, and call for a WT:VOTE on your proposed change. It is not to criticize me or the bot for following current policy, pretending it is "controversial" because you are still upset that you lost that edit war, and would prefer to think that your way is still/ever was the convention. See?

By all means propose changing the current, established policy. Don't criticize someone for following it. Robert Ullmann 12:27, 17 April 2007 (UTC)

  1. Who says I "lost" that edit war?
  2. I still interpret what WT:ELE actually says quite differently than you.
  3. The convention is (and certainly was) to use L3 for those headings. If AF has skewed the results to your favored interpretation now, that doesn't mean it is correct!
--Connel MacKenzie 15:09, 17 April 2007 (UTC)
WT:ELE shows ====Synonyms==== in the example, and before Translations, which must be L4 (right?) The description of the format for the synonyms section indicates that it is part of the POS section. There is no indication anywhere that L3 is permitted (unlike Derived terms, which specifies the exceptional case in which it has to be used at L3 even though that is not the normal case.)
The stats are 4380 occurrences at L3, 11525 at L4. Taken from an XML before any AF changes. Robert Ullmann 12:27, 18 April 2007 (UTC)

"Off the course!"

A long time ago I used to ski at a little area in N.E., mostly at night after work. One night, there was a slalom course set up along the side of one slope.

I decided to try running it, it was set up fairly easily, and turned out I could ski it fairly well. As I was, I heard a yell from someone on the chairlift, inside the trees to my left: "OFF THE COURSE!"

When I got the bottom, someone else screamed in my face that I wasn't supposed to be using the course. I watched and listened over the next few hours, and more than a few times someone would try it, or just go around a gate or two, and there would be screams from the lift: "OFF THE COURSE!"

Now, were there a sign at the top, they would seriously reduce their stress level, and not look like idiots ... but this never occurred to them.

Just remembered that for some reason. Robert Ullmann 12:27, 17 April 2007 (UTC)

Right. The applicable part of WT:ELE is the part about "flexibility." --Connel MacKenzie 15:15, 17 April 2007 (UTC)
But how can "flexibility" make it an error to follow what is claimed to be formal policy? And more personal, how will I know that there are no further conventions "floating around" never written down, and which I involuntarily break by following ELE? Is there any point in even having a policy which for the last 2.5 years has said one thing [synonyms, antonyms, quotations etc has been marked by a H4 heading there since late August 2003], which is considered incorrect, but which is more or less the only piece of information that never has been changed (at least not for very long)? I.e., not even by the part who consider it to be wrong. \Mike 12:03, 18 April 2007 (UTC)


If I understand Connel correctly, WT:ELE has certain things as policy that were not voted on, and he feels strongly that this AutoFormat bot is enforcing those things in WT:ELE which he disagrees with. Perhaps a debate about the specific offending WT:ELE items is in order. However, I agree with the above post which says that the development of AutoFormat bot should proceed with respect to the non-controversial edits.
Robert, how much of a pain would it be for you to turn off the offending features, but still run the other edits?
Connel, would such a compromise be satisfactory until the other issues are resolved? Or are you completely against the AutoFormat bot in its entirety? -- A-cai 12:46, 17 April 2007 (UTC)
Note that he's the one that tagged the current WT:ELE with {{policy}}: must not be modified without a WT:VOTE, and he set up the vote process ;-) I'd like to know if anyone else thinks there is a controversy.
It isn't hard at all: all of the header formatting is controlled by User:AutoFormat/Headers. Robert Ullmann 12:55, 17 April 2007 (UTC)
My complaint, yes, is about the L4 headings. OTOH, AF would do very well (and see many more improvements) if it went through its formal approval process! --Connel MacKenzie 15:19, 17 April 2007 (UTC)
And yes, I would like to see AF turned back on without the L4 error. Yes, I would like to see it get a bot-approval, with or without the bot flag as a result. --Connel MacKenzie 15:40, 17 April 2007 (UTC)
I agree with Ruakh & A-cai. The sooner we get to a bot-recognisable standard format the better. But if there are specific aspects where there is not clear consensus AND where changes made by AutoFormat would be difficult for it or another bot to undo, then we should hold off "regularising" those aspects until we have consensus. There are already plenty such issues which AutoFormat is set to leave alone. A few more would not hurt too much. At present, I am content with ELE as interpreted by AutoFormat, but I accept I have not thought hard about other possibilities which might be better. --Enginear 19:34, 17 April 2007 (UTC)
I've changed the code to tag some entries with {{rfc-level}} as an experiment. It will not change any header level. The entries tagged are not the same set as the entries it would have fixed; it will tag entries it can't fix as well, and not tag some simple cases it could correct. (Just using about 2 lines of code right now, to try it out.) Gives us something to look at, and we can always feed the cat to the bot later ;-) Robert Ullmann 13:05, 18 April 2007 (UTC)
Not only are synonyms dependent on the POS, they're dependent (or should be) on the definition number! This argument doesn't make any sense at all. DAVilla 14:24, 17 April 2007 (UTC)
For all words' synonyms, in all circumstances? Of the synonym headings we have, how many are "disambiguated" like translation sections? 1%? 0.1%? 0.2%? That is approximately how often such subdivisions are appropriate. Normally, subdividing them is inappropriate; the synonyms apply to figurative uses just as much to literal uses. --Connel MacKenzie 15:14, 17 April 2007 (UTC)
Subdividing the synonyms is almost always appropriate, because there are very few English words that are strict synonyms across all definitions. There's little impetus for two words with totally identical meanings to continue to exist in any language. Most often, one will drop out of use (as Charles Darwin hypothesized, and other evolutionary theorists and historical linguists since have repeatedly shown). Synonyms and Antonyms should always be L4. --EncycloPetey 23:52, 17 April 2007 (UTC)
Haven’t read all this (and am not planning to), but I think synonyms (and the like) should come after every POS, as an L4 header, even if they are the same for both. Otherwise it is only confusing for people that are insecure about their language skills. H. (talk) 08:49, 23 April 2007 (UTC)
EP, I would love to read up on that linguistic theory. What text (or even a link) would you suggest I start with? Offhand, I disagree with that conclusion, but then, I haven't yet seen or heard that "evolutionary" argument. My observations show the opposite: that homophones and homonyms blend and share in natural language, while linguists strive to proscribe separation.
Now, we could at this point start a simple clarification vote for =Synonyms= and =Antonyms= to clear up what apparently is my misconception. I honestly don't know how I've done so many tens of thousands of edits against what seems to be the consensus above. To me, it seems like a pyrrhic victory for the splitter mentality, at the expense of correctness and ease of entering new terms. On the other hand, parsing en.wikt will gradually become easier, if they are all bot-"corrected." I guess I'll know for certain, in about a month or two, when the vote ends? --Connel MacKenzie 03:29, 3 May 2007 (UTC)


Needs some serious cleanup and archiving. Any volunteers? - [The]DaveRoss 02:58, 19 April 2007 (UTC)

If there were a clearly outlined procedure for the process, I would have begun doing some of this long ago, but there isn't. And I can never remember what formatting / templates /etc. are supposed to be used (and can't remember where to go to look for good examples of passed or failed entries either). --EncycloPetey 11:34, 19 April 2007 (UTC)
I've been cleaning out the list, removing rfvfailed entries and archiving the oldest rfvpassed entries, but it's a lot to do. -- Beobach972 20:24, 20 April 2007 (UTC)

Somebody pulled a Wonderfool/Dangherous style stunt on Wikipedia

w:User:Robdurbar (who has since been desysopped) deleted the main page on Wikipedia, blocked several bureaucrats, and deleted several important pages like w:Cheese and w:History. This reminded me of the Dangherous stunt. Could we coordinate with Wikipedia to help them learn from the Wonderfool and Dangherous stunts and to halt this kind of madness much more quickly, and possibly coordinate with its CheckUsers to determine if Robdurbar is a Wonderfool sock? Jesse Viviano 04:11, 20 April 2007 (UTC)

Please see w:Wikipedia:Administrators' noticeboard/Incidents#Robdurbar for the details on this incident. Jesse Viviano 04:13, 20 April 2007 (UTC)

Sock? I find that highly unlikely. They're two completely separate users who got fed up and went off the edge. Picaroon 04:18, 20 April 2007 (UTC)
How do you know that? --EncycloPetey 15:18, 20 April 2007 (UTC)
Think about it. I don't know much about Dangherous (talkcontribs), but User:Robdurbar is a longtime (almost two years) Wikipedia contributor who seems to have burned out, and then returned to mess around with the sysop tools. Wonderfool/Thewayforward/Dangherous stopped editing Wikipedia as those back in 2006, but Robdurbar was active as late as February 2007, before posting this goodbye message in early March and returning yesterday for his stunts. If he was Wonderfool, would he not have "gone rogue" earlier? Why stick around and keep on making productive contributions for another four months? Picaroon 20:01, 20 April 2007 (UTC)
Concur. Wonderfool even did it better, with the right timing. And if it were his second time, he wouldn't have wondered aloud how long he could continue doing what he was up to. DAVilla 05:56, 21 April 2007 (UTC)
Well, Wonderfool certainly had his idiosyncracies...so who knows there. Any Steward can run CheckUsers across projects, or if a Wikipedia CU were to join #wikimedia-checkuser on freenode we could hash it out in there. - [The]DaveRoss 20:47, 20 April 2007 (UTC)
We do have someone (as of a few days now) who is a native CheckUser on both projects. Robdurbar was checked early on to see if the account was compromised. I don't have Wonderfool's IPs on hand though. If another Wiktionary CheckUser can confer with me in private, I can compare. Dmcdevit·t 21:45, 20 April 2007 (UTC)

Zip Code lists on Wikipedia facing deletion - suggested outcome

There is a large group of list-articles on Wikipedia that have been nominated for deletion: see Wikipedia:Wikipedia:Articles for deletion/Lists of ZIP Codes in the United States by state. I am of the opinion that Category:Appendices is an appropriate home for this almanaic content. I am not asking for people to weigh into the Articles-for-discussion debate on Wikipedia, but to consider the appropriateness of the inclusion (via transwiki) of this information in Wiktionary; it is my contention that the listing is a de factor thesaurus. --Ceyockey 01:24, 21 April 2007 (UTC)

  • Whereas this content is durably archived all over the place, and it is freely available in it's most up-to-date format online anyway, I personally think that the [delete] button is the best home for these lists as far as Wikimedia is concerned. - [The]DaveRoss 01:52, 21 April 2007 (UTC)
    • Have to agree with TheDaveRoss. Cheers! bd2412 T 03:30, 21 April 2007 (UTC)
  • I saw a lot of good reasons for keeping the Oregon page, and I saw a lot of bad reasons for deleting it. My favorite is, paraphrasing, "We've voted to delete all the other zip code pages, so regardless of the merits of the Oregon page, it has to go too." That's very clearly the wrong mentality for batch deletions. Personally I would neither have given so much credit to the votes that argued this information is already available elsewhere. Even if that were a valid reason, anyone who wrote that they did any investigation, apart from simply Googling USPS for the URL, concluded that it was not in as much detail. I'm not saying that I would have voted to keep the Oregon page according to encyclopedia standards, and anyways I didn't see it because it's already been zapped, but it looked like it deserved some attention at least. I don't know why quick-to-judgement comments like the above won the day.

    On the other hand, it doesn't sound like Wiktionary material, certainly not as an appendix. I would argue that the proper names don't deserve individual pages since they don't have any linguistic value, and likewise for the numbers. Probably the only exception is 90210. No, we do not discriminate against numbers here, but apart from a few hundred counting numbers they do have to be more than just that. The basic question here is if someone might run across the term and wonder what it means. We don't even include the full names of historical people on those grounds alone, only if the name, not the person, has some significance, linguistic rather than historical significance. Anyone who ran across a zip code would know immediately from context what it was and where to look it up, and the utility of neither dictionary nor thesaurus have that definition. Only to a postal worker, a legitimate but not sufficient exception, would a zip code be a "name", but I doubt dictionary or thesaurus would be a good name for their reference books either.

    If you've received a lot of negativity, it does sound like your project has some great value. I would suggest that you not give up your search for a suitable wiki. There is no Wikimedia map wiki that I'm aware of, but I've heard of a yellowpages wiki. As I'm not sure who would be interested in the historical data, I sincerely hope they don't turn a deaf ear either. Don't let something so useful fall through the cracks. DAVilla 05:36, 21 April 2007 (UTC)

    • You've made me wonder whether it would make a useful addition to Yellowikis. Hmmm. Uncle G 11:29, 24 April 2007 (UTC)

Thanks all for your input. The lists have indeed been deleted with the unfortunate citation of a vote count (there is much discussion on Wikipedia about "Articles for Discussion is not about voting", but acting on a count is effective nonetheless). I'm not personally on a mission to find a home for this information, but I do think that it has a home somewhere. It is useful to have thoughtful input from you on the scope of the Appendix section of Wiktionary. Regards, --Ceyockey 23:28, 21 April 2007 (UTC)

Quality expectations of other Wiktionaries

I have blogged about a request to stop including the Russian and Vietnamese Wiktionary in the interwiki process that I run as a public service. The reason to exclude the Russian Wiktionary is because there are mainly empty shells. The reason for the Vietnamese exclusion is that the Russian declension and conjugation tables are wrong. Both have been asked by the Polish Wiktionary to delete the offending material and this has not happened.

I need a discussion about this because when the Polish ban my bot, the whole process will stop.

Thanks, GerardM 09:31, 21 April 2007 (UTC)

Seems to me they have a legitimate complaint, although I would think they ought not to really do anything about it; it is the vi and ru.wikts' problem. But given that they insist, is it possible to filter in the 'bot so that ru and vi aren't added to the pl.wikt, without stopping anything else? (Well I know is is possible, is it something you are willing to do?) The other alternative is just to not write to the pl.wikt until they change their minds, but still read it and write everywhere else. Does the person on the vi.wikt know that much of the Russian declension information is available here? Robert Ullmann 12:07, 21 April 2007 (UTC)
My bot does all wiktionaries. As a consequence no languages are configured at all. So much of the information is on the English Wiktionary.. Now how do you get it out ?? GerardM 12:30, 21 April 2007 (UTC)
I wonder if reducing the amount of traffic those "lower quality" (I have no idea, I haven't spent any time on either) Wiktionarys receive is the best way to solve this. Perhaps they aren't great now, but Russian and Vietnamese speakers will encounter them from a higher traffic Wiktionary via the interwiki links, and decide to help them out some. Hiding them might not be the best option, I think. - [The]DaveRoss 13:15, 21 April 2007 (UTC)
The complaint about the Russian Wiktionary is a legitimate concern. I can't recall the last time I followed a link there and found any content in an article. It's all content-free formatting; all the section headers are in place, but no definitions, pronunciations, translations, or other information. I have usually found information on the Vietnamese Wiktionary, even if I can't read it. --EncycloPetey 22:18, 21 April 2007 (UTC)
I find the Russian approach of a barren wasteland to be a fairly depressing sight. Given how many foreign entries they have, you would think a word like Russian should be defined. Finding a few stub pages, maybe up to half of all entries, isn't so bad. As far as the enormous goals of the project, red links are one thing, but swimming through a sea of empty pages just turns potential contributors away. It even turns users away, since all the links are the same color, content or not. That's a categorically bad system. DAVilla 07:22, 22 April 2007 (UTC)
What about checking the page for {{stub}} before linking to it? Would that be any easier? If it's a stub page, it doesn't exist, for all intents and purposes. I didn't see an example of the Vietnamese errors, but their tables probably use a template too, don't they? Could the same trick work for that? DAVilla 17:31, 21 April 2007 (UTC)
Or better yet, bot-created content-free articles should have a tag ("Attention readers: There is no content here") to be removed by human editors when they add content to the article. The interwiki bot could recognize and ignore articles with those tags. Dmcdevit·t 00:19, 22 April 2007 (UTC)
My vote would be to leave your bot exactly as is. Your bot is merely reflecting the fact that an article has been created. It should be up to other linguistic experts to evaluate the contents of those links and fix them if necessary. Can you imagine the can of worms that would be opened if your bot began to take into consideration the correctness of the contents on the other side of a link? The next step would be to not link any incorrect articles from any wiktionary. I think that is a bad idea because the bot draws attention to the articles, which increases the likelihood that they will be noticed by an expert and then fixed. Furthermore, I'm not sure how much of a right any Wiktionary (Polish or otherwise) has to block any other Wiktionary (Vietnamese, Russian or otherwise). -- A-cai 00:47, 22 April 2007 (UTC)
On the Vietnamese issue, I have to agree. Your bot shouldn't be concerned with correctness. Count {{stub}} pages as non-existent, and if meeting the Polish half way isn't enough for them, then they'll just have to block your bot. There's no stopping them from making poor decisions. DAVilla 07:12, 22 April 2007 (UTC)
I don't think the bot is looking at the content of any of the pages now, this would not be a simple thing to do. The best I can come up with is to just not add ru or vi iwikis to pl, which is a nasty special case, I understand why he wouldn't want to do it. If they do block it, they just lose getting iwkis for any others, the bot can still read pl and do everything for everyone else? (They want to damage their project, there is nought we can do.)
If vi.wikt cares, they can get better Russian declension info here; I pick it up for Russian entries in the sw.wikt. Robert Ullmann 11:14, 23 April 2007 (UTC) Can someone point me at a specific example of where the vi.wikt is wrong? Robert Ullmann 12:51, 23 April 2007 (UTC)
I have noticed a few mistakes in the Vietnamese Wiki Russian declensions, but I think a lot of it is a confusion about the order. The Vietnamese show Russian declensions in a different order, which can make them appear incorrect to someone who does not understand the Vietnamese words for the different cases.
As for the Russian Wiki, yes, there are a lot of pages that have no content, but I still find a lot of important pages with good content. I have noticed some bad declensions in the Russian Wiki, too, but their templates are too complicated, so I don’t attempt to make corrections. —Stephen 01:24, 27 April 2007 (UTC)
I dislike the notion of trying to stiffle this (harmless and useful) bot's function. If an entry exists on ru:, it should link back to en: for bilingual Russian speakers' reference (so that they can correct those entries, at some point.) Having the link to ru: from here is equally useful, as those same translators don't have to guess at where something belongs.
If the bug that kills the interwikibot can't be fixed (where it crashes and burns as a result of a pl: block,) then I suggest that the Polish Wiktionary should be dealt with very harshly. All the Wiktionaries have a very long way to go; perhaps the pl.wikt: should be closed until they come to their senses.
A long time ago, I complained about the ru.wikt:'s stub-bots, but those complaints fell on deaf ears. I agree with DAVilla that what they've done is bad/wrong. But breaking the interwiki bot (for all wiktionaries) is not the right answer. --Connel MacKenzie 15:57, 2 May 2007 (UTC)

Word of the Day mailing list

I just answered an email to OTRS from someone asking about something like a mailing list for the word of the day (to send it to subscribers' inboxes daily). This isn't a new proposal (meta:The word of the day) and I think there would certainly be many readers interested in subscribing. This is similar to daily-article-l, daily-image-l, and destacado-l. Does anyone have any thoughts about this (or willing to help work on it)? Dmcdevit·t 23:54, 21 April 2007 (UTC)

Connel made an attempt to get this started, but changes to the way wiki-software works broke the work he had done. I don't know that anyone else with the requisite understanding of the process has attempted to make this happen. I don't know who here would be able to set this up. --EncycloPetey 14:32, 22 April 2007 (UTC)
I haven't gone back and fixed it, mainly because Deusentrieb did such a fantastic job setting up the generic tool (see the WikiCommons RSS feeds.) To my right, as I sit here, I have a google screensave displaying the Picture Of The Day (from commons:) for quite a while back, as a screensaver. I don't use his "e-mail the POTD" feature, but I understand it works well, also. --Connel MacKenzie 01:47, 27 April 2007 (UTC)

Headings for 漢語, 閩南話, 粵語 etc.

Could someone tell me what is the exact policy on headings for the different Chinese languages please? I have seen some entries under "Chinese", and some under "Mandarin" and "Min Nan", but I haven't seen any under "Cantonese" at all. I would have thought that Chinese entries should be under "Chinese: Mandarin", "Chinese: Cantonese", "Chinese: Hokkien" and so on.

Also, shouldn't "Min nan" be under "Hokkien" since that's the name of the dialect in English? No other langauge is under its local name, "Min nan" isn't a word in English wheareas "Hokkien" is. Pistachio 12:29, 22 April 2007 (UTC)

I refer you to an earlier Beer parlor debate (Wiktionary:Beer_parlour_archive/2007/April#Amoy) on this very subject. The current Wiktionary guidelines are to name a language/dialect in accordance with ISO 639-3 standards. This poses a problem for the language you are referring to. I explain the reasons in detail in the earlier debate. The short answer to your question is that Hokkien would not be appropriate since Hokkien (Fujian) is home to more than one language/dialect. Amoy Hokkien (Nicholas Cleaveland Bodman's Spoken Amoy Hokkien) or just Amoy would be the most linguistically correct, but since it is not recognized separately from Min Nan in ISO 639-3, there is opposition to using the term Amoy. I agree with you, but my hands are tied for the time being. Ironically, the term Amoy has been used in English to describe the language since at least 1913 (Rev. William Campbell's A Dictionary of the Amoy Vernacular, spoken throughout the prefectures of Chin-chiu, Chiang-chiu and Formosa), probably much earlier. The term Min Nan is, as you say, probably not even proper English. Southern Min would be more suitable as an English term, but Min Nan was chosen instead to be the official term for the ISO 639-3 language code (nan). Unfortunately, if you read the earlier debate, you will also find out that Min Nan actually represents a family of non-mutually intelligible languages/dialects, Amoy being the most well known of these. The guidelines for all of this, such as they now stand, are spelled out in WT:AC. -- A-cai 12:59, 22 April 2007 (UTC)
P.S. If the L2 header says Chinese, it should be changed to ==Mandarin==, ==Cantonese== etc. The {{zh-attention}} tag can be added so that a word is placed in Category:Chinese words needing attention. -- A-cai 13:03, 22 April 2007 (UTC)
Yes, highlight the last point since it's the one thing we can say with certainty. Just "Chinese" is incorrect. DAVilla 06:22, 24 April 2007 (UTC)
I see ISO 639-3 recognizing 13 individual codes of Chinese as a macrolanguage. I will pass this to Chinese Wiktionary.--Jusjih 11:34, 23 April 2007 (UTC)
We need a WM-wide extension code for Teochew (nan-tch or some such), like fiu-vro for Template:fiu-vro; I don't know who maintains such things. Robert Ullmann 12:48, 23 April 2007 (UTC)

Cleaning out English requests

There are many entries in Wiktionary:Requested articles:English that have been languishing for ages. I propose to remove all entries that do not have an entry in the online OED, and for which there are zero proper hits in Google-Books. Does anyone have any objection? SemperBlotto 11:02, 23 April 2007 (UTC)

  • I think it might be nice to have a purgatory of sorts — a place to put requests for articles on words that don't seem to be real, or that seem like they wouldn't pass CFI — just in case looks turn out to be deceiving, and someone later can come along and define the word with verifying cites. The not-in-OED,-not-on-Google-Books criterion sounds like a good one for that. —RuakhTALK 13:23, 23 April 2007 (UTC)
  • It's always worth remembering that the fact that someone has requested a word doesn't mean that it is actually a word. ☺ I recommend checking Google Groups as well as Google Books. Uncle G 11:17, 24 April 2007 (UTC)


Something has to happen with this. As it is now, it is unusable. See lah#Old English, and my last edit of make: I could not encode Common Germanic in there. Though I know nothing about those proto-languages, I suppose someone will have had his reason not to call it Proto-Germanic. So I propose:

<span class="proto-lang-ref">from conjectured [[{{{1}}}]] ''[[Appendix:{{{1}}} *{{{2}}}|{{unicode|*{{

{2}}}}}]]''{{#if:{{{3|}}}|, ‘{{{3}}}’}}</span><noinclude>[[Category:Etymology templates|proto]]</noinclude>

and leave it up to the knowledgeable user to use it multiple times if more than one root is to be given. The multiple parameters are only confusing. {3} can even be left out entirely. H. (talk) 21:37, 23 April 2007 (UTC)

my apologies; there was an incredible amount of CRAP to deal with on this, and I haven't gotten back to it ... the multiple parameters are just 2 optional repeats. Robert Ullmann 21:57, 23 April 2007 (UTC)
simplified, I fixed all the existing uses. (was going to do this last night, but the net went away yet again ;-). Common Germanic is just another (less common) name for Proto-Germanic, you can just change it. Robert Ullmann 13:38, 25 April 2007 (UTC)

L4 headers and WT:ELE

Some statistics:


See also: [[]], [[]], and [[]]


(and to forestall the whinge, AF has changed less than 200 of these)

Most users do seem to actually follow WT:ELE.

In the example there, it shows these headers at L4, and then says:

"A key principle in ordering the headings and indentation levels is nesting. The order shown above accomplishes this most of the time. A heading placed at one level includes everything that follows until an equivalent level is encountered. If a word can be a noun and a verb, everything that derives from its being the first chosen part of speech should be put before the second one is started. Nesting is a key principle to the organization of Wiktionary, but the concept suffers from being difficult to describe with verbal economy. If you have problems with this, examine existing articles, or ask questions of a more senior person."

Under Derived Terms it gives an exception to this rule:

"If it is not known from which part of speech a certain derivative was formed it is necessary to have a "Derived terms" header on the same level as the part of speech headings."

So it is very clear, not ambiguous at all as far as I can see. (The one difference being that "See also" has been used very commonly at L3 at the end of an entry, this was discussed above.)

Do we follow the clearly written policy? I suppose we could have a vote, but what would it be?

Support: WT:ELE is policy, and means what it says in plain language.
Oppose: WT:ELE doesn't mean anything, policy is something you have to ask Connel about (;-)

I don't think that would really be helpful?

As noted above, Synonyms and Antonyms (and other *nyms) do not in general apply to all senses; they apply to multiple POS only by coincidence in the word formation. Most related and derived terms are clearly from one POS, and usually from one sense. Only in an exceptional case should they be at L3. If there is only one POS this exception cannot apply.

I don't know where to go: everyone except Connel seems to have no problem with ELE, and agrees with its plain language; but Connel insists that it is wrong to follow it? Robert Ullmann 14:46, 26 April 2007 (UTC)

AF and {{rfc-level}}

All of the above is completely independent of AF, although that is what "tripped" it. AF right now is putting all of the things it could be correcting into Category:Entries with level or structure problems. There are 150+ there now, and some have been fixed by various people. Would be a lot easier just to fix them.

Yes, it would be good to have a WT:VOTE on AF. But first I need the answer to the question: Is AutoFormat permitted to follow written policy? Robert Ullmann 14:46, 26 April 2007 (UTC)

I say yes. H. (talk) 23:40, 27 April 2007 (UTC)
I not only concur, I'd say that's a requirement. \Mike 10:19, 30 April 2007 (UTC)
Robert, I'm not sure if AF would be the right bot for this, but I'm wondering if you could add a couple of rules for Chinese entries:
  1. if an entry contains an L2 header such as ==Chinese==, add something like {{zh-attention|should be Mandarin, Cantonese etc.}}
  2. if Chinese entry lacks a POS template, add something like {{zh-attention|needs POS template}}
Please let me know what you think of the idea, and whether it's doable. -- A-cai 12:16, 30 April 2007 (UTC)
It is already doing that! (Well, it doesn't flag the Chinese L2 header, but does flag the Mandarin or whatever that typically follows it.) A missing inflection line is created as {{ZHchar|{{subst:PAGENAME}}}}{{zh-attention|needs inflection template}}. Note that it is only doing this for errors. Tagging all of them is something that should be run separately from the XML, I did this for Japanese. Will think on it. Robert Ullmann 14:01, 30 April 2007 (UTC)

Visual dictionaries

I would like to see a Wiktionary area modelled on the Facts on File Visual Dictionary (French, English, Spanish), and other similar publications. Without violating their copyrights, of course. The printed version I have has more than 3,000 line drawings of common or otherwise familiar or significant objects from shoes to galaxies, with more than 25,000 labels. I have several times wished for such a thing while studying another language.

The Facts on File Visual Dictionary Facts On File / 1986 / Hardcover ISBN: 0816015449 ISBN-13: 9780816015443

This one might be a good starting point.

My First Visual Dictionary Educational Insights Inc. / 2002 / Hardcover

There are lots of others. Here is a sampling

Visual Dictionary: English-Hebrew, Hebrew-English Eisenbrauns / Hardcover ISBN: 9652201855

Ultimate Visual Dictionary Dk Adult / 2003 / Trade Paperback ISBN: 0789499703 ISBN-13: 9780789499707

DK Visual Dictionary of Baseball James Buckley Dk Children / 2001 / Hardcover ISBN: 0789467259 ISBN-13: 9780789467256

Scholastic Visual Dictionary Jean-Claude Corbiel, Ariane Archambault Scholastic Trade / 2000 / Hardcover ISBN: 0439059402 ISBN-13: 9780439059404

Ultimate Visual Dictionary of Science Dk Adult / 1998 / Hardcover ISBN: 0789435128 ISBN-13: 9780789435125

Five-Language Visual Dictionary (English, French, German, Spanish, and Italian) Dk Adult / Hardcover ISBN: 0789484390 ISBN-13: 9780789484390

Eyewitness Visual Dictionary of Special Military Forces Dk Children / Hardcover ISBN: 1564581896 ISBN-13: 9781564581891

German English Bilingual Visual Dictionary Dk Adult / 2005 / Trade Paperback ISBN: 0756612950 ISBN-13: 9780756612955

French English Bilingual Visual Dictionary Dk Adult / 2005 / Trade Paperback ISBN: 0756612977 ISBN-13: 9780756612979

Italian English Bilingual Visual Dictionary Dk Adult / 2005 / Trade Paperback

OISBN: 0756612969 ISBN-13: 9780756612962

Spanish English Bilingual Visual Dictionary Dk Adult / 2005 / Trade Paperback ISBN: 0756612985 ISBN-13: 9780756612986

The Perigee Visual Dictionary of Signing Rod Butterworth, Mickey Flodin Perigee Trade / 1995 / Trade Paperback ISBN: 0399519521 ISBN-13: 9780399519529

Horse: The Visual Dictionary Dk Children / 1994 / Hardcover ISBN: 1564585042 ISBN-13: 9781564585042

Human Body Visual Dictionary Dk Children / 1991 / Hardcover ISBN: 1879431181 ISBN-13: 9781879431188

Stitch Sampler: The Ultimate Visual Dictionary to Over 200 Classic Stitches Lucinda Ganderton ISBN: 0789446286 ISBN-13: 9780789446282

I wonder how much relevant clip art is available for unrestricted use. A quick survey says that there is good quality art for a fee, and not very good art for free. Well, we should start some sort of WikiArt project then. We will need line art (SVG, for choice) and bitmap (PNG, for choice).Mokurai 21:24, 26 April 2007 (UTC)

Here are some of examples of how Wiktionary is already doing (on a limited scale so far) what you are proposing: 清明上河图, avocado. -- A-cai 22:40, 26 April 2007 (UTC)
I think what Mokurai is referring to are the pages in such dictionaries that show a whole set of topically related items, each of which is identified on two languages. So, rather than a page like avocado, where an image of a single fruit is given with translations, a Visual Dictionary would show twenty different fruits and identify them all by name. This allows a person learning vocabulary to see, compare, and contrast a whole range of terms at once. A visual dictionary might also have a picture a an object (say a computer) and label all the major components.
There isn't anything like that currently on Wiktionary. The closest we have are the topical categories, e.g. Category:Fruits, but these categories do not (and should not) include comparative pictures. The Visual Dictionary concept is interesting, and I certainly agree it is a good idea, but it almost sounds like an entirely new project. We don't have the capability to display images with dynamically generated labels (as far as I know). Nor do we have the informational structure to place labeled diagrams in a single language into appropriate locations.
This could conceivably be done in the variation Wikipedia projects, and to some degree already is. For example, I used the image of a leaf on the German Wikipedia and translated the labeled parts for the English Wikipedia article on w:leaf (and a nice person then produced a clean version of the image). However, this is an entirely manual process and only works when someone takes the time to edit the image. It is also feasible within the scope of a Wikipedia because each of them is in a single language. Here, we deal in all languages. This presents additional difficulties. --EncycloPetey 15:37, 27 April 2007 (UTC)
French Wiktionary and possibly others have a hypernym/hyponym listing that we don't. I could see turning that framework into something similar to the proposed concept in another namespace. For instance, steering wheel and windshield would link to Picture:vehicle, showing several representative vehicles, including an automobile, and listing the major components. From there it would probably be best to link to Wikisaurus considering how much these names can differ regionally. The names of the parts should definitely not be included in the images themselves, or we'd be reinventing the wheel on every Wiktionary. I'm not sure how to get links and pictures to work together though. Maybe a table of 10-pixel by 10-pixel cells where the background image is the one being labeled. Any better ideas? DAVilla 16:32, 27 April 2007 (UTC)
Wikimedia commons seems to already have a lot of the infrastructure that we would need. If we could create links to the commons categories (ex. commons:Category:Fruit) from Wiktionary such that someone would understand that the commons category can be used for this purpose, then all you need to do is put Image:Avocado.jpeg in any entry that means avocado. The reason is that if you go to information section for the Avocado.jpeg, it shows which words are linked to it. What I'm describing is a little clunky compared to the visual dictionaries in print. However, it seems like it would be the most logical place to start. Perhaps a bot could be written that could read such links, determine the language, and use that information to create a kind of translation section within the info section of the picture. -- A-cai 23:23, 27 April 2007 (UTC)
We could do this with appendices. I'm not so keen on relying on Commons, only because their categorization scheme serves a different purpose (e.g. they may have an entry for "leaf" with a dozen pictures of the same leaf). bd2412 T 23:34, 27 April 2007 (UTC)

Two new admins?

We currently have two users up for adminship, with the votes ending within a couple of days. The turnout on both of these votes has been rather pitiful, and I am hoping that most of the community was simply unaware of these votes, and not being apathetic about our admins......I hope. Go here and show your support for your fellow editors or publicly humilitate them. Either way is better than simply ignoring the vote altogether. Atelaes 08:01, 28 April 2007 (UTC)

The poor turnout is due to the fact that it’s on a new page, and therefore it does not appear in one’s "watch" list. We have lots of such votes, and since we started putting them on new pages, I rarely ever see them. —Stephen 13:03, 28 April 2007 (UTC)
Is there a way that we could update Wiktionary:Votes with a hard link to each new vote's new page each time such a vote is started? That way, changes would show up on the master page and, thus, everybody's watchlists. — Beobach972 15:44, 28 April 2007 (UTC)
Sorry if I'm misunderstanding your suggestion, but Wiktionary:Votes does get edited every time a new vote is created. (It's edited to transclude the new vote, rather than link to it, but the effect is the same.) I guess the problem is that a single edit on one's watchlist can easily go unnoticed. —RuakhTALK 16:32, 28 April 2007 (UTC)
You're right. I suppose what I meant to say is that each time an individual vote is editted, it should show up as an edit to the main votes page on watchlists. -- Beobach972 01:53, 29 April 2007 (UTC)
Well, in fairness, both Wiktionary:Votes & Wiktionary:Administrators should get edited for every new vote, and so if one has both pages on their watchlist, they should get two notices. However, even this is sometimes not enough, for people who have a lot on their watchlist (such as myself). Atelaes 18:54, 28 April 2007 (UTC)
One obvious option is to just check pages like WT:VOTE more frequently, or, I know some wikis have announcement templates for things such as this, a template which one can stick at the top of their talk page or on their userpage and can be updated with the most recent five or so announcements (new votes, major policy changes, Wikimedia announcements) and that way anyone who wishes to track such things has a convenient means to do so. - [The]DaveRoss 18:58, 28 April 2007 (UTC)
Here is a very ugly example of what that might look like:


I think that's an excellent idea, but it would have to be uglier and more obnoxious. Atelaes 20:13, 28 April 2007 (UTC)

Eh, I think it's perfectly ugly and obnoxious (read: prominent and noticeable) as it is. -- Beobach972 01:53, 29 April 2007 (UTC)
Another problem was that we'd had a slew of votes recently. I've archived the old ones, which makes it much easier to see what votes are currently happening. --EncycloPetey 22:42, 29 April 2007 (UTC)
TheDave, that is pathetic. Obnoxious. Brutal to maintain. And your example has something AFU with the dates. (Comparison date format works wonders in such situations: yyyy-mm-dd.) Do not add any such monstrosity without first ensuring that it can be turned off with a WT:PREF dohickey. --Connel MacKenzie 07:00, 30 April 2007 (UTC)
Well, the idea with this would be that anyone could transclude it into their own talk/userpage should they desire to have it there, and bother no one else in the process. We have the announcements page, the site notice, and several other means for messages that we intend for everyone, this would be more like a mailing list or news ticker for those who wanted it, I imagine those would be the same people who would try and maintain it. It would also be trivial to make the color scheme (read: level of obnoxiousness) a parameter, so people could have it less intrusive on their own pages should they desire it so. - [The]DaveRoss 22:13, 1 May 2007 (UTC)


I have only just noticed that we are suddenly headlining such pages as fuck and shit with banners telling people that these are vulgar words which polite speakers do not use. Is it just me that thinks this makes us look rather twee and ridiculous? We already give people the relevant information by marking the definition lines with {{vulgar}}, is there any need to stick a great banner over the top of the page? More importantly though, we are suggesting that (in the case of fuck for example) ‘polite speakers substitute make love’. We really cannot be telling people what kind of word substitutions to make. It's nonsensical. In the example of fuck, it doesn't work for six out of the seven listed senses. Did I miss a discussion about this somewhere? Widsith 16:07, 28 April 2007 (UTC)

I just had a look at the template and I think it's not the place of Wiktionary to dictate polite language to people. Its also true that it would the substitution would not always be accurate ("make love you!"). In order to indicate those words in some paper dictionaries there is a small mark after a word, which could work here, for example: damn ⓥ, bastard ⓥⓥ, c*** ⓥⓥⓥ. Pistachio 18:50, 28 April 2007 (UTC)
Away with them. What nonsense. H. (talk) 18:31, 28 April 2007 (UTC)
Agreed. While I think it useful to note vulgarity, and perhaps a level of vulgarity display such as above might be useful (there is a marked difference between crap and cunt in most circles I've ever run in). But this is just obnoxious and preachy, and the "polite" replacement will, I think, vary too much from circle to circle to be practical. Atelaes 18:43, 28 April 2007 (UTC)
Word. —RuakhTALK 19:27, 28 April 2007 (UTC)
Yes, the banner should go away, it's ugly and redundant... we can indicate vulgarity with (vulgar) and perhaps (somewhat vulgar), (very vulgar) to show severity. -- Beobach972 01:58, 29 April 2007 (UTC)
Considering it was created along with {{wanker}}, I simply considered it to be a joke in poor taste. --Connel MacKenzie 02:10, 29 April 2007 (UTC)

I've mentioned before that I think the label "vulgar" is the wrong term to use. In its earlier sense, it means "as used by the common people; lacking in good manners", and that it how it has been used by some dictionaries in the past to cover all manner of words and phrases, many of which were merely informal rather than swear words.

Some modern dictionaries use "taboo slang" for the stronger words, such as "fuck", and "coarse slang" for the less strong words, such as "twat". These clearly show that the words are inappropriate in most contexts without the need for a prim usage note. "Vulgar" is simply too ambiguous and I think we should avoid it. — Paul G 08:22, 29 April 2007 (UTC)

Those labels are unusable in American English, as the meanings/connotations are opposite what you describe. --Connel MacKenzie 07:02, 30 April 2007 (UTC)
Vulgar will do for the context label,if there is any doubt about regional variations in te levels of vulgarity of the offending article then this can be notes in a Usage notes section.--Williamsayers79 09:22, 30 April 2007 (UTC)

Why are Persian words not included in search results?

address formally, address informally

OK, I mentioned some time ago that I felt there was a need for these pages. I gone ahead and added them, even though there was some objection when I first suggested the idea. I knew the translations for two languages (French, Italian) and have since discovered three more (German, Modern Greek, Spanish). No doubt there are a few other languages (probably Romance languages or other European languages) that have similar expressions. Note also that some languages (such as French) have nouns for the action of the verb. I have included these on these pages.

Those should probably be at in/formal address or something. DAVilla

Note that these page are not intended for languages such as Japanese that have more than two forms of informal and formal address.

Why couldn't these be included with a parenthetical explanation, as we do with all other translations, e.g. you? DAVilla

While the phrases are not idiomatic in English, the existence of equivalent phrases in several languages for this concept demands that they be gathered together and listed in Wiktionary. Compare day before yesterday and day after tomorrow, neither of which is idiomatic in English, but for which single words or idiomatic phrases exist in other languages.

Even thought this might not quite be the right way of doing it, I feel that there is a place for collecting these phrases together somewhere in Witkionary. Please discuss how this information can be presented, but please do not delete the pages. — Paul G 08:16, 29 April 2007 (UTC)

Category:Phrasebook, perhaps? --Connel MacKenzie 08:19, 29 April 2007 (UTC)
Yes, this is the perfect solution. DAVilla 20:30, 29 April 2007 (UTC)
There is the equivalent practice in Northern England of using thee and thou to one's friends and family (with derived words such as sithee). The rule (in dialect) is something like "Tha thee's they that thees thou" meaning you only use it when someone has used it to you. (But I don't know how to define it) SemperBlotto 08:29, 29 April 2007 (UTC)
The terms day before yesterday, day after tomorrow, address formally, address informally are all inconsistent with WT:CFI, from what I can tell. These terms do not need to be defined, but you still want a translation section for them, correct? Perhaps, a category could be created for such a case. You could name it something like Category:non-idiomatic English equivalents of foreign words. You could then populate that category with articles like Wiktionary:address formally and Wiktionary:address informally. -- A-cai 11:56, 29 April 2007 (UTC)
I'm not sure about address formally, but address informally strikes me as completely unnecessary, as in English we have the verb thou with that sense. Also, I think all of these verbs are derived from the pronouns they reflect; vouvoyer should be listed as a derived term under vous, tutear under , etc. —RuakhTALK 16:04, 29 April 2007 (UTC)
We do indeed, and the last time I raised this, I suggested putting these under "thou". However, there is no formal equivalent, and I don't think there is a noun equivalent either.
""Tha thee's they that thees thee", surely? "Thou" is a subject pronoun. — Paul G 17:44, 10 May 2007 (UTC)


It strikes me that our link to uncountable from all noun templates when the user indicates lack of a plural form is not particularly helpful to the reader. First, unless the reader is already familiar with the article layout, he or she may not already know that it is typically the plural forms that go in the parentheses next to the headword, in which case its meaning is unclear. Furthermore, we're linking to our own article on uncountable, not explanatory text, which results in a curious reader landing at a page whose first definition is "So many as to be incapable of being counted," not the linguistic sense, and which gives no direction to readers who are tryin to find ut what it means in parentheses next to the word they were really trying to look up. I suggest we make a page in the Wiktionary: namespace intended for the reader to navigate to from the noun templates which tells them the meaning in plain and undisuised language. Dmcdevit·t 08:56, 29 April 2007 (UTC)

What we need are Wiktionary:About English and a whole slew of Appendix:lang:Glossary articles to link to for "uncountable", "past participle", "enPR", "f", etc. DAVilla 20:29, 29 April 2007 (UTC)
I disagree with the article name. Our articles such as Wiktionary:About Spanish and Wiktionary:About Ancient Greek are style articles for editors, not descriptions of general grammar. Articles explaining the grammar of English should be linked from Appendix:English language, with specific articles such as Appendix:English nouns, Appendix:English adjectives, and so forth.
However, since "past participle" applies across multiple languages, we might need a general set of grammatical articles such as Appendix:Participles and Appendix:Gender. --EncycloPetey 22:47, 29 April 2007 (UTC)
Okay. Now how do we link the {{t}}ranslation template to Appendix:Gender? We already have a lang= paramter, so it might be possible to #shortcut. Or do we need a different page for each language? DAVilla 09:32, 30 April 2007 (UTC)
I don't know of any situations where we would need to link to a particular language for explaining gender. The idea is one of matching parts designated for a particular group. All the languages I can think of the have gender do it mostly the same way. Anything more detailed could be linked from the entry for the word itself rather than a translation table. For translation tables, a minimum of explanation is all that would be needed. --EncycloPetey 05:00, 1 May 2007 (UTC)
This need was why I tried to revitalise Wiktionary:Glossary a month or two ago - although I think a better answer is to have an article with a name like uncountable (glossary) for each linguistic term, each would be mutually linked to the ordinary entry for that word. —Saltmarsh 06:43, 30 April 2007 (UTC)
Would it be okay to link instead to Appendix:Glossary since Wiktionary:Glossary should be specific to wikijargon? Also there's a lot of stuff there that I don't think matters so much to the user space, like abbreviations which we avoid.
If we need individual pages it would be better to have an outright Glossary: space. But I'm not conviced of that. DAVilla 09:35, 30 April 2007 (UTC)
We could try that (linking to Appendix:Glossary), though I think it might need a slightly more descriptive name, unless your idea is to include all the terms a user would need in a single page. --EncycloPetey 05:00, 1 May 2007 (UTC)
I think it's a good idea to have separate pages for Wiktionary:Glossary (Wiktionary jargon; for editors) and Appendix:Glossary (dictionary jargon; for readers). We can split it out if it gets unwieldy, but I don't think it's time to worry about logistics yet. Anyone want to take a stab at adding some useful information? Dmcdevit·t 22:26, 3 May 2007 (UTC)
O.K., I've created Appendix:Glossary. Right now it's just a copy-and-paste of Wiktionary:Glossary, so much work is needed on it, but it's a start. —RuakhTALK 00:45, 4 May 2007 (UTC)

Off topic

Lake Magadi, 595 meters, 15:30, temperature 40C. On the lead power unit, unit train is 33 cars of soda, 1,968 metric tons. Destination is Kajado, 120 km away and 1200 meters on top of the east escarpment of the Rift Valley. Single track, Cape gauge, 2% grades, sharp turns through cuts. Speed 15-40 kph.

First two long climbs, fans on high, everything running well, outside temp to 47C. Going through Masai land, cows, goats, more cows. Baboons, dik-dik. No giraffe on this trip. Donkeys. Through a Masai market; hundreds of colorful people stop to watch. 50 klicks without crossing a road.

Last major grade, get stuck on a curve. Have to back down and manually sand the rails. Try again.

Later, after sunset, driving through the East African night, the Archer overhead, high-spot showing the narrow gauge line ahead. Two 2900 horsepower turbines in throttle-4, 600 amps to the traction motors, still climbing. Listening to the crickets.

Yes, sometimes the day job interferes with the wikitime ... Robert Ullmann 14:21, 30 April 2007 (UTC)

Now, what we need is a wikiblog. SemperBlotto 14:46, 30 April 2007 (UTC)

Category for Biblical quotes and proverbs

I was just checking out eye for an eye and turn the other cheek and realised that there is no categorisation for idioms etc that have a Biblical origin, even though there is a category for Bible. Given the high percentage of such phrases, and even plain vocabulary, would it be possible to set up a category of this nature? Algrif 16:34, 30 April 2007 (UTC)

That's a good idea. I'm not sure what it should be called, though; maybe Category:English terms of Biblical origin? —RuakhTALK 17:05, 30 April 2007 (UTC)

How about Category:English Biblicisms? --Joe Webster 21:32, 30 April 2007 (UTC)

Would there also be a Category:English Pseudo-Biblicisms or, possibly, Category:English Literaturisms for quotes like, "God helps those who help themselves?" --Joe Webster 22:15, 30 April 2007 (UTC)

It feels more like an etymology category. How about Category:Biblical derivations as a subcategory of Category:Etymology. That way we can have Category:es:Biblical derivations & co. Since this is the English Wiktionary, we do not need to specify that they are in English. We need only include terms and idioms. We don't need to include quotations; that's what our sister project Wikiquote is for. --EncycloPetey 22:27, 30 April 2007 (UTC)

Good idea. "Biblical origin" would be misleading for some of these, anyway; for example, I believe "an eye for an eye" is found in Hammurabi's code, and the Bible is relevant not as the originator of the phrase, but as its vector into common English use. —RuakhTALK 04:34, 1 May 2007 (UTC)

Category:Biblical derivations as a subcategory of Category:Etymology is more or less the way I was thinking. To me that sounds good. However, I do take the point that the Bible is not necessarily the unique source. Algrif 14:06, 1 May 2007 (UTC)

Well then, let's do it! :-) —RuakhTALK 15:14, 1 May 2007 (UTC)
It would be nice to have a category of terms with biblical significance; to make such a thing "NPOV", do we have someone willing to research (or knowledgeable of) each of the other major religious texts, and their counterparts? Or is it simply a can of worms? --Connel MacKenzie 05:23, 7 May 2007 (UTC)