Wiktionary:Beer parlour/2007/April

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit

Lewis & Short

As some of you may have gathered from the Grease Pit, we're getting the Lewis & Short Latin dictionary. Inasmuch as this should help to quickly fill out our Latin section, it does pose some interesting problems. First and foremost, if anyone knows of ANY reason why this could possibly be copyrighted, this should be noted. Dumping the entire L&S onto Wiktionary and then finding out that it's copyrighted or that certain parts of it are copyrighted would certainly be a rather large mess. Wikipedia says it copyright free, and by every conceivable copyright law that I can find, it should be in the public domain, but I could have missed something. Secondly, the L&S has a LOT of material, much more than any Wiktionary entry I've ever seen. As an example, take a look at wikisource:A Latin Dictionary (they have the first two entries). This would probably be a good time for the WT:AL page to be expanded and clarified. Personally, I rather doubt that the L&S is being extranneously verbose, and most of the information is probably quite worthwhile, but this is something that should be discussed. A third issue is what kind of disclaimer the bot should put on the articles. I was thinking that it should put a note similar to the 1913 Webster disclaimer, and include it in a category such as Category:L&SLatin entries requiring format. No matter how much the bot is tweaked, the entries will undoubtedly require some gruntwork. However, I don't think that the part about things being out of date is as necessary. The English language has certainly changed in the past 100 years, but the Latin manuscripts, in large part, haven't. It is my understanding that many university Latin courses still treat the L&S as one of the best dictionaries out there, regardless of its age. What does everyone else think? Atelaes 19:38, 28 February 2007 (UTC)

Our understanding of Latin may not have changed much, but appropriate English translations might have; for example, a word formerly translated as "gay" might now be better translated "festive". (I don't know if such problems are likely to be sufficiently widespread as to warrant a message on every entry taken from L&S, but it's something to think about.) —RuakhTALK 06:13, 1 March 2007 (UTC)
Actually, our understanding of Latin has changed. In particular, scholarly opinion about the pronunciation (and hence the placement of macrons) has changed for more than a few words since Lewis and Short. There is also additional scholarship in the case of some words, which has altered our understanding of the use of certain words (sorry, but I'm blanking on examples right now). The translations are less likely to have undergone significant change than you might think, but that's partly becasue Lewis and Short seldom give a single word as the entire translation. Usually, there are several equivalent translations given, and sometimes an explanation of the meaning as well, particularly when there isn't a good single-word translation.
I've got extensive notes for the content of Wiktionary:About Latin and will try to tackle a major expansion of WT:ALA this weekend, but this has been a very busy week for me at work. --EncycloPetey 06:28, 1 March 2007 (UTC)
If the edition we are contemplating is the 1879 edition (and it appears that there is no other) then the work is in the public domain under just about any copyright regime, and definitely under those of the U.S., UK, and Australia. bd2412 T 19:18, 3 March 2007 (UTC) (your friendly neighborhood copyright attorney)


Currently the IPA template links to the Wiktionary "IPA chart for English". What is good about that is that this table covers UK, US and Australian English. However, we need to link to a chart that explains the symbols in other languages too. I thought we had that once? What happened to it?

I think that IPA and SAMPA should each link to the same page on Wiktionary that explains all the symbols of both schemes alongside each other, given that Wiktionary is a multilingual dictionary. I like the style of the Wikipedia page, which explains the sounds used in a language rather than the sound of each IPA or SAMPA symbol. We would probably want separate tables for other languages. — Paul G 12:50, 3 March 2007 (UTC)

The IPA chart for all languages is at Wiktionary:IPA pronunciation key. Both have a purpose. Personally, I use the "English..." version most of the time, since it's usually easier to understand, but check on the big one if I'm dealing with a non-English pronunciation or am unsure what is meant on the "English..." one. --Enginear 16:22, 3 March 2007 (UTC)


User:Eric_Utgerd: w:Borat, or just User:Dangherous? Kysztte 04:58, 4 March 2007 (UTC)


This IP user has been editing policy/near policy docs, making changes to entries that have almost all been rolled back, will not create an account and log in. Also has copied some templates from wikipedia that don't belong here. Seems awfully aggressive for a newbie. (See Wiktionary:Modular Wiktionary which was created in one edit. Don't know what's up. Blocked temporarily (feel free to change that at will), while I revert some things. (Another person from the 'pedia who thinks everything there should apply here?) Robert Ullmann 13:35, 4 March 2007 (UTC)

Oh, specifically, modifying Wiktionary:Translations without a vote (or discussion) is sufficient cause for a block (at least technically). Robert Ullmann 13:37, 4 March 2007 (UTC)
User name is Mac, see User talk:Mac. Hasn't been around in a while ;-) Robert Ullmann 14:00, 4 March 2007 (UTC)
I'm going to downgrade the policy status in the template on Wiktionary:Translations, it should still be in some sort of draft status as it says. Robert Ullmann 14:06, 4 March 2007 (UTC)

Wiktionary:Requests for verification

Why do so many widely used terms get listed? Currently on the page we have "phwoar", "châteaux", "boo-ya", "confusticate", "fo’ shizzle" (sans apostrophe), "Mr. Big" (sans full stop), the phrase “Houston, we have a problem”, "F-", "decider" and "retcon."

Some of these have passed, others are still up there. What I'm wondering is: why were they listed at all, given how widely they're used? Regardless of what's in print, surely it's pedantic to words that everyone hears on a daily basis? RobbieG 21:41, 5 March 2007 (UTC)

Of those, I've never heard phwoar, wouldn't spell châteaux that way in an English context (preferring to omit the circumflex, and possibly to use an -s instead of the -x, since it's pronounced differently anyway), wouldn't spell boo-ya that way (instead writing boo-yah), have never heard confusticate, would be torn on whether to include the apostrophe in fo shizzle, have never heard Mr Big (in either spelling), am not sure if I've ever heard F-, and am not sure I've heard retcon. The only ones I couldn't imagine RFV-ing are Houston, we have a problem and decider — and it doesn't shock me that someone with different experiences than mine could RFV them. —RuakhTALK 02:54, 6 March 2007 (UTC)
Also, remember that sometimes we'll RFV a term not because we don't believe it exists, but because we want to request some citations for it. That's what the verification process is all about. Widsith 09:36, 6 March 2007 (UTC)
Be thankful that so many of these have shown up on RFV before someone else deletes them. There have been a number of entries that I have completely cited that were deleted the same day they were listed. DAVilla 19:55, 7 March 2007 (UTC)


Connel MacKenzie blocked this user on February 13 along with the network The network is in Los Angeles. I do not live in California. I left a message on MacKenzie's talk page, but he seems to be ignoring it. Both C.M. and TheDaveRoss are checkusers, so perhaps TheDaveRoss can verfiy that WilliamKF is not me. C.M. also reverted this user's edits, which seems to be inappropriate given the circumstances.--Primetime 20:46, 6 March 2007 (UTC)

I didn't see any reverted edits, or any suspicious edits for that matter. Are they just not showing up?
Perhaps the user was banned because he claimed that certain editions of the OED are out of copyright. Pretty harsh. I mean, it's got to be true, you know. DAVilla 19:53, 7 March 2007 (UTC)

This is User:WilliamKF. I just found out that my account has been locked out for being a sock puppet of User:Primetime. This is a false accusation. I have a few requests:

  1. How can I get my account unblocked?
  2. How can I get my IP address unblocked (
  3. How can I get the sock puppet accusation cleared up? Where is the evidence to support this claim? On Wikipedia, there is a formal process which I could refer to, I'm not sure how that is done over here.
  4. How was I expected to contact anyone about these issues with a blocked account and blocked ip address? I had to go to a computer on another network to post this message, rather inconvenient and probably beyond what many would be willing or able to do.


FKmailliW 03:58, 8 March 2007 (UTC)

Connel unblocked me! Thanks Connel! WilliamKF 06:32, 8 March 2007 (UTC)

Why we should abandon AHD

I know we've just had a vote on AHD, but, as someone who enters a lot of pronunciations, I feel we should be dropping it altogether.

I understand that many people prefer a "respelled" pronunciation, as it is more readable than IPA. However, there are a number of reasons why I think that AHD has no place in Wiktionary.

  1. It is only really any use for American pronunciations, since that is what it was designed for.
  2. It therefore does not represent any other variety of English. For example, ä represents the sound of the vowel in the American pronunciation of the word "ah", which is equivalent to the IPA /ɑ/, but the British pronunciation of "ah" is /ɑː/. AHD can't therefore be used accurately for any other variety of English as it simply does not represent any other variety of English.
  3. We should therefore be asking why American English deserves its own pronunciation scheme. Having a scheme for American English only is POV and unhelpful to speakers of other varieties of English.
  4. AHD is phonemic rather than phonetic (I hope I've got those round the right way). It represents sounds as used in American pronunciations, not sounds as articulated with the vocal tract (which IPA does, which makes it language-independent).
  5. It contains symbols for varieties of US English that we do not give in Wiktionary; for example, ŏ, apparently equivalent to /ɒ/, the vowel heard in "pot" in British English, is not used, because we don't give pronunciations for American accents (New England accents?) that use that sound. General American uses ä (as in "pät") rather than ŏ.
  6. Symbols can be pronounced differently when used in combination with others; for example, "s" is IPA /s/, but becomes /ʃ/ ("sh") when followed by an "h". Although a pronunciation scheme is available, this is a potential pitfall for those whose first language is not English.
  7. Similarly, diphthongs are given as single symbols (eg, ā for /eɪ/), meaning that the individual components of these sounds are not available. These are needed in some varieties of English (see my illustration below).
  8. It lacks symbols for the rarer sounds of English, especially in words adopted from other languages that are only partly naturalised. For example, the IPA /ɬ/ (Welsh "ll" - used in words adopted from Welsh, such as penillion) is absent. Similarly absent are /ç/ (the sound at the beginning of "huge", although this is usually transcribed as /h/), /ɱ/ (the sound of the "m" in "emphasis", although this is usually transcribed as /m/), /ɾ/ (the sound of "d" and "t" in accents with flapping that makes "medal" and "metal" homophones) and /ʔ/ (the glottal stop, heard in uh-oh). Although these are rarely used, we need to have them available. Most of these also have SAMPA equivalents.

To illustrate my points, see bread. The UK and US pronunciations are broadly the same. However, one Australian pronuncation of the word is /breːd/. That cannot be transcribed in AHD, as /e/ is not available (ĕ is equivalent to IPA /ɛ/, not /e/, and ā contains the sound but is a diphthong, not a monophthong). There is no equivalent to the lengthening symbol /ː/ either.

While some complain that IPA is hard to understand and difficult to learn, this is somewhat of a red herring. Clicking on the "IPA" link next to any IPA pronunciation takes the user to a pronunciation chart. The user need only look up what they need at any one time and does not have to learn the system.

If we are to continue to use a "respelled" pronunciation scheme, we need either to extend AHD, or to invent some alternative that covers the sounds of all varieties of English, as IPA already does. Any invented alternative would however still be phonemic rather than phonetic (or vice versa if I've got those round the wrong way). In the absence of a better solution, it would therefore be better to abandon AHD altogether. — Paul G 11:13, 3 March 2007 (UTC)

Hear, hear. Widsith 11:23, 3 March 2007 (UTC)
I never write AHD myself, but I would rather see an AHD pronunciation than none at all. If there are many editors who find AHD simpler to write than IPA, we might lose out overall by banning it.
And obviously, no AHD pronunciation should be removed unless the AmE equivalent in IPA is already present. --Enginear 16:35, 3 March 2007 (UTC)
Now that we've voted to call the system enPR, we need to decide exactly what symbols it will use and how. As things currently stand, there is no key to AHD pronunciation anywhere. So, before we consider abandoning the system, we ought to have a go at creating a pronunciation guide to see just how practical (or impractical) it is to set up an international version.
I think it would be very useful to have a broadly phonemic pronunciation system -- one that doesn't try to hit all the details of pronunciation, but provide a quick impression of the pronunciation by comparison with common sounds in common words. Consider that the word rob is the US is pronounced /rɑːb/, but in the UK the same word is /rɒb/. This difference is a consistent one, so that the phoneme pronounced in the US as /ɑː/ is usually pronounced /ɒ/ in the UK. A system like enPR could consistently use /ô/ for both, because the difference is a consistent one. The essential way the system works is by comparison with known words. We would simply need to ensure that we have a table of pronunciation values that is keyed to both flavors of English pronunciation. --EncycloPetey 22:52, 5 March 2007 (UTC)
Just a fleeting thought, could a bot be written to scan for AHD pronunciations, and then convert them to IPA in the entry (assuming that the AHD portion is in a standard format)?

A-cai 14:28, 4 March 2007 (UTC)

No. Unlike SAMPA, AHD symbols do not pair one-to-one with IPA. Also, there has been no consistency in the past as to how stress was marked. --EncycloPetey 22:52, 5 March 2007 (UTC)
I'm no fan of AHD enPR and wouldn't mind that much if it were abandoned. However, the arguments you present, Paul G, I'm afraid I find unconvincing.
  1. If it is "only really any use for American pronunciations," it's a simple matter of extending it. We should feel free to adapt it to our purposes since it's our system not the American Heritage Dictionary's: AHD was a misnomer from the start.
  2. Perhaps it "does not represent any other variety of English" but this wouldn't mean it cannot (after suitable modification). For example, ä can be (re)defined to represents the sound of the vowel in ah regardless of dialect.
  3. American English doesn't deserve its own pronunciation scheme, no, but enPR doesn't have to be American-only.
  4. enPR is phonemic rather than phonetic (yes, you've got those round the right way) and this is not a bad thing. In fact IPA, as used in this dictionary, is also phonemic. The advantage of IPA (& similarly X-SAMPA) is that it can be used for phonetic transcriptions, however, it can also be used for phonemic ones too. When giving a pronunciation of a word it is generally better to give a phonemic one, this way you don't have to concern yourself about which allophone is being used nor do you have to worry about different realisations of a given phoneme dependant on accent.
  5. "It contains symbols for varieties of US English that we do not give in Wiktionary;" I think we should strive to include all varieties of English ... but perhaps this is just my opinion. Your example, ŏ, could be used for the vowel in pot in whatever dialect you speak. For an speaker of General American ä and ŏ could represent the same vowel, for an RP speaker these would be different.
  6. "Symbols can be pronounced differently when used in combination with others;" this is a disadvantage somewhat but could, if necessary, be fixed.
  7. "Similarly, diphthongs are given as single symbols (eg, ā for /eɪ/)," I don't see this as a problem since these are single phonemes. Indeed some of these phonemes can also be realised as monophthongs also.
  8. "It lacks symbols for the rarer sounds of English," it can, where necessary, be extended. However, what we don't need are different symbols for allophones. For example, the m in emphasis may be [ɱ] but this is an allophone of /m/.
Also the Aussie vowel in bread is the same as in care. This can be transcribed in AHD (i.e. the American Heritage Dictionary's pronunciation scheme): brâd. Jimp 03:57, 9 March 2007 (UTC)

direcciones de mexicali...?

quien me puede dar algunas direcciones de mexicali.

Quizás, le puedo ayudar si me pueda aclarar. ¿Qué quiere? ¿La dirección de cierto negocio o teatro ubicado en Mexicali? —Stephen 18:58, 8 March 2007 (UTC)

Captions of Pictures

Recently I came across the entry for Российская Федерация, and there, the Russian flag is pictured, but captioned in Russian.  I wondered if that was standard to caption pictures with foreign languages, even though this is the English version of Wikipedia.  My initial inclination is that even on the entries for non-English words, the pictures should be captioned in English, but I'm open to other ideas, and what I'm really hoping for, as with everything, is a standard so I know what to edit in the future. — V-ball 18:06, 7 March 2007 (UTC)

I’ve done a fair number of these, and the intended audience is people who know a little bit abour Russian, or are studying Russian, or at least are interested in the Russian language. The word Российская Федерация is clearly translated for anyone to see, and if someone is then also interested in Russian Federation, they can go there to see what it says. On the Russian pages I put translations, examples of usage, grammatical notes, etc., as I think are needed and useful to beginning students of the language. Advanced students don’t need any of it, and those who have no interest in the language can go on to the linked translation at Russian Federation. The captions explain the picture in clear but simple language, and each element of the caption has a separate entry for anyone interested. I think it might be a good idea to include a Russian flag on the Russian Federation page, but I don’t usually make such additions to English pages. —Stephen 19:08, 8 March 2007 (UTC)

About Greek

I hope that people will visit Wiktionary:About Greek and Wiktionary:About Greek/Transliteration and make comments about them on their respective talk pages. Thorough knowledge of Greek is not necessary in order to criticise the Inflection lines and most of the other suggestions made there! Thanks, Saltmarsh 15:51, 9 March 2007 (UTC)

en-noun & English plurals

I've noticed that we are just using {{en-noun}} these days and will be depreciating {{en-noun-irreg}} etc. This means that all nouns with irregular plurals will fall into category:English nouns - this I can live with.

However, recently I've noticed a general trend to place all English plurals in category:English plurals and the irregular plurals and ones that end in "-es", "-ies", "-en" etc. are all going to end up in one category making them harder to find.

Can we get a steer or a least a vote on what we are doing with the English plurals categories?--Williamsayers79 22:12, 8 March 2007 (UTC)

My first suggestion would be if were going to have a catch all category of category:English plurals then we could list all the irregulars in appendices.
If we don't have a catch all category and decide to seperate the various regular and irregular English plurals into sub categories then we need some way of embelishing the {{plural of}} or {{irregular plural of}} templates to auto-categorise irregular plurals accordingly.--Williamsayers79 22:12, 8 March 2007 (UTC)
I suggest we put all English plurals in Category:English plurals; the irregular ones can then be placed also into a category for irregular plurals (by hand, I suppose, or by a robot identifying the appropriate instances). -- Beobach972 22:47, 8 March 2007 (UTC)
I agree, and I've been going through the plurals linked from that template and doing exactly that, adding both the |lang=English identifier which forces them into the plurals category, and where necessary adding Category:English plurals ending in "-es", Category:English plurals ending in "-ies", Category:English plurals ending in "-a", Category:English irregular plurals ending in "-ae", Category:English irregular plurals ending in "-i", and Category:English irregular plurals ending in "-en". I've finished plurals starting with a and I'm moving on to the b's tomorrow. bd2412 T 06:12, 9 March 2007 (UTC)
Re: The switch from {{en-noun-irreg}} to {{en-noun}} causing all nouns to be in one category: it doesn't have to be that way. {{en-noun}} is already fairly intelligent; there's no reason it can't choose its category intelligently as well. —RuakhTALK 05:19, 9 March 2007 (UTC)

A bigger problem I have is that the {{plural of}} template assumes that the entry is a plural noun. While this is fine for English, other languages have plural adjectives. --EncycloPetey 03:49, 12 March 2007 (UTC)

Faroese vs. Faeroese

Both these spellings seem current here on Wiktionary, and the categories Category:Faroese language and Category:Faeroese language both exist, with some subcategories. A standardization is needed, so that a single spelling be used in all section headers and categories for the language here. Someone seems to have had deletion of Category:Faroese language in mind and a move to Category:Faeroese language, but the spelling with ae seems less common, not only on Wiktionary and on Wikipedia (see w:Faroese language, w:Faroe Islands), but also on the rest of the Internet: Faroe Islands vs. Faeroe Islands, Faroese vs Faeroese. To me it seems that the best choice is "Faroese", although the other spelling may be more original. – Krun 23:10, 10 March 2007 (UTC)

Faroese is the only spelling we should be using, as per ISO639-3. --Connel MacKenzie 10:56, 11 March 2007 (UTC)
Thanks, I'll move everything over, and perhaps you'll help with deleting the relevant categories, since I'm not an admin. – Krun 12:04, 11 March 2007 (UTC)
Remember template {{fo}} ;-) Robert Ullmann 18:55, 11 March 2007 (UTC)
I've updated the template. What does that ever get used for, anyway? I see from WhatLinksHere that it's included in a couple translation sections... I thought we were opposed to using templates in translation sections (or was that just in headers?), weren't we? -- Beobach972 19:51, 11 March 2007 (UTC)
The template is used in other Wiktionaries, and exists here primarily so that it can be subst'ed. Becuase such templates are regularly subst'ed, you won't see them used. However, if the template didn't exist, we couldn't subst it. --EncycloPetey 03:44, 12 March 2007 (UTC)

Oxford English Dictionary Fasciles

I looked through the archives here and found no definitive statement on official policy for using the Oxford English Dictionary Fasciles which are out of copyright (i.e. first fascile was published in 1888). I have heard it stated that there are copyvio concerns, plus outright errors, and on that basis the OED is not to be used, but instead one should use the Websters 1913 edition which is out of copyright and corrects many errors in the OED. Can someone please point me to the policy? Is it the case tho Websters 1913 contains all entries from OED? Thanks. WilliamKF 06:37, 8 March 2007 (UTC)

I say we should make use of all resources. To state that there are errors in an old OED which are not in a less old Websters does not imply that the Websters had no errors. If we are willing to make do with a Websters which doubtless contains errors we should also be willing to make use of an OED which may contain errors. We should do our best to spot errors in all our sources rather than embark on an impossible quest for an error-free out-of-copyright resource. — Hippietrail 14:54, 13 March 2007 (UTC)

Common mispelling vs alternative spelling

How do we determine if something is a common misspelling or a valid alternative spelling? RJFJR 16:36, 8 March 2007 (UTC)

  • If a spelling appears in a dictionary it's valid. If it appears in publications it is likely to be correct, especially if used multiple times in one text. — Hippietrail 14:38, 13 March 2007 (UTC)

Is it correct to assume that the English Wiktionary has no intention of implementing the new logo? -- Zanimum 18:11, 12 March 2007 (UTC)

Almost nobody likes it, it was "decided" by a process that didn't include any visible fraction of the people affected, and it is a total trademark violation that will last a New York minute when the lawyers from the trademark holder see it and call WMF General Counsel. Forget it. Robert Ullmann 19:27, 12 March 2007 (UTC)

Placeholders in article names

Did we resolve the issue of placeholders in article names after all? Here are a couple of examples that illustrate what I mean:

  • take someone to task
    Here, the pronoun "someone" is a placeholder for a noun or another pronoun. This is fairly simple to handle: just create take someone to task. But "someone" and "somebody" are interchangeable, so there must also take somebody to task. Is this solved by duplication or cross-referencing? That has already been discussed elsewhere and is not quite the issue I am asking about. (At the time of writing, we have the latter but not the former. This is not good.)
  • to n decimal places
    Here, n is a mathematical placeholder for "any integer". The letter n is commonly used in this way in mathematics, but is it appropriate in English? Should the article be named "to n decimal places", "to X decimal places", to ... decimal places", or something else? Of course, when n is 1, we have "to 1 decimal place" (singular). (There is a need for this article, incidentally, as it has non-idiomatic translations.)
  • do a ...
    We have discussed this one before. While some are set phrases (for example, "do a Reggie Perrin" is a well-known idiom in UK English, or at least, was a few decades ago) and so can have individual entries, just about any noun can be substituted for the ellipsis in "do a ..." as desired, with the meaning understood. "Do a ..." is therefore a useful construction and requires an entry of its own, but how do we name its article?

Perhaps the answer is to write these using the placeholders that you would expect to see, namely "someone", "n" and "...", respectively. — Paul G 12:03, 20 February 2007 (UTC)

I believe the standard method (OK, the method that I use) is to choose one version and add definitions, translations and all the rest. Then to add simple redirects for as many of the other forms as are used. It can get messy. SemperBlotto 12:30, 23 February 2007 (UTC)
I agree. One complete form, many redirects. bd2412 T 11:23, 15 March 2007 (UTC)

Scots Gaelic and Scottish Gaelic

We have entries in both; there is discussion in the Tea Room and in the Grease Pit request for bot fix, ready to go. Change is to standardize on "Scottish Gaelic" (ISO639/SIL is either or, literature prefers Scottish Gaelic). Comments? Robert Ullmann 15:31, 11 March 2007 (UTC)

Incidently, let it be noted that having it as Scottish Gaelic will help avoid confusion with Scots. :) -- Beobach972 19:46, 11 March 2007 (UTC)
Agreed. When I see "Scots Gaelic", I always think someone has missed the fact that Scots and Gaelic are different languages. —RuakhTALK 23:20, 11 March 2007 (UTC)
Agreed : Scottish Gaelic is the name preferred on Wikipedia and by SIL and Ethnologue. The name Scots Gaelic (while valid) is easily confused with Scots, which is an entirely different language. --EncycloPetey 03:41, 12 March 2007 (UTC)
Agree we should use Scottish Gaelic to avoid confusion with Scots.--Williamsayers79 13:35, 13 March 2007 (UTC)
Having been confused on precisely that issue, here, in the past, despite the very Scottish name "MacKenzie", I do not speak any flavor of Scottish I fully support this "confusion reduction" effort. Do you need someone to do the bot run? Or is it waiting for a vote? --Connel MacKenzie 06:59, 14 March 2007 (UTC)
I have the bot all set up; tested on a couple of dozen entries. Just was leaving it for a few days to see what comments there might be. Robert Ullmann 11:51, 14 March 2007 (UTC)
Just to check, the bot, will it change all the Scots Gaelic instances in headers, translations and categories?--Williamsayers79 13:17, 14 March 2007 (UTC)
And section references, don't forget those! Yes. Robert Ullmann 13:28, 14 March 2007 (UTC)

Done. 818 headers, 525 translations, 12 ttbc, 23 wikilinks, 231 cats, 70 section references. There are some new entries since the last XML dump that will get caught when I run the recheck on the next one. Also a few exceptions (badly formatted translations). Robert Ullmann 09:33, 16 March 2007 (UTC)

Determiner vs Determinative

We have had prior discussion about allowing the use of Determiner as a POS header. Opinion was divided, but not strongly opposed since it would be a closed set of terms. The guide for understanding English determiners recommended by proponents is the Cambridge Grammar of the English Language (hereafter CGEL), which I ordered and have begun reading (in small portions). I have discovered that, by their terminology, Determiner is not a part of speech.

What the CGEL does is to recognize several grammatical levels between "part of speech" and "clause", and discussions are careful to identify which level the discussion treats. For instance, there are separate levels of discussion for "noun" (which is a part of speech); "nominal" (which is a higher category also including a noun and associated modifiers or a pronoun and associated modifiers); and "noun phrase" (which is the meta-category for a structure including a noun or pronoun, together with modifiers and determiners). Note: By their definition of "noun phrase", it is not a part of speech and so should not be used as a POS header.

As a result, I was struck by their discussion of determiners. The CGEL defines determiner functionally, and explicitly so. That is, a "determiner" is any part of speech fuinctioning in a certain capacity. For example, possessive nouns (e.g. Mary's) and pronouns (e.g. my) may function as determiners. Therefore, Determiner should not be used as a POS header on Wiktionary. However, there is a part of speech recognized in the CGEL called a determinative, and this is the part of speech previous discussions have centered upon. We have been using the wrong term as a POS header.

What this means for the mechanics of editing is that all instances of ===Determiner=== as a level 3 header (or lower in a few cases) ought to be changed to ===Determinative===. Words currently classified in catgeories such as Category:English determiners should be recategorized in Category:English determinatives. The Category:English determiners (and similar categories in other languages) should continue to exist, though. It is an important meta-category that includes Articles, Numbers/Numerals, some possessive Pronouns, indefinite Pronouns, demonstratives, as well as the Determinatives themselves.

The alternative is to continue to implement a terminology that flies in the face of the CGEL. --EncycloPetey 19:19, 25 February 2007 (UTC)

We have not been using the wrong term as a POS header. The CGEL terminology is somewhat idiosyncratic. They make an excellent case for it being a more appropriate way to describe (specifically English) grammar, but that doesn't make it even the most common way, let alone the only acceptable way. Personally, I don't think either "determiner" or "determinative" is familiar enough to a lay audience to matter which one we use, so I lean to "determiner" as simpler and more English-sounding. If the community makes a conscious decision to move wholesale toward the descriptive framework of CGEL, that would be a Good Thing, but there's no reason to do it higgledy-piggledy. -- Keffy 20:35, 25 February 2007 (UTC)
Yes, I think Keffy has it right. I did make this distinction back in January.--BrettR 13:34, 18 March 2007 (UTC)
That's good to know. The class of determiner is new enough that I haven't encountered it much before, and have little information outside of the CGEL to rely on for understanding the preferred terminology. If the linguist community norm is for determiner, then using that term should be fine. --EncycloPetey 22:48, 25 February 2007 (UTC)
Well, we should be striving for the correct term. While my objections to the heading were not well expressed, I still have reservations about it. It is something of a relief to hear that CGEL was mis-represented in those previous discussions. IIRC, it was called a POS in those discussions, with the misleading implication that CGEL said it was.
All in all, a determiner is simply a type of noun, or a type of pronoun. I can see the inflection line(s) being specialized for en-noun-determiners (etc.) I can see those template including a Category:English determiners, but only in addition to the correct noun or pronoun categories. I am much less convinced this merits a separate third level heading, now. --Connel MacKenzie 02:38, 26 February 2007 (UTC)
No! A determiner is not a type of noun! Some pronouns do function as determiners, but the CGEL recognizes them as "pronouns functioning as determiners". However, there are other words that function as determiners. These other words include Articles, Numerals, and Demonstartives. In the noun phrase "the big red bus", the is a determiner. In the NP "one big red bus", one is a determiner. In the NP "each well-groomed little boy", each is a determiner. In the NP "that ferociously charging lion", that is a determiner. Traditionally, these words have been lumped into the adjectives when they functioned this way, but they're serving a different function from adjectives. Adjectives present attributes of the noun (or pronoun) they associate with (big, well-groomed, charging). By contrast, the Determiners "point to" a particular noun (or pronoun) instead of providing attributes. Ruakh has more information given below. As I've read the examples and discussion in the CGEL I've become convinced that this is a new POS worth recognizing on Wiktionary. My comments were intended to question the terminology. Note that while the CGEL does not call the category "Determiners", they do recognize the category; they simply call them "Determinatives". Keffy has noted (above) that this is not the usual terminology, which is why they painstakingly differentiated their perculiar choice of name. --EncycloPetey 00:25, 27 February 2007 (UTC)
My Collins English Dictionary (2005) uses "determiner" as a POS name. Thus defining "the determiner (article)". It also includes a, some, any, this, that + poseesives (my, your) and numerals —Saltmarsh 06:35, 28 February 2007 (UTC)

What the CGEL calls determinatives and other modern linguists call determiners are traditionally classified doubly as adjectives and as either nouns or pronouns, with a few exceptions (notably a, an, the, and every, which always require a noun and therefore have traditionally been considered adjectives exclusively, or in the former three cases articles by sources that didn't consider articles to be adjectives). If you look up all, both, each, some, two, many, and so on in most dictionaries, it will give a part-of-speech heading along the lines of "adjective and pronoun" or "adjective and noun". Since Wiktionary seems to object on principle to such headings, feeling that each part of speech warrants its own definition, it's in our best interest to use a heading like "determiner" for determiners, rather than defining each determiner twice, once as noun/pronoun and once as adjective. —RuakhTALK 06:59, 26 February 2007 (UTC)

Back in January, I posted the following:

Differentiating determinatives (determiners) from English adjectives

   * Both adjectives and determinatives modify nouns in phrase structure.
   * Both adjectives and determinatives can participate in fused-head constructions.
   * A determiner is an obligatory part of many noun phrases; Adjectives are always optional
   * Determinatives alone can modify singular countable nouns; Adjectives alone can't.
   * Most adjectives can be used predicatively; Determinatives typically cannot.
   * Adjectives are usually gradeable, determinatives non-gradeable.
   * Determinatives identify nouns and mark them as definite or indefinite while adjectives describe properties attributed to them.
   * Core determinatives cannot co-occur with the', a, and an; adjectives can.
   * Many determinatives are licensed only for specific singular/plural countable/uncountable nouns, while adjectives are generally licensed independent of these considerations.
   * Determinatives can often function in the slot (det) of them; adjectives can't
   * Determinatives can often function in the slot so (det) (noun); adjectives can't
   * Determinatives can be modified by only a very limited set of adverbs; adjectives are less limited in this way.

Most of this information is taken from The Cambridge Grammar of the English Language (CGEL).--BrettR 13:34, 18 March 2007 (UTC)

Numerals and their categories

What is the correct category naming style for numerals -- full language name or language code only? Looking at Category:Numbers and the siblings of Category:ja:Cardinal numbers it seems to be mixed.

Also, could someone give me suggestions on how to handle an entry such as 二百五 205? Should I make a POS template that links to 二百四 204 and 二百六 206?

Cynewulf 15:37, 5 March 2007 (UTC)

If "number" is the POS heading, then I'd imagine it should use the full language name (like Category:English nouns, Category:Hebrew prepositions, etc.); but if "number" is just a description of the topic, and the POS heading is, say, "determiner", then it should use the language code (like Category:ja:Horses, Category:fr:Days of the week, etc.). According to WT:POS, it's a matter of some debate when and whether "number" should be a POS heading, so … —RuakhTALK 18:37, 5 March 2007 (UTC)
The category names for the numbers are very mixed. I initiated a move to standardize the numeral POS headers some time ago, but the move had strongly entrenched opinions that were incompatible with each other. Part of the problem lay in debate over whether the part of speech should properly be termed "Number" or "Numeral". There was a general feeling at the time that the header should simply be Number (or Numeral) instead of Cardinal Number or Ordinal Numeral and the like, but we couldn't agree on which shortened form should be used.
The additional problem is that there are names for cardinal numbers that do not function grammatically as numerals/numbers. For example, aleph-null is a cardinal number mathematically, but the word aleph-null functions only as a noun, never as a numeral. My take has therefore been to treat the cardinal numbers and ordinal numbers as topical categories, but include them within a grammatical super-category. Thus, the Japanese cardinal numbers would be in Category:ja:Cardinal numbers and the Japanese ordinal numbers would be in Category:ja:Ordinal numbers, but both of these would be subcategories of Category:Japanese numerals (or Category:Japanese numbers according to some). This way, the numerals/numbers are listed within a grammatical category, but the topical category within Category:ja:Mathematics could exist as well. See Category:Afar numerals and the contained subcategories and entries to see how I would set it up. There is a template {{cardinal}} and one for {{ordinal}} that does automatic categorization.
The idea of a template to link backwards and forwards among the cardinals (and ordinals) is a good one. I've considered the same idea myself, but I haven't figured out exactly how to include all the useful information you'd need to have in a reasonable format. It would mean inserting the template into thousands of exsting pages, so it ought to be designed for easy editing and insertion. --EncycloPetey 23:04, 5 March 2007 (UTC)
(not commenting on the cats for the moment ;-) A template would be very good; this is presently done in lots of ad hoc ways, which often break ordinary parsing of the page, since the links are put in odd places. (The only standard place would be under "See also", with appropriate gloss—next, previous, whatever—but people seem to do anything else but!) A fairly simple template plus various magic options for fancy things people will want to do would be good, it can be a float-right box? There are several examples, one is used for the Greek letters, but whoever did it subst'd it (arrgh!) so I don't know what it was and the entries are a mess. Robert Ullmann 12:09, 8 March 2007 (UTC)
There's one I created for the signs of the {{Zodiac}} that might serve as a model. However, I'd want to see both the word and numerical form of each number name in such a template. --EncycloPetey 03:53, 12 March 2007 (UTC)
THE CGEL model has cardinal numerals as either nouns or determinatives. Ordinals are adjectives.--BrettR 13:43, 18 March 2007 (UTC)

Wikionary Scalability usefulness et alia

Great work all of you, still a long way to go. I have a comment on scalability and usefulness and the method for adding new entries.

1. For this to be truly useful, all words in other wikis need to be automatically tied to the wiki dictionary. To do that manually will be very tedious so it would be great that admins--whoever you are--created a 'search and link' macro every time a word is added and viceversa, i.e. each word definition is linked to all articles that use it in order of importance ( articles, headers, text). A new kind of link ( right click or on hover) would need to be created for this to not confuse articles.

2. As polyglot I dont care if the article is french,spanish,english,german, russian or arabic or... The Dictionary (and all wikis in general) should unify these fields. Currently there is a lot of duplication. If you want a truly scalable and mantainable wiki dictionary translations need to work differently that they are working now. Instead of replicating the conjugation of a spanish verb in the english wiki, e.g. http://en.wiktionary.org/wiki/matar for http://en.wiktionary.org/wiki/kill, the translation needs to link to the spanish wiki http://es.wiktionary.org/wiki/matar which is bound to be better checked and linked to the rest of the spanish language.

If this is properly done as a database then the link ought to be bidirectional. This will dramatically increase the connectivity and usefulness of this resource. Otherwise this effort will only come to fruition through strenuous manual work and until then will be shaky at best.

thank you jvdp

  1. User:RobotGMwikt does update interwiki links (after some amount of XML dump delay, or replication lag, or something.)
  2. OmegaWiki (formerly WiktionaryZ, formerly UltimateWiktionary, formerly ...) seems to be what you are looking for: more of a universal translation engine. License differences have forced that to no longer fall directly under the WMF umbrella.
--Connel MacKenzie 10:52, 11 March 2007 (UTC)
For you, as a polyglot (;-) it may be perfectly reasonable to use the es.wikt for Spanish entries. For someone who say speaks Swahili, with English as a second language, and uses the English wikt as a references for Spanish, these entries are critical. The task of the English wikt is to define everything in English. Sure it seems redundant to you, but to others it is utterly essential. And note that es:matar which you say is "bound to better checked and linked to the rest of the Spanish language" lacks an entry for Spanish itself, let alone the conjugation. Our entry is far more complete. Robert Ullmann 15:45, 11 March 2007 (UTC)
Regarding 2: the new convention is to include a link to the foreign language wikt in translation sections. This is being done for all entries. See e.g. frequency, under Dutch translations. The links will be inserted for all languages, on the long run. H. (talk) 20:40, 17 March 2007 (UTC)

Middle English

I have been wanting to bring this up for ages but have been dreading it a bit.

Is there any point in using the term "Middle English" as a language header? I dislike it and I think it's unhelpful and misleading. This is why:

  1. It's not very well defined. Whereas Old English ends (in the written record at least) very markedly at the Norman Conquest, ME by contrast blurs considerably with modern English. A lot of words which I've seen entered here as "Middle English" survived well in to the seventeenth or eighteenth centuries and I think they're better off being labelled as =English= with an {{obsolete}} tag.
  2. Most words in ME are identical to their modern English counterparts, which means we'd need a lot of duplication. Again, the difference with Old English is worth pointing out: that also has lots of familiar vocabulary, but OE words need their own entries to provide grammatical gender and other information which does not exist in Middle English.
  3. Calling it Middle English makes it seem like a totally different language to modern English, which is not necessarily desirable; better, to my mind, to just include ME senses of words as obsolete under the =English= heading, which is a better representation of the "continuum" of the language.

For an example, I was looking at siege recently. In the past this has been spelled in the following ways: sege, cege, seche, segh, seghe, seeg, seege, seage, saige, sige, siege, syege, seige, sedche, sedge, syedge, seidge, sidge, segge. Now which of these are we going to call Middle English? To be sure, some forms were only spoken during the ME period, but most either outlived it or in some cases were not connected with it at all. So it makes more sense to me to call them all obsolete spellings of the modern form of the word.

The solution I've played with on some pages is to have a =Spellings= header which would include all known spellings of a word, including ME forms, with the obsolete forms marked as obsolete. Some entries (e.g. colour/color) would also have non-obsolete (i.e. "Alternate") forms under the Spellings header.

Any thoughts on all this rambling? Widsith 15:40, 12 March 2007 (UTC)

Thumbs-up. :-) —RuakhTALK 16:24, 12 March 2007 (UTC)
This is one place where some form of a "Word History" sectio would be appropriate - for showing when various spellings existed. --EncycloPetey 05:39, 13 March 2007 (UTC)

I strongly believe that Middle English should be a separate heading because Middle English (ME) is with out a doubt a separate language. Very few English speakers can read it and even fewer can understand it if they hear it.[1] If we include ME forms under ==English== headings, then some affected users might actually start using them. I also object to the example. Many of those spellings are unique to ME. They can go under the ==Middle English== heading. Other spellings are unique to certain senses. I know sedge, for example, is used in ornithology. If we start treating ME as Mod.E, then we'd have to start including ME pronunciations under Mod.E. Pronunciation in ME is different from Mod.E. in every case. ME at times even used different letters from our own language.

I don't think people realize just how much the language changed between 1100 and 1500. Two branches of the IE family--Italic (French) and Germanic--were merged together. Spaniards can read Portuguese easier than English speakers can read Middle English. Middle English is about as easy for us as Italian is for them. From what I've read, you can say the same for Norwegian and Swedish, as well as Russian and Belorusian.--Νικα 07:58, 13 March 2007 (UTC)

Well, first of all ME is not "without a doubt" a different language - that is why it is called Middle English and not Mediaeval Germanic or something. Your objection to my example is also misplaced. sedge as an ornithological term is a completely separate word from sedge as a spelling of siege, which is what I am concerned with. The point about pronunciation is a good one, but again there are huge problems since there is really no "standard" ME pronunciation. Chaucer's London dialect sounded wildly different from the Northern dialect of Sir Gawain and the Green Knight for example. And it opens up the question of other obsolete pronunciations - should we be including Shakespearean pronunciation? What about Restoration pronunciation or Victorian pronunciation? As for different letters, again I don't see the problem with that - modern English regularly used æ until about fifty years ago and we have no problem including encyclopædia etc. Widsith 09:03, 13 March 2007 (UTC)
I hope you won't make the same argument for Old English. That language is closer to German than to Mod. E. (I prefer to call it Anglo-Saxon.) Also, sedge is a variant spelling of a specific sense of siege. As for the dialectal issue, my impression has been that ME varied more in spelling than Mod. E. As for the date issue, I prefer to use the introduction of printing (ca. 1475). That marked a milestone in the standardization of spelling. That also, incidentally, is one reason why you see so many different ME forms of the same word. I wouldn't mind giving pronunciations from Shakespeare's time under Modern English so long as they are marked as obsolete, though.--Νικα 09:43, 13 March 2007 (UTC)
No, sedge was used for all senses of siege throughout the sixteenth century. Old English was very different from modern English, that's exactly why I think it should be treated differently, as I explained above. The problem with picking 1475 as a cut-off date is that it's completely arbitrary - no one woke up on New Year's Day 1476 speaking modern English. It was a very gradual change, which is not reflected if we relegate older forms to a Middle English heading. ME spellings and vocab did not die out in 1475. Most continued for centuries and many are still with us today. Widsith 10:39, 13 March 2007 (UTC)

I think that it would be tidier to have seperate Middle English language sections because of the volume of content. It may also get a lit confusing with a multitude of ME spellings on a modern English page, and what about all those words that did not make it to modern English? --Williamsayers79 13:42, 13 March 2007 (UTC)

I must say though that I agree with Widsith that Middle English is readable - and is closer than you think to modern English. This becomes apparent when you read something in ME in your own dialect... By the way Old English is not closer to German than it is to modern English, this is a POV pushed by Latin/Romance based-English fans who seek to deny the English languages Germanic and Norse roots.--Williamsayers79 13:42, 13 March 2007 (UTC)

If the header were "Modern English", then it would indeed be wrong to include words and senses that died out before the Modern English period; but as the header is "English", I think it's quite reasonable to include words and senses from both the Middle English period and the Modern English period, assuming the former are appropriately labeled so no one assumes they're current. —RuakhTALK 07:00, 14 March 2007 (UTC)
It is my impression that the distinction between Old, Middle, and Modern English is sort of the standard convention in linguistic academia. I have to imagine that there is some reasoning behind this (although I'll be the first to admit that I know nothing about Middle English myself). I think Connel has a good point in that it's probably best to simply follow the standard conventions as laid out by SIL. We have enough decisions to make as it is, without recategorizing languages. Certainly, the point is well made that doing so presents a type of clean separation where none existed, but I don't know if Wiktionary is at a point yet where it can handle time spans that well. Certainly the OED can sort of lump everything together, but they have millions of cites, and thus can reasonably show roughly when a word came into usage and when it disappeared. It's my opinion that Wiktionary is not at a point where it is feasible to do something similar. It seems a bit misleading if a word hasn't been around for 500 years and we simply label it as obsolete. As EncycloPetey said, word histories would be the goal for something like this, but I think we have a hard enough time simply getting definitions and proper format up for the words we need. I believe that Wiktionary will, in time, evolve to the point where it can dispense with such rough approximations and deal with word histories individually, but I just don't think we're there yet, and so Middle English should be kept as a category, at least for the time being. Perhaps it should be noted again that I know nothing about Middle English, and so my opinions on the matter should be taken with a grain of salt. Sorry to shit on your parade Widsith, I can understand why you were dreading bringing this up. Atelaes 08:27, 14 March 2007 (UTC)
I'm not an expert on this, but it's my understanding that the line between Middle English and Modern English has to do with the Great Vowel Shift, which was a large-scale shift in pronunciations but not spellings. Now by my understanding of what kind of information is freely available, we can't really give accurate pronunciations for Middle English words other than simply assuming the spelling is representative, which strikes me as less than useful; so I don't see that we gain anything by drawing this same distinction. In terms of vocabulary and usage, there's really no way to distinguish the two.
BTW, there seems to be some sense in this discussion that labeling centuries-obsolete senses as "obsolete" is understatement; but by that argument, we should also have a separate category for Early Modern English, as there are plenty of words and usages that haven't been seen since the 1550s, which is technically the Modern English period but is well beyond obsolete.
RuakhTALK 16:28, 15 March 2007 (UTC)
Connel: it wouldn't be wrong. Yes, it is sometimes useful (especially in literature studies) to treat ME as a different language, but that's just a convention. As Wikipedia puts it: "Middle English is the name given by historical linguistics to the diverse forms of the English language spoken between the Norman invasion of 1066 and the mid-to-late 15th century" (my emphasis), a quote which also reinforces how vague the cut-off point is. This all came about because I read the word sege in Gawain and wanted to enter it. I cannot be convinced to put it under a =Middle English= heading when I know damn well it was still being used four centuries later. The same can be said for many - most - ME words. Marking it out as a separate language just irritates those of us who are familiar with it, and misleads people who aren't by giving the mistaken impression that it's a wholly separate entity from modern English. I agree with Atelaes that we don't yet have the number of citations to make this evident, but after all there are very many areas in which we are still lacking material – surely that's our aim though. In the meantime, for myself I think I'll just avoid entering such words altogether, since I can't bring myself to use =Middle English= and I don't want to annoy others by using =English=. Widsith 10:00, 14 March 2007 (UTC)
Rather than having nothing at all, what about a hard-coded Index to Middle English, providing a place to list a form (and its modern spelling) and provide a citation? --EncycloPetey 16:49, 14 March 2007 (UTC)
I don't know if this is worth anything Widsith, but if you'd be willing to take the time to give each "Middle English" entry a couple of cites which give an approximate range (to the nearest century) of the word, I would have absolutely no problem with it being put under =English=. I can't speak for Connel, however. Atelaes 20:42, 14 March 2007 (UTC)
For my part, if ISO or Ethnologue considers it a language it's a language. And if print dictionaries exist for it we should cover it. If Middle English specialists want a free online dictionary that covers their field, it should be right here. If there is overlap with either English or Old English then so be it. More is good, less is bad. — Hippietrail 20:52, 14 March 2007 (UTC)
I agree, we should have seperate language sections in articles for Middle English, and its own category. The free online dictionary - yes that is us!--Williamsayers79 21:39, 14 March 2007 (UTC)
Totally agree. It is a long-term project, but isn’t almost anything here? H. (talk) 21:15, 17 March 2007 (UTC)
I wonder if some of the most recent comments might be missing the point here. Widsith is not proposing that we not do Middle English words. Rather, he's proposing an alternate method for categorizing them. Atelaes 21:43, 17 March 2007 (UTC)


This is an experiment I am working on. The impetus is that we have a fairly detailed preferred format, and new users in particular find the details a lot to learn. (Why should it be "Usage notes" when there is only one?) It would be easier in a lot of case to just fix things, but rather than do a lot of fiddly editing, just tell something automatic to do it.

So the experiment is User:AutoFormat, it picks up entries tagged with {{rfc-auto}}, and can be taught to do various things. Right now it sorts languages into order, and adds the ---- dividers where needed, as well as some spacing. See [2] for example.

It is set up as a 'bot, but runs under my direct supervision, an entry or three at a time. So when you see it in RC, take a look if you like, but you don't need to worry about it. If you want it to try an entry that has a problem in a class it fixes, add the tag, but no guarantee when I will run it.

If you'd like to make any suggestions, etc: User Talk:AutoFormat, if you have more than one idea, add separate sections (they're cheap ;-). Robert Ullmann 18:12, 17 March 2007 (UTC)

Looks very nice! If you make a bot of this, I am going to use this a lot, since it are things I otherwise do manually. H. (talk) 21:37, 17 March 2007 (UTC)
I think your question 'Why should it be "Usage notes" when there is only one?' is meant to be a new user's, rather than your own, but it's because it makes for consistency to use "Usage notes" everywhere, and also just because there is only one usage note now does not mean that more won't be added later, and the person adding the new note might forget to change the title of the section. The same goes for "Synonyms", etc. I'm sure you knew that already, but it doesn't hurt to have it stated again in case anyone reading your posting is wondering.
Anyhow, I like this idea very much... how much will it be able to do automatically? — Paul G 09:59, 18 March 2007 (UTC)
Yes, the question was certainly meant to be a new user, not moi. Should have thought that was obvious? ;-) I'm not at all sure how much it can do; and that is why it isn't clear how generally useful it might be. Part of the reason for doing this you can see by looking at User:Robert Ullmann/Han/Problems which lists all the remaining problems with the Han entries that couldn't be done with AWB. (or with the code to fix the Korean Yale and Mandarin Pinyin). In particular, language sorting isn't so easy with just regex replacements. If you look at the list starting at 6434, you'll see where I had messed up the section flipping regex for a few minutes, and couldn't ID all the entries to fix at the time; now I can tag those and they will get fixed. Also quite a few with no ---- section dividers.
In general my thought is that when doing other edits and cleanup, users can drop the tag in if there is something they know the bot can fix. And even if it isn't run for a while, it will eventually. Meanwhile the tag is intentionally invisible, except for the cat at the bottom of the page. Robert Ullmann 19:13, 18 March 2007 (UTC)
Oh, and just a random sort of list of things it could do: fix header spellings, subst language names for codes in headers and translations lines, sort translations lines (careful with the multiples!), fix ''f'' to {{f}} (only in translations), unlink "top 40" language names in translations, move categories to the corresponding language sections, subst PAGENAME, wikilink one word definitions (and some variants, I did this in the Han entries), etc. And a number of things that could be tagged for attention if we wanted. Any given idea can be a new talk page section. Robert Ullmann 19:31, 18 March 2007 (UTC)

I have a fairly reasonable first version, would encourage anyone to add the {{rfc-auto}} tag to whatever they are editing that they think would benefit. I will be checking every edit. Robert Ullmann 22:17, 18 March 2007 (UTC)

"alternative spelling of" template

Connel Mackenzie requested an extra parameter for this template to show the region(s) where a term is used, and I think this would be useful too. Something like this, for the article encyclopaedia, for example:

{{alternative spelling of|encyclopedia|mainly UK}}

which would produce this:

An alternative spelling (mainly UK) of encyclopedia

or maybe

A mainly UK alternative spelling of encyclopedia

(The first is probably better because the second would need to change "A" to "An", and would not be clever enough to know that it's "A US..." and not "An US...".)

I don't know how to add this, otherwise I would do it myself. Could someone add it, please? — Paul G 10:14, 18 March 2007 (UTC)

Come to think of it, we can already write (mainly UK) or similar before the template, and it formats fine, so I'm not entirely sure this is necessary. What do you think, Connel? — Paul G 10:17, 18 March 2007 (UTC)
Context labels can be used. Thats how I've handled this in the past.--Williamsayers79 10:43, 18 March 2007 (UTC)
I'd like to see the parameter option in the template. I'd also like to see a standard set of regional abbreviations used, both as parameters in this template as well as in the {{context}} template and in a template for marking pronunciations in the pronunciation section. The {{alternative spelling of}} template could then simply have the parameter set as:
  • {{alternative spelling of|encyclopedia|UK}}
To produce the output:
I'd rather have the option of inserting a pipe and two letters at the end, instead of packing the whole shebang inside of a context template. --EncycloPetey 16:54, 18 March 2007 (UTC)
We have a perfectly ordinary way to do this, like everything else. The alt of template is just a definition line. Use the context label, like any other definition line:
# {{context|mainly|UK}} {{alternative spelling of|encyclopedia}}
1. (mainly UK) Alternative spelling of encyclopedia.
This is already standard format. Robert Ullmann 18:50, 18 March 2007 (UTC)

Connel's essay in response to Dmh's statements on deletion

OK, now you've ruffled my feathers. So, since I intend to start writing the book I've outlined, I need the practice at being verbose. So I'll practice my prose with a chapter here. For good measure, I cleared out/archived 185 KB of text so I'll have room for all these ones and zeros here.

What is Wiktionary?

That actually is a very good question. No one really knows the answer, though. The answer seems to be a dictionary that represents the minds of the collective contributors at any given point in time. If only vandals are here, then yes, it will quickly become another Urbandictionary. To be fair, Urbandictionary today, bears little resemblance to Urbandictionary a year ago. But I'm sure your can understand that if left unchecked, Wiktionary would rapidly decline.

But what keeps real contributors here? Is it their notion, that Wiktionary will grow to become the dictionary, that they each want to be able to tell their grand-kids they helped write? Does it purport to be some massive force for social change?

This is the left arm of Wikipedia. An encyclopedia simply cannot, and should not, cover the needs that a dictionary does. But for a universal reference, a dictionary component is needed. Most people looking something up, can't grok the idea of an encyclopedia containing dictionary definitions - that stuff, their brains automatically want to see lumped into a dictionary.

Back to the social change stuff for a moment...Wikipedia in and of itself, is becoming a force of some sort. More and more people are turning to it first when they'd like to understand something. So what is changing? The role of publishers? I can't imagine they are particularly pleased with that prospect.

So, off on a slight tangent, have you checked how much it costs to access the OED online these days? $382.75 In North America, buying an annual subscription through your local library is only $195.00. You don't get a copy of it or anything - you just get the ability to search their references and read individual items. Granted, that is a lot less than it was a year ago, but still...that is not chump change. Limited (i.e. incomplete) editions are available free of charge through most libraries.

Oddly, m-w.com still doesn't charge for their general access, but does charge $29.95 for annual access to the unabridged version. Nor does dictionary.com (but e-reference can be downloaded for $34.95, AHD for $26.00, Cult lit. $29.95, etc.) Cambridge is free or £21.00 to buy, Bartleby is free to access, or $60.00 for hardcover.) Instead, both bombard you with advertisements, many which make it past the various filters made to combat such nonsense. But how long can even that last?

So, um, wait a second. Why are we here again? I think it is the realization of all contributors, that such free access is extraordinarily precarious now. Any day, "they" (you know, them - "Them" - the big meanies) may decide it is time to start charging for access. All of the serious dictionary publishers have already made their attempt at going online. From their perspective, there is no more "adaptation" to new environments that they can do.

You and I, we know better. Free content is not just free, it is liberating.

Do I want to see a replacement for other dictionaries? No. I do want Wiktionary to be equivalent (or better.) I get the feeling that most contributors here feel the same way.

Now, does being a "multilingual dictionary" make us better? Absolutely not. Lookups are astronomically harder, glosses are much more susceptible to splits in deference to other languages, translations clog up the works to no end. Technical aspects, like simply rendering the alphabet, are no longer simple. But who am I to say? The decision was made by this community long before I ever heard of wikiAnything. And although I only listed the glaring defects, there are also benefits.

Being a multilingual dictionary gives tremendous insight into etymology and cognates. Having everything as a single search means the information you are looking for, about an obscure Greek term, is right at your fingertips. The lack of language separation at the software level has had direct benefits on the English side as well. It has forced us into listing all forms of a word, which really is a good thing.

Such a technical marvel has been inconceivable since dictionaries first existed. Look at the other online dictionaries...they still don't get it. Instead of listing the entire word, as spelled, the list the suffixes for a given headword. They could spell them out, but they are so set in their ways, they refuse to.

But back on topic, our immediate international reach has allowed other miracles, like the list of French Wiktionnaire's English terms that aren't yet in the English Wiktionary. The German and Dutch word distinctions have forced us to explain many entries in ways we would never imagine, as native speakers. And words like uncle which have only a single meaning, are suddenly clarified beyond imagination. (Yes, that is both good and bad.)

So why is being a multilingual dictionary so bad? It doesn't meet the expectations of our readers. As calcified as the dictionary publishers seem to be (to me,) the readership is an order of magnitude more guilty. So our software here, has to accommodate both the lay-reader and the hard-core linguists. (The recent PIE vote would be a good example of that.)

Of all things, I think our readers are of the most concern. No one wants to open a dictionary and see goatse. No one wants to look up a term, and find an obscure, deranged S&M re-definition of a normal word. No one wants to be redirected to a made up "phobia" that describes a fear of the word they are looking up. No one wants to find out that a Pokimon character used the thing they are looking up in episode 827.

And no one wants to be told the wrong way to spell a word (especially on a close match lookup.) People do want to know that they are using the right word, spelled the right way. Some authors want to know that the obscure word they are using is acceptable. Authors, in particular, know perfectly well, when and how they can go outside the bounds of strict, formal, correct usage.

Are we currently building a usable dictionary?

Now, lets look for a moment at the contributors. Who do we have? Not so many people are desperately interested in the grunt-work of composing accurate and consistent definitions. Instead, we tend to get a lot of people stopping in, with a strong desire for world recognition.

Turning en.wiktionary.org into an intellectual pissing contest (scenarii) is not exactly productive.

Other contributors wish to provide the terms relevant to their "vertical segment" in an increasingly popular dictionary. Others are here just because they are baffled that we don't have entries for their favorite terms. Many others seem to unconsciously think this is, or should be, a slang dictionary.

None of those groups are interested in the day-to-day grunt-work of building a usable dictionary. Occasionally, an individual in one of those groups is, but by and large, those stereotypes are nearly opposite of what en.wiktionary.org needs.

So, back to what is Wiktionary. Or rather, what should it be.

I for one, am embarrassed by the enormous number of words we have that do not appear in any other general-purpose dictionary. I for one, am embarrassed by the enormous number of words that appear here, that are universally thrown out as spelling errors elsewhere. I am astonished that, for the most part, except a tiny handful that I've marked, those errors have all the appearance of "valid" words. Such short-sightedness makes a task such as building a spellchecker from Wiktionary, nigh impossible.

Should we throw up our hands, as Dmh suggests, and allow a free-for-all? Or should we get serious about building a real, usable dictionary that can be looked at as an historic achievement? We have an opportunity here to provide the World with a copyleft usable dictionary.

Now, before I start on chapter two, (How to get there via a "multi-level Wiktionary") I'd like to take a quick straw poll. --Connel MacKenzie 00:42, 19 January 2007 (UTC)


  • Wiktionary should be a usable, "real" dictionary with nonsense and slang kept out of the main namespace.
  1. --Connel MacKenzie 00:42, 19 January 2007 (UTC)
  2. --Versageek 01:17, 19 January 2007 (UTC)
  3. --Cynewulf 02:18, 19 January 2007 (UTC)
  4. --DAVilla 04:56, 19 January 2007 (UTC) except that I'd challenge your notion of a real dictionary, since even respectable dictionaries have slang; and as long as "nonsense" is defined objectively.
    I was intentionally vague, so that I'd have something to say for chapter two.  :-) But then, that will just be a rewrite of the "multi-level Wiktionary" thing, that I've gone on about before, elsewhere. --Connel MacKenzie 05:55, 19 January 2007 (UTC)
  5. --Jonathan Webley 07:26, 19 January 2007 (UTC). Delete the nonsense, but keep the slang.
    Perhaps you should move your vote down then. I am suggesting slang be eradicated from namespace zero, but remain search-able as full entries, e.g. "Slang:bitchin." --Connel MacKenzie 17:20, 19 January 2007 (UTC)
    As you can see from my delete history, I'm not in the free-for-all camp. To be honest, I'll need to see the slang namespace in action before I can be certain whether I agree with it or not. Jonathan Webley 11:44, 20 January 2007 (UTC)
  6. --Enginear 15:06, 19 January 2007 (UTC)
  7. --Jeffqyzt 16:58, 19 January 2007 (UTC) Agreeing to the bold text, assuming that the non-bold is merely commentary stating Connel's POV. Otherwise, the two options are equally disagreeable.
    Perhaps you should move your vote down then. Yes, it is an attempt to clarify my POV, for my little straw poll here. --17:20, 19 January 2007 (UTC)
  8. --Cerealkiller13 21:01, 19 January 2007 (UTC) (Let me be quite clear that I think this is the best option of the two, but I am not advocating it as a Wiktionary policy).
  9. —Stephen 23:20, 19 January 2007 (UTC)
  10. I am a dictionary editor and this is my manifesto? - [The]DaveRoss 17:21, 23 February 2007 (UTC)
  • Wiktionary should be a free-for-all with no [delete] or [move] buttons for anyone, whatsoever.
Please do not add other choices. Neither goal is likely; I'd simply like to know what the general desire actually is.
OK, then I can't answer yes to either of the above. I like a lot of the points you make above (though I'm still more with Hippietrail on making better use of available technology). I also completely believe you offer the choice in good faith, but it's still a false dichotomy. Wiktionary should be a usable, "real" dictionary. But real dictionaries include slang, and "nonsense" is like obscenity — you know it when you see it. Which is why ...
To be usable, Wiktionary needs consistent rules and they need to be consistently applied. It can't be a free-for-all. Back in the day we'd argue over whether a particular made-up word, and I mean something that the contributor said they'd made up out of whole cloth, should be in the main namespace or not. Eventually we decided on LOP, but then we had to explain why someone's baby ended up there instead of in the main space. It was on their web page after all.
It was about that time that CFI got larger — at least one person said too large — and a hell of a lot less vague. Now when someone introduces a made-up word, we can say "Nope, sorry, fails independence and attestation. LOP." Game over. Done. Next customer, please. This is progress.
I share your concern, to some extent, about filtering out garbage. I'm not concerned about (IMHO) silliness like scenarii, ingenuitive and the "I don't like it" sense of illiteracy. I'm not greatly concerned about vulgarity, profanity and internet-flavor-of-the-month. As a word geek, I'm much less bothered than most where a citation comes from, as long as it's durably archived and it's clear the speaker is using the term in question in earnest.
I am, however, concerned about two things that I believe you're also concerned about:
  • Preserving the information that (however silly the reasons) some terms will give people the impression you're stupid. Similarly, some (for generally more valid reasons) will offend. Also, some are only used formally and some are never used formally and some are in between. We need to say this, but without taking a POV as to whether any of this is justified.
  • Noting which spellings are commonly accepted and where, which ones are used infrequently but not considered outright wrong, and, within limits, which ones are commonly used but commonly considered wrong. We want to do this without letting in a full entry for every conceivable spelling of every conceivable word.
I think the solution to all but the very last item is, "Include, but mark." We can have (and have had) a lot of fun arguing over just how to marke things, but they need to be marked. The entry for scenarii needs to say that it's only used in particular techincal contexts and scenarios is overwhelmingly common elsewhere. The entry for ingenuitive might say that the very similar ingenious is much more common. The entry for your favorite obscenity should be marked as such and given as prudent a definition as will convey the meaning concisely.
All of these should be codified as general rules to be applied in such cases, not just done ad-hoc. They should be codified for the same reason that we codified handling of made-up words. It will make our life simpler and Wiktionary better.
The last item is harder. Plenty of bogus spellings will pass the current CFI, which is more aimed at filtering out made-up words. Actually, I think I have a proposal, but I'll give it separately. Even if that doesn't pan out, we need some sort of well-defined rule.
If I've been expressing this well at all, it should be clear that I'm in no way advocating a free-for all. If you re-read most of my complaints over time, you may find that they suddenly make more sense if viewed as "this is not following any consistent rule" and not as "we need to let in anything, from anyone, anywhere, any time" or "I'm just rattling cages" (I'm very seldom just rattling cages :-).
You can't just fiat "no nonsesnse shall appear in the main namespace of Wiktionary." You have to give clear, objective criteria. Otherwise you do get a free-for-all. -dmh 04:00, 19 January 2007 (UTC)
Having taken the trouble to look it up now, I'm no longer bothered by ingenuitive or of the opinion that ingenious should be offered as an alternative. They're two different words. I'm not sure what a better example would be above, so please pretend that ingenuitive is just an odd variant on ingenious -dmh 05:00, 19 January 2007 (UTC)

A dictionary which excludes slang is neither "useable" nor "real". As for "nonsense", well everyone agrees that that should be kept out, but the point is that everyone has a different idea of what constitutes nonsense. Widsith 17:12, 20 January 2007 (UTC)

I agree. That is why I said "kept out of the main namespace" (ambiguously.) To be less ambiguous, I am suggesting a "Slang:" namespace (fully searchable, but marked as slang by entry title.) --Connel MacKenzie 22:24, 20 January 2007 (UTC)

We can still describe slang words in Wiktionary without any of the usual crap found on Urban Dictionary. I'm inclined to agree with Widsith on this, but no free-for-all though. --Williamsayers79 18:13, 20 January 2007 (UTC)

Yes, in the longer proposal, I'd shunt such entries to "Vulgar:" or "Obscene:" namespaces. Since this is all so hypothetic, no real discussion of what the exact namesaces will be, has even started. But with positive feedback, I think I will propose a bunch. --Connel MacKenzie 22:24, 20 January 2007 (UTC)
I'm fine with "include, but mark" as a general approach. Namespaces may or may not work as a practical means of marking. I'm much less interested in the mechanism or even the categories than the rules for categorizing. Once again, rules. I don't think anyone has seriously proposed a free-for-all. -dmh 02:02, 21 January 2007 (UTC)
I regularly comment that there should be some area or areas (probably not in the main namespace) where misspellings, misprints, typos and scannos can be placed, subject to something similar to our present CFI, so they can be searched for. This apparent serious request for a definition shows exactly why. I have yet to hear a good argument why such entries are less useful than "correct" entries, although I accept that we need to find a way to discourage mirroring before we add them. Personally, I think an entry for niany would be more useful than, say, metropoleis. The latter should, IMHO, normally only be used to an audience who understand Greek inflections, since a more commonly understood plural is readily available. Few people would therefore need to look it up in a dictionary, indeed perhaps none yet have. However, many with limited knowledge of English might be expected to look up the scanno niany, or indeed the scanno bum, as repeated in one of the cites. --Enginear 20:38, 7 February 2007 (UTC)

(It should be clarified that nothing in this paragraph was intended with meanness) OMG thought police OMG! Your absolutist logic makes no sense to me. No reasonable person wants either option at all. Then your points - "No one wants to see goatse" - no one will if you watch for vandalism. However, a wiki is a wiki, and you can't change that. "No one wants to see an S&M term" why not? I think your "deranged" descriptor is frightening, to be honest, because S&M is a notable subculture, and there's no reason its definitions should not be included. Obviously, a completely bogus phobia should be deleted, and if they're looking up a real word it shouldn't be a redirect to a completely bogus phobia. I think that a pokemon's name is not a definition, and therefore belongs at Wikipedia, which would of course get a transwiki link. I don't think including misspellings is a bad thing either. I think one of the most confusing parts of your argument is that you suggest somehow including peripherary information makes it so readers can't find what they're looking for. I'm not a regular contributor, but I think the wiktionary needs no general policy revamp, and I'm very concerned that you think so. 09:55, 9 March 2007 (UTC) (Also Atropos.)

We can have our cake, and eat the bits we like too

Connel loves back & white discussions, doesn't he just. But the world is not black and white. It is more complex. My view is that

  • the Wiktionary database should contain everything possible
  • the reader, on registering, sets their preferences of what they want to see.

You set which languages you want to see. (Maybe you just want an English dictionary, or maybe an English/Hindustani dictionary, or "All Languages" is your interest. In future, maybe a Hindustani/Japanese dictionary). Maybe you are offended by Vulgar slang, so check the right exclusion box. But then you might be reading a text with a word you suspect is a slang word and you want to know it's meaning. So check the right box to allow all slang. Do you want to see misspellings or not ? Maybe you really don't want to see the etymology. I certainly don't, most of the time. Check if you want to see Obsolete terms or not. Check if you want to see Protologisms or not. And then change your preferences if your usage changes.

By having this kind of universal database, many views approach, we could conceivably keep everyone happy. It is, to me at least, an obvious compromise between Universality and Personal Useability. Certainly, seriously considering this has to beat playing Connel's simplistic black or white debate.--Richardb 10:30, 19 March 2007 (UTC)

Actually, I think I devoted a couple thousand words above describing just how much of the gray areas are gray. We should retain all vandalism, people's phone numbers, "JOSH IS GAY" entries? My, that is a new one, even for you. We don't have the technical ability to accomplish what you propose, anyhow. WM lookups are WM lookups...so if the garbage has an entry, it will always count as a direct hit. So the "misspelling vandalism" ends up being very effective. Great. --Connel MacKenzie 23:34, 24 March 2007 (UTC)

Structure of Wiktionary

One of the things I don't understand about en.wiktionary.org - which according to the main page is an English language dictionary - is why the entries define the word for multiple languages. For example, if I look up "hut" then as well as the English word I get translations for the Czech, Dutch and Old High German words "hut". In an English language dictionary it would be useful and interesting to have comparisons to related words in other languages (of the "c.f. Dutch, hut" type), but surely each language pair should have its own dictionary (in these cases Dutch-to-English, Czech-to-English, and so on)? What is the point of listing potentially completely unrelated words in the same article just because they acidentally happen to have the same spelling? To cope with the occasional event that I might have a word and not know which language it is in it would be much more sensible to have a "master lookup" feature across all dictionaries. Thus, I type in "hut" and it tells me the word is in the English dictionary, plus the Czech-to-English dictionary, etc. with links. The way it's organised at the moment seems bizarre to me. Perhaps I am missing some fundamental point. Matt 20:52, 16 February 2007 (UTC).

The point you may be missing is that this is a multilingual dictionary. We're striving to contain every word in every language (we still have a very long way to go on that). What distinguishes this as the English Wiktionary is that all of the definitions and explanations are in English. The beauty of this system is that if I, as an English speaker, want to know what the Dutch word "hut" means, I can look it up here and find out. If I look up hut in the Dutch Wiktionary, everything (the definitions, usage notes, etymology, etc.) is in Dutch, and I'll be completely lost. I hope that answers your question. Feel free to leave further clarifications if it didn't. Atelaes 21:23, 16 February 2007 (UTC)
I understand what you are saying, but if I, as an English speaker, wanted to know what the Dutch word "hut" means I would never consider looking it up in the same place as I looked up the definitions of English words. I would be looking somewhere for a Dutch-English dictionary. Perhaps this is just because every other dictionary that I've ever seen works like that. I've never come across anything with the structure of Wiktionary, which is probably why to me it seems so bizarre! Matt 22:00, 16 February 2007 (UTC).
Wiktionary is a Dutch-English dictionary. And a Czech-English dictionary. And an everything-else-English dictionary as well, and also a regular old English dictionary. We're just not constrained by space like those paper dictionaries you're used to, or constrained by lack of imagination like those online dictionaries that only mimic the paper dictionaries. bd2412 T 22:04, 16 February 2007 (UTC)
I remain unconvinced, but I appreciate that others take a different view. Matt 00:51, 17 February 2007 (UTC).
Matt, you are not alone. Hippietrail is putting finishing touches on a "Multi-lingual Wiktionary" extension to the MediaWiki software. I completely agree that lookups (and therefore, also edits) should be restricted to languages a user prefers. A simple note that the word's definition exists in other languages should be more than sufficient. Also of note: the Latin Wiktionary, I believe used "Wikipedia-style disambiguation" to separate the languages, instead of level two headings. It's the same, only different. --Connel MacKenzie 01:07, 17 February 2007 (UTC)
A further argument could be made against (by default) having translation sections and entries for foreign words. One or the other would lead to much greater consistency. --Connel MacKenzie 01:11, 17 February 2007 (UTC)
How would that help someone who either a) wants to know how to say "foot" in Spanish, and has no idea, or b) comes across the word "pie" in a Spanish essay and wants to know what it means? Would you suggest they look up "pie" in Spanish Wiktionary and hope to find the English translation? bd2412 T 01:24, 17 February 2007 (UTC)
What, what, what? No, click on "Show all languages" or "Show Spanish entries" before pressing [search] for "pie". For multi-lingual Wiktionary, many things assumed here (currently) about the [Go] button would not be/should not be valid. The way we've done it here, so far, is not scalable, nor flexible. --Connel MacKenzie 05:53, 19 February 2007 (UTC)
I will be bold and say that nobody (including me) is quite sure exactly how Wiktionary should be organized. From what I've seen, we really have three types of contributors at Wiktionary:
  1. people who create definitions for words
  2. people who obsess over the format of individual entries and/or the organization of Wiktionary as a whole
  3. people who write software to speed up mundane editing tasks
those are the three big ones. Most contributors to tend specialize in one of the three. I personally believe that we will not be able to know how Wiktionary should be organized until we have created entries a lot more words.
A-cai 12:23, 17 February 2007 (UTC)
Have gone through phases of each of the three stereotypes listed above, I'm not sure what you're trying to say. I will say that at this point, I do not wish to sit by, idly, while Wiktionary is turned into an un-parsable (programatically unusable) mess. --Connel MacKenzie 05:53, 19 February 2007 (UTC)
I'm not sure what I'm trying to say either :) The only thing that truly seems to unite the long term contributors to Wiktionary is the belief that Wiktionary could become something truly ground-breaking. Having now worked on Wiktionary for over a year now, I'm no longer under the illusion that it will happen any time soon. The biggest problem that I see is that we have a lot of people worrying about the form of Wiktionary, but not enough who worry about the content. I believe that the reason for this, based on the bilingual people that I have talked to, is that the process for creating entries is still too cumbersome. This is something that we old-timers tend to forget. Wiktionary needs to become much more user friendly if we are ever going to have a chance of attracting a large number of language enthusiasts (many of whom are not computer savvy).

A-cai 07:50, 19 February 2007 (UTC)

I fully agree with Connel that it would be very nice to make Wiktionary a customizable experience. Most of the users really only want to see a simple definition of an English word, but I'll be damned to let Wiktionary be limited to that. The OED online has some hints of what I would like to see Wiktionary one day become. It has buttons at the top which allow the user to determine which portions of their entry they want to see. If you want to see the etymology, you click the "etymology" button at the top. Same goes for pronunciation, quotations, etc. I think it would be nice if there was something similar to WT:PREFS at the homepage, where people could set up the default views. Here they could determine if they want to see etymologies, if they want to see translations (and if so, from which language(s)), etc. This would certainly require the imposition of a lot of rigid formatting rules, but I think it would make Wiktionary appeal to a broader audience, especially as our articles continue to (hopefully) grow. It certainly is rather intimidating to go to an entry, looking for the definition, only to find five pages of text that you have to sort through. But at the same time, I don't want to give up an iota of those five pages. Atelaes 07:36, 19 February 2007 (UTC)
I too like the OED buttons available on the front end accessed from [3]. Unfortunately, it seems that most US libraries have gone for the Oxford Reference front end [4] which does not use those buttons, so people like Connel can't easily see what we're talking about. However, your description seems pretty clear. The only things I can think to add to the description are that on OED the preferences last only for the session, which is a pity, and that the etc in your "same goes for" represents date chart.
Obviously, another useful button for us would be translations; and we need, as Connel has just hinted somewhere, a means of choosing what languages we want to see entries for. It would be an advantage to have the buttons visible on every page (as OED does) rather than having them on the home page, since I find I sometimes alter them a few times during a session, depending what I'm looking for.
I do think it makes sense to have a relatively high proportion of editors caring about setting the style while the number of entries is still in the 100k's rather than the 10M's. I agree that, once we have a format that seems scaleable to the "all words in all languages" goal, we should make the entering of words easier. However, I think it may actually be good that the present imperfect system throttles the number of edits and leads to a preponderance of nerds, at this stage when the structure clearly needs attention.
And to anyone who hasn't noticed Hippietrail's latest (at WT:BP#Wiktionary structure awareness extension prototype live for testing) then do look (though I haven't yet worked out myself how to vary it from the default). --Enginear 16:33, 19 February 2007 (UTC)

I agree with some of the sentiments expressed above. As a newcomer, the more Wiktionary pages I look at, the more of a mess it seems to be. Someone needs to sit down and properly design the structure, and then create an interface that *enforces* that structure, so that individual editors can't just go off and do their own quirky things. (I should emphasise that my comments are not in any way intended as a criticism of the people who have obviously put in a lot of effort to get Wiktionary to where it is. It's just the way the thing's grown I guess.) Matt 14:51, 20 February 2007 (UTC).

Some of the perception of messiness comes from having a large number of articles which are fine, but could be more complete, combined with articles that are fairly complete. E.g. if every article consistently had pronunciation, it would look less "messy".
But: is is very fortunate that we started out with the 'pedia s/w, and that no-one "sat down and properly design[ed] the structure"! Most of what we have done in the last two years would have been difficult or nearly impossible if we had had s/w that enforced the structure. For example, if the several people working on Greek/Ancient Greek right now had to make code changes and get them committed to the running s/w base, instead of playing with a few templates and formatting pages as they like, that work would almost certainly not be happening. They would just be forcing the information into the pre-conceived format, with inferior results. Sure, I could write s/w today that would look really good, but couldn't have done it 6 months or two years ago; we didn't know enough. And note that the previous sentence will still be true 6 months or 2 years from now ...
The WiktionaryZ/Omega project is trying to write such software, but it "freezes" some level of understanding (and when they did the first version, they didn't even know that Japanese could be written in 4 different scripts, and that entries were not 1-1). Even if they do it over, they just freeze at another point.
We are still at a fairly early point, still learning enormous amounts about what a dictionary can be freed of the constraints of paper. (Why have lots of lang-x to English dictionaries, when one can have an Any to English dictionary? I dreamed of compiling one of these in the 1970's, and figured out it run to to many dozens of volumes, so not be terribly useful ...) Right not we have about 300K entries, in less than a year we will have a million+, in two years probably 5-10 million as we move toward comprehensive coverage of 40-50 major languages. Anyone think they can predict what is going to be needed in the structure? All we can do is work on it and learn. Robert Ullmann 12:39, 23 February 2007 (UTC)
(Interposed comment; sorry if this disrupts the flow... not quite sure where to put it). The reason why the "any-to-English" format does not work as currently implemented in Wiktionary is that 99% of (English language) users, 99% of the time, either want an English definition of an English word or want a translation of an English word into a known, specified language, or want an translation of a word from a known, specified language into English. Mixing everything together on the same page just makes it more difficult for people to find what they are looking for, while adding no value. The one unusual circumstance where someone wants a translation of a word, and they don't know what language it is in, should be handled by some sort of "global lookup" feature. Matt 14:36, 25 February 2007 (UTC).
As I noted below, the number 1 hit on the English wiktionary (after the drunk-college-student obscenities ;-) is Category:Japanese language. I have a suspicion that contrary to your assertion, the vast majority of our users are in fact looking for English definitions of words in other not-necessarily-known languages. And if you are looking up something written in Han characters, which language/Englist dict are you going to look in? No value? The translingual/common and related languages (e.g. Mandarin/Min Nan) add a lot. A "global lookup" as you say? That's just what we provide. If it offends you (;-) that you get additional information: we are working on a filter, see below. Robert Ullmann 14:50, 25 February 2007 (UTC)
I do find it very hard to believe that most users are looking for an English definition of a word in an unknown language. If this is true then the community of Wiktionary users must be a very atypical bunch compared to your average dictionary user, I would say. Matt 15:13, 25 February 2007 (UTC).
I'm not suggesting that once designed the structure can never change; that would be daft. I am suggesting that a structure be devised and enforced to cope with the content that exists now, that can then be extended/revised as people have new ideas and want to do new things. What I'm talking about is tidying up the sort of mess that we have, for example, at note (just to pick at random one of countless examples), which has a list of definitions of the noun, an out-of-sync list of translations which if extended to "all" languages would be about 100 pages long, followed by the definition of the verb, followed by more translations etc. This is the sort of very unfriendly "ad hoc" layout that could be avoided if, for example, a sensible structure for handling translations were designed. Matt 23:08, 24 February 2007 (UTC).
I agree that what headings can be used should be restricted, but only (if and only if) that list of restricted headings can easily be extended by sysops. For example, I'd love to see ===Usage note=== never be allowed (indicating instead that only ===Usage notes=== is a valid heading.) There are now four different "flavors" of related cleanup lists...mine are at User:Connel MacKenzie/todo, todo2, 3, 4, 5 etc. No one has been eager to attack {{rfc-trans}} recently...it does seem to be a growing problem.
The "Preload" templates have gone a long way towards helping newbies enter English new words. Many of the preload templates still need expanded "-intro" fillers, like template:new_en_noun_intro. And other languages...ahhh. Big time.
The biggest roadblock to making it "easier" to edit is that Wiktionary serves a lot more pages to readers than contributors. Right now, it is still fairly easy for a newcomer to make a minor correction to an existing entry. And although cluttered, I think the entries are somewhat comprehensible for newcomers to read. But certainly starting a new entry is daunting, for newcomers. Unfortunately, I don't see many ways to simplify that. --Connel MacKenzie 08:23, 25 February 2007 (UTC)
The minor variations to headers aren't so difficult; we just need to periodically run something to fix them. (E.g. (^={3,6})\s*[Uu]sage\s*\[Nn]otes?\s*={3,6} (to) \1Usage notes\1 or such ;-) I have something that would fix all of them, but right now it "fixes" a bit too much ... as to the readers: 200+ hits day on MILF and choad? No wonder we have to protect those pages. It is interesting that Category:Japanese language is in the top 100.
Some kind of entry method for new users/new entries (much better than the preload templates) would be a fine idea. Robert Ullmann 12:09, 25 February 2007 (UTC)
Some kind of "model page" might also be useful: a page that is fully populated, with all sections present, translations into all languages present, etc. Has anyone done this? Initially it might be good for someone very familiar with Wiktionary to do this and invite comments so that a consensus view on how all the elements should be laid out is arrived at (for example, how to avoid breaking up English definitions with acres of translations). Then the page could sit as a useful reference for newcomers like me. Matt 15:13, 25 February 2007 (UTC).
A single model page would never satisfy this need; there are too many possible variations. There is no single word in the English language that can function as every part of speech and every subcategory of every part of speech. Examples would be needed for each part of speech, and also for handling words that serve in multiple categories. There are also relatively few words in English that have directly precise translations in all languages, never mind the fact that we don't have editors who speak all the various languages of the world. For example, there are more than 1600 languages spoken in India alone, and only a handful of those languages have any entries on Wiktionary.
That said, there are a small number of pages that show a high proportion of basic layout information. I started a project (which I work on only occasionally) to accumulate some pages to serve as models. One such page is listen, and I am working to make Central Europe, transparent, and round into model pages as well. You can see the starting putline of my efforts at User:EncycloPetey/Model pages. --EncycloPetey 18:54, 25 February 2007 (UTC)
It could just as well be a made-up word with made-up definitions and translations. The purpose is to ilustrate the structure and layout, not the actual content. Matt 21:06, 26 February 2007 (UTC).
WT:ELE originally was exactly that - the made up word "Hrunk" formatted a la Wiktionary. It has evolved a little bit, over the past couple years. --Connel MacKenzie 00:45, 25 March 2007 (UTC)
IIRC, it was moved to its current name in late 2004, and gained initial acceptance in early 2005. Not sure how long after that, that WikiMedia added log entries for moves, deletions and protections. --Connel MacKenzie 19:03, 17 April 2007 (UTC)

Plurals and translations.

I was recently browsing around and came across the entry for geese, the plural of goose.  I noticed there were three translations for that entry.  I know I've seen it elsewhere, but haven't noted it.  My first inclination was to delete the translations, but wondered if there is a policy for that.

I personally think entries marked as English plurals should not have translations sections.  The non-English plurals should be in their own pages. — V-ball 12:47, 28 February 2007 (UTC)

I think that is the policy. I certainly delete translation sections on plurals when I see them. Widsith 12:53, 28 February 2007 (UTC)
That is insane. For completely irregular plurals you'd delete useful content? No, that is not policy. The General case (which is wrong) is for regular inflections, to not require translations. --Connel MacKenzie 07:10, 1 March 2007 (UTC)
I am in total agreement with Connel on this - there is no sound basis for deleting translation sections from plurals. Consider the end user who wants to know how to say, for example, friends in French. They might well go first to friends, and seeing no translation section there, may give up, or may go to friend (which will have ami and amie, but not amis or amies. Here's another place where our user may give up, or if they are intrepid they may go on to look up ami and find what they sought (after first suffering two unnecessary disappointments). bd2412 T 07:20, 1 March 2007 (UTC)
How about etymologies? Atelaes 07:14, 1 March 2007 (UTC)
I have no problem with etymologies (or pronounciations, citations, 'pedia links, etc). An entry for a plural is an entry that happens to be for a plural; we may define plurals by reference to the singular, but that doesn't mean they have to be stripped barren but for that information. bd2412 T 07:22, 1 March 2007 (UTC)
Please try not to call me or my actions insane. I was under the impression that we had discussed this before and concluded that translation sections were only attached to singular forms? As for a sound basis, I find it hard to sympathise with BD's hypothetical user, since I can't believe anyone wanting to know the French for friends would not look up friend. That is the way all other dictionaries work. Anyway, I'll do whatever the community decides, but as I say I thought we'd been over all this in the past. Widsith 09:51, 1 March 2007 (UTC)
To take some odd examples, what would you do with news, data, or peoples, where I expect the translations are often quite different from the "singulars"? More generally, I agree with the others that it is normally inappropriate to delete any content which might be useful. --Enginear 20:33, 1 March 2007 (UTC)
I too had always understood that we made a fundamental distinction for certain information between the lemma form and non-lemmata. In particular, that when the only "definition" of an English word is "form of foob", that the translations will be given on the lemma page foob. Otherwise, we open ourselves up for incredibly bad headaches of maintenance. I for one don't want to try to correlate and verify all the translations of English present participles, to ensure that the Latin translation is the present active participle. I don't want to have to be sure that the correct gerund form is given under the English gerund form, even though the part of speech will not match between the entries. And which verb forms should we give then in the translation of English verb lemma, if we're going to open it up like this? The first person singular present active indicative? The present active infinitive? The passive preterite infinitive? Latin verbs have six infinitive forms (unless they're defective). Translations are not one-to-one. I say any non-lemma entry should point to the lemma for translations. --EncycloPetey 03:51, 2 March 2007 (UTC)
I agree with this proposal. Perhaps there should be exceptions made for words like news, data, and peoples, and all other forms where a plural is the only form, or has a separate meaning. But, in general, if a word is simply plural form of foob (which, humorosly enough, is a Hmong verb), it should simply refer back to the lemma, where all the pertinent information will be held. I think most users will be intelligent enough to figure this out, especially if we are consistent in this, and the non-lemma is simply a soft redirect with nothing else. Atelaes 04:10, 2 March 2007 (UTC)
I'm not seeing a "why" - paper dictionaries limit the information they provide in accordance with their corresponding limits on available space. We have no such barrier. bd2412 T 04:34, 2 March 2007 (UTC)
We may not have limits on space, but we most certainly have limits on manpower. Having all the information in both places adds little in terms of the user's experience, it is a simple matter to follow the redirect to the lemma form. However, it adds a great deal of workload in maintaining the entries, as well as figuring out which form to use (this becomes more relevant with verbs, as EncycloPetey mentioned earlier). And, if we decide not to maintain the entries, we are then presenting a low quality product, which no one wants to do. Atelaes 04:57, 2 March 2007 (UTC)
This is an all-volunteer project. Our manpower is whoever is interested in doing whatever they are interested in doing - so long as additional information is merely permissible, but not mandatory, I see no manpower problem. As for maintaining the entries, do you mean policing edits? I don't think having additional information in legitimate entries will increase the number of vandals. It really doesn't even give them additional targets, as we already have plurals as entries. bd2412 T 05:50, 2 March 2007 (UTC)
I'm not talking about vandalism, no. What I'm talking about is when some anon adds the Turkish translation of foob, but doesn't do it on foobs. Then we're offering a sub-par version of foobs, which is lacking in the Turkish translation. On the other hand, if we offer nothing but "plural of foob", then we're giving them the same high-quality product, as they're, in essence, forced to go to foob and see the Turkish translation. You can certainly say, "Well I'll just add the Turkish translation to foobs," but will you? I won't. I don't have time for tedious stuff like that. And while our manpower is theoretically limitless, in reality, it does have a very distinct limit. We have, what, maybe a few dozen solid contributors? I think it unwise to add work for ourselves which adds little to the overall project. Atelaes 06:18, 2 March 2007 (UTC)
Okay, but if some anon does go and add the Turkish translation to "foobs" do you think one of us solid contributors should then be tasked with taking the time to delete this information from the entry? bd2412 T 13:44, 2 March 2007 (UTC)
I suppose so, yes. Atelaes 13:58, 2 March 2007 (UTC)
I have to agree with BD2412 in this case. For example, the word marines refers to the marine corps, but marine does not refer to the marine corps! Depending on context, it can mean a member of the marine corps, or it can mean a variety of other things that the plural marines does not mean (you can't have a plural adjective can you?). This could be true of other languages as well. To use Atelaes' example, foob does not necessarily equal foobs in everyway. Take a look at the following (We'll call our language Fooblese for the sake of argument):
  • Fooblese: foobe
  • Fooblese: foober
This is especially true for a language like English, with its wacky plurals such as cactus/cacti, city/cities etc. I also disagree with deleting a valid translation just because it was placed under the plural form and not the singular. Even if Atelaes is correct about the plural/singular thing, the translation should be moved to the singular form, not just deleted out of hand!

A-cai 14:41, 2 March 2007 (UTC)

If foobs is not simply the plural of foob, then obviously it should have translations for those senses which are not simply plurals for senses of foob; I don't think anyone's claiming otherwise. Also, the existence of "wacky plurals" is not an argument one way or the other: the pages for cactus, city, etc. state what the plurals are. If you want to add the plural of a non-English noun, the place to do so is at the entry on the singular, under an "inflection" heading. —RuakhTALK 16:01, 2 March 2007 (UTC)
I wholeheartedly agree with the fact that a translation put in the plural should be moved to the singular form (if not present), instead of unceremoniously deleted. I apologize for not being more clear on that. As for the marines, as I mentioned earlier, there will certainly be exceptions. Anytime a word cannot be defined simply as "plural of foob", then all bets are off as far as what I'm talking about. As for cacti, certainly it's a goofy plural, but we'll have a succinct explanation of it: "plural of cactus". Perhaps it might also be wise to have a usage note (follows Latin declension) to explain it, but otherwise, I still think it should simply be a soft redirect. Atelaes 16:05, 2 March 2007 (UTC)
I don't know if "wacky" plurals like cacti are anything special.  To me, it seems weird to have translations sections on plurals.  For example, the page cactus will have a translation section, and in that one can see (after they make an unnecessary extra click to show the translations (my pet peeve since I can't seem to get my preferences to work)) how to say cactus in various languages.  Most likely, you will see the Russian word кактус.  If I really want to know what the plural of кактус is in Russian, I will click on it because it's entry should have a paradigm showing the plural, and I would not expect the entry for cacti to have the Russian nominative plural, кактусы, listed.  The plurals of foreign words should be listed the same ways English words are, meaning кактусы is mentioned on the кактус page as a plural, and кактусы has its own entry saying, "Nominative plural of кактус." — V-ball 16:20, 2 March 2007 (UTC)
Since I am not usually very interested in translations, I suppose my view -- about 100 lines up -- should not be given undue weight. But the general issue of how to deal with "incomplete" entries for inflections, that is, skeleton entries or entries which are less complete than what we normally call a full entry, is of wider interest. Perhaps the standard "definition" of an inflection should be along the lines of Plural of foob, where further information can be found. --Enginear 18:35, 2 March 2007 (UTC)
Take a look at friend and friends. Please tell me that we are not going to put the translation for the TV show under the singular entry! Also, note the difference between translations in Mandarin and Min Nan. Which information should be left in, and what should go into the entries for the Chinese words?

A-cai 18:42, 2 March 2007 (UTC)

Translations of the TV show belong on the capitalized Friends page and are a separate issue altogether. Details of how 朋友 is inflected in different Chinese languages belong on the 朋友 page. Widsith 18:47, 2 March 2007 (UTC)
Aha! But did you notice what happens when you click on Friends? It redirects you to friends! I'm not sure any more what Wiktionary policy is for that, although what you say makes sense.

A-cai 19:05, 2 March 2007 (UTC)

Policy is that the proper noun should be at Friends [though I can't remember if Proper noun is still used, or whether we now call them all Nouns] and the noun at friends. I've now removed the redirect and split the entry. --Enginear 20:25, 2 March 2007 (UTC)

Whoa, cool down guys. There is a very good reason why we do not give translations (or synoyms, etc) for inflected forms. It is that words often have multiple meanings and the translations usually do not apply to all of them.

Taking "friend" as an example, that currently has seven meanings. There are, correspondingly, seven translation tables. Examining these shows that translations differ with sense. For example, French has "ami(e)" for the first sense, "petit ami(e)", "copain"/"copine" for second, and so on. "Friends" however gives "amis", suggesting that this is a suitable translation for all senses of the word. This is utterly false.

A user can easily find the translation they require (and, more importantly, the correct translation) by following the link to the uninflected form, and then clicking on the link in the translation for the sense they require. A well-formatted entry will include the plural (and any other inflected forms) there.

In the case of plurals that have special meanings, such as "marines" or then, of course, translations can be given for these. Otherwise, entries for English plurals and other inflections of English words must not include translations.

Now, I understand Connel's point about removing useful information. The thing is, this information is in the wrong place and is unhelpful or misleading as it stands. The appropriate action to take is to move these translations to pages for the foreign-language singular forms (and plural forms, if required, especially if these are irregular) and then to delete them from "friends". Anyone willing to help me with this?

By the way, Friends should be deleted. It is encyclopedic. — Paul G 11:33, 3 March 2007 (UTC)

I'm for collecting information at one place in cases where it isn't controversial. There are a number of things besides translations that need not appear on "stub" pages, that is, pages where the only definitions are those that refer to other pages. These types of entries include alternative spellings and inflections, but not synonyms such as Allen wrench and Allen key.

There is no rule of thumb, I think, so much as an outcome of process. It is never acceptable to delete correct information that does not violate an accepted standard. Especially when in question, such as with word histories, a deletion should be noted, of course. Over deletion, it is much preferable to consolidate information such as synonyms and etymology (e.g. a full etymology becomes root plus inflection). By consolidation I do not mean "move" so much as "merge", although the original example of translations for plurals is minor and acceptable as per Paul G. Consolidating differing information should be allowed even if it leaves a stub page, provided the information does not contradict (as with color/colour, program/programme) and there is no controversy over the "correct" form, i.e. no clear principal spelling of e.g. irregular plurals. On the latter point, changing a principal page into a stub page without consolidation as a clean-up measure is not allowed. Expanding a stub page into a full page is already permissible if the contributor has good reason to believe that it should be a principal page. DAVilla 20:32, 7 March 2007 (UTC)

Widsith, I apologize for calling your idea insane. To clarify what I meant, the removal of beneficial information is much worse than standardizing the Translations section layout (in this manner.) Please note that BD2412's hypothetic user (in his example somewhere above) usually is not going to be a human being; rather, it will be that human being's software performing the lookup.
The notion that all software out there properly knows how to truncate a word form to a lemma (I assert) is insane - most software can't even tell (accurately) what language a given word is in. Looking at the Wikipedia pages on Corpora linguistics, I'm stunned that my trivial frequency analysis of 1.6 billion words from Project Gutenberg wildly overshadows the ANC.
My expectation is that there will be an order of magnitude more software components written over the years. Some will get better, but all new ones are very likely to start from the same starting point. If Wiktionary provides information directly for all forms of a word, the programatic mistakes are not only eliminated (before they happen) but subtle mistakes are avoided entirely. This comes about by human contributors here verifying the word forms individually, and noting exceptions accordingly.
My point of view (admittedly, my own) is that first hits to Wiktionary pages should contain as much information as possible. Every web-based extension of Wiktionary I've seen so far has tremendous difficulty linking back to anything other then the "direct hit." As those components become more elaborate, the navigation to what you call "the correct" lemma form will become more difficult, if not impossible. (E.g. try browsing Wiktionary on your cell phone - GOOD LUCK!)
With all that said, from my perspective, it is "insane" to remove translations from plural entries, especially as a matter of procedure. (Again, I think I'm using "insane" as an intensifier, not as an insult...that may be why you were offended by my wording initially?)
--Connel MacKenzie 14:14, 26 March 2007 (UTC)
I think that if a plural entry definition were to say "Plural of foob, where further information can be found." then many of the objections to adding incomplete sets of translations, etc, would be countered. --Enginear 19:58, 26 March 2007 (UTC)

Trademark names

We need a policy for trademarks. If we have one, I can't find it (and it should be at Wiktionary:Trademarks). I think that widely know trademark names should be included if they can be used metaphorically (e.g. Cadillac), descriptively (Mark wears a Rolex and drives a Lexus, whereas Joe wears a Timex and drives a Honda), if the mark is approaching genericism (Kleenex, Xerox), or if the mark is a specific use of word that would otherwise be in the dictionary anyway (Bounty for paper towels, Crest for toothpaste, Janus for mutual funds). I'm putting together a listing of the most widely known brand names at User:BD2412/brand names.

Also, I've noticed that from time to time folks need to look up trademark registrations here, so I'm going to provide some quick tips on how to do this.

1. Go to the United States Patent and Trademark Office trademark main page.
2. Near the top of the right-hand column, click [Search].
3. I recommend the Free Form Search. Type in the word you're looking for followed by [comb] and you'll get a combination of searches for the word alone, with punctuation, or as part of a phrase. Also, it often helps to add "and live[ld]" to a search, as this will limit it to live marks and filter out marks no longer registered.

Cheers! bd2412 T 23:18, 2 March 2007 (UTC)

I agree with most of this, but disagree with your view that the Bounty and Crest and Janus trademarks should be included just because bounty and crest are words and Janus is a dictionary-worthy proper noun. (Maybe they should be included anyway, but if we develop criteria for inclusion of trademarks, I think they should apply regardless of whether the trademark represents its own entry, or simply an additional sense in an existing entry.) —RuakhTALK 03:40, 3 March 2007 (UTC)
I think that trademarks that incorporate a common word for an uncommon purpose (e.g. Apple) merit a one-line entry because the word is already in the dictionary, and the trademark definition is a legitimate alternative definition of a word for which we are trying to give complete information. That said, I think such instances should be limited to trademarks that can be demonstrated by reference to a source such as a trade journal to be very widely known and very strong. bd2412 T 19:25, 3 March 2007 (UTC)
I strongly disagree with this (sorry bd2412, I just seem to keep picking fights with you, nothing personal :-)). Certainly we should have entries for bandaid (or is it band-aid?) and xerox, because they're used in an idiomatic sense, not necessarily related to the brand itself, perhaps rolex as well. But, my opinion is that they should not merit entries until they can be put in non-capitals, as xerox and bandaid can. Otherwise this opens us up to including every brand name in existence, which is not dictionary material. Unless someone can show exactly where else the line should be drawn, I say we draw it here. Atelaes 04:50, 4 March 2007 (UTC)
I think there are resources that would make it fairly easy to draw sensible lines. I've been tossing some ideas back and forth in my mind and would say, for example, that we can easily agree to exclude company names that are just collections of surnames (e.g. Morgan Stanley Dean Whitter, Bristol-Myers Squibb, and Ethan Allen). I do, however, think that we should make every effort to list all brand names for medications (Tylenol, Dexatrim, Motrin, Prozac) because I can see a particular utility to such listings, in part because the drug makers tend to come up with fanciful words, and in part because most such drugs can be described by reference to their key ingredient (i.e. acetaminophen, ibuprofen). I'd also be rather inclined to include fanciful car names (Integra, Montero, Prius). There are hardly so many that it would cause a fuss. With respect to other corporate or brand names, I'd set a higher criteria than the CFI to show that the brand name is used in some descriptive or attributive sense, but that should be easy for truly mega-brands such as Coke and Pepsi, McDonalds, Microsoft, etc. bd2412 T 03:30, 6 March 2007 (UTC)
That might be useful, but the utility argument is a very well documented logical fallacy. We're aren't aiming to be useful, we're aiming to be a dictionary (which is useful, of course, but not just). Usefulness includes TV listings, atlases, currency converters, whatever you can think of; there are lots of useful things. I fail to see why we would give any class of words immunity from CFI, and I especially fail to see why, if we did, we would want them to be medications and car models. Those aren't within the urview of a dictionary, but are more appropriate at an encyclopedia. Trademarked names or brand names still need to pass attestation with independent use (and not just mention). I would suggest a vote to make the point clear, but I'm already satisfied with CFI's wording: "To be included, the use of a trademark or company name other than its use as a trademark (i.e., a use as a common word) has to be attested." Dmcdevit 06:49, 25 March 2007 (UTC)

Use of ® and ™ in entries

Greetings! As a professional intellectual property attorney, I can assure you that there is no requirement whatsoever that we should use the ® and ™ symbols adjacent to the headword of names that are trademarks (registered or not). First, we're an educational organization making a purely nominative use of the terms (i.e. we're not selling hamburgers, so we don't even have to acknowledge that McDonald's is a trademark). Second, even so, we do indicate in the entry and often in a usage note that the word is a trademark or is a registered mark. Third, the ® or ™ symbol is not a part of the actual word. Finally, trademark registrations are neither eternal nor certain. Registrations lapse, get cancelled, or become abandoned all the time (I have personally seen some very big companies errantly allow the lapse of registrations for some very famous trademarks).

Frequently multiple parties claim ownership of a particular mark and spend years litigating who has the right to use the mark, or whether both parties can use the mark for different products (e.g. Ritz crackers and Ritz hotels); parties may have rights to a mark in limited geographic areas; and parties often claim to own generic or descriptive marks that can not actually be "owned" by anyone. In short, even information on the best known marks can become obsolete, and there are few people here with the technical background to determine the status of a mark, particularly one that is contested.

In short, we should get rid of those symbols. Cheers! bd2412 T 06:44, 9 March 2007 (UTC)

Any idea why all other dictionaries seem to use the marks, then? I don't understand what is bad about including the mark on a term that has had technical problems with renewals. OTOH, retaining the marks, alerts our readers that they probably should use the symbol as well. To me, it seems that removing the marks would be inconsistent and unhelpful. --Connel MacKenzie 10:47, 11 March 2007 (UTC)
It seems the cleanest way of marking trademarks and brand names to me. The symbols are universal and concise. --EncycloPetey 03:46, 12 March 2007 (UTC)
To Connel, I'm looking at the Webster's Collegiate Dictionary, Tenth Edition (which is the one I have handy at the moment) and it has a listing for Xerox without the symbol, but with trademark at the beginning of the definition line. I do not believe we need to 'alert our readers' in the manner that you suggest, as there is no reason for anyone other than the owner of the mark to actually use such a symbol. A Google book search for Coca Cola, Absolut, Tylenol, shows that such symbols are absent not only in works of fiction, but even in non-fiction works examining these specific industries.
To EncycloPetey, I think the cleanest way of marking trademarks and brand names is the same way we mark medical terms, slang, vulgarities, sports terms, etc., with a notation in the definition line. This is particularly evident where we have an entries such as Cadillac, Hartford, Lincoln, Mercedes, Nike, Quaker, and Saturn each of which is a famous trademark, but each of which has additional meanings for which a capitalized entry is necessary (place names, given names, surname, mythological figures, etc.). bd2412 T 16:46, 12 March 2007 (UTC)

I agree with BD; I have always found it a little weird that we include the symbols, when they virtually never appear in actual usage. We give the impression that such symbols have to be used with the word, which is not the case. Widsith 17:11, 12 March 2007 (UTC)

  • Sorry guys, but these symbols while concise are certainly not universal. In Spanish neither are used, instead MR marca registrada takes the place of both.
  • As for the symbols being used in the headword section but not in actual use, this is just silly. We also put m or f in the headword next to nouns but these are also never seen in actual use. — Hippietrail 14:34, 13 March 2007 (UTC)
I agree with bd2412, having a tag on the definition and not in the headword seems like a good solution. Regarding Hippietrail's comment: so, if we did decide to continue using TM in headwords, do we restrict this to ==English== parts of speech, and use MR in the ==Spanish== headers? -- Beobach972 15:44, 14 March 2007 (UTC)
Hippietrail, m and f indicate how the word must be used in speech and writing. ® and ™ do no such thing; rather they indicate how one party, the owner of the trademark, would like you to use the word, but not any way in which you are required to use the mark. And how about the many words (including some listed above) that have one sense that is a trademark and others that are generic? Also, how does Spanish account for unregistered marks? A word or phrase used in a Spanish-speaking country that is used as a trademark but is not registered gets no recognition, virtually no protection whatsoever. bd2412 T 16:17, 14 March 2007 (UTC)
bd2412, your original argument made no mention of "how a word must be used" but merely "the symbol is not a part of the actual word". Since you are content to change your argument as we go would you not concede that some writers do care whether a term might be a trademark or not. It could be a matter of style or policy in the departments of certain companies. I think it is never an error to include too much information on any word. Those who do not care about certain parts of the information can ignore them. If we erase information then people who do care have no way to create them however. Now the idea of putting the warning that a term may be a trademark into the sense sounds like one worth exploring.
Sadly I am not aware how the various Spanish-speaking countries account for unregistered trademarks. My experience is mostly in Mexico where I am only a visitor. — Hippietrail 17:01, 14 March 2007 (UTC)
Well let me point out that the way we are using the symbols adds to the appearance that they are, in fact, a part of the word. We are putting them in boldface right next to the word, with no space between. Our m and f and pl and so forth are in italics and set apart by a space. Now, if we were to do that with the ® and symbols, it would be more appropriate, but in my opinion it would look horrible. It also seems to me that the qualifications we use in the headword line are fairly stable. We expect a masculine noun to still be a masculine noun in a hundred years. However, a word that is vulgar today may lose that connotation in some years; a term that is slang now may become so widely used as to be deemed formal; and a trademark may become generic, or may simply cease to be used.
If we're going to indicate trademark status, we should do it on the definition line for the trademark definition. After all, what do we do with Ace, a common nickname for fighter pilots, but also a famous trademark for two unrelated companies (selling hardware and bandages, respectively). What do we do with Dove? In fact, if you go to the website of the United States Patent and Trademark Office (http://www.uspto.gov) you'll find that thousands of words in the English language are marks, including the names of most any figure from mythology or ancient history, most city, county, state, and other place names, most given names and surnames - should we put a copyright symbol next to all of them? There are nine current registrations for "Dan", should our entry read Dan®? Should we note marks that are registered trademarks for some purposes and unregistered trademarks for other purposes with both symbols, Dan®™? What if the owner of one of the 40 or so registered "Scott" marks should decide to sue us for using the ® with some marks but failing to use the ® with Scott? Or Venus? Or Rio? Or Smith? Or Taurus? You must know that if we include marks at all, we can not do so discriminately.
I have no need to change my argument, but I'll surely raise additional arguments, all of which indicate that we have no business playing with trademark symbols, especially when our use is bound to be inconsistent unless someone is willing to check every word in the dictionary against the constantly changing USPTO database every few months. bd2412 T 07:35, 15 March 2007 (UTC)
It's worse than that. Many, probably most, of us are not based in the US. There seems to be general agreement above that there is no legal requirement to use the symbols in the dictionary, so the location of the servers is irrelevant. It is where we use the words that counts. Many marks are registered in one or a few countries only. It would be necessary to check all countries' patent offices, and perhaps note in the entries where the marks were registered and where not. That is not the job of a dictionary. I agree with you that we should not use the symbols, but where known, we should gloss the entry to say it is used as a trade mark. --Enginear 12:29, 15 March 2007 (UTC)
Whatever we decide to do, one thing I think we really ought to have is a disclaimer, as some paper dictionaries do, that the inclusion or absence of a trademark symbol does not affect the trademark status of the word so indicated or not indicated (in other words, if you take our word for it but we've got it wrong and you get into trouble because of it, don't sue us). — Paul G 10:21, 18 March 2007 (UTC)
  • Now that I'm back in Mexico again I've kept an eye out and I have actually seen ® (but not ™) used here. I don't know if it's considered the same or different to MR. Also in at least one of my Spanish-English dictionaries, ® is used in both the English and Spanish sections for some words. Niether ™ nor MR is used in either section. — Hippietrail 18:03, 19 March 2007 (UTC)

I CALL FOR A VOTE (can someone put that together?) bd2412 T 05:05, 22 March 2007 (UTC)

Ok, I have started a vote at Wiktionary:Votes/pl-2007-02/Trademark designations. Cheers! bd2412 T 03:39, 23 March 2007 (UTC)

A proposed Vote concerning Placenames

A recently started a vote concerning the criteria for inclusion for placenames. See Wiktionary:Votes/pl-2007-02/Placenames This was not discussed beforehand and has degenerated somewhat. I propose that it be abandoned, and replaced with a simpler vote with fewer, and less specific options as follows.

  1. The criteria for inclusion for placenames should be exactly the same as for all other words - broadly attested, used rather than mentioned.
  2. One or more addtitional criteria should be applied to placenames - details to be discussed later if this option is caried.

Please make your thoughts known here and provide a better first vote if you can think of one. SemperBlotto 09:31, 13 March 2007 (UTC)

I agree that the vote should be abandoned, it's quite a mess. A (rather lengthy, I fear) discussion is needed before a vote of this nature could be restarted. I think that the current criteria are good, but I don't believe it's a simple matter of picking one or the other. Past that I have nothing useful to offer, sorry. Atelaes 10:07, 13 March 2007 (UTC)
I agree that the vote should be abandoned, it seems we didn't plan it out very well. -- Beobach972 20:56, 13 March 2007 (UTC)
  • I apologize for inadvertently spurring it into existence (in a discussion further up, on this page.) --Connel MacKenzie 06:02, 14 March 2007 (UTC)
It's alright, you're right that we need a vote; we just need to clearly plot out all the options we'll have on it. -- Beobach972 15:36, 14 March 2007 (UTC)

The criterion I like to apply to proper names (and with little difficulty to all words really) is if it can be understood out of context. For place names this is a little more lenient than I had previously been judging them. In Athens, Georgia the word "Athens" means the city in Georgia, so if this can be verified with three independent citations, e.g. newspapers or what have you, out of context meaning "Athens" (as per the title of the page) instead of "Athens, Georgia", then the place of Athens, Georgia is one sense of Athens that is understood regionally. Outside of that Athens could only be assumed to be the city in Greece or taken just as a general, unspecified place name. DAVilla 18:54, 23 March 2007 (UTC)

I don't agree with (or maybe don't understand) that logic. First, if we assume Athens, Georgia did not generate 3 cites for Athens, surely the term Athens, Georgia is understood out of context and can be attested as such. For that matter, what place can't be understood out of context regionally considering placenames belong to regions? The problem is that being understood out of context is not a criterion that established appropriateness for the dictionary. If I said the single word "blast" even a cytologist is not likely, in any region, to assume first that I mean an immature cell. Conversely, the cytologist in the US would understand Millard Fillmore, or even Mallard Fillmore, out-of-context. In general, while I wouldn't say that all proper nouns don't belong, I would say that while not all proper nouns should be removed (I just added Sahabi, for instance), since English, Taoism, and Enlightenment have definable meanings with or without context, Skagway, Transamerica Pyramid, and Lü Buwei can only be described by pointing at the physical objects they reference. They don't belong, and neither, I think, does Athens, Georgia, unless its an attributive or generic term. Dmcdevit 23:03, 26 March 2007 (UTC)
I don't think we should have an entry on Athens, Georgia, but we should most definitely have entries on Athens and Georgia! bd2412 T 23:09, 26 March 2007 (UTC)

Easter Competition 2007

This is an announcement to open the Easter Competition 2007. As with previous contests, the prize consists mainly of a warm woolly feeling inside, but the primary object is playful competition among Wiktionarians. --EncycloPetey 02:34, 15 March 2007 (UTC)

Results are now posted. --EncycloPetey 17:16, 10 April 2007 (UTC)

Interpretation of CFI

Someone (dmh) else said earlier in this forum that the CFI are too vague. I tend to agree.

A user created an entry for Friends, defining it as the US TV show. I nominated it for deletion, saying it was encyclopedic (that is, that "Friends" belongs in an encyclopedia). (Note that the entry currently does not contain this definition.) This provoked a heated discussion that comes down, in my understanding of the issue, to the interpretation of what the CFI say about inclusion of names, namely "A name should be included if it is used attributively, with a widely understood meaning".

It is being argued that, as "Friends" can be used attributively (as in "Jennifer Aniston, the former Friends star"), it should be in. Taking CFI literally, this means "Friends" is allowed in. My feeling, based on my experience of what does and does not get into Wiktionary, is that this is not the intention of that part of the CFI.

If the CFI are to be interpreted literally on names, then TV shows, movies, place names (no matter how tiny the places they refer to) and many other names are allowed in provided they have an attributive use ("Friends star David Schwimmer", "Gladiator star Russell Crowe", "Nowhereville resident Joe Bloggs").

Well, yes. If there are three such uses spanning a year in permanently recorded media. bd2412 T 04:30, 16 April 2007 (UTC)

However, if the CFI are not to be interpreted that way, and I don't think they are, then they need to be tightened up to state more precisely what can be included and what should not be. — Paul G 07:13, 15 March 2007 (UTC)

  • See the proposed vote two sections up. Should this vote be exapnded to include ALL proper nouns? SemperBlotto 08:15, 15 March 2007 (UTC)
I think that might be a good idea. Proper nouns CFI seems to be a hot topic as of late. Atelaes 08:19, 15 March 2007 (UTC)
I agree that the issue at hand is whether or not to include proper nouns at all. I also agree that WT:CFI is rather vague in this area. My vote would be to include proper nouns (place names, people, companies, TV shows, movies etc) as long as the definitions are kept short, and the proper noun is widely used in spoken or written English. Some contributors have expressed reservations about this idea, because they fear that we might not know where to draw the line (i.e. include Rocky, but not Rocky II?). My take is that Wiktionary is in its infancy, and it is probably too early to be overly conservative about what to leave out. We are breaking new ground with Wiktionary, and this is definitely an area where we have a potential to surpass our traditional expectations of a dictionary.
For example, did you know that there are three different Chinese translations for the 2000 movie Gladiator (PRC: 角斗士; Hong Kong: 帝國驕雄; Taiwan: 神鬼戰士)? These three translations are, of course, not to be confused with the Chinese translations of the 1992 movie Gladiator (PRC: 终极斗士; Taiwan: 神鬼拳王)! How are these terms pronounced, what are their etymologies, which one (if any) is the literal colloquial term for gladiator in Chinese? Wikipedia does a poor job of answering such questions, but Wiktionary is ideally suited to answer them all. -- A-cai 09:11, 15 March 2007 (UTC)
I agree that we should take a liberal course towards including proper nouns, with some caveats.
  1. I think that we should include place names with abandon - from the country level down to the town, borrough, or hamlet. In order to prevent ourselves from going crazy with 50 towns named Springfield, I propose a rule that if a place name is used for more than 5 places, then it gets a single line indicating that it is a commonly used place name (unless one of those places is a world city like Paris, or a capital like, well, Springfield (Illinois, that is).
  2. I think we should include a line for any brand name, movie name, TV show title, band name, or song title for which we should otherwise have an existing Wiktionary entry (Friends, Sneakers, Pledge, Nirvana, Joy, etc.)
  3. I mentioned somewhere above that I think we should include the brand names of medications and cars (remember, we're talking about one-line entries here).
  4. With respect to people, I think we have to act on the presumption that any combination of a first and last name (Joe Smith, Marcia Clark, Reginald Denny) is simply non-idiomatic unless it means something other than merely an identification of a human being (e.g. Shirley Temple, Benedict Arnold) bd2412 T 11:10, 15 March 2007 (UTC)
That sounds generally O.K. to me, except for criterion #3#2. I don't think "is a capitalized version of a common noun" should be a criterion for including a proper noun, any more than "is a protologistic use of an existing word" should be a criterion for a normal word. —RuakhTALK 16:12, 15 March 2007 (UTC) and 07:20, 16 March 2007 (UTC)
Good work - this seems reasonable. I particularly like number 2, which would cover the current "Friends" debate; I have already suggested in the discussion on RFD that we should cross-refer to Wikipedia in this case, and I think we should do this for all words in category number 2 above.
Ruakh, did you mean "#2" when you said "#3"? As I understand it, the idea here is to acknowledge the existence of a capitalised form of a common noun, but not to give it any treatment here unless it falls into any of the other categories that we will have articles for. So, for example, bath should have a "See also" at the top (indicating another entry in Wiktionary) that links to Bath, the place in England; cavalier and mini would do the same (brands of cars); but friends and nirvana would include links to the Wikipedia articles on the TV show and band name respectively under the "See also" section towards the bottom of the article. We would do this not because these proper nouns deserve special treatment, but rather because users might expect to read about them here, and would be directed to Wikipedia; and because it will also go some way towards preventing contributors from adding definitions for the proper noun to the entry for the common noun. Users searching for proper nouns that aren't capitalised forms of common nouns in Wiktionary will get the "Perhaps there is an article on X in Wikipedia" link and can then find what they want in Wikipedia.
Articles on particular people, such as "Shirley Temple", are clearly encyclopedia material and would get no coverage in Witionary at all, that is, not even a cross-reference or link. Users searching for "Shirley Temple" will get the page that says "Perhaps there is an article on Shirley Temple in Wikipedia". Vanity articles created by users about themselves will of course continue to be deleted.
I think this is pretty much what we do already, even if it is not explicitly stated in CFI. What is therefore needed is a form of words for this that we can put into CFI to make it much more clear exactly what is to be included and what is not. — Paul G 07:00, 16 March 2007 (UTC)
Paul, I don't think you meant to use Shirley Temple as the example, it is the counter-example: it is the name of a drink, which is why we have the wikt entry. And given person who does is not a drink or synonomous with traitor would just be in wikipedia. This all seems pretty good to me; but I don't think that we should have "Gladiator" as a film title because it is named variously in other languages; that should be in the Wikipedia articles (English and others) on the film; in particular there should be interwiki links. If WP is weak in this area, well, go improve it! (We don't have the invented term "sorcerer's stone" because J. K. Rowling thought—probably correctly—that the title had to be dumbed down for the American audience who wouldn't know what a philosopher's stone was ;-) If the Chinese names for Gladiator qualify in their own right, sure they should be included as ordinary words, with the same see-also reference(s) to w:zh etc. Robert Ullmann 09:53, 16 March 2007 (UTC)
Ah, my mistake. Yes, names of cocktails are certainly to be included, and many of these are named after people (famous or otherwise); and yes, we would have no entry for "Gladiator". People wanting translations of the name of the film should look at w:Gladiator and then follow the link for the language they are after, or request it if it is not there. The Wiktionary article "gladiator" would have an interwiki link in the "See also" that just says "Gladiator in Wikipedia" or something like that, without specifying that this is a film. (Who are we to say what article Wikipedia has or will have in the future at "w:Gladiator"? Currently, it's a page for the common noun, with a link at the top to the disambiguation page which contains a whole load of links, including not one, but three films with that title. So there you go — a Wiktionary article on "the" film Gladiator would be incorrect.) You make a good point about the treatment of languages that don't have capitalisation, such as Chinese.
Once there is agreement, I'll draft some text summarising what we are discussing here and add it to CFI. Or should it be treated as draft policy first? — Paul G 10:36, 16 March 2007 (UTC)

[Back to margin.] This discussion was only started yesterday, so I suggest that either it should be left a few days for other views to be added, or it should be written up as a draft policy.

However, subject to fine tuning, I think these are excellent criteria. The page that says "Perhaps there is an article on Marcia Clark in Wikipedia" needs to be made a bit more friendly (because it doesn't actually say that at present), and IMHO, our "See also" heading, when placed at the top for cases like this, should read "For additional senses see". Pop stars, et al, who use a single name, should be treated the same as single word film titles.

Incidentally, translations of place names can be covered by checking WP, in a similar fashion to film names, but we still need them here for their etymological value.

So to consider one example, say someone wants to find out what the name Sigourney Weaver means, and for some reason comes to wikt to find out, we seem to be aiming for the following:

  • Searcher finds no entry on wikt but is given page saying "Perhaps there is an article on Sigourney Weaver in Wikipedia"
  • Searcher finds there is such an article; in the article finds that Weaver is her family name, and that she chose Sigourney as her stage name to match a character in The Great Gatsby. (The article should perhaps say that this was etymologically a singularly appropriate choice, but it doesn't.)
  • Searcher therefore decides to check the individual words; Weaver's disambiguation page should direct the searcher to wikt for the meaning and etymology of the surname Weaver, and we should have an entry saying that, as a surname, it most commonly means descended from someone who made a living from weaving. Neither of these are in place yet, but since they are fairly obvious, that is not too important.
  • Searcher, having got the message that wikt deals with surnames, decides to try looking up Sigourney. At the moment there is nothing there but let's imagine...there is an entry saying
    1. Surname adopted in US by certain Huguenot families previously called Sigournay, named after their town of origin, Igournay, France
    2. Town in US, named after Lydia Huntley Sigourney
      Those seem uncontentious, and investigation on WP could find that the Hugenots were forced out of France by religious persecution, and brought their skills as weavers with them. (Neither WP nor wikt have entries on Igournay at present.)
      But should we also have:
    3. Stage forename chosen by Susan Alexandra Weaver, after a character Mrs. Sigourney Howard in The Great Gatsby. Mrs. Sigourney Howard was herself named after Father Sigourney, a tutor of F Scott Fitzgerald.
      Or should we just have
    4. For additional usage see w:Sigourney Weaver.
      and leave the searcher to search w:The Great Gatsby to find out (actually, it's not yet mentioned there, but then nor is it yet in wikt).
      or should we not mention it at all? My preference is for the second option. --Enginear 14:09, 16 March 2007 (UTC)
You make some good points. Yes, we should certainly carry on discussing this for a while until we have clarified what changes we are going to make. For now, I'll put a note in CFI that the policy is under review, pointing to this discussion.
Regarding given names and surnames, we already have a policy on this - given names go in; surnames go in if they are etymologically interesting. So "Sigourney" should probably go in, as, even if Ms Weaver was the first to adopt it as a first name, no doubt there are lots of baby girls who have been named after her. "Weaver" would also go in because of the etymological interest, along with Archer, Smith, Taylor and another surnames derived from occupations.
I prefer the "See also" option for the info on Sigourney Weaver. I lke "For additional usage", by the way. I would prefer the link to look like this: [[w:Sigourney Weaver|the Wiktionary article on Sigourney Weaver]], which makes it clearer what the user will get on following that link.
I'm intrigued... what was singularly appropriate about the choice of "Sigourney"? Perhaps you could update the Wikipedia article if this is of interest. — Paul G 12:29, 17 March 2007 (UTC)
Not really my bag to add it, but see [5]. --Enginear 16:23, 17 March 2007 (UTC)
I don't see how a surname can fail to be etymologically interesting. Virtually all surnames have some basis of derivation, whether by place, occupation, patronymic, even assignment as a form of derision. But isn't this discussion mostly about place names, brand names, and titles of media? bd2412 T 15:10, 22 March 2007 (UTC)
  • Are we getting anywhere with this discussion? Perhaps we should set out some specific proposals, e.g. my earlier proposal that brand names of cars and drugs should be included based on their respective demonstrably large populations of interest? bd2412 T 03:46, 7 May 2007 (UTC)
    • <incredulous>As dictionary entries?</incredulous> --Connel MacKenzie 05:17, 7 May 2007 (UTC)
      • Yes, I think we can do that. While names of cars and drugs are often brand names, in general usage, these terms are not used as brand names. Compare "I own two Toyotas", in which "Toyotas" refers to vehicles, with "The trademark Toyota is owned by (whoever)", in which "Toyota" refers to the trademark itself, which does not have a plural (IANAL, but I think trademarks can't be said to have plurals [or singulars, if already plural in form]). Hence trademarks can be said to have a usage that extends beyond the trademark itself. Likewise with drug names: "to take Viagra"; "to buy some aspirin" ("Aspirin" was originally a trademark, but no decent dictionary would exclude it).
      • And yes, can we please have some proposals drawn up that we can get some agreement on. — Paul G 09:52, 16 May 2007 (UTC)
  • Paul, that simply isn't true. You cannot say "Toyotas" to refer to vehicles; you can only say "Toyotas" to refer to a particular type. "What kind of Toyota do you drive?" is not the same as "What kind of car do you drive?". How about: "We sat on the roof, watching the Toyotas go by." <-- this is not coherent, unless you were sitting next-door to a Toyota factory. The "Toyota" example you gave is just normal use of a trademark!
  • Likewise, a director of a porn shoot won't ask an actor what type of Viagra he has taken; he'll only ask if the required pill has been taken. If a spammer says "G3NERIC V1AGRA" it is still a direct reference to the trademark! Asprin (the drug originally extracted from willow tree bark by the native American Indians) refers to a much more generic item, which is why that original "trademark" was deemed to be invalid.
  • The more I hear arguments for inclusion of trademarks, the more I am convinced they are inappropriate in a dictionary. --Connel MacKenzie 15:33, 16 May 2007 (UTC)

Categories as Wantedpages but not Wantedcategories

I suppose there's a good reason why many of the top Wantedpages are categories like Category:xx:Slang or Category:en:Slang, and yet none of these categories seem to be in the list of Wantedcategories? A quick search didn't turn up any revelant discussion about this. - dcljr 17:11, 18 March 2007 (UTC)

The reason seems to be that those categories aren't actually included — {{#ifexist:…}}s are used in the relevant templates to prevent non-existent categories from being included — so the pages aren't actually added to those categories. IMHO they shouldn't show up at Special:Wantedpages, either, but this seems to be part of a more general problem; for example, when you edit a page that includes a template in a non-active part of an {{#if:…}}, the list of templates-being-used at the bottom of the page does list that template. —RuakhTALK 18:28, 18 March 2007 (UTC)
The problems with a lot of them is an interaction between a MW bug and the nav template. Connel has fixed this one.
The problems with the ones for xx: and en:, such as xx:computing are from the context templates, which pass {{{lang}}} as a parameter even when lang is not defined; I pointed out to DAvilla that the calls on context/label should use lang={{{lang|}}}, but she insisted this was caught lower down. As you can see, it isn't, and other code uses xx and en and it gets picked up and used as a reference. One oddity about template syntax is that the variable namespace scoping is not at all what you think it is in some cases. May also be related to the same MW bug. (These templates are way more complicated than need be, not at all sure why.) Robert Ullmann 18:36, 18 March 2007 (UTC)
What a mess. This was working, at least for a little while. 10 cents to the person that can find the MW change that caused it! --Connel MacKenzie 16:19, 23 March 2007 (UTC)

Context labelling of inflected forms of Regionalisms

Hello all, I had a small conversation EncycloPetey (talkcontribs) about de-tagging inflected forms of Regionalisms. Specifically the Geordie ones in category:Geordie. I felt that the category had become unecessarily cluttered with plurals and verb forms therefore I set about de-tagging the ones where the infinitives/non-inflected forms are marked already, with exception given to inflected forms of non-dialect words that are specific only to that region.

Anyway, we thought it might be polite to ask others first. But I do feel that tagging ALL inflected forms adds clutter.--Williamsayers79 00:01, 19 March 2007 (UTC)

Why not have a Category:Geordie plurals, and so on, like with English plurals? —RuakhTALK 00:34, 19 March 2007 (UTC)
Its a possibility but I'm not sure everyone will like it since Geordie is a dialect (albeit very substantial) of English and does not follow the standard language + POS naming convention. I remember my very first entry radgie nearly got killed by Connel because I mistakenly had Geodie as the language header :-> --Williamsayers79 00:57, 19 March 2007 (UTC)
I agree with your view that only the singular form should be tagged, which comes from my general preference of having lemma pages as the sole information containing entries, with inflected forms simply as soft redirects (with certain exceptions, of course). However, I have received flack for this view before, so take it for what you will. Ruakh's suggestion would make an excellent compromise if people are adamant on having the the inflected forms categorized (although I must admit I think it somewhat pointless myself). In any case, I strongly feel that putting lemma forms and non-lemma forms in the same category is highly effective at making those categories useless. Atelaes 00:53, 19 March 2007 (UTC)
Yes I think your right on it there.--Williamsayers79 00:57, 19 March 2007 (UTC)
  • I'm not sure what you mean be "tagging" in this context. I strongly believe there should be a label on each sense which is regional. On the subject of putting articles in Categories I have no strong opinion other than agreeing than flooding categories with inflected forms is a bad idea. If the problem is that some template is being used that adds a regional label and a category I'd say don't use the template on inflected forms but still use a regional label. — Hippietrail 17:50, 19 March 2007 (UTC)

Against demolishing the present Policy structure

One CM is trying, through the RFD method, to blow away the present policy structure which allows the gradual development of Policies. He is basically proposing that all discussion again return to the rowdy environment of the Beer Parlour. My experience was that the Beer Parlour was generally a never ending discussion that never reached any conclusion, that never developed a single policy in Wiktionary. I totally oppose his idea.

What I also have to question is the basically sneaky way that is being used to try to achieve this. When the change which would be brought about by these policy method deletions is advocating that all policy discussion take place in the Beer Parlour, isn't it rather odd that there is actually no mention in the Beer Parlour of this planned complete change to the very idea of policy development. In fact there seems to be no (easily found) coherent explanation of what CM is really proposing to put in place of that which he wants to demolish.

I see no merit what so ever in what CM is proposing. It is purely destructive. It would leave no real way of developing policies. In the past, no policies were developed until this policy development structure was put in place and a concerted effort was made to develop policies beyond the discussion stage outside of the highly volvatile forum of the Beer Parlour. In future, CM would have us believe that somehow we could make a leap from Beer Parlour discussion (more like a shouting match half the time) to a fully fledged, approved policy, without any intervening stages. My view is that the change is a recipe for killing off any future policy development. It is not a positive move at all.

I have to observe that it seems the proposal is quite naive about the whole need for policies, and the ways policies are developed in real world organisations. The genius of an idea may come from a Beer Parlour discussion. But to make it all the way to an Official Policy it needs to go through varous development stages of serious consideration. It involves draft proposals, policy focus groups, white papers, proposals, and only at the end, a vote on the Official Policy. The Beer Parlour is not the place, nor the right tool, for this sort of back room work to take place to fully develop and refine a policy.--Richardb 10:58, 19 March 2007 (UTC)

What you refer to as the "present Policy structure" is entirely obsolete, it was not being used. All he is doing is marking the remnants for deletion.
The actual present "Policy structure" is to tag draft policy documents with the {{policy}} template with the draft= parameter to describe the status as it is drafted and discussed here and on its talk page, then conduct a WT:VOTE and remove the draft=. It then takes a vote to make changes. That is the whole Torah, the rest is commentary. Robert Ullmann 11:24, 19 March 2007 (UTC)
Far from it. He has not made real changes. seeWiktionary:Policies and guidelines which still has all the steps described, and is thus, according to CM's own tagging, now Policy. :-) Show me some evidence that CM ever took this Policy change to a vote. Show me any serious discussion of the change. In fact, since you say "The actual present "Policy structure" is to tag draft policy documents with the {{policy}} template", how about you show me where that policy is ? The whole point is that the discussion of any change, by either the past policy or the CM idea, should be on the talk page of Wiktionary:Policies and guidelines. But it is not. This has all the appearance of a totally unilateral move by CM, very illogical and very incomplete. Very destructive. Very much not properly discussed and voted on. Just a typical CM jackboot approach. He is a techo through and through. He should stick to techo stuff.
OK, we could use the {{policy}} template with the parameter. But, to me, this is a typical unnecessary complication much beloved of techos, hated by the ordinary user. And totally unnecessary.
Yopu say "What you refer to as the "present Policy structure" is entirely obsolete, it was not being used." what you mean is that things were not moving in it. Perhaps some of the policy ideas captuired were in fact stable, and could have been promoted. But CM has always been one of the last to do any real work on Policy. He much prefers to shout louder than anyone else and do things unilaterally. To be destructive of other peoples' work. (comment continues below)
Wait just a cotton-pickin' second! Are you talking about me or someone else? You other comments may have had some basis, but what is this all about? Unilateral changes? Obsolete/ignored/superseded policies were stable? What on earth? No, those proposed policies were in DIRECT conflict with existing practices, and represented (in each case) one person's POV of how things should be. --Connel MacKenzie 15:31, 19 March 2007 (UTC)
Funny, the proposed policies stood for a year or so, and the fact they are stable generally means acceptance. During the process I constantly publicised what I was doing, and, contrary to your statement, quite a few other people did contribute to the ideas. Richardb.
(comment continued from above) I stand by the current policy of Wiktionary:Policies and guidelines. And it says we have the various development steps. That policy should not be changed without discussion and a vote. So it still stands. So that is the clearly stated current policy, not CM's half-baked, half implemented, half-forgotten proposal.--Richardb 11:45, 19 March 2007 (UTC)
Your abuse of Connel is completely, totally out of line! It constitutes a personal attack. Stop right now. (comment continues below)
I contributed heaps to Wiktionary for two years or more, till I got totally p'd off by CM's too destructive approach. I'll desist fromo calling a spade a spade when he stops trampling all over other peoples rights.--Richardb 12:15, 19 March 2007 (UTC)
While some of that is true, that doesn't automatically give you the right to assume bad faith. Richardb and I have at times worked exactly towards the same goal, other times directly opposite (while both trying to achieve the same end result.) While I understand Richard's bitterness now, anyone who wasn't here for all of the fireworks probably does not understand the ins and outs of the situation. I'd like to request that no one fight for me, per se. Richard has some genuine complaints, along with some very major misconceptions about what has transpired, and those circumstances. --Connel MacKenzie 15:21, 19 March 2007 (UTC)
Thanks for the conciliatory note Connel. Now I've got your attention I'll try to be more polite. Richardb.
(comment continued from above) For everyone else's information, this has been discussed; a lot of it was on the IRC channel; Connel is in no way whatsoever acting unilaterally or improperly. There is a lot of cruft to clean up. Robert Ullmann 12:02, 19 March 2007 (UTC)
The IRC channel has absolutely no standing in deciding policies. You make no attempt to point to any evidence in the log, or the talk page, or any where, that this was ever discussed in writing. I can only assume that is because there is no written log of the discussion. So how does that fit into any sort of policy ? And are you going to point to the "policy" you purported to quote ? Or does the policy I pointed to have more standing. You are only demostrating your own complete ignorance of the written policies of Wiktionary, and your unfounded faith in CM's good faith. Point to some real evidence, or just back off with your useless platitudes.--Richardb 12:15, 19 March 2007 (UTC) (In no mood to be polite with people who put politeness above actually following the rules. Connel is clearly way outside the written rules. If you can find any rules he has followed in this area, please point them out to me. Otherwise just - shut up! The most cruft to clean up is the useless waffle about being polite.--Richardb 12:15, 19 March 2007 (UTC)
By the way, I checked the Beer Parlour Archives for January Wiktionary:Beer_parlour_archive/2007/January#63275614825 and found not a scrap of discussion about this issue, yet the RFD's were put up around Jan 27th. Did find a bit of discussion in the Sept06 Archives, but nothing to actually back up what Connel did.--Richardb 06:57, 20 March 2007 (UTC)
Is that a public expression of intent to wheel-war? That sounds like fun... --Connel MacKenzie 15:35, 19 March 2007 (UTC)
Wheel Wars is something that exclusionists indulge in, not Inclusionists such as me. I haven't touched a single one of your entries in regard to this deabte (But couldn't resist RFDing your apparently useless, unexplained Catgory on "Pages with a shortcut"). But I do ask you to rethink. Inclusionists such as me just try to bury you in verbosity :-) See the note "A way to go" below.--Richardb 06:18, 20 March 2007 (UTC)

Try reading w:Wikipedia:Off-wiki policy discussion for the Wikpedia view on the standing of IRC discussions when it comes to deciding policy. To quote their highlights :-

  • "Consensus" in the Wikipedia context means consensus amongst comments posted on Wikipedia. Off-site discussions do not contribute to "consensus".
  • IRC can also be used for the purpose of consensus-building. Quite simply, Serious policy discussion should be common on IRC. When good ideas or proposals result from such a discussion, participants should publicly post a summary of the idea on Wikipedia.

So where is this summary of CMs idea posted on Wikipedia ?--Richardb 13:00, 19 March 2007 (UTC)

At present, I don't recall even which section of w:WP:AN is is archived under. I'll try to dig up links this evening. --Connel MacKenzie 15:24, 19 March 2007 (UTC)
I'm confused here, why would Connel post comments about Wiktionary policy on Wikipedia? --Versageek 21:17, 19 March 2007 (UTC)
May I ask for some clarification on specifically which of Connel's actions are being called into question here? Perhaps a link to the relevant diff, or at least the page which he deleted, or whatever it is. I'm totally lost, and would very much appreciate being brought up to speed. Thanks. Atelaes 20:44, 19 March 2007 (UTC)
start reading here it's this entry, and several that follow. --Versageek 21:04, 19 March 2007 (UTC)
The very point I'm trying to make. Connel has used the RFD process to change Policy. There is no-one place (That I can find) that puts forward a proposal for change. The talk pages of the affected policy pages do not include any discussion for the change. Ricahrdb

A way forward ?

Being optimistic, is this an indication of Connel being willing to actually work on developing policies. If so, I'm more than willing to spend a bit of time working with him, and anyone lese. But, we have to do it the right way. Changes have to be proposed and publicised and slowly a consensus built. All completely transparent and in writing in Wiktionary, in the talk pages of the pages affected. No using RFD to push changes through. First signs of goodwill I'd like to see would be:-

  • Connel to withdraw the RFD for/from each of these pages.
  • Connel to put up a written proposal somewhere (probably in the talk page of Wiktionary:Policies and guidelines as to what he proposes. If there was a considerable IRC chat about this, can we have a summary.

I would also suggest that we possibly try to align somewhat with the more mature Wikipedia. See w:Wikipedia:Policies_and_guidelines. They seem to have "Policies", "Guidelines", "Proposals", "Essays". Not so different from "Official Policy", "Semi Official Policy", "Draft Policy", "Policy Think Tank", but perhaps not so apparently rigid and "bombastic". But nevertheless a recognition that it takes stages to develop a policy.

Hope we can work together on this Connel, even though I can barely spare the time.--Richardb 05:29, 20 March 2007 (UTC)

Wikipedia needs many levels of policy development because of the number of people involved. While I agree that we need more than one level, we should be cautious of going for an over-complex and over-rigid framework more suitable for a large organisation. The result of excessive complexity is that the system falls into disrepute and is ignored...which has indeed happened here. --Enginear 12:31, 20 March 2007 (UTC)
A few notes on this. First, I agree that simply nominating these things for deletion may not have been the best approach. However, it should be noted that, at the very least, he did not delete them outright. RFD still allows the opportunity for discussion and debate (as the very fact that we are now having this debate shows). Second, I think it should be noted that many of these pages were in a state of being sidetracked and ignored when Connel nominated them. Third, I would really like to have a page that has a listing of all the policy pages (as I noticed one of the RFD'd pages was). Wiktionary has such a ridiculous amount of policy (and yet rightly so), and I find it hard to keep track of it all myself. And, while I am not a veteran like some folks here, I'm not exactly a newbie either. Yet I still find myself unaware of certain policies. However, most of the pages that were nominated for deletion do need a great deal of work if they are to remain useful, as they certainly do not reflect current practice (which is really what Wiktionary policy is, in reality). (comment continues below)
If they don't reflect current Wiktionary current practice, there are at least two ways to go.
  • Update the policy to reflect the current practice.
  • Modify practice to more follow policy, thus slowly edging away from the current poor practice.
Those who generally agree with the benefits of having written policies will tend to the latter, with some of the former. Those who generally disagree with having policies will tend to the former to some extent, but actually are more likely to just ignore any policies anyway. Which they are free to do. (Indeed that in itself iis a Wikipedia policy). But no need for them to try to knock down policies which are useful to newbies, and to those who do want to try to build and use them. --Richardb 06:18, 20 March 2007 (UTC)
(comment continues from above) At least some of them merit (in my opinion) such work. Finally, while I agree that it is sometimes beneficial to have policy discussions in a location other than the BP (as, for example, I found the discussions on the talk page of the About Greek to be much more focused than many BP discussions), as it largely limits the people involved to those who are interested and somewhat knowledgeable in the topics at hand. However, each and every single discussion of this type absolutely must have a note on the BP publicly announcing that the discussion is happening and where. Atelaes 05:55, 20 March 2007 (UTC)
Absolutely. The BP should always have a noticeboard of what policies are being discussed. And guess what. That is what is there right at the start of BP. But, even so, it's worth standing up in the Beer Parlour every so often and shouting "Anyone interested in seriously discussing ..... should go to ..... for the serious debate". Which, I guess, is what I've done. Was it just a bit of a tactic to also throw a couple of swings at "my mate" CM in the process, to get some extra attention ?--Richardb 06:18, 20 March 2007 (UTC)
I get enough of that from rolling back vandals, thank you very much. As Versageek was baffled earlier, let me try and assemble some of the relevant events. There was a Wikipedia blowup about Wiktionary, with a couple Wikipedia admins visiting Wiktionary and immediately running afoul, based on their assumption that this is Wikipedia.
The policys that I tagged for RFD were specifically called out as being what led those contributors astray. Each was undeniably obsolete. Each one was also long abandoned.
During this time, much of the confusion was resolved on IRC. The fallout of that, after the RFDs was my rearrangement of what the existing policies are, for visiting Wikipedians' sake. In a nutshell: WT:CFI and WT:ELE are the absolute pillars of Wiktionary. Discussions are (for better or for worse) held in the central WT:BP area. WT:VOTE is used to implement/validate new policies and practices. What I abandoned, was devoting a couple hours per week of my time to keeping them up to date...once the three Wikipedians in question got comfortable, there was no urgent incentive to pursuing the policy maze that Richardb had originally set up (for further simplification/dismantling.) NOTE: Richardb spent a lot of time and effort singlehandedly trying to implement a policy structure that he though was appropriate...however, with the lower traffic of en.wiktionary.org, the system was enormously too complex, and overkill for the situation by several orders of magnitude. Remants of his proposed policy structure (which everyone ignores) shouldn't be left around for Wikipedians to trip over...that was the impetus for the initial RFDs!
So, where do we go from here, indeed? --Connel MacKenzie 16:16, 23 March 2007 (UTC)

Revert first, look later

After a string of incompetencies by mister Connel Mackenzie I see everyone talks about recently (see my IP's Block Log for edifications), today I see another idi... um, fellow, reverting my changes and then his after probably actually SEEING what he has reverted. Now, I know that these people are busy, but to revert a change based on the comment, or worse, on Connel's side, for just editing a word that he knew as mostly vandalised seems to me incompetence, not to mention a violation of a certain statute, if I'm not mistaken, of this site's, that mentions ,,good will" assumed of one's modifications. That may apply to people that actually read the modification, but what do you call those incompetent idi... um, folks that just revert cause they don't like the edit summary or the word edited? 15:56, 19 March 2007 (UTC)

If you think any reversion of edits on Wiktionary is a violation of any statute, then you are indeed mistaken. Cheers! bd2412 T 15:59, 19 March 2007 (UTC)
Well, if you tolerate bans for no reason from admins or modification in the detriment of an articol and take no action against that or see no problem with it, I must congratulate you on a job well done promoting vandalism. —This unsigned comment was added by (talkcontribs) 16:12, 19 March 2007 (UTC).
I didn't say that. All I said was, there's no statute violated. But take it however you wish. Cheers! bd2412 T 16:28, 19 March 2007 (UTC)
I'm not sure what you're saying. If you mean that no reversions violate policies, then I respectfully disagree. If you mean that not all reversions violate policies, then you're correct, but I'm not sure what your point is; anon wasn't claiming otherwise. Rather, he was saying it violates the "assume good faith" policy (w:WP:AGF; I don't know if Wiktionary has a similar counterpart) to revert an edit without looking at it. —RuakhTALK 19:55, 19 March 2007 (UTC)
No, I'm just nitpicking. We have policies. We do not have statutes. Cheers! bd2412 T 22:15, 19 March 2007 (UTC)
It would help if you referred to a specific reversion or edit, rather than making vague grumbling noises. We can't fix a generic problem without addressing the specifics first. --EncycloPetey 16:16, 19 March 2007 (UTC)
Anon is referring to http://en.wiktionary.org/w/index.php?title=Special:Log&type=block&page=User: and http://en.wiktionary.org/w/index.php?title=pizda&diff=next&oldid=2149808. Frankly, I agree with him/her: Connel MacKenzie (talkcontribs) seems to have acted indefensibly in this case. Anon made a series of contributions pertaining to Romanian, all seemingly reasonable and correct (I don't speak Romanian, but that's how they seem), culminating in a contribution to pizda adding the Romanian sense of that word (which, unsurprisingly, is the same as the sense of that word in the various nearby Slavic languages). Connel MacKenzie responded by reverting the edit and blocking the user, writing "don't mess with constant vandalism targets please". This is indefensible; the edit is quite reasonable-seeming, and he seems to have made no effort to determine whether it was accurate. If he feels that anonymous editors shouldn't edit this page, he should semi-protect it rather than block any who try. —RuakhTALK 18:55, 19 March 2007 (UTC)
I looked into this case a while back (the anon had posted a friendly little note on Connel's talk page). The extra history behind this is that someone had been adding a Romanian section, when Dijan had noted (a number of times) that a Romanian section did not belong here, but rather at pizdă. A number of different anons had attempted to incorrectly implement a Romanian section at this entry, all being reverted. Perhaps Connel acted a bit hasty in this block, but it is not as though there was no reasoning behind it. This page definitly did have a history of anons doing things that shouldn't be done to the page. To the original author of this thread, as EncycloPetey notes, you would do well to cite specific grievances, and to ask for specific remedies, instead of making incoherent claims and poorly veiled attacks. I agree that Connel did in fact make a mistake, but I have yet to find an admin who hasn't. However, a quick search of your own contribution history finds you making the understandable mistake of adding a false link [6], so I think it should be admitted that no one is perfect. If you feel that some action is necessary in response to Connel's mistake, then please propose one. Otherwise, I suggest you get on with your life. Atelaes 20:29, 19 March 2007 (UTC)
I'm confused: there's currently a Romanian section at pizda, and it's been there for almost three weeks with no comment. What's changed? —RuakhTALK 21:02, 19 March 2007 (UTC)
pizdă and pizda are different words. Cynewulf 21:09, 19 March 2007 (UTC)
No, they're not … —RuakhTALK 22:07, 19 March 2007 (UTC)
Well, the pizda entry says that one is articulated and one is unarticulated. However, I don't quite understand what that means. Anyone care to curb my raging ignorance? Atelaes 06:02, 20 March 2007 (UTC)
Romanian, like Bulgarian, marks definiteness of nouns in the ending. For this particular word, pizdă is the citation form and means cunt. The spelling pizda means "the cunt" (nominative and accusative); pizde is genitive/dative, and pizdei is the definite genitive/dative (of/to the cunt). —Stephen 23:01, 20 March 2007 (UTC)
As I've stated, I don't speak Romanian; but I understand it to mean that pizda means the poontang while pizdă means simply poontang (see w:Romanian grammar#Articles). —RuakhTALK 16:58, 20 March 2007 (UTC)

On a related note, perhaps Wiktionary should have a policy specifying when users can make automated reverts. Such a policy might specify that automated reverts are to be made only in cases of clear vandalism. I know the arbitration committee on Wikipedia criticized an editor for not explaining his reversions once. Without an explanation, many editors assume the worst as to why a reversion was made. Some editors are embarrassed by them because it makes it look as if their edits were so bad that an explanation is un-necessary. Users who write detailed summaries of their edits may feel like they're being ignored. Others may feel like their new-comer status is being highlighted by the use of such powerful tools.--Νικα 22:41, 20 March 2007 (UTC)

I have to apologize, but let's bring this back down to reality :) An anonymous contributor edited a vulgar word in a language that most of us do not speak or understand. Why should we trust the edits? The anonymous editor made no attempt to establish a track record of credibility (this is especially important when editing slang words). When I make an edit, people generally trust that it's correct because I have demonstrated, over time, that I am knowledgeable in the languages that I work on. Anybody can look at my edit history and decide that for themselves. This is key, especially when the word is poorly documented. If nobody can verify that you know the language, your only other option is to include where you got the information from (not in the history comment, but in the actual article under the references section). Remember, we're all anonymous here. Nobody should believe what anybody does unless it can be verified in some way. -- A-cai 02:37, 21 March 2007 (UTC)
Shouldn't editors assume good faith, though? And if someone (a sysop, editor — anyone) isn't knowledgeable about a subject, shouldn't they simply leave the entry alone? Or, if they find it suspicious, they can RFV it, rather than simply revert edits. But I could be misunderstanding your comments. Could you elaborate on exactly what types of practices you support vis-a-vis newer editors?--Νικα 03:29, 21 March 2007 (UTC)
Frankly I think we should do what Wikipedia does and lock down our most frequently vandalized words. After all, the meaning of fuck or fag is not likely to change radically anytime soon, making it far more stable as a dictionary entry than George W. Bush or Hillary Clinton are likely to be as encyclopedia articles. I suggest that for particularly vandal-prone words, we make the entry as complete as possible and then lock it down, with an invisible note at the top of the page to tell would-be editors to take their suggestions to the talk page. Cheers! bd2412 T 03:42, 21 March 2007 (UTC)
Well, before we lock something down, we should make sure it has translations in all (order of magnitude 10000) languages, all derived terms set, all "see also"s added, all synonyms and all alt spellings. -- unsigned
They have pretty thorough coverage of major languages as is. With respect to synonyms, we have WikiSaurus. Anything else, take it to the talk page. bd2412 T 05:20, 21 March 2007 (UTC)
In most cases, the correct action would be to submit suspicious definitions to WT:RFV. However, vulgar and contentious words tend to invite vandalism, which is why sysadmin's tend to be quick to revert suspicious edits. Is this the correct course of action? I guess that's depends on your perspective. I'm simply saying that if you make such edits, you should be able to demonstrate in some way that the edit is legitimate. Perhaps bd2412's suggestion is correct. Maybe Wiktionary should only allow registered editors to directly edit contentious words. This would not be out of keeping with what's going on over at Wikipedia. -- A-cai 04:05, 21 March 2007 (UTC)
Guys, you're sidetracking part of the issue. My problem isn't that these guys take one look at the definition and if they see something that they don't understand, they revert it. That would be... let's say, a little arrogant, but somewhat understandable. My problem is that, by all apparences, THEY DON'T LOOK AT WHAT THEY'RE REVERTING. Simple as that. They just hit the revert button, hell if I know where they have it as to not see what they're reverting, and then look at what they did... if they actually do, that is. It wasn't Connel's actions that made me take stand, one bad weed I can more or less understand, but it was yet another blistering show of ignorance: http://en.wiktionary.org/w/index.php?title=Australia&action=history which makes me think this is a regular habit for people with some experience around here. It didn't end in a ban like the last time, but it sure got me annoyed once more.
As for discussions about track record, achieving credibility... come on! Do you think every (or any, for that matter) anonymous contributors knows about the ground rules you set up for them? For me in particular, I just edit where I see apropriate to change/add/whateva. smth, I don't care about what words I edit or how many. paused
inserted response I understand your point of view as a casual contributor with respect to assuming good faith. However, you must understand that my comments about credibility have nothing to do with "ground rules." I'm simply stating reality: life is not fair! A sysadmin is not likely to give the benefit of the doubt to an anonymous contributor who makes a questionable edit (i.e. an edit that cannot be independently verified by a sysadmin). This is because too many people (registered or not) make bogus edits to words. One glaring example of this is editing a word for a language that you do not speak!!! You can complain about the sysadmins all you want, and in some cases, you may be justified. But life is a two way street; don't do things that will get you reverted, and you won't be reverted. I've added thousands of words to wiktionary over the last year. I have only had my edits called into question on rare occasions. In every case, I resolved the matter, not by cursing at the person who did it, but by citing evidence for the validity of my edit. end of inserted response -- A-cai 03:07, 24 March 2007 (UTC)
bd2412, nice of you to try to close the issue by nitpicking (taking advantadge of the fact that I said statutes in stead of policies... that was quite fair-play of you, I must admit).
inserted response Perhaps, you are unaware of the fact that bd2412 is a lawyer :-) end of inserted response -- A-cai 03:07, 24 March 2007 (UTC)
And for the record, as I saw mentions of my gender, I am a male :P Signing off is here default getaway with a friendly warning: mister Ullmann was strike two, if there is ever another strike of stupidity from someone that considers himself/herself superior enough to revert stuff just 'cause they don't like the editing message, as much as I appreciate this project's ambitions, I shall feel forced to use the ,,big guns" (dear old Proxy Switcher works like a charm ;) )to thank them in a civilised (NOT) order. Toodles! 14:44, 21 March 2007 (UTC)
This is a dictionary, so I feel I have the right to be particular about the meanings of words. A "statute" implies that violation thereof is unlawful. A policy is more like a guideline to be interpreted in accordance with the dictates of the situation. bd2412 T 20:33, 23 March 2007 (UTC)
Wow. All this from a troll/vandal (talkcontribswhoisdeleted contribsnukeedit filter logblockblock logactive blocksglobal blocks), who has been reentering an item that has previously failed RFV/RFD, who not only resorts immediately to personal attacks, but gets support for those attacks? Why wasn't this section immediately rolled back? WTF is going on here? --Connel MacKenzie 15:45, 23 March 2007 (UTC)
I don't think he's a troll/vandal, and you've provided no evidence that he is. All of his contributions seem well meant. The only person he seems to be attacking is you, which I think is understandable, seeing as you had blocked him for no reason and have never looked back. (That doesn't make it acceptable, mind, but eminently understandable.) —RuakhTALK 16:43, 23 March 2007 (UTC)
User:Ruakh that is bullshit, and you know it. Someone who not only expressed intent to use open proxies, but is already intimately familiar with them is not a vandal? It is a good reason to review his edits in detail, but certainly no reason to feed the troll, nor to hide in fear from threats of vandalism. If he wants to post goatse on my user talk page now, we can certainly use the exercise of blocking new/residual open proxies. As to the pizda entry, go take a look at the history. Frankly, I trust Dijan's research more than Stephen's knee-jerk assumption of good faith, but I don't have a handy method of checking either, at the moment. Did the vandal resubmit with three citations? Is it attested? Come off it. It is run-of-the-mill vandalism. --Connel MacKenzie 17:36, 23 March 2007 (UTC)
Sorry for saying so, but you're the one bullshitting; I guess you find that easier than recognizing your error and apologizing for it? I'm also familiar with open proxies; I've never used one, but to be honest, if an administrator blocked me for no reason, I might decide to use one; does that make me a vandal, too? (Granted, I probably wouldn't try to circumvent an unjust block, as I'd more likely just say "fuck this" and give up editing entirely — but I can't say for sure one way or the other. I guess it depends whether I felt the problem was Wiktionary in general, or a single power-mad administrator.) It seems quite obvious to me that pizda is a legitimate Romanian word, whose definition should be something like {{form of|articulated (definite) nominative|pizdă}}. A Google search for google:site:ro "pizda" will show you instantly that pizda is much more common on Romanian Web sites than pizdă is (whether because people don't bother typing the breve, or because the definite nominative form is more common than the indefinite nominative, or what). Also, your attack of Stephen strikes me as just this side of crazy; if anyone here has made a "knee-jerk assumption", it's you. You blocked a user for a good-faith (albeit slightly misguided) edit, and now you're acting like his angry response justifies your having blocked him. —RuakhTALK 19:34, 23 March 2007 (UTC)
O.K., having say that, I see that now that you've re-blocked him, he's started to genuinely and blatantly vandalize under a different IP ( So, congratulations; you've been outdone in the bad-guy department. *is done defending the anonymous-editor-turned-vandal* —RuakhTALK 19:51, 23 March 2007 (UTC)
Again (as if you didn't already) see comments below. He always was a vandal, from the very start. I suggest you redact your "outdone" comment. --Connel MacKenzie 20:27, 23 March 2007 (UTC)
Okay, here's a better example, untainted by all the vulgarities and slang nonsense. This reversion violates Assume Good Faith in my opinion. Deletions like this should be commented at the very least, and probably noted on the talk page as well. DAVilla 19:20, 23 March 2007 (UTC)
You're suggesting that isn't nonsense? DAVilla, that edit is nonsense - you have now (a half a month later) dredged up a student's (now assistant professor or something) web-page as "evidence" that all astronomers make the same mistake as this one former student? If astronomers use the jargon term metalicity, with a similar meaning, then that entry might merit an entry here - but such a blatantly bogus redefinition of metal? Without references? What is going on around here? --Connel MacKenzie 01:00, 24 March 2007 (UTC)
As this vandal likes to point to: User talk:Connel MacKenzie/archive-2007-3#Thanks for the ban. Note clearly, that any "good" contribs this guy has ever made are far outweighed by his initial, and constant, vandalism, interspersed throughout. This is not some guy who is "slightly misguided;" rather, he is an insipid troll. Frankly, the more obvious vandalism he's doing now is much easier to deal with, than the subtle mistakes he was intent on inserting into Wiktionary. --Connel MacKenzie 19:48, 23 March 2007 (UTC)
Sorry for butting in here as a relative newcomer, but may I suggest that Connel MacKenzie might do best to take a small step back here, and admit to a slight glitch. Glitches happen! I see good faith in general (although admittedly, cannot back up this faith with "evidence"). It is clear that Connel cares a lot for the Wiktionary project. --Keene 01:17, 24 March 2007 (UTC)
Nope. As evidenced, it was no glitch to block the vandal. --Connel MacKenzie 01:55, 24 March 2007 (UTC)
You seem to call users vandals all the time Connel. --Keene 02:16, 24 March 2007 (UTC)
I never tried to cite that definition as it was never posted on RFV. What I was doing was taking 5 seconds to give a reason for my reversion. If anyone does even a half-ass job of looking they'll see it's actually quite common in astronomy. The point is that there's really too much information out there for any single person to claim to know what kinds of entries are bogus or not. That's why you have to assume good faith. DAVilla 17:27, 24 March 2007 (UTC)
I have to partially disagree with DAVilla. It's not about what any one person knows or good faith etc. It's about the integrity of the information in wiktionary. "Take my word for it" is not a viable solution for any of the wikis. I will grant that there are plenty of words, that are often not found in mainstream dictionaries, which should be included in Wiktionary. However, someone saw the word somewhere, or it shouldn't be on our site. Here is my approach to this (mind you, I'm not stating Wiktionary policy, just my own opinion). For example, if I create an entry for a basic Chinese word such as 杯子 = cup, anyone can verify the definition from any number of on-line resources[7]. Technically, I should provide proof of the entry's validity, but I didn't in this case (laziness), because it's so easy to verify. Now take a look at 缓冲器. This word is poorly documented in other dictionaries. As a result, I was questioned about it by a contributor (see: User_talk:A-cai#bumper). My solution was not to say, "Trust me, I speak the language and you don't!" Why should that person have to take my word for it? He doesn't even know me. I say, good for him! My solution was to find proof, and then include that information in the article. -- A-cai 02:53, 25 March 2007 (UTC)
That's not what this discussion is about. I don't think DAVilla is arguing for assuming that someone is right or wrong, but for assuming good faith. No one should ever assume that someone is right or wrong. If you have a strong suspicion that something is wrong, you might request verification. If you strongly believe that something is wrong, you might undo an edit. But under no circumstances (in my opinion) should you assume that someone is wrong based solely on the age of their account or on whether an addition is sourced. Perhaps it has to do with my world view, but I believe that humans, in general, act in good faith. I think that they are good at heart. Look at the recent changes for this wiki and 99.9% of the changes you will see are made in good faith and are factually valid. Most of the additions are also unsourced. Assuming that these edits are by their unsourced nature incorrect would be a logical fallacy and also tragic. We saw with Essjay on Wikipedia that having an old screen name on the internet means nothing. The best way to ascertain accuracy is to discuss the content and not the editor.
Also, to get back to my other point: Under no circumstance should anyone be reluctant to explain why they have done something. If you have carefully examined an edit and you are reverting in good faith, then you should have no trouble explaining yourself. In fact, you should be eager to tell everyone. On the other hand, if you do not have a valid reason for making an edit, then you will be reluctant to explain yourself.--Νικα 05:44, 25 March 2007 (UTC)
Unless you're talking strictly in the abstract, you seem to be misunderstanding something here. No one asked the anon to justify his edit; rather, an admin reverted it and blocked him for having made it — and didn't even leave a note at his talk-page explaining why. It's fine to be a bit cautious while assuming good faith — we have to be, especially at oft-vandalized entries — but the admin made a strong assumption of bad faith without any support for that assumption so far as I can discern (though he maintains that there is support for it). —RuakhTALK 05:13, 25 March 2007 (UTC)
Let me clarify my position, if good faith is so important, then the anon must also assume good faith on the part of the sysadmin. The proof that you have cited of Connel's unfairness is not a slam dunk case in my opinion. It appears to me that Connel was making a good faith attempt to stop what he believed to be a vandal. Remember, the anon still has not demonstrated a proficiency in Romanian, nor has he offered any proof from a credible source of his definition (an example sentence might be nice. For example, see 上穷碧落下黄泉). Had one of those things happened, I might have been more likely to side with the anon. What he has done instead, and you can read it above, is threaten to make edits via some kind of voodoo proxy in the future, so that he can't be blocked as easily (rather than making an attempt to support his edits with evidence). With respect to the original potty mouth word that started this whole thing, until a fluent Romanian speaker comes along and sets us all straight, I'm not sure what else we can do at this point. -- A-cai 09:08, 25 March 2007 (UTC)
Someone is lucky that the anon was not civil after what, aside from the history of the page, could look like an unjustified slam. Had the anon not been the same contributor, had he acted civily and been able to credibly source the definition, there would have been no justification for blocking without communication, regardless of the histroy of the page, and I might have recommended disciplinary action against the admin for violation of AGF. That the anon did not act civily means the history of the page backed the admin. So I think the point is that Dijan knew what he was talking about, and that Connel is either just lucky or he really knows what he's doing, which I hope doesn't mean don't suspect means singling people out and driving them acts of vandalism like this. DAVilla 11:45, 1 April 2007 (UTC)
"Unjustified slam"? I suppose it could look like that, if you are blind, perhaps. --Connel MacKenzie 04:09, 2 April 2007 (UTC)
Yes, that's almost exactly what I said. If you're blind to the history of the page, then it could look like (not equal to "is") an unjustified slam. The history of the page makes the story turn into one of an ambitious contributor not listening to reason. The incivility of the anon makes the story into one of an ambitious contributor not listening to reason. If the history were debunked with proof of the word, and if the anon had acted civily, then the story would be completely different. If the history were debunked with proof of the word, and if the anon had acted civily, then the block would have been inexcusible. But that is not the case. You blocked the right guy this time. Is that because you're lucky or you really know what you're doing? For a successful stockbroker, they say that's impossible to tell, the difference between luck and skill. And so it might be here. And so why bring up the question? To point out its irrelevance. You blocked the right guy this time, and everything else we can say, and pretty much everything I said, is nothing more than speculation on the difference between luck and skill. DAVilla 19:04, 2 April 2007 (UTC)
Well well. Just how many people are to be totally p'd off by CM, and as a result quit contributing, before someone decides it is time to rein Connel in. I've been around on wiktionary for a few years. And was active at the time Connel started and helped him for a couple of days. Only a few months later he was already into his stride of using the jack boot approach, and has been doing so for a couple of years now. He rarely takes a backward step, or looks backward at the damage he does. Whilst being aware of how much stuff he does, I still feel he is a real danger for the future of Wiktionary. It is still being run as a private club with a few arrogant people prepared to ignore all the rules and just smack users down. Which is probably why it is not taken seriously by many people. Connel - for a while, just take a back seat on being the policeman, and lets see if the world collapses around us, or if it becomes a friendlier place to contribute (even if it might be a bit smuttier).--Richardb 10:05, 14 May 2007 (UTC)
There are some things as a CU I must not discuss, particularly in a public forum. However, your assumptions that my block of a vandal was unjustified are misplaced. I assure you, that if I were to unblock every OP I've ever blocked, yes, en.wikt: would be a smuttier place. I can safely guarantee that it wouldn't be friendlier.
I'm almost curious as to how you're misguided mode of thinking arrives at the conclusion that WT:VOTE is "run as a private club." Please spend some time constructively, rather than feeding trolls with baiting personal attacks. --Connel MacKenzie 10:55, 14 May 2007 (UTC)
Connel still manages to offend people? Absolutely. The latest victim: User:Keffy. His absence since then has left me, frankly, rather dismayed about the whole ordeal.
Connel doesn't back down? I'm not sure I completely agree with that. I've seen what I speculate to be a slight modification in his very personality. Not a gigantic change in behavior. Not an unwillingness to be the only person arguing his side, as yet. But an acknowledgement of error in a few incidents that, for lack of fanfare, may not have blipped on your radar.
Likewise I have noticed that SB has re-evaluated his position on what constitutes a worthy entry. Again, not a radical change, but definite in my view, and pragmatic in contrast to the idealism that they, and we all, try to uphold.
Those are two of the longest-standing members with whom I have ever had any real gripe. They also contribute a ton more than I do. My politeness requires humility on my part for their consideration of especially my own opinions. A more cynical view says that if they were made of concrete and not oak then they would have already fallen in the strong changing winds. But if you find knots, then they are not the concrete towers we might imagine them to be. (Apologies for the poor literary attempt.)
There is no one here for whom this equally applies, that simultaneously he more deserves CU status, and that it is more difficult to defend that fact, than Connel. Connel himself has said that he is looking for a replacement. Wiktionary is maturing, it is evident, and when the day comes, I expect that Connel will give up CU voluntarily. That will be the end of headaches for himself far more than for you. For you will discover that even then, apart from the natural maturing of Wiktionary, of which I should mention you have already been instrumental, not much else will change. DAVilla 16:48, 14 May 2007 (UTC)


Do we have guidelines anywhere for what it takes for a word to get this tag? It seems to me, like it is a last ditch plan for prescriptivists to defame words which thwart rfv/rfd :-) Several of the tagged words were not neologisms, so I removed the tag from them; many others would probably be better off deleted. The very nature of this tag just seems contradictory: either something passes RfV/RfD, or it does not :-) Though, I don't want it to sound like I have anything against DaVilla (who created the template), in fact I think DaVilla is a fantastic contributor and we can all learn much from their contributions :-) Anyway, if we are going to use this template, we could make link to a page describing the specific, objective criteria used to make the classification. That is more in line with Wikimedia philosophy in general, and would no doubt stir joy and happiness in the hearts of all our readers!!!! :D -Signed, Language Lover

Word. I think we're much better served by appropriate use of {{context}}. —RuakhTALK 16:46, 20 March 2007 (UTC)
To clarify: the template's talk-page does say how it's to be used ("Use this template […] on pages that have passed the RFV process or are otherwise well sourced, but which do not appear in any of the six major English dictionaries […]"), but I'm not sure it's actually being used that way, and the current wording is grossly misleading. —RuakhTALK 16:53, 20 March 2007 (UTC)
I fail to see why inclusion in the "six major English dictionaries" is relevant. For one thing, Wiktionary is itself a major dictionary :-) For another thing, such a philosophy reeks of copycatting, I mean if we're just mirroring those dictionaries, how are we better than dictionary.com? For yet another thing, the classification of dictionaries as "major" or "non-major" is mostly arbitrary (the arbitrariness is of course obscured by lots of appealing to authority and such)-- why is OED "major" and Urban Dictionary not? Does the fact UD's contributors don't all have college degrees, mean that the words they speak aren't words? The way I see it, the "six major dictionaries" can look to us for inspiration/confirmation, not the other way around (and if it's not like that now, it ought to be our goal anyway) :) Especially with all the wonderful work all of you guys like Ruakh do :-) Language Lover 17:19, 20 March 2007 (UTC)
Although I have commented that the neologism template in its current state is useful, Language Lover's comments are so perfect, I can't help but second them.  What Language Lover said is the essence of "wiki is not paper" IMHO.  If a word exists somewhere out there, it should be here and others should be able to find definitions and usage notes on it here. — V-ball 17:24, 20 March 2007 (UTC)
A few comments. First, in my opinion, there is a glaring distinction between the OED and the Urban Dictionary. Yes, one of them is that the editors of the OED have degrees, and, in general, the editors of the UD do not. But more importantly, the OED is consistent, extremely well researched, and more representative of the language as a whole. UD includes definitions used by small communities, or sometimes solely of individuals, whereas the OED's definitions generally represent the semantic understanding of millions. That being said, there is, nonetheless, some merit in your comments, Language Lover. There definitely is a point where we should strive to carve our own niche in the dictionary world, and not simply try to imitate the OED. However, at the same time, we have to deal with the ambiguous line between descriptivism and prescriptivism. I think that most of the editors here are quite in favour of going with the descriptivist school, meaning that we are striving to describe language as it is actually being used, not trying to tell people how they "should" use their language. However, many of our readers are not in on that frame of mind. Many of them look to a dictionary to find the "correct" spelling of a word, or the "correct" context in which to use a certain word (I must admit that I do from time to time). If we do not make some distinction between correct and incorrect (in certain situations), then we are misleading our readers. Whether that is our fault or theirs is irrelevant. That being said, Ruakh makes an excellent point that the context tags might often be more appropriate and more useful for many places where the neologism tag is currently in use. Finally, thank you very much, Language Lover, for properly signing your comment. Atelaes 19:51, 20 March 2007 (UTC)

Good discussion from everyone :-) It is an unignorable fact that one purpose of a dictionary is to help elementary school students check whether words should be put in their book reports. The prescriptivist response is to single out "bad" words and "condemn" them. The progressive response is to take a stance of, we are the ones in the forefront, it's not a matter of the words being bad, but of the English teachers being out of touch. I think for the most part everyone agrees that tags like {{slang}} are an excellent compromise :) I doubt many readers will say to themselves, "I don't mind putting slang in this report, but I can't put neologisms in it!" :D

For the sake of being constructive, here's a possible guideline for neologism status. I'm just putting this up with little thought, hopefully others will expand on it.

  • To be considered a neologism, three out of the following four conditions are required:
    • The word (or sense) is known to have been coined within the past five years, by a single individual, for the sole purpose of coining it. (See santorum)
    • The word is not a straightforward construction by agglutination (ruling out common sense words like windshieldlike, podiumward, etc., which could easily be "accidentally" "coined" by an author without even realizing it; also rules out tsunameter)
    • In a 3/4+ majority of the word's citations, the author talks about the word itself, or defines it as soon as it is brought up; as opposed to the author naturally, seamlessly slipping it in among other words (this rules out things like lolicon)
    • The word is not an eponym, Latin/Greek construction, or other such construction, defined in peer-reviewed academic literature, government literature, etc.
In addition, the word cannot have more than twenty-five independent citations in preserved mediums (as described in CFI).

This is just a rough proposal, hopefully it'll inspire some terrific discussion :-D Language Lover 00:47, 21 March 2007 (UTC)

Defining foreign-language verb forms

This is something that has been on my mind for a while. It seems to me that our current convention for defining non-English verb forms is not to give the translation, as with all other (as far as I know) non-English entries (and as recommended at WT:ELE#Variations for languages other than English), but to give a definition. What I mean is that just as, for instance, the entry for hola is "#hello, hi" which tells me what that word means in English, I would expect a verb form that I've looked up, like comido, abro, getroffen, or karju to tell me "eaten," "(I) open," "found," or "(you) shout," with the appropriate glosses to designate which senses are meant. Instead, the definitions given there, and commonly at all non-English verb forms are in the form of "The past participle form of the verb comer." "The first-person singular of abrir in the present indicative." "past participle of treffen," and "Second-person singular imperative of karjua." These definitions are confusing and not very helpful for the reader. The meanings of first-, second-, and third-person might be common knowledge, but it's not readily obvious to the reader what we mean by conditional, subjunctive, imperfect, past participle etc. mean, or how they are translated into English verb forms. Imagine going to a dictionary to look up fuesen and finding "familiar second-person plural imperfect subjunctive form of ir".

I can't make out any good reasons for them except for the ease of automation. I think we should make it clear that while bots might mass-generate easily created definitions like these, the ideal one one should give a proper translation to an English word for to reader's reference, and should include a gloss if it is necessary for the reader's comprehension, as with other non-English entries. I'm thinking about a change like this one, which I think improves the meaning considerably: [8]. If I go fill in yéndose, should I write "leaving" as I've done for animado, or "present participle of irse"? Any thoughts on this? Dmcdevit 04:26, 21 March 2007 (UTC)

This is an excellent point you raise, one I've pondered a bit myself. One thing to keep in mind is that "translations" are really only approximations, there is no one-to-one correspondence between most languages and English. I think there's a balancing act going on. One type of reader uses the dictionary directly to translate text. For them, your suggestion would simplify things. The other type of reader uses the dictionary as a companion for learning a language. For them, the suggestion might make things seem overly complicated. See Ruakh's excellent examples below. Hmm, it is a subtle and interesting thing which you have brought up!!!!  :-D Language Lover 04:40, 21 March 2007 (UTC)
While you make a good point, I think that the technical information should not be removed. It is highly useful to many people. In addition, I think the "soft redirect" nature of these entries needs to remain. People need to know that they are not seeing the whole picture. I propose a format similar to that which is in place at λύῃς. Atelaes 04:55, 21 March 2007 (UTC)
I don't mind including that information, certainly, but I'm still wary of putting it in the translation, when it isn't. Could we move information like "Present active subjunctive 2nd singular form of λύω" to a "Usage notes" or "Etymology" (or "Verb form"?) section, or do something with it to differentiate? Dmcdevit 05:12, 21 March 2007 (UTC)
Oooh, I like λύῃς :-) You did a fantastic job with that word, Atelaes! :-) The only thing I'll add to your comment is this "best of both cakes" approach should be optional, to allow the quicker, more mechanical old method as well. One other thing, I wonder how one would directly translate the te-form of Japanese verbs? Is that possible? It seems to it is impossible. Language Lover 05:00, 21 March 2007 (UTC)
Translation is surely impossible, as it's used in so many ways. An explanation (meaning by meaning, like in any entry) can be put in (and maybe te, or -te?), which is the relevant suffix. In the grammar taught in Japanese mandatory education there is no "te-form" (conjunctive form? as far as I know there's no Japanese term). It's analyzed as the ren'yookei (continuative form?) of a verb followed by the auxiliary "te". Similarly "masu", "rareru", "ta" (of the perfective(?) form), and many others, are handled as auxiliary verbs.
In the case of the conjunctive form and some others (such as the perfective form) linking to the English term explaining the form may be useful, as they're commonly considered separate forms in Western explanations of Japanese grammar. Perhaps perfective and conjunctive, with added meanings for how they are used in teaching of Japanese and link to Wikipedia article on Japanese grammar (which, unfortunately, doesn't even say what approach it uses, but looks like a mixture of various explanations). With the stem forms one can link like this: continuative form and have something useful there, such as a list of suffixes which attach to that form. -- Coffee2theorems 13:51, 11 May 2007 (UTC)
I think we should define the form in terms of the lemma, as is currently common practice here, for a few reasons:
  1. If the lemma has a number of different senses, then it makes sense for all information to be contained on the lemma page, rather than listing all the different senses for each inflected form (a maintenance nightmare).
  2. Forms correlate poorly across languages. When you define yéndose as "leaving", you ignore the fact that Spanish gerundios behave quite differently from English gerunds and present participles; yéndose often means "in leaving"/"while leaving"/"by leaving" rather than simply "leaving", and conversely, leaving often means "yéndome"/"yéndote"/etc. or "irme"/"irte"/etc. By using a standard template for gerundios, we leave for ourselves the possibility of having and linking to a useful Appendix:Spanish gerundios or whatnot, rather than giving a largely unhelpful "translation". (Note: I use the term gerundio because there doesn't seem to be a good English word for this form. Are we really referring to them in entries as "present participles"? That's grossly misleading, because Spanish gerundios are quite different from present participles in languages that have true present participles.)
  3. I think you underestimate people. I think most people looking up a Spanish word in an English dictionary have some basic familiarity with the terminology; and if any don't, it's at least helpful to know that yéndose is a form of irse, even if they won't get what form without actually knowing a bit of Spanish.
RuakhTALK 05:03, 21 March 2007 (UTC)
Yes, I was being too brief with yéndose, but your good point about poor correlation is true with all translations between languages, including gerundio, it seems, not just verb forms. That's why we suggest glosses to convey the proper sense. In any case, it may have been unclear of me to say that the current definitions arenot obvious. I don't actually think they are definitions. Even in English, it would be like defining cats as "The plural form of cat". But that's more akin to a part of speech, not the real meaning of cats, which would be "more than one cat". It's not that people won't get it, but that it just doesn't convey meaning, except by being a degree removed from the actual usage of the word. I deally, I would think it is best to put the translated meaning in the definition space, and then include the technical terminology specifying the precise tense/person/etc. in a section of its own. Dmcdevit 05:27, 21 March 2007 (UTC)
Your English comparison is a good one; but I think the solution is still to stick to an explanation of what the word is (plural form of cat, adverbial participle of irse, etc.), but to use italics so it's clear it's not actually a definition. By the way, I don't just do this for inflected forms; a while back, I rewrote English adjective sense #6 of gay in a way that gave no definition, only italicized explanation. In that case, it's because there doesn't seem to be an actual definition, whereas in the case of inflected forms, it's because I think an explanation of the word is much clearer than an attempt at translation, but it's the same result either way. —RuakhTALK 06:11, 21 March 2007 (UTC)
In your example edit, inspired is also the simple past tense of inspire -- what is the simple past tense of animar? (Assuming it has one, the French simple past isn't so common) Inspired is also an adjective! What would you propose for 言われた, passive past tense of 言う say? "was said" and nothing more? How about 言いました, past polite form? How would it differ from 言った plain past? Cynewulf 05:07, 21 March 2007 (UTC)
I have been thinking about the formatting of λύῃς and similar words for some time now. This word illustrates an excellent example of what Ruakh is saying. The subjunctive sense does not always mean "might" (although it certainly sometimes does), but has a whole array of nuances. Thus, the translation given is not entirely accurate, or at least not comprehensive. I think this illustrates an important conflict which goes on throughout all considerations on Wiktionary. Do we make it user-friendly or comprehensive? Often times we can do both, and so it is not an issue. However, sometimes we cannot. For example, a word may have a subtlety in meaning which is not adequately covered in less than a decent sized paragraph. But most users simply want a quick and dirty definition and are not concerned with nuances of meaning. In my opinion, we should always strive for both, if at all possible. In this situation, I feel that providing both serves the casual user who simply wants a quick definition of yéndose or λύῃς and then wants to get the hell out of here, as well as the linguist who wants to know what those words "really" mean. It seems that the two do not detract from each other. In addition, including both takes care of Cynewulf's excellent critiques. Atelaes 05:19, 21 March 2007 (UTC)
I guess this would have been a better way to do it? It's important to me that the usage notes are not an actual translation of the word though, and belong separate (as long as there is true translation, that is), and that we should encourage adding true translations and glosses to words that have only tense terminology. Dmcdevit 05:40, 21 March 2007 (UTC)
I think that's a reasonable way to go. However, as we fill out these pages more (which I see as a good thing), we should come up with a way to show that these are "stub" pages, in essence, and that there is (hopefully) a whole lot more info waiting at the lemma page. Any thoughts on that? Atelaes 05:48, 21 March 2007 (UTC)
I would have the bot move its additions to the usage notes section like in animado and leave a note, like "{translation needed}" sign (a template with a category?) after the # in the definition with a link to some explanatory page, in its place. Perhaps not the most aesthetically pleasing, but it seems like the most correct option. Most regular verbs in English with clear translations will be easy to add without needing glosses. Dmcdevit 05:56, 21 March 2007 (UTC)
I have to say I disagree with that. Perhaps it could put it on the definition line and then also put it in a cat. as you say. However, there are a lot of inflections (do YOU want to go through the 90,000 Spanish inflected forms?), and I think we should simply admit that that's a project which shall be waiting for some time. Atelaes 06:04, 21 March 2007 (UTC)
Well, I do, just not personally. :-) My thinking was that either format without clear translations is less than ideal, but moving the current content to a usage notes section (a bot could do that, I'm assuming) at least clarifies the entry. It's a work in progress either way. It's not a big deal though. I would like to at least update WT:ELE (or wherever it should go) with the preferred format, because it appears to me (notice that the two non-Spanish entries, German and Finnish, in my original post were not created by bots) real, live editors are now seeing the inadequate bot-processed creations as the conventional format. Dmcdevit 06:15, 21 March 2007 (UTC)
I think usage notes is a little less than optimal. Usage notes is supposed to be a place for pointing out quirks and such. We could instead make a new section header called "Grammar". Or we could put the grammar data in the line where the word itself appear bold. For example:
kreota Future participle of krei (plural kreotaj, accusative singular kreotan, accusative plural kreotajn)
  1. which will be created
I guess that could cause trouble with words which have tons of conjugations on that line. But maybe such words should be dealt like Japanese, with a separate conjugation table below? Incidentally, I think someone already pointed out the maintainance problem. If we do this, then any time we significantly change an unconjugated word, we'll have to make appropriate changes to all its forms... yikes :-) Language Lover 14:29, 21 March 2007 (UTC)

(Coming back to the margin.) I'd like to reiterate the point already made here (and which I made a long time ago, in another long-since-archived discussion) that giving translations of inflected forms, rather than grammatical information and a cross-reference to the uninflected form, is a bad idea. One reason for this is that the English translations may have many senses. The French word poser can be translated as "to set", but the English verb has dozens of meanings (take a look at it in the OED). So if I edit the page for posé (the past participle of poser) and just give the translation as "set", then — leaving aside the fact that the past participle is identical to the infinitive in English and so "set" is ambiguous (it needs a gloss) — it is unclear which of the senses of "to set" I am referring to.

Of course, you could (and should) include a gloss, and this would be one solution. However, suppose the word being translated has many translations, and someone adds, edits or deletes one for the uninflected form. Then, in theory, they would also need to update all of the pages for the inflections. If they make a mistake, that means a lot of pages to roll back, especially for languages like French, in which verbs conjugate to give very many different forms. If they didn't do the updates (which is very likely) then the pages end up giving different information, or, in the worst case, contradicting each other.

If there is just a cross-reference, none of this extra donkey work is needed, and users can still find all the information they need. Note that we already do this with English inflections: if the noun "foo" has the meanings "foo: 1. an X. 2. a Y. 3. a Z.", we don't give three meanings at the entry for its plural: "foos: 1. Xs. 2. Ys. 3. Zs"; we just say "plural of foo".

The grammatical information is useful for those who understand it, and those who don't can find out what it means by looking it up in Wiktionary or elsewhere. — Paul G 15:43, 21 March 2007 (UTC)

To cover the issue raised by Ruakh about differences in usage in different languages (such as "yéndose"/"irse" for "leaving") then this can be covered by giving usage notes and examples in the entry for the inflected form. — Paul G 15:45, 21 March 2007 (UTC)
What I'm not understanding here with the concerns about ambiguities is that I don't see why the inflected form posé is any more ambiguous in translation than the infinitive poser. However the word is translated at the infinitive, it should simply have the same translation in the inflected form, except the English should be inflected to the proper tense as well. Ambiguities are a concern for all non-English words in translation; how does pointing back to the infinitive (which is then translated) with a tense specification change that problem? What makes these verb form problems different from normal translation issues with ambiguous English equivalents, which we just have to deal with and try to clarify as best we can with glosses or notes or whatever else the situation requires?
(If it's mostly about the extra work (editing the uninflected form requires changes to all its children), well, that's true, but it doesn't strike me as a very compelling argument. Endless work is the nature of the project.) Dmcdevit 17:30, 21 March 2007 (UTC)
Oh, now I full on disagree with you. The point of having all the information at the lemma is so it doesn't have to be repeated. We can do a rather thorough job at the lemma, add twenty different English translations, an etymology, whatever we need to try and get it adequately covered. That in itself is difficult, but doable. Having all that at all its inflected forms, is not do-able. Not at all. This is the beauty of having non-lemmata as soft redirects. Once we state which part of speech and what their lemma is, we're done. We can work on the lemma for years, trying to get just the right translation, and it's no problem. However, if we include all the same info on all forms, inflected languages become a nightmare. I absolutely refuse to change all 200 or so forms of φιλέω every time someone adds a slightly better translation. And someone will, it's a pretty simplistic translation right now that I'm sure does not covere everything. Having full entries at inflected forms is simply not practical at all. Atelaes 18:32, 21 March 2007 (UTC)
There was recently a similar discussion re Translations of inflected forms of English words at WT:BP##Plurals_and_translations. To go back to your earlier example, I should like to see something like:
  1. (present active subjunctive 2nd-person singular of λύω) often You might loosen (see λύω for further information).
(but preferably less verbose). While I should like to see as much info as practicable at the inflected entries, eg I feel cites using that form should be included there rather than at the lemma entry (and this in itself will help clarify meanings), it will not be possible, for the foreseeable future, to give all the detailed info on meanings and (for English words) translations, that are at the lemma entry. However, this need not prohibit giving the most common meanings and translations of inflected forms, provided it is clear to the reader where they can find further info if they want more than a quick and dirty answer. --Enginear 19:43, 21 March 2007 (UTC)
Re: "[…] differences in usage in different languages (such as 'yéndose'/'irse' for 'leaving') […] can be covered by giving usage notes and examples in the entry for the inflected form": I strongly disagree. The solution is for irse (the lemma entry) to explain everything that's specific to the verb irse, and for Appendix:Spanish gerundios or the like to explain everything that's specific to Spanish gerundios (though this might actually be better as part of an Appendix:Spanish conjugation or something, rather than as its own appendix). It seems crazy to re-explain the function of gerundios at the entry for every single gerundio. —RuakhTALK 19:34, 22 March 2007 (UTC)

I've thought of another problem: non-analogous lemmata. In Hebrew, for example, the verb "Template:wlink‎ (halákh)" means "to go", but the actual verb form "Template:wlink‎ (halákh)" is the third-person masculine singular past tense (suffix conjugation); the infinitive is Template:wlink‎ (lalékhet) (well, or Template:wlink‎ (lékhet)), or Template:wlink‎ (halókh) — different linguists apply the term slightly differently to Hebrew — the take-home point being that no one uses any of these infinitives as the lemma). The current system handles this well: "Template:wlink‎ (halákh)" is translated as "to go", per universal tradition, and "Template:wlink‎ (lalékhet)" is explained as the infinitive of "Template:wlink‎ (halákh)", and so on. How would your system handle this? —RuakhTALK 21:21, 24 March 2007 (UTC)

A similar problem occurs in Latin. Verbs in Latin have five infinitives: present active infinitive, future active infinitive (with three sub-forms), present passive infinitive, and so on. I can't even begin to imagine trying to translate correctly the sense of each infinitive form on every one of the verb form pages. I'd much rather say "present active infinitive of verb X" and put the grammatical explanation into an Appendix. --EncycloPetey 18:52, 25 March 2007 (UTC)
I think definitions should only be given in the main article. If giving a definition for 10-50 inflictions (some languages has alot) and someone finds this definition could be explained better, then he has to change the definition in all 50 articles. If not, we will get tons of more or less thought-through definitions for all inflictions, all saying different things. It will be impossible to find the right discussion page of all these. The work with adding inflections by bot will be 100 times more work. And as a user, you will not be sure where to look for the information of best quality. Focusing on one main article, will make the quality so much better than spread it out on 50 different articles. To find the inflicted forms should just be a way for the user to find his way to the main article, the article with all the information. Including special information about certain inflictions aswell as perhaps a grammatics table of different forms. Creating "definitions" by bot, only stating the grammatical form, is the best way to keep it clean and simple, and adopting this standard will speed up the work considerably to include these forms and present them in a standardized and easy understandable way. The viewpoint many of you already suggested, that different inflictions of different languages often also lack direct equivalence in other languages, giving the grammatical info is also a way to give exact and correct information in an effective way. Then, creating grammatics tables in the main article will be a better way to serve the user with information since he can see the information of the infliction in the main article in its context related to other inflictions. ~ Dodde 01:07, 27 March 2007 (UTC)

Format of abbreviations

I've added a section to WT:ELE on how to format abbreviations. In particular, I mention that expanded forms should notobvious error corrected --Enginear 20:01, 21 March 2007 (UTC) be in their usual forms and not capitalised just because the corresponding abbreviation is made up of capital letters (eg, the expansion of AI should be given as "artificial intelligence", not "Artificial Intelligence") and that expanded forms should link to Wiktionary or Wikipedia articles, as appropriate.

It looks sound to me, but please make any necessary revisions. — Paul G 13:12, 21 March 2007 (UTC)

I mention SNAFU there - it needs a gloss, as someone has already pointed out in RFC for that word. — Paul G 13:14, 21 March 2007 (UTC)
Thanks for doing this. It looks good overall, but I think I disagree on one point. If the expanded form doesn't have and doesn't warrant a Wiktionary entry, then I think the components should be wikified as links within Wiktionary. Whether or not the expanded form warrants a Wiktionary entry, the {{wikipedia}} template should be used to link to relevant Wikipedia articles, which can include any Wikipedia articles on the abbreviation itself (e.g. w:SNAFU) as well as any Wikipedia articles on the expansions (e.g. w:Recreational vehicle). —RuakhTALK 13:44, 21 March 2007 (UTC)
I see what you mean. The reason for linking to a Wikipedia article rather than the Wiktionary articles for the component words is that the user is likely to want to know what the whole expanded form means rather than its component words. But if we can do both, then great. Could you perhaps give an example to illustrate how this would work, and then we can update WT:ELE accordingly if people agree with your idea? — Paul G 15:21, 21 March 2007 (UTC)
The disadvantage of the {{wikipedia}} approach is that it doesn't make clear, for those abbreviations with several meanings, which senses can be found there. I am tempted to specifically write (see Wikipedia article) by the appropriate senses. However, that would lead to an error if the 'pedia article were modified. --Enginear 20:01, 21 March 2007 (UTC)
The problem you mention with {{wikipedia}} is not specific to abbreviations; it's a problem with any noun that has multiple distinct senses, and we need to formulate a general solution rather than a hackaround. (Actually, I think this is much less of a problem for the typical abbreviation, since Wikipedia tends to name articles after the expanded form, so the link text will make clear which sense is being referred to.) One option is to give that template a way to specify which sense is intended. —RuakhTALK 19:29, 22 March 2007 (UTC)
The {{wikipedia}} template allows the entry of the direct diambiguated link. --Connel MacKenzie 15:57, 23 March 2007 (UTC)
Don;t forget that it's possible to use a directed {{pedialite}} template in-line or at the end of the entry under "See also". The specific article name may be entered as a parameter. --EncycloPetey 18:38, 25 March 2007 (UTC)
I honestly think the {{wikipedia}} template needs to be completely rethought. Intended to appear once on a page, it could not do more than link to the primarly definition of a word. But in many cases, not just abbreviations, there are a good number of relevant Wikipedia articles that need linking to. Proper names such as Disney are an example, but also any word that has a more specific technical sense, or several common meanings such as trunk, etc. DAVilla 18:25, 27 March 2007 (UTC)
Part of the original problem was that Wikipedia links are not fixed; articles move around and are renamed. This happens less often today, than a couple years ago, but still enough to be a valid concern. IIRC, that is why we link to "disambiguation" pages whenever possible, rather than specific senses. --Connel MacKenzie 15:46, 16 May 2007 (UTC)

Our deletion logs are being harvested

It appears that any deletion with a deletion summary that contains "content was: 'text here'" gets harvested for the following site: http://www.in-vacua.com/interdiction.html

Now would be a good time for all admins to sign up for the "Replace text in deletion log comment." of WT:PREFS, so we don't accidentally expose personal info posted by vandals in the deletion log. fwiw, --Versageek 07:09, 22 March 2007 (UTC)

Note: This has been de-Connel-ized to the Wiktionary namespace now. Please be bold rewording it. --Connel MacKenzie 21:54, 24 March 2007 (UTC)

The guy behind the site has posted a response, here. It might be good to send him a note explaining why we obscure that information. JesseW 23:43, 5 May 2007 (UTC)

And why is that, anyways? DAVilla 03:54, 6 May 2007 (UTC)
Erm, no, it might be better to ignore him. --Connel MacKenzie 05:53, 6 May 2007 (UTC)

X form headers

The question of not using X form headers (Verb form, Adjective form, Noun form) was never quite formally resolved; WT:POS says at one point it is under discussion, but the tables say that X form is deprecated.

Any objection to just treating this as settled and routinely correcting X form to X? (Which a number of people have been doing for a long time? ;-) Robert Ullmann 14:00, 23 March 2007 (UTC)

(Oh, the reason I ask is that AutoFormat is finding these with some frequency, should it be fixing them? Robert Ullmann 14:09, 23 March 2007 (UTC)

I think that would be non-contentious for ==English== entries. But I recall some respected contributors claiming that X form was almost essential for some highly inflected languages. --Enginear 14:43, 23 March 2007 (UTC)
I don't recall that being the conclusion at all. As I recall, it was that making the "form-of" distinction is even worse for foreign languages, than for English. This is supposed to be targeted to English readers after all. --Connel MacKenzie 15:55, 23 March 2007 (UTC)
On the other hand, bot edits are supposed to focus on non-contentious edits, so this is probably outside the purview of AutoFormatBot. I don't recall seeing a proposal for it, by the way. Looks good so far, but should have more community input. --Connel MacKenzie 21:59, 24 March 2007 (UTC)
Indeed, it is out of scope (User:AutoFormat#Principles ;-) if it is not long resolved. Probably should be voted on and the resolution added to WT:POS. If you look at the control table (User:AutoFormat/Headers) "Verb form" is listed as POS, and non-standard. That means "Verb Form" will get changed to "Verb form", but not to "Verb". And the section will be treated as a POS section. Robert Ullmann 22:29, 24 March 2007 (UTC)
To be clear, I recall (and agree with) Connel's viewpoint on this, but I do not recall a clear consensus re highly-inflected languages, even though there was for English (there may have been a consensus, but I don't recall it). (But this is irrelevant if the change is only to regularise the capitalisation.) --Enginear 15:04, 25 March 2007 (UTC)

Wiktionary:Things to do, Category:Wiktionary

Wiktionary:Things to do and the sysop pages in Category:Wiktionary need some attention. They are out of date. Thanks --Keene 01:01, 24 March 2007 (UTC)

Homophones as a L4 header

I formally propose that we modify WT:ELE to recommend Homophones as a Level-4 header under Pronunciation, just as we have L4 headers for Synonyms and Antonyms following the definitions. Homophones are important enough to warrant their own header, particularly as they may confuse English Learners. Such words should not simply be listed in-line within the Pronunciation section, since they are separate words, and not aspects of the entry under which they appear. Unless there is mass opposition, I'll start a VOTE on the matter in the next week or two. --EncycloPetey 19:14, 24 March 2007 (UTC)

I think you mean level four. (?) Pronunciations is L3 (unless under Etymology n) Not a bad idea. I've seen a number of them. Robert Ullmann 19:44, 24 March 2007 (UTC)
Yes, you're absolutely right. I've modified the text above (and section header) accordingly. --EncycloPetey 21:08, 24 March 2007 (UTC)
I think that's a good idea. Before it goes to a vote, though, we should probably have some discussion on how to make clear that homophones depend on dialect and speaker (Mary/marry/merry, witch/which, etc.). BTW, would this be used at words in all languages (or at least, all languages with non-phonemic writing systems), or only at words in English? —RuakhTALK 21:39, 24 March 2007 (UTC)
I'm not sure how much would be needed. Each entry should have its own pronunciation(s) marked by region. Are you thinking about cases in which the homophones exist only in a limited range of dialect? I could see that as an important issue, and would like to hear suggestions. I seem to recall having seen some odd examples marked before, but can't recall which words they were.
Yes, this would be used in all languages, BUT each would be specific to the language section in which it appears. There would not be any reason to link a German word as a homophone in an English section, just as we wouldn't cross-link Related terms between languages. --EncycloPetey 22:36, 24 March 2007 (UTC)
Before it goes to a vote, I'd rather see someone creatively come up with a new/better scheme for the L3 Pronunciation sections. The "look" of the Pronunciation sections currently is awful. Adding subsections to that would only make it worse. --Connel MacKenzie 22:02, 24 March 2007 (UTC)
Something should be detailing the format of the Pronunciation section. Perhaps at Wiktionary:Pronunciation? (or is there another page already?) Then referred to from ELE. As Connel says, it needs some style ;-) Robert Ullmann 22:07, 24 March 2007 (UTC)
It is my ultimate intent to have a fully-fleshed out style guide for the Pronunciation section at Wiktionary:Pronunciation along with a thorough summary at WT:ELE, but there are many, many issues to be resolved in the Pronunciation section and I am trying to attack them in small steps. Otherwise, we would have too many discussions going on simultaneously and none of them would be fully resolved. I started with the AHD --> enPR proposal, and am now tackling the issue of Homophones. I have a laundry list of other concerns too :) My thought is that the homophone issue makes a nice self-contained sub-issue that could be then formatted independently of the rest of the pronunciation section. We could deal with formatting the rest of the Pronunciation section next. --EncycloPetey 22:36, 24 March 2007 (UTC)
Well then, my vote is for having a bulletted, indented "Homophones" tag within the pronunciation section. I don't want to have to rewrite Dvortybot to account for the intervening section. If you are only going to make it uglier and less consistent, I don't see the point at all. --Connel MacKenzie 22:45, 24 March 2007 (UTC)
Who said anything about ugly or inconsistent? I'm suggesting we adopt a standard, and (if it assauges your concerns) this is the only subsection I can see as being worthwhile within the pronunciation section. Everything else should be part of a bulleted list (unless we come up with a better idea).
Part of the problem I have with a bulleted tag for Homophones is that such things don't show up in the Table of Contents (yes, I use them and like them). Having the section separate also eliminates the need to decide where to put the homophones. With a subsection header, it comes at the end of the pronunciation section every time. With your proposed bulleted item, it could show up anywhere in a list of items that may or may not all be included (regional pronunciations, various audio files, rhymes, and hyphenation, at least). This is part of what is making the Pronunciation section look "ugly" right now -- we have a mish-mash of items that all look different but are all set up in a list as if they had parallel format. --EncycloPetey 22:57, 24 March 2007 (UTC)
It is hard to see how to deal with regional homophones without using bullets, each region having homophones shown after its pronunciations. But you're probably much more in touch with the ideas than I am. --Enginear 15:14, 25 March 2007 (UTC)
Having now read EP's explanation below, I understand better, and think that his examples 1 & 3 are the best, for complex and simple cases respectively. --Enginear 19:42, 26 March 2007 (UTC)
I've found some entries where that would rapidly become a mess. Within a region, there may be more than one pronunciation of a given word. Each specific pronunciation has homophones both in and out of the region, which vary with which of the regional pronunciations is compared. One recent headache is sere. There's a UK (Commmonwealth?) pronunciation of /ˈsɪə/, which is a homophone of UK sear and one pronunciation of UK seer. In the US, there are two major pronunciations of sere: /siːr/ and /sɪr/, with the former having a southern US variant of /siːɚ/. Only this Southern variant is a homophone of US seer, but it is homophonic with seer as generally pronounced in the US. The second US pronunication is also regional, and depending on region is homophonic with either sear or sir, but not seer. The first US pronunciation is homohonic with sear, but not as pronounced in the Southern US.
Frankly, I can't envision any means of communicating even a fraction of that information cleanly if the homophones are interpolated between the various regional pronunciations, and we've only considered the US and the UK so far. I think it would be much better to list the homophones (and the rhymes?) in a Homophones section that is structured first by a bulleted list of IPA pronunciations. Each IPA pronunciation would begin a line of homophones, each identifying in parentheses the region (or dialect) for which it is a homophone.
*{{IPA|/ˈsɪə/|lang=en}}: [[sear]] (UK), [[seer]] (UK)
*{{IPA|/siːr/|lang=en}}: [[sear]] (US)
*{{IPA|/ˈsiːɚ/|lang=en}}: [[sear]], [[seer]] (US)
Of course this is just one possibility. I could imagine the structure of the Homophones section paralleling the main Pronunciation section by organizing along regional lines, just as the Synonyms and Translations sections parallel the list structure of the definitions:
*{{a|UK}}: [[sear]], [[seer]]
*{{a|GenAm}}: [[sear]]
*{{a|Southern US}}: [[sear]], [[seer]]
Or we could start off less ambitiously and just use:
*[[sear]], [[seer]]
...and although that would eliminate all the dialectical information, there are some words for which that simpler form would be sufficient. Please keep in mind that this is not the best example for the potential difficulties, but it happens to be one fresh in my mind and therefore easier to find and discuss.
My feelings are rather strong on this issue because the homophones are words with distinct entries, rather than elaborations of the entry in which they appear. Just as we separate the synonyms, antonyms, and related terms into their own subsections rather than interpolating them among the definitions, so I would like to see the homophones separated into their own subsection rather than interpolated among the pronunciations. Particularly since there may be more than one pronunciation in a given region listed on the same line, and not all of them may share the same set of homophones. This wouldn't happen with our definitions, where each definition gets a separate line, but in the Pronunciation section it is a possibility and happens not unfrequently. --EncycloPetey 18:27, 25 March 2007 (UTC)

I think my preference is for something like this:
Note: homophones vary by dialect and speaker. Each of the following words is a homophone of ''sere'' for at least some speakers:
only preferably less wordy. To see exactly who treats those words as homophones, they'll need to look at the various pronunciation sections, but this both lists possible homophones (useful for language learners) and makes clear that they may not homophones for everybody. —RuakhTALK 23:40, 25 March 2007 (UTC)

I agree with EncycloPetey homophones needs its own heading just like synonyms, antonyms etc. It's not always easy to state clear regions of which a certain pronounciation is used, so I think that information in paranthesis should be optional. The good information about the homophone's pronounciation should be in the page entry for the homophone anyway, not in the page where the homophone is listed. So I reject the idea given by example 2 where homophones are devided by region. Though I think it's great to devide the list by pronounciation given by example 1. It should also be possible to add homophones, even without adding the IPA pronounciation, and sometimes the page entries aren't that complex with many different pronounciation. Therefor also example 3 with a plain list of the homophones should be acceptable, in my opinion. ~ Dodde 02:07, 26 March 2007 (UTC)

Please consider the value of getting as much information as possible to the user on the first screen seen. Making homophones a header "costs" about two (2) lines more than the current WT:ELE approach if there is only one line for the homophones and one additional line for each homophone if the generic approach for derived terms, related terms, and see also is used. That's a lot of prime real estate. To use it without much knowledge of the effect on average WT users in the service of conceptual consistency seems unwise. DCDuring TALK 22:56, 27 January 2008 (UTC)

Retrieved from "http://en.wiktionary.org/wiki/Wiktionary_talk:Votes/2008-01/Homophones_section"

See also

While we are talking about headers: this is one of the most common headers on the wikt, and not mentioned in WT:ELE. It gets used at L4 under a POS, typically after Synonyms, Translations, etc, but before (recognized) headers External links and References. Convention seems to be that See also is references inside the wikt and WM projects that are not Syn/Ant/Related/Derived terms. (It also shows up at L3 when it shouldn't, and sometimes when it maybe should, and also shows up at L2, which it clearly shouldn't!)

I'd think it ought to be listed in WT:ELE in that place in the sequence, as references to other words/indexes/related bits that don't fit in the preceeding headers, but aren't external links, which follow. (Is all that clear as mud?) Robert Ullmann 19:44, 24 March 2007 (UTC)

I can see uses both as a L3 and L4 header. When phonemics has a See also listing phonetics, that usage could certainly fall under the POS as a level-4 header. However, when that Afar entry links to the *Afar edition of Wiktionary, that usage of See also should be at level-3. I can't justify including such interwiki links in a subcategory of the Noun part of speech. --EncycloPetey 21:13, 24 March 2007 (UTC)
That seems just about right. Something has to also say that the L3 use of "See also" has to be at the end of the language section, not intermingled with POS sections. Robert Ullmann 22:03, 24 March 2007 (UTC)
If anything, WT:ELE should specify it as an L3 heading. The L4 headings are inappropriate, and should be "disambiguated" at the L3 level instead. --Connel MacKenzie 22:05, 24 March 2007 (UTC)
I'd be fine with putting them all at level 3, or with using a combination of L3 and L4. --EncycloPetey 22:38, 24 March 2007 (UTC)
I would be wary that "See also" sections are just an invitation for random trivia and spam to accumulate. Anything that we actually want readers to also see can fit under an existing heading, and if not, a new heading could be considered. The phonetics in phonemics above is a related term (or derived term), and "See also" shouldn't be used. It might not be worth going through all the instances of "See also" and changing them, but I don't see any reason to codify that type of header into policy. Dmcdevit 03:20, 25 March 2007 (UTC)
No, not everything can be coded under an existing header, which is why the regular sysops use "See also" so often. For example, Semper uses it for taxonomic entries to link to subtaxa. (e.g. to link Oleaceae to Fraxinus). It's how I cose to link to . These are just two cases where none of the existing headers are really appropriate, and there any many more similar situations besides. Although the "See also" isn't officially sanction in the ELE, it's used all over the place and has been for a long time. I rarely see unwanted detritus accumulating there, though it does happen from time to time. I don't see that as a new problem, though, since we have the same rate of additions of duplicate definitions. --EncycloPetey 03:50, 25 March 2007 (UTC)
I can't think of any cases where a "see also" section is necessary — lists of subtaxa fit more comfortably at Wikipedia or Wikispecies (though if there are just a few top-level subtaxa, or a few particularly important ones, then those should be mentioned in the definition line), and the astrological symbols are conveniently grouped into the interestingly named Category:Astronomical symbols, and phonemics can link to phonetics either in its definition line (in a "contrasted with" phrase, like at white-collar) or in the usage notes (in a "not to be confused with" note, like at affect#Verb), or both — but seeing as "see also" sections aren't going away anytime soon, it would be nice for WT:ELE to mention them and give guidelines on how to use them (where they should go relative to other sections, what kinds of links they should contain, how to format each link, how to order the links, whether and how to group the links, etc.). —RuakhTALK 05:35, 25 March 2007 (UTC)
To be clear, what I mean is that if there is a case where none of the existing headers work, I would much rather that the editor add a descriptive one than a "See also". So, I'd rather see someone using "Subtaxa" (or whatever) than "See also". Dmcdevit 17:12, 25 March 2007 (UTC)
The flip side of that is that it would proliferate the number of various headers, which we definitely don't want to happen. A limited set of headers at L3 and deeper means that (1) we can more easily search for and remedy spelling problems, and (2) we can have a short list for new users to learn and grow comfortable with. Too many extra headers makes it harder to control the structure of the data as we've been trying to do. The See also remains a much more flexible option, particularly when the user is directed to one of the Appendices. --EncycloPetey 18:04, 25 March 2007 (UTC)

The above discussion (appropriately) ignores the other use of "See also" at the beginning of an article to link to alternate Capitalised/non-capitalised spellings, and sometimes spellings with diacritics. Such usage needs separate consideration. --Enginear 19:47, 26 March 2007 (UTC)


What would be a good term or phrase to define a situation or just a word that becomes used excessively- to where it begins to annoy people? Something other than redundancy. Basically comes to mind as an example. From what I see on TV, law enforcement and military personell are the main offenders. It becomes a mindless usage used in every other sentence. It ends up clouding conversation and not complimenting the talker. Another example could be the word absolutely. Where is the spelling checker on this thing? —This unsigned comment was added by Gord 6789 (talkcontribs) 04:03, 25 March 2007 (UTC).

I'd say a cliché, catchword, or buzzword, depending on the details; but you might want to take your question to Wiktionary:Tea Room, which is more suited to that kind of question. —RuakhTALK 05:16, 25 March 2007 (UTC)
Yes, cliché is the appropriate term here. You can also use "hackneyed phrase".
Wiktionary has no spellchecker. You can always spellcheck content in a text editor or word processor and then copy and paste it here. — Paul G 09:51, 25 March 2007 (UTC)
You can turn on Wiktionary's (primitive) spellchecker at WT:PREFS. Note that Firefox 2's spellchecker is quite superior. --Connel MacKenzie 06:00, 6 May 2007 (UTC)

Use of anchors in {{t}}

The template {{t}}, used for linking translations to other wiktionaries, is great, but it doesn't allow, as far as I can see, for the use of anchors. I've just been working on "vine", of which one sense is translated as "vite" in Italian. As this is also a word in French and probably several other languages too, I wanted to link the translation to the Italian section, thus, [[vite#Italian|vite]], but this won't work if the translation is given using the {{t}}.

I see that this was discussed when the template was created. Was it ever resolved? Couldn't the template be parsed to recognise an optional template following one containing a hash? — Paul G 10:04, 25 March 2007 (UTC)

It would be so very, very useful if our language code templates didn't contain wikilinks, then we could translate any code to the canonical language name in another template, and this case would be trivial, {t} could just always generates the anchors (#{{{{{1}}}}}). And it is easy to link the result of a template call anyway, so someone wanting (say) Scottish Gaelic linked could just use [[{{subst:gd}}]]. But it is impossible to unlink the result of a template call. But there is resistance to just unlinking all the code templates, even though would be incredibly useful. Robert Ullmann 14:52, 25 March 2007 (UTC)
What about another series of templates for "unwikified" language names? {{n-en}}, {{n-es}}, {{n-gd}} etc. You are trying to use existing templates for something they were not intended for. (Actually, doing that might be a bit crushing to the WMF servers - checking multiple cascading templates on each translation, in 21,000+ entries?) --Connel MacKenzie 15:06, 26 March 2007 (UTC)

Move to WT:GP??? --Connel MacKenzie 15:06, 26 March 2007 (UTC)

@Paul G: right now, the template adds the link automatically, but due to this, it only works with languages in the WT:TOP40. It is this technicality that is discussed above. So go ahead and use {{t}}. You can see in the preview that it gives the correct link. H. (talk) 16:46, 26 March 2007 (UTC)

I go with Connel and suggest we use two setups of templates containing the language name, one containing the language names with wikilinks (or whatever it rules for the TOP40 and such is right now), and one for use with the {{t}}-template containing the language name without anything else whatsoever. I am not sure why Connel suggests the naming convention "n-". I think "t-" would be more suited since it will be used with t-template in translations lists. ~ Dodde 00:25, 27 March 2007 (UTC)
Thanks. I picked "n-" thinking "name" but really, any prefix will do. What it really should be, is a list of Wiktionaries that exist. So, at the template level, if the language code doesn't have a language name, the template would know not to link to the non-existant foreign language Wiktionary. The "top 40" list is great, for what it does, but this really is a separate problem/function. --Connel MacKenzie 16:12, 28 March 2007 (UTC)

Medical Eponyms

I believe the preferred form for medical eponyms in the AMA style book is to omit the possessive 's. However, there is some debate on this - http://www.medtrad.org/panacea/IndiceGeneral/n5_dirckx.pdf

Wikipedia reports:

In 1975, the US National Institutes of Health held a conference where the naming of diseases and conditions was discussed. This was reported in The Lancet (1975;i:513) where the conclusion was that "The possessive use of an eponym should be discontinued, since the author neither had nor owned the disorder." Medical journals, dictionaries and style guides remain divided on this issue. - http://en.wikipedia.org/wiki/List_of_eponymous_diseases#Punctuation

Should our von Willebrand's disease be Von Willebrand disease as in Wikipedia? (For now, let's ignore Wikipedia's unfortunate use of the capital "V"!)

Ben 12:42, 25 March 2007 (UTC)

Therefore, the attested form you are suggesting we should move? That's not quite right. We should have entries for both forms with Usage notes describing the AMA's/The Lancet's prescription. Funny that they would use that logic - the person's attribution "owns" the disorder. Seems like a pretty weak excuse for trying to change lots of common disease names (which are used primarily by newspapers, not medical journals.)
If/when each new form is attested, we can add each "disorder" entry here. (The Lancet, itself, certainly counts as a "reviewed journal" - that is quite likely the publication for which that clause was added to WT:CFI.) --Connel MacKenzie 15:00, 26 March 2007 (UTC)

The underlying reason for omitting the apostrophe, I suspect, was to simplify spelling, especially when the name ends in "s." The Lancet is a very fine journal, but The Annals of Internal Medicine omits the possessive (Neil A. Goldenberg, Linda Jacobson, and Marilyn J. Manco-Johnson. Brief Communication: Duration of Platelet Dysfunction after a 7-Day Course of Ibuprofen. Ann Intern Med, Apr 2005; 142: 506 - 509. "......concern given the high prevalence of von Willebrand disease (1 in 100 individuals)....". So does the Journal of the American Medical Association (at least since 1982 or so). The New England Journal of Medicine uses the pssesive for the disease, but omits it for "von Willebrand factor."

At any rate, how should we proceed? Would it be necessary to find a published example of each form and set up a new page for each? Abels test and Abels' test? Osler's nodes and Osler nodes? Or, do we just put a usage note on one form that indicates the other is sometimes used? Ben 12:05, 27 March 2007 (UTC)

For a word/phrase to pass WT:CFI it should normally be possible to find three durably archived cites (or one in a refereed academic journal). However, if a word is categorised as a "misspelling" (or perhaps "misuse" though this is more contentious) a higher bar is set (not AFAIK defined). So is von Willebrand's disease a misspelling or misuse? I don't know, but I suggest that there has been a change of scientific fashion which is broader than medical usage.
Previously, those who discovered (or improved knowledge of) scientific entities were often linked to their discovery, as in Halley's comet or Weil's disease. But nowadays this is considered a flawed description -- Edmund Halley did not own "his" comet, and Adolf Weil did not suffer from "his" disease, as is perhaps implied by the descriptions.
So now, scientists refer to Comet Hale-Bopp (and indeed Comet Halley) and von Willebrand disease. For older phrases, I suggest that both are valid, perhaps with a note on the 's version that the usage is now deprecated within the scientific community. For newer discoveries, perhaps they should be treated the same, or perhaps the 's version is a misuse. Of course, if less than three (or one refereed) cites exist for a spelling, then the issue does not arise as it cannot even meet normal CFI.
To answer the specific question, I believe there should preferably be separate entries for each spelling which meets CFI, with cites for each. However, this doesn't mean that it is essential for a contributor to add more than one entry or add any cites at all. It is better to add a single entry for a term believed to meet CFI than add none at all; it is even better to add a "soft redirect" entry for "the other" spelling; having one or both entries cited is better still (some say this is best, while others of us would prefer two "full" cited entries). This is a wiki. Once a basic entry is in place, others can build on it (and usually will if its appropriateness is later challenged). --Enginear 15:45, 27 March 2007 (UTC)

I think we should have, as Enginear suggested, seperate pages for each spelling and include relevant context labels, etymology or usage notes as appropriate.--Williamsayers79 13:31, 28 March 2007 (UTC)

Enginear, thank you for restating what I said more clearly. --Connel MacKenzie 16:14, 28 March 2007 (UTC)

OK, I like this solution, and I think I understand it, too, except: What is a "soft redirect?" Thanks --01:37, 29 March 2007 (UTC)

A "soft redirect" is what would be considered a "stub" entry on Wikipedia. The minimal Level two language heading, the minimal level three part-of-speech heading, and a "#" definition line using one of the form-of templates, such as {{alternative spelling of}}. --Connel MacKenzie 18:17, 29 March 2007 (UTC)
Basically, a "soft redirect" is an annotated link: hametz is an example of a simple soft redirect to a full cited entry at chametz, cat-flaps is a cited soft redirect, while cat-flap and cat flap are full entries each noting the existance of the alternative spelling. --Enginear 18:26, 29 March 2007 (UTC)

Language sort order

In WT:ELE, we specify that languages (after English) are to be sorted in alphabetical order by the English name in L2 sections, likewise in Translations sections. Strictly, that means that Classical Nahuatl should sort under C, and Old English under O, etc.

I think it would be better if we sorted on the base name in these cases, (which is what people often do anyway), so that Old English sorts as "English, Old", while remaining "Old English" in the header. And would group with English on the page in this case. "English, Middle" and "English, Old" conveniently sort into reverse chronological order following English.

Prefixes treated this way would be Old, Middle, Middle High, Ancient, Classical etc. Or we could treat any language name that ends with a recognized language name as something to be "inverted" for sorting?

Or do we just stick to strict alphabeticity? (is that a word?)

See wine and vino Robert Ullmann 16:30, 25 March 2007 (UTC)

I'm not sure how I feel about that, but it adds another layer of complexity to transaltions section. We already group some languages rather strangely, varying by editor. Should the various forms of Chinese (which are not called "Chinese") be grouped together (look at the entry for birthday)? Should all eight or more flavors of Sami (look at Monday)? In short, your question is part of a larger sorting issue for languages in the Translations section. For instance, would you want to group Scottish Gaelic with Irish (Gaelic) becuase the end of their name is "Gaelic", or separate them because we have arbitrarily decided to call Irish Gaelic simply "Irish"? Do we group Tosk Albanian together with Gheg Albanian because they both contain the word "Albanian", or do we separate them because they're not mutually intelligible anyway? And if we decide on a case by case basis, just how long a list of little sorting rules would be too long?
I absolutely do not agree with placing language families together in the translations section. We should be consistent, applying simple rules. Even doing it for Chinese, this opens a can of worms. Are we then to do the same for other language families? No, each language or dialect that is identified should be alphabetized independently. DAVilla 20:44, 26 March 2007 (UTC)
Your preliminary list merely scratches the surface of possible prefixing words; consider Western Apache versus Plains Apache, Moroccan Arabic versus Egyptian Arabic, Upper Sorbian versus Lower Sorbian, Inari Sami versus Lule Sami, and note that Tok Pisin is etymologically a compound as well (though I doubt the average user would guess that).
That said, I think it would be good to alphabetize while ignoring words like "Old", "Middle", and "Ancient", primarily because these describe a specific period of development in a language. I think it would be good to group the various forms of Arabic, and possibly the various forms of Chinese. However, this is a very tricky issue with many angles and I don't think I've gotten them all sorted out in my own head yet. --EncycloPetey 17:56, 25 March 2007 (UTC)
Sort of what I was thinking: the "age" qualifiers should be secondary key (not ignored entirely). I think this will make a lot of intuitive sense to people, as well as being fairly simple to code where needed. (if starts with word in set, moe it to the end, then sort) I don't want to get into groupings, it is endless, and they overlap in various ways; this is (one of the things) that the alpha order was intended to avoid. Robert Ullmann 15:09, 26 March 2007 (UTC)
I don't think this would be very transparent to contributors, so it really makes sense to keep the rules very simple. If you really believe this sort order is desirable, then you'd have to be willing to allow for the naming convention to be "English, Old" etc. But in reality this is of minimal benefit. Olde English could just as easily be called Anglo-Saxon, and there are other languages where the "old" language is only known by an entirely other name. DAVilla 20:44, 26 March 2007 (UTC)

Does the ISO language definition code sort in an intelligible order? Ben

Not really. Some of the codes are similar to the English names, but that isn't the objective of the coding. For example, Mandarin, German, French, Dutch, Cantonese are in alphabetical order ... (cmn, de, fr, nl, yue ;-). There are wikts that use the code templates all the time, and sort on them (which produces a consistent, but often apparently random order). Robert Ullmann 15:09, 26 March 2007 (UTC)
  • I think Hippietrail's work on the MediaWiki extension to group language names in a sane manner, is the much better approach. http://wiktionarydev.leuksman.com/ We really should be looking at improvements to his methods there, and adoption of his extensions here. --Connel MacKenzie 16:10, 28 March 2007 (UTC)
But I haven't seen anyone try to implement this within a section of an entry. How would this work apply to the various translations sections, and would such a format make it difficult for visiting translators to add or check translations? --EncycloPetey 21:59, 30 March 2007 (UTC)
Have you given Hippietrail that suggestion on http://wiktionarydev.leuksman.com/ yet? He may already have something up his sleeve... --Connel MacKenzie 16:00, 31 March 2007 (UTC)
Good idea. I've thought about this but not when I've been editing my todo list on WiktionaryDev. I'll add it now. — Hippietrail 17:19, 31 March 2007 (UTC)
I'm sorry that I don't understand what the extension does yet. Hippitrail, if you could automatically pass a {{{languagecode}}} and {{{languagename}}} parameter to every template included within any section, it would be useful to the utter extreme. DAVilla 12:36, 3 April 2007 (UTC)

Constructed languages

Hey, I know we've brushed over this topic before, but I really think it would be best to finally come to some sort of decision on the matter. Do we include constructed languages, or more specifically which ones? It seems rather clear that we do include Esperanto; I don't think there is much debate about that. But what about Quenya? It's an Elvish language constructed by J. R. R. Tolkien for his Lord of the Rings series. We currently have an anon cleaning up the section, and well, I guess I'd feel sort of shitty if a few months down the road we decide to squash all their hard work. We should either put a stop to it right now, or decide to allow this language. I must admit I don't have any strong convictions about it one way or the other. If any dictionary is ever going to include such things, we are certainly the perfect format for such a venture, not being limited by paper. However, this admittedly opens the doors to all sorts of nonsense. If I was forced to make a decision right now, I would say allow Quenya, but disallow certain other languages, such as Brithenig, simply because I like one language more than the other. But it seems that perhaps Wiktionary ought to have some more rigid standards than that. Any thoughts, anyone? Atelaes 23:12, 26 March 2007 (UTC)

I would prefer to put lexicons for minor constructed languages in the Appendix namespace in a single page rather than in the mainspace, but I'm not sure of a good metric for differentiating major constructed languages (Esperanto, Interlingua, Ido, Lojban, etc.) from minor ones (Quenya, Klingon, etc.). Words, and by extension languages, whose use is restricted to a single literary work like Quenya would seem to fail CFI in my opinion, but that doesn't settle the matter completely, considering other languages like Toki Pona. Dmcdevit 23:54, 26 March 2007 (UTC)
Does Quenya have an ISO language code? Wasn't that part of the stadard (or at least rule of thumb) we were using? RJFJR 16:04, 27 March 2007 (UTC)
Yes, 'qya. But the relevant section of WT:CFI#Constructed languages says that uncoded languages are not acceptable, but coded constructed languages may or may not be; and gives a specific list. The current list (and policy) seems pretty good. I would think if someone wants to change the status of any given (coded) language, it just goes to a vote. At present Quenya does not meet CFI, it is explicitly listed as not approved (all of the constructed languages coded in 639 are explicitly listed as in or out).
So the question that presents is: do we want to change CFI to permit Quenya? Robert Ullmann 16:15, 27 March 2007 (UTC)
I think we should stick with what CFI states until given good reason to do otherwise. It's just that I've never heard anyone interpret that particular CFI paragraph so simply. Last time I brought up this issue, it was a whole lot of "ummmmm"'s and "I don't know"'s. Well, that certainly answers the question to my satisfaction. All that remains to be said is this: If anyone disagrees with this, speak now. If I don't hear a community uproar in about a week, I'm going to start going through that list and cleaning out all the Quenya, Brithenig, etc. However, the question also remains of what to do with all these entries. My instinct is to go with Dmcdevit's excellent suggestion of putting them all in their own indeces. Atelaes 16:31, 27 March 2007 (UTC)
What, precisely, do you mean, "cleaning out" that list? You'd need a separate vote on each one, would you not? --Connel MacKenzie 16:17, 28 March 2007 (UTC)
By cleaning out the list I simply mean moving all the mainspace entries which do not meet current CFI (because they are part of a non-CFI language) to appendices. It does not mean that I'll be changing the list. I don't think that requires a vote. If you think it does, please say so. Also, does anyone know of a good example appendix which I can model the Quenya appendix after? Another question, should I leave a redirect (to the appropriate appendix) in place of the article, or just delete it entirely (after all the info has been moved)? Atelaes 05:38, 29 March 2007 (UTC)
Whew. Thanks for the clarification; I'm glad I merely misinterpreted it the first time. --Connel MacKenzie 18:15, 29 March 2007 (UTC)
I'm confused by that list. It says Interlingue is accepted, while Occidental is not; but according to our and Wikipedia's articles on them (Interlingue, Occidental, w:Occidental language), they're the same language, Occidental being an older name and Interlingue a newer one. Am I missing something? —RuakhTALK 17:34, 27 March 2007 (UTC)
That's correct. It appears Occidental isn't used at all, and is not very notable except as Interlingue's predecessor. Dmcdevit 07:14, 29 March 2007 (UTC)
My thoughts are tat if these oddities do not meet the current CFI then they sould be removed from the main namespace. There is no harm in having them in an index or appendix are like the proto-languages.--Williamsayers79 13:06, 28 March 2007 (UTC)
BTW, Quenya is stretching it a bit anyway, but Brithenig really takes the biscuit!--Williamsayers79 13:06, 28 March 2007 (UTC)

The problem with this though, is that while I can say why Quenya is forbidden, as someone who isn't familiar with these languages, I can't tell why Novial, for instance, is included. I can't even find any indication its noticeably more well-known or used than the others. Dmcdevit 07:14, 29 March 2007 (UTC)

Novial has some active speakers/writers/users. See, for example w:nov:Chefi pagine ;-) Quenya is just a vocabulary in a literary work (albeit a very notable one). Robert Ullmann 18:21, 29 March 2007 (UTC)
I suspected as much, though I was hoping for a more quantitative measure to differentiate between the non-literary conlangs. Dmcdevit 22:06, 31 March 2007 (UTC)

This isn't a keep/kill vote really, but I actually benefited from our Quenya entries just yesterday, when I read this, leading me to look up tengwar. If not for wiktionary, I probably never would have figured it out. In that sense, it is nice to have entries for Quenya words, and I'm tempted to say, "what harm can it cause?" On the other hand, obviously Brithenig words shouldn't be put in unless they see a massive increase in usage. I think Quenya really does fall right smack in gray area, and it really is a tough decision. I think it would be good if, assuming we move the Quenya stuff to appendices, when people search for a word which we don't have, search results might include the Quenya appendix (or other language appendices) if applicable. Although our appendices are awesome, I don't imagine many of our casual readers have discovered the many joys of appendices yet :-) Language Lover 23:36, 29 March 2007 (UTC)

One thing though, which I just realized, is that all our Quenya entries are morphologically transliterated into the Roman alphabet! If Tolkien's Elves really did exist, wiktionary would be next to useless to them, since we wouldn't have the words in their rune forms, even if those runs could somehow be transmitted into the search box!  :) Language Lover 23:39, 29 March 2007 (UTC)
As it turns out, there actually is a Unicode range reserved for Tengwar. How many people can see this:  ? Somehow I can. But, I admit that many people probably can't. In any case, I think it might be nice to have both Latin and Tengwar scripts. Atelaes 07:00, 30 March 2007 (UTC)
"Reserved" is not the word I'd use. The ConScript Unicode Registry attempts to coordinate the use of the Private Use Area for artificial scripts, and recommends the use of a certain part of the Private Use Area for Tengwar use; but so far, according to w:Tengwar, only one font supports it, and given the nature of the Private Use Area, this can never become standard. —RuakhTALK 21:11, 30 March 2007 (UTC)
Moving constructed languages that are used in one or more major works, but do not meet CFI as living languages, to appendices seems like a brilliant idea to me. -- Beobach972 21:32, 31 March 2007 (UTC)

Amending WT:CFI

I do think we should codify this better in WT:CFI though. If we agree that constructed languages whose primary use is restricted to a (series of) literary work and its fans do not meet WT:CFI, may be allowed in lexicons in the Appendix: namespace, but are not appropriate in the main namespace, shall we put that to a vote? Currently, CFI seems to imply that there is no agreement either way.

I can't think of a good metric for other ISO 639-3 languages. It has to do with how well used it is, but a measure of that would be nice, if anyone can think of one. Dmcdevit 22:06, 31 March 2007 (UTC)

Specifically, WT:CFI#Constructed_languages implies that there is no agreement; I would like to change the section to add a fourth bullet stating "There is consensus that languages whose origin and use are restricted one or more related literary works and its fans do not merit inclusion as entries or translations in the main namespace. They may merit lexicons in the Appendix namespace." Dmcdevit 00:51, 4 April 2007 (UTC)

From the current list of languages in WT:CFI#Languages to include, which would be moved there? --Connel MacKenzie 04:45, 4 April 2007 (UTC)

Quenya, Sindarin, Klingon, and Orcish I think are the applicable ones mentioned in other categories. Dmcdevit 05:20, 4 April 2007 (UTC)
Sounds worthy of a vote, to me. Thank you. --Connel MacKenzie 05:26, 4 April 2007 (UTC)
I created the subpage: Wiktionary:Votes/pl-2007-04/Fictional languages. Any last suggestions about the wording, or should I make it live? Dmcdevit 06:01, 4 April 2007 (UTC)

Archiving of WT:RFV

I notice that for the last year Wiktionary:Requests_for_verification/archive, linked from the header on WT:RFV, no longer has a list of words which have failed RFV (and the Jan & Feb 07 archives don't exist at all). Is the list of failed words available somewhere else? If not, how are we meant to check if a word has failed in the past? Previously a search for it on the /archive page was sufficient. I vaguely remember this being discussed, but can't remember the resolution. --Enginear 11:57, 28 March 2007 (UTC)

Someone had volunteered to manually maintain that list. I've taken stabs at automating it, but have not had time to devote to completing my preliminary efforts on that task. --Connel MacKenzie 16:06, 28 March 2007 (UTC)
I have a suggestion for how to maintain an archive of RFV-failed terms without maintaining the terms themselves anywhere on wiktionary where search engines could find them and archive them and cloud future verification requests (which seems to be a community concern and reason against keeping them on Wikt): remove them from the RFV page, then put a link to the diff in the archive. What do you think? -- Beobach972 18:38, 28 March 2007 (UTC)
Thanks. That is an excellent suggestion, actually. I forget why it fell into disfavor in the past. As we move towards an automated solution, I think that merits another look - as it is a superior method. At any rate, the problem is implementing the solution - the day to day drudgery of someone actually doing it. --Connel MacKenzie 18:13, 29 March 2007 (UTC)

WordWeb 5 Freware Dictionary

Anyone have experience with this yet? They seem to be using Wiktionary (Yay!) so it might be worth checking into, in detail, at some point. (I stumbled across it, here.) --Connel MacKenzie 18:22, 29 March 2007 (UTC)

Hmmm, they don't seem to be complying with the GFDL very well though. --Connel MacKenzie 18:29, 29 March 2007 (UTC)

RFVing of words with generous b.g.c. hits

Kind contributers, I wonder if anyone here would agree with the following proposal. I propose that if a word has at least 10 immediate citations right at the front of books.google.com upon a simple search, and they are independent, then the burden of proof should be upon the person who wants the word deleted, not upon those who want the word kept. With the current system, a person could go RFV cat, dog, the, and pencil, and the burden of proof would be on those of us who like those words, to spend some of our time writing citations for these obviously good words. For example, someone recently RFV'd usurpress, even though providing cites for this word is just a tedious task of entering it in b.g.c. and choosing some cites from there and typing them in here. We could better use our time than that, unless the person doing the tagging offers an actual reason why the word should be deleted :-) The oversight of words is an important part of the dictionary, but in cases where a 1 minute search will immediately make it clear a word passes CFI (without even resorting to controversial cites like usenet), I think such words' presence is definitely a good part of our dictionary :-) Language Lover 19:06, 29 March 2007 (UTC)

That misrepresets the current practice quite wildly. If something is "clearly in widespread use" there is no reason to issue an "rfvfailed" for it. But in theory, all entries should have references, so an RFV isn't the sinister thing you are making it out to be. Re: "oversight:" Wikipedia has a special meaning for this term; in general, I mean the common meaning of the word, not the Wikipedia special meaning. --Connel MacKenzie 19:14, 29 March 2007 (UTC)
Wow, thanks Mr. MacKenzie, this is an interesting aspect I didn't realize.  :-) You're definitely not considered one of the leading Wiktionary contributers without good reason!! :D See below (my response to Enginear) for more... Language Lover 20:28, 29 March 2007 (UTC)
I agree with Connel on this one. It does not seem to be the case that people actually are RFVing common words, and then just forcing someone else to work on them. From what I can tell, people are generally only RFVing rather obscure words, which are the ones which most desperately need cites. Ultimately, I think it must simply be admitted that part of the drudgery of entering obscure words is fighting for their existence. If people start RFVing dog, cat, and the, then perhaps the policy might need some revisiting. However, for now, I think it works well as it stands. Atelaes 19:26, 29 March 2007 (UTC)
And also, this is a wiki. We don't all need to do everything ourselves. If someone refers a word they don't recognise, without checking adequately, then as you say, it is quick for someone else to correct them (as I've just done for WT:RFV#lose one's rag). OK, it then takes time to add the cites to the page, but I've yet to see a case where a link to plenty of clearly appropriate cites had been added to the RFV page and yet the entry still failed. Either someone copies the cites into the entry, or someone maintaining RFV uses their discretion to strike it from the RFV page, or to leave it in place for another month until one of us has time. --Enginear 19:51, 29 March 2007 (UTC)
Wow, thanks for all this great discussion :-) Alright, it looks like people generally are cool with the current system, at least as long as noone starts indeed RFVing cat and dog.  :-) Altering the subject, what would you guys think of a new {{rfcites}} tag for words which one does not want deleted, but simply for which one feels some in-article citations would make our readers happier?  :-) Mr. MacKenzie brought up the great point that sometimes an article would improve with more cites, but is not one we want to delete outright. A tag similar to rfphoto would be entirely appropriate :-D Thanks Connel, you are a great innovator!!! :-D Language Lover 20:27, 29 March 2007 (UTC)
I think that an {{rfcites}} is an excellent idea for words which are obviously in use, but could use some cites simply as an effort at improvement. However, one thing which would need to be considered is some general guidelines as to what sort of entries could specifically use cites. Because, ultimately, all entries which don't have cites could be improved by them, but I imagine that {{rfcites}} would be used on entries that, for some reason, would especially benefit from them. Atelaes 20:50, 29 March 2007 (UTC)
Also of concern with such a teplate, is Eclecticology's idea of separating {{rfv}} and {{rfv-sense}} onto separate pages. That same distinction (for {{rfvcites}}) seems advisable, so this sort of confusion is (perhaps?) less likely to arise. Maybe. I don't feel like creating a separate scheme for them though, nor am I particularly inclined to monitor yet-another-maintenance-page. --Connel MacKenzie 15:54, 31 March 2007 (UTC)
While I have struck words for being clearly widespread use (and I probably should have done so for attender, but what the heck), I could not do so for usurpress as much as it is evident to me that it belongs. DAVilla 20:10, 29 March 2007 (UTC)

Wiktionary:Contact us and OTRS

You might notice our brand new "Contact us" link in the sidebar. It goes to the new Wiktionary:Contact us page. This is a feature of all Wikipedias, and it leads to various help pages as well as a link to the Foundation email address (OTRS). We decided to start answering Wiktionary-related emails at OTRS, and that's one of the primary reasons for the new sidebar link. I think we'll see what the volume is in the next few days and then determine what to do about volunteers, if anything. However, currently that page is mostly a copy of Wikipedia's page, with the content substituted for Wiktionary equivalents by me. Please spruce it up, make it better, and more Wiktionary-like. Dmcdevit 21:00, 29 March 2007 (UTC)


A few months back, default http://www.dictionary.com/ lookups started displaying translations.

Dictionary.com has always been the number one source for inadvertent copyright violations, on en.wiktionary.org. Particularly, from visiting Wikipedians unfamiliar with our rules and our particular copyright concerns.

To me, there seems to be a direct correspondence between the change at dictionary.com, and the increase in contributors entering questionable translations. It did not occur to me that d.com was the source of the translations, particularly for sockpuppets of people who did not speak those languages.

Could all sysops, when checking translations, please remember to check dictionary.com, to see if any patterns evolve? To me, edits like this are particularly disconcerting. Do we automatically block indef for stuff like this?

Thanks in advance, --Connel MacKenzie 15:33, 30 March 2007 (UTC)

I think block indef would be a large overreaction. However, you have a good point that this is something that should be watched for and stopped. Atelaes 18:08, 30 March 2007 (UTC)
Blocking them for a while and let them explain themselves. If not indef block. That edit you picked out highlights the kind of arrogance some of these people display, a good blocking always brings them down a peg or too.--Williamsayers79 18:27, 30 March 2007 (UTC)
Well deal with it as you like, but please don't block the user who's editing jungle. I'm working with them. Atelaes 18:30, 30 March 2007 (UTC)
I agree. I don't see wht's so disconcerting about that edit, Connel. Eseentially, the only changes are (1) changing the outdated language name "Hindustani" to the more usual "Hindi", and (2) changing the POV so that it leaves open whether the word came into English through Urdu or Hindi (instead of definitiely from one or the other). I don't see either of these changes as potentially a copyvio. --EncycloPetey 21:44, 30 March 2007 (UTC)
I wouldn't say that the term Hindustani is outdated. The term encompasses Hindi and Urdu, and it especially refers to the colloquial versions (mutually intelligible) of both standards (which have become so distant from each other, due to political reasons, that they're almost mutually unintelligible). --Dijan 20:33, 1 April 2007 (UTC)

Math words with many equivalent definitions

In math, there are some terms for which there are many definitions, such that the definitions are all actually the same, but that fact is not at all obvious. For example, computability theory is famous for having tons of definitions of computable functions, which seem utterly different, but turn out to be identical (but proving that takes a lot of work). I wonder how we should define such words. How about if we gave a broad handwavey definition, together with a link to a subpage which lists the most common formal definitions? What do you guys think? :) I'd like to make a page for semisimple, and I'd also like to add computability theory senses to recursive and computable.  :-) Thanks, y'all!!! :D Language Lover 06:21, 31 March 2007 (UTC)

Hopefully, in such cases there will be a comprehensive article on Wikipedia that may be linked. In such cases, a general definition and link to Wikipedia should suffice, since Wikipedia allows for a lengthier discussion. --EncycloPetey 08:00, 31 March 2007 (UTC)
With the current software limitations, I am pretty strongly opposed to "subpage-for-everything" concepts. The "Citations" tab was enough of a nightmare, but at least it has direct lexical relevance to dictionary-making. --Connel MacKenzie 15:56, 31 March 2007 (UTC)
It's not only in maths that precise definitions are required -- some of the terms used in linguistics, for example, seem equally complex to me, and certainly in physics/engineering, work is defined in several ways which all end up with the same result (though I accept this is a lot simpler than the words you are talking of). It's just that the math ones stand out more in a dictionary. I think there's a general consensus that "technical" definitions should be included, provided that they are not too distracting to that majority of users which only wants the everyday meaning.
In the long run, once someone works out how to do it, I like the idea of collapsible sub-sections for precise definitions, and again for cites <onto soapbox>(to me, the idea of Citations sections/sub-pages reliant on glosses is wrong: citations help to draw out the exact meanings of definitions so should be adjacent to them) <off soapbox>. But meanwhile, here's "one I prepared earlier". I did it a few months ago, and I don't really like it. But I've offered it for criticism before, and so far no one's improved the layout.
Obviously, it would look better, to most, if the (more precisely) sections were collapsed, and only opened up to those who wanted them...Now I've remembered it, I'll add some cites soon. --Enginear 18:14, 31 March 2007 (UTC)
I'm sorry, but most of that information strikes me as encyclopedic rather than as definitional. For one thing, what's with the list of common heat sources? Is this to imply that the use of the term "boiler" is dependent on what heat source is used? If a new heat source were discovered today and people started using it to build boiler-like devices and started referring to these devices as "boilers", would you consider that to be an extension of the existing sense? —RuakhTALK 21:58, 31 March 2007 (UTC)
I largely agree, which is why I said "I don't really like" the article. I did it 10 months ago, when I was far less aware of what should/should not be here. I am fairly sure that the short list of prohibited heat sources (hot water and steam) in the (more precisely) sections is definitional and never likely to change. Some of the rest should probably be in Usage notes (since it makes clear circumstances where the word should/should not be used, at least in the building industry), and some should be junked. Improving it is still on my "to do" list, but I won't be offended if you rewrite before I get to it (which will be a few days at least). --Enginear 20:00, 1 April 2007 (UTC)

Much ado about Graphemes

Since I see that the last edit to the Beer Parlour was on the "Connel is an asshole" topic (and I know we've all heard enough of that), I thought I'd try and start a discussion on something a bit more constructive:the formatting of letter entries. Seeing as we currently have about five letters in RFC, and, in my opinion, most of our letter entries are kind of messy and unstandardized, I thought a BP discussion on the topic was in order. In my opinion, the first topic which needs to be covered (and has been much discussed on the RFC page, without any conclusion) is what header does a letter go under. The precedent appears to be translingual, and perhaps it should stay that way. However, Stephen has made the excellent point that, while most Latin letters are used in a slew of languages, most other letters are somewhat more restricted. Take the letter β for example. As far as I can tell, it is used only in Greek. Now, bear in mind that I'm taking specifically about β as a unicode character. Coptic also uses beta, but it is a different beta: . So, in that respect, it's not terribly translingual. How about the character 𒊕 (don't worry, I can't see it either), used in Sumerian and Akkadian? By the way, there is an insightful discussion about this topic centering around that particular character at Wiktionary:Requests for cleanup#𒊕. Another question which arises is what information should be included at the entry for a letter. Should it have a pronunciation for every language which uses it? Should these pronunciations all be within a single L2 header, or should each language receive its own L2 header with a pronunciation section within it. If they're all within a distinct Language header, what's the part of speech? And where would we include IPA in all this mess? Does a letter get an etymology? It certainly descends from something (in the case of β, it comes from the Phoenecian letter 𐤁). I imagine there is a whole slew of other issues that could be raised, but I figure that's enough to start. To facilitate this discussion, I've created the entry β/test, which is an identical copy of the β page. I figure we can use this page as a testing ground for different ideas without worrying about presenting the users with some half-baked crap. I picked a Greek letter because it's a bit less complicated than a Roman letter (fewer languages), and thus seemed rather more appropriate for testing. It even comes with it's own special template: {{greek letter-temp}}, which is only used on this page. Hamaryns has asked that this template be cleaned up anyway, so I figure people can fiddle around with it a bit without screwing up all the Greek letters in the process. So, any takers? Atelaes 12:38, 2 April 2007 (UTC)

I’m not having any luck with the encoding [[&#55304;&#56981;]]. I wonder if you mean 𒊕 Wiktionary:Requests for cleanup#𒊕. The original works for me, as does the last, but not &#55304;&#56981;, it looks like <|=| H. (talk) 09:51, 4 April 2007 (UTC)
Yes, besides that fact that some scripts such as the Roman alphabet are used for many languages, other scripts are used only for one or two languages (e.g., Lao, Burmese, Thai, Korean, Oriya, Khmer, Tamil, Malayalam, Cherokee, and so on). Also, there are some letters in common scripts such as Cyrillic that are used for only a single language (e.g., Cyrillic is used by a number of languages, but the Cyrillic letter Template:RUchar is limited to Chuvash). And while some scripts are used by multiple languages, there are some languages that use multiple scripts (e.g., Serbian).
In my opinion, the way I set up the Cyrillic and Arabic scripts takes everything into account and works well. See for example ж and Template:ARchar. The principal header is the name of the script (==Cyrillic alphabet==, ==Arabic alphabet==) and the next level heading is ===Letter=== (in the case of alphabets), ===Syllable=== (for syllabaries), ===Logogram=== (such as Sumerian).
As you see in ж and Template:ARchar, the "definition" lines indicate the position in the alphabet and the pronunciation in the different languages that use that letter in that script, in alphabetic order.
After the script section that describes the letter, if the letter in question is also a word or abbreviation of some languages, these sections follow with second-level headers (==Russian==, ==Urdu==, etc.).
I always thought that that word translingual was very odd for this purpose. A few scripts such as Roman are used by many languages (translingual), but most scripts are not. And there are some letters in the Roman and Cyrillic alphabets are restricted to a single language. To me, symbols such as !@#$%&*()-+=/., are translingual, because they are used not only by virtually every language that uses the Roman alphabet, but also by languages that use several other alphabets, including Cyrillic and Greek (although the meaning of specific symbols vary from language to language even if the typographic symbol does not.
We have had short discussions about this several times over the last couple of years, but nothing has ever been decided. I did the Cyrillic and Arabic alphabets and could do the some for many other scripts, but I don’t want to do it when this is still all up in the air the way it is. —Stephen 18:25, 3 April 2007 (UTC)
I think you set those up very nice, but am unlucky with the L2 headers X alphabet. That’s why I would propose to just put them one level lower, with, indeed, Translingual as the l2 header. As I suggested in the page about the cuneiform above, we probably want to think of a better word instead of translingual. Maybe ==Symbol==, with l3 ===Cyrillic letter===, ===Roman letter===, ===Cuneiform logogram===, ===Diacritic===, ===Ligature===, ===Reading mark=== (for !;:.$, but there is probably a better English word for that), ===Mathematical sign=== (for +-%#∃∄∃∈∉...), ===IPA symbol===, ...
For the other use of Translingual, for things that are more than symbols such as µg, ff, mW etc., it may be kept, or we think of a second alternative. H. (talk) 09:22, 4 April 2007 (UTC)
Thanks for bringing this up. As some might have noticed, I’ve been spending a lot of time on the first few letters of the Greek Alphabet. β sort of represents what I think it should look like now. But if you browse through the recent history, you see that I’ve come a long way to getting to that form. I would recommend that (and perhaps also for α, γ, ...) before commenting here. H. (talk) 13:08, 2 April 2007 (UTC)
Wait, I thought Wiktionary treated Modern Greek and Ancient Greek as separate languages? β is used in both.
At any rate, I've been thinking about this for Hebrew letters, and what I was thinking was:
  • letters are translingual; even if the letter is only used in one language's writing system, actual graphic references to the letter are the same no matter what language the referring text is in.
    • they don't have a pronunciation per se, though we could have e.g. an Appendix:Greek alphabet that gives that kind of information (though insofar as this borders on a discussion of the phonological history of Greek, it might be more appropriate on Wikipedia).
    • uppercase and lowercase letters are separate, for a few reasons, most notably that case mappings depend on the language (e.g., in Turkish "i" and "I" are different letters, with "i" having a dotted uppercase counterpart and "I" having a dotless lowercase counterpart), and that the Greek lowercase letters should have their uppercase counterparts as their etymologies.
  • names of letters are language-specific; for example, "beta" is an English word that refers to both β and Β.
  • the definition of a letter in an ordered alphabet should link to its predecessor and successor. (The reason I say this should go in the definition is that the same letter may appear in multiple alphabets — especially common with Latin-based and Cyrillic-based alphabets — in which case the letter's predecessor and successor may vary. For example, n should have a separate definition for the Spanish alphabet, giving ñ as its successor.)
  • if a letter is a specific form of an abstract letter (like β is of beta, and a Japanese katakana character is of a kana character), then it should link to the other forms.
  • So, for example, I think β should be something like "==Translingual== ===Letter=== β # Lowercase beta, the second letter of the Greek alphabet (uppercase form Β), coming between α and γ." (Plus the other definitions at that page, obviously.)
Is that reasonable at all?
RuakhTALK 16:47, 2 April 2007 (UTC)
Excellent conclusions. I especially like the distinctions you make between the symbols and their relationship to the alphabets, in case and in order.
Does any transligual word have a pronunciation? How many different expanded forms are there for ? Apparently translated as знак номера in Russain! How many ways are there to say TAXI? In Nigeria, /dag'zi/... DAVilla 19:20, 2 April 2007 (UTC)
Actually, I think this is a case where translingual is misapplied. As far as I know, languages that use the Roman alphabet do not use the symbol . It is very familiar and perfectly readible, but it is quite unusual to actually use it in a text. Cyrillic is a better description of , because langauges that use Cyrillic do not have the letter N readily available (at least in pre-Unicode days), and therefore that symbol is specifically provided on Cyrillic keyboards (the uppercase of the number 3 or sometimes 4). Japanese uses various things for this, including , etc., and as far as I know, the symbol , although it appears to be a Roman symbol, is Cyrillic only. —Stephen 18:57, 3 April 2007 (UTC)
Once again: if we simply think of an other word which does not have the connotation that it is to be used in more than one language, this could be solved. I am all for one header which applies to everything which is not a word in a language and thus does not fit under a ==language== header. H. (talk) 09:22, 4 April 2007 (UTC)
Indeed. Grumble, grumble, now I’ll have to redo a lot of my greek letter work. And indeed you’re right that the lower case forms were derived from the upper case ones, I should have thought of that. Hm more input still welcome. H. (talk) 10:14, 3 April 2007 (UTC)
I don't think your template is useless. Just put it under a language-specific header. The example above would be placed in Greek. There's no reason not to define it in both the Translingual header and in the languages from which the letter actually derives. But realize that there could be more than one instance of the template on a page, and work towards making a more concise format when shared by may languages, e.g. on n. DAVilla 12:16, 3 April 2007 (UTC)
No. I don’t like that at all. The template is some sort of extra thingie, it would be ugly if there were more than one on one page. But it can be extended to allow for more than one previous / next letter, by using named params or something. I’ll give it a try, since the problem just arised for ζ (sixth in modern Greek, seventh in Ancient Greek).
I think n is a bad example, it really needs some cleaning up (which I’d be happy to do, once this discussion has settled, and I finished the Greek alphabet, and perhaps some other ones :-) ) H. (talk) 09:22, 4 April 2007 (UTC)
I had a go at {{greek letter-temp}}, to accomodate more than one previous letter, and put it into use in β/test. Have a look. H. (talk) 09:51, 4 April 2007 (UTC)

And by the way, do we also have to make distinctions between different forms of letters that are conflated in English? The two lower-case a's have different meanings in IPA. This is handled by unicode, as are, strangely, a number of other very similarly looking characters, but what about cases that are not? The number 7 can have a stroke through it in some parts of the world, two strokes in others. In Taiwan the left bar of the 5 extends upwards vertically. (I have even had my handwriting "corrected" by a local.) Print, in block letters, the words "island", "glands", and "sliding" and then compare them. You might be surprised! What about symbols that don't have a unicode equivalent, such as the happy face, many of the more obscure and antiquated astrological symbols, and some of the symbols used in print by various magazines, journals, etc. to indicate the end of an article? DAVilla 19:41, 2 April 2007 (UTC)

I think every unicode symbol deserves its own page. No redirects at all. The fact that different languages use different orders in the alphabet makes templates like {{greek letter}} uselessdifficult to use. That’s a pity though, since they are nice. Anybody have an idea how to combine the two? One such table per language using the letter is absurd, but something similar would be nice. H. (talk) 10:14, 3 April 2007 (UTC)
What if they're exactly the same symbol, just with different uses? Why bow down to Unicode? DAVilla 12:20, 3 April 2007 (UTC)
It's not a question of "bowing down" to anyone or anything. What can anyone possibly look up terms here, using? Since the distinction by spelling has already been made, it seems only reasonable to extend that same by spelling (of headword) to individual symbols. --Connel MacKenzie 03:36, 4 April 2007 (UTC)
Good point. Unicode gives a short description of each symbol, maybe for a starters it is possible to import that with a script? And even non-Unicode symbols are welcome, but do they still exist? H. (talk) 09:22, 4 April 2007 (UTC)
Does the happy face have a Unicode character? Might seem silly, but remember we're talking about a noncommercial symbol that's instantly recognizable internationally and used in writing today. DAVilla 16:16, 4 April 2007 (UTC)
Indeed, it has two: (U+263A, WHITE SMILING FACE, = have a nice day!) and (U+263B, BLACK SMILING FACE). —RuakhTALK 16:31, 4 April 2007 (UTC)
Wow! Unicode is so complete that counter-examples are clearly difficult to come by. Really compelling ones, that is. The handicap sign and boy/girl stick figures just aren't used in running text. Not that I'm aware of, anyways. For the more contemporary ones, I'm sure I've seen a little symbol for a TV here and there, as a fancy bullet or what have you. Nah, maybe just an icon. What are we down to, the ancient Chinese only coded in BIG-5? DAVilla 00:06, 5 April 2007 (UTC)
I just read up about Chinese in Unicode (due to the decomposing suggestion below): there are 70000+ Chinese ideographs in Unicode, so you’ll have to search far to find some which aren’t, but indeed, they do exist (there are some examples in the document referenced in the below discussion). And I’m pretty sure there are some obscure mathematical symbols which aren’t, yet. But eventually they all will be, I suppose. Hell, even the most abstruse cuneiform symbols are in there. H. (talk) 10:38, 5 April 2007 (UTC)
By indices, similar to Chinese characters. The symbol (that is, one of the symbols) for Pluto uses a combination of P and L. Going from the planet to the symbol is easy. In the other direction, if you found it online and wanted to look it up you could copy and paste it. But if you saw it in a book and you didn't know what it meant, there would be no other way of telling the computer "look at this and tell me what it is" than to decompose it.
I have no objection to making a separate page for each unicode character. It's not certain that it's the ideal solution but it's certainly the most clear one. I would jost hope that some of them are very closely linked, even tighter than a simple "see also" at the top. DAVilla 16:09, 4 April 2007 (UTC)
We might make exceptions for symbols that are only present for backwards compatibility purposes, though, such as CJK Compatibility Supplement: U+2F800–U+2FA1D. H. (talk) 10:38, 5 April 2007 (UTC)
You don't have any choice to make an exception in this case ;-) the WM software (correctly) maps to the standard character, so you can't make an entry at the compatibility code-point. FYI: User:Robert Ullmann/Han is a complete map of the CJKV/Han characters we have. Robert Ullmann 11:53, 5 April 2007 (UTC)
I've altered β/test to conform to my vision of what the proper formatting should look like, which can be seen here. I suggest that others might consider doing the same, as it's much easier to see the stuff in practice than in theory. I've put my name at the top as an L1 header, in case others put their own versions, just so it'll be easier to keep track of whose version is whose. A few notes: First of all, it should be remembered that Ancient Greek did not actually use this letter, which is kind of interesting. We use minuscules in our Ancient Greek words because that is the general standard in other Ancient Greek works. I think it best to simply get the Wiktionary Ancient Greek section up to the standards of other lexicons before trying to outdo them. But it's something will will certainly come up in the futre, but is not really germane to this particular conversation, and so I'll drop it for now. I've dropped a lot of the stuff which should really be on the majuscule version's page. All of the information which I feel is specific to the character (outside of the context of any specific language) I've put under the translingual header. Everything which depends on the context of a specific language, I've put under the headers of the languages. As for the template, I think that, with a bit more tweaking, it could be general enough to be used for most languages, and would be best used in the language sections on the letter entries. Atelaes 21:54, 4 April 2007 (UTC)
Good idea, I put my version in its own section below it: [9]. I borrowed some of your ideas, and interspersed mine with small comments, where suggestions are welcome. Most important I find that I use the template only once, with the accommodations I made to it to have multiple previous/next letters for different languages. I am not enough of a historian to decide on some points. which I put in the comments. H. (talk) 10:38, 5 April 2007 (UTC)
That is an excellent idea (I was thinking people would each just have a version, but your idea is much better. The facts are, ultimately, unimportant at this point, only the format. Atelaes 15:32, 5 April 2007 (UTC)

Some input please

It seems that only Atelaes and me are interested in this any more. What do others think of my suggestion to use ==Symbol== instead of ==Translingual==? Who else wants to experiment with β/test? Stephen, you at least should have a go. I want this settled, so I can continue with the Greek alphabet. H. (talk) 15:25, 6 April 2007 (UTC)

I can accept ==Translingual== for symbols that are used by numerous languages and even in different scripts (Roman, Greek, Cyrillic, etc.), such as !@#$%*()[]/:;,.?, but it strikes me as silly if the symbol in question is only used by one language and in only one script, such as (a Tamil "ka"). There is noting "translingual" about it. So, Symbol would be a better choice, although still a problem in some cases, since the alphabets used by some languages include digraphs, trigraphs and tetragraphs (e.g., Dutch IJ, ij). If a tetragraph can be considered a "symbol", then it wouldn’t be too bad.
However, if we use Symbol, then some "symbols" will be letters of alphabets, some symbols will be punctuation, some symbols will be numerals, and some symbols will be symbols (e.g., @#$%*)). That means that there would be cases where the L2 heading was ==Symbol== and the L3 heading was also ===Symbol===.
Besides ===Letter===, ===Symbol===, and ===Punctuation===, there will also be ===Logogram=== (e.g., Sumerian, where a glyph has both syllabic and semantic value), and ===Syllable=== (e.g., the syllabaries of Amharic, Oriya, Gujarati, Bengali, Thai, Khmer, Lao, and so on). Also, there are some true alphabets that only write "letters" that have been composed into complete syllables (e.g., Korean, Phags-pa).
So I still hold that the name of the script (Roman alphabet, Cyrillic alphabet, Greek alphabet, Cuneiform script, and so on) are the best choice for L2 headers, keeping the type of symbol (punctuation, symbol, letter, syllable, logogram) for L3 headers. But if it comes down to "translingual" vs. "symbol", I much prefer "symbol". —Stephen 05:10, 15 April 2007 (UTC)
There is a serious problem with using things other than languages at L2: there are hundreds of bots and programs that read the en.wikt, to add entries to other wikts, to extract various kinds of info, etc. Level 3/4/5 headers (if valid) are in a smallish set, 50 or so; a program can have a table of what it is interested in, and treat others as unknown/errors. But at level 2, the program cannot reasonably have a "complete" table of the languages (7000+ coded now), so the only way it can parse the heading is to recognize "Translingual" as not a language, and treat all of the others as language names. And that is what they do. If there is another open-ended set of headers at L2, with no syntactical indicator that they are not a language, the parsing is irretrievably broken. And we don't have any syntactical indicator. (If we were using XML or something, we'd use L2-lang and L2-thing or whatever.)
More abstractly, to maintain the ability to abstract the semantic meaning from the entry syntax, L2 must always be a language name.
The other point is that "Translingual" is exactly the right header for the Cyrillic and Arabic alphabets, each is used in dozens of languages. (And the letters aren't "symbols".) Things like the Tamil "ka" can just be under Tamil (as all of the Hiragana entries are under Japanese.) Robert Ullmann 12:00, 15 April 2007 (UTC)

English to Arabic wordlist relicensed to GFDL

Arabeyes.org is proud to announce that its GPL English to Arabic wordlist was relicensed to GFDL to meet the Wiktionary needs. The source PO files can be found here. It already has a web interface named Qamoose. It can be a valuable addition to the Wiktionary. --Chahibi 01:23, 3 April 2007 (UTC)

I'm quite limited on Wiki-time right now, myself. Please (everyone?) see Help:Bots / WT:BOTS etc. (The help page is obviously my first draft - please be bold rewriting it.) I think if 20-30 of our current admins take an hour to install the bot framework, we'd have a respectable pool of bot operators to draw from (and much greater understanding of the advantages and limitations, all around.) --Connel MacKenzie 04:54, 3 April 2007 (UTC)

Words that are the same in other than English language.

What to do with words that are the same word as in English, in some language other than English, and with largely the same definitions? I.e. most of the time words that come from Latin or Greek, such as epsilon: in Dutch it means about the same as in English (of course), except for the computer science meaning. The question is: what to put in the Dutch definition line:

# [[epsilon#English]] (letter, mathematics, phonetics) 

i.e. a short gloss (but not so nice, and can get long if a lot of definitions coincide) or

# The name for the fifth letter of the [[Greek alphabet]].
# {{context|phonetics|lang=nl}} The [[IPA]] symbol that represents the [[w:open-mid front unrounded vowel|]].
# {{context|mathematics|lang=nl}} An [[arbitrarily]] small [[quantity]].

i.e. a repitition of the English definitions? H. (talk) 10:35, 3 April 2007 (UTC)

Other languages use a translation, not a definition, where possible, which means that the first option is better. However, there should still be a separate definition for each foreign sense of the word. That might mean making three definitions which all translate to the same English word, with three different glosses. DAVilla 12:08, 3 April 2007 (UTC)

{{trans-top}} and AutoFormat

At Connel's request, I added code to AutoFormat to convert top/mid/bottom only within Translations sections to trans-top/etc.

If you add {{rfc-auto}} to an entry when editing it will find the entry, even if not run for a while.

The gloss is correctly folded into the template if it is ;... or ... a few variant cases won't work (see name), these show up in Category:Translation table header lacks gloss. This is only done in the Translations section; top isn't supposed to be used elsewhere, but often is. Robert Ullmann 11:06, 3 April 2007 (UTC)

The bot probably shouldn't touch anything under {rfc-trans} or {checktrans} either, or if it does then it should treat those cases specially, with the "gloss" being 'Translations to be checked' or similar. DAVilla 12:05, 3 April 2007 (UTC)
If the "Translations to be checked" header is there it won't. (You might be surprised at how often it changes "Translations to be categori{sz}ed" to the correct header ;-) Stopping at either of those two templates is a good idea; will do; it will just leave the rest alone. Robert Ullmann 12:11, 3 April 2007 (UTC)
Thank you. Shall I change all "{{top}}"s to "{{rfc-auto}}{{top}}"s?  :-) --Connel MacKenzie 03:39, 4 April 2007 (UTC)
Please don't. I've cleaned out a number of the table-header-lacking-glosses entries in that category and found the work to be tedious and mundane. In a few of cases I actually had to write a gloss, or used one the bot missed, but on most pages it was unclear and all of those translations had to be ttbc'd, and adding ttbc tags is a repetitive chore. On the other hand in a few cases like summer I was able to do some research to discover when the second sense was added, and wound up being able to write a gloss after all, one that applied to translations in several dozen languages. I think we should strive for that kind of solution, not overburdening the translation work any more than it is, and I feel that there's a lot of clutter that we really don't need to be digging up until there's a more automated solution. In other words, marking those where a gloss does not exist does not solve any problems. It floods the more interesting work with trivial tasks that really only pass the buck onto the translators. I don't have an immediate solution, although hopefully some day about half of the checktrans traffic I think could be eliminated with a bot that were history-aware. Maybe someone else could clear out part of the category and get a feeling of what sort of things need to be done. DAVilla 15:45, 4 April 2007 (UTC)
Please note my "smiley"! --Connel MacKenzie 20:19, 5 April 2007 (UTC)
By the way, you'll find that the fewer the number of definitions, the easier it is to salvage the table. But the majority were ttbc'd as I said. DAVilla 15:48, 4 April 2007 (UTC)
If we were to do this, there is a much easier way (add the cat to {top}!); but we shouldn't do that yet. I've changed the code for now to not convert the templates where it can't find the gloss. (So as to avoid flooding that cat for now.) If you wanted to tag entries that have ''' or ; at the start of one line and {{top}} on the next, that might be useful. Then we can see where we are. I wonder how many instances of top outside of translations sections we still have? Robert Ullmann 12:06, 5 April 2007 (UTC)
Answers to my own questions: top is used about 24 thousand times, in just over 15 thousand entries; about 12 thousand do not have glosses. It is used about 700 times outside of translations/ttbc, where it shouldn't be used; mostly in derived and related terms. Robert Ullmann 15:57, 10 April 2007 (UTC)

Components of Chinese characters

I'm not sure if this has been discussed before (the discussion archives are a bit difficult to search), but the Chinese character entries are missing a decomposition into components, as described in wikipedia:Radical (Chinese character), subsection "Character decomposition".

The decompositions could be given as Unicode ideographic description sequences (see [10], figure 11-8) and if necessary also in some other format. It would also be useful to have indices based on them, as most dictionary programs have a way of doing component search and Wiktionary should too. Multicomponent search and other such complicated things could be left to external software which could just get the indices from Wiktionary. The ultimate wiktionary project could also provide the extended search functionality if/when it materializes.

Of course there are many characters that are hard to produce good decompositions for, but most are easy, and there's no need to fret over the details. Simple graphical decompositions provide good enough indices for searching. Actual radicals and etymologies etc. are also a separate matter. If there's some kind of decision on this then one could start adding the decompositions right away, just like stroke order diagrams are being added incrementally. -- 11:32, 3 April 2007 (UTC)

Thank you for your suggestion. A character decomposition section may indeed prove useful to someone wishing to know more about a particular character. I would anticipate that the most challenging aspect of such an undertaking would be the shear amount of time and effort involved in inputting such information. Unless a non-copyrighted database containing this information is already in existence, we would have to type this information by hand, one character at a time, into Wiktionary. My hope is that some day, we will have enough Chinese speakers to tackle such tasks in a short amount of time. For the time being, there are only a handful of contributors that work on Chinese entries. Of these, I'm the only one fluent in Chinese that regularly contributes Chinese words (Mandarin and Min Nan). My main activities to date have been focused on two areas:
  1. creating entries for useful Chinese words and phrases that are not found in other Chinese-English dictionaries
  2. creating entries for words found in the Appendix:HSK list of Mandarin words
I also recently finished the Appendix:Amoy Min Nan Swadesh list, and completely revised the Appendix:Mandarin Swadesh list that originally came from Wikipedia. If you are interested in working on character decompositions yourself, there are several of us here who could offer formatting suggestions, proofreading etc. If this sounds like something you would like to work on, I would suggest that you create an account for yourself. Once you have done that, you should read WT:ELE and WT:AC. -- A-cai 12:18, 3 April 2007 (UTC)
It is something I'm interested in, but I don't tend to contribute much on wikis. I would contribute decompositions now and then if there was an accepted format for them. I don't know any Chinese though, only Japanese.
There's no public domain database of decompositions that I'm aware of, but there is a GPL one at [11]. GPL is unfortunately incompatible with GFDL, even though both are GNU licenses. You can do searches on the aforementioned database at [12]. E.g. if you enter 糸車口 it gives you a list containing 轡, and with 肉退 a list containing 腿 (because the 月 is 肉月 you have to enter 肉; I think it would be more useful to allow 月退 too as that's what it looks like graphically). It allows both the actual radicals and their meanings, e.g. both ⺅中 and 人中 give you 仲.
Anyway, as there's an existing (free, even if incompatible with GFDL) implementation, it's both possible and useful. I don't think there's need to do this in a short amount of time - it's not like this information will become obsolete any time soon. It will eventually be complete even if done little by little. If there were a few examples and maybe a category of "Character decomposition needed" like there is "Cantonese definitions needed" etc, a casual visitor like me might add a few when they see they're needed. I've added some entries from time to time for Japanese words and would do that for character decompositions if there were an accepted format for it. -- 12:59, 3 April 2007 (UTC)
How about you just go ahead, create one or two entries as you see fit, post them here, and then others can comment on it and make suggestions. Someone has to be the first... H. (talk) 10:09, 4 April 2007 (UTC)
Robert, I'm thinking that this is something that should be in your Template:Han char template under the translingual section. Do you think it would be a problem to add a variable to the template? If we use as our model character, then the character decomposition would look like: 宀子. We would put this information under a variable called comp or something. For example:
{{Han char|rad=子|rn=39|as=03|sn=6|four=3040<sub>7</sub>|canj=十弓木 (JND)|comp=宀子}}
would produce:
字 (radical 39 子+03, 6 strokes, cangjie input 十弓木 (JND), four-corner 30407, composition 宀子)
That should do the trick I think. -- A-cai 23:05, 3 April 2007 (UTC)
I'd also like to see for example 字 listed on both and or the proper indices. The radical is more important of course, but this dictionary is not limited by paper constraints. DAVilla 15:54, 4 April 2007 (UTC)
I think it would be nicer to use IDS descriptions instead of a plain list of components. E.g. 字 would be "⿱宀子", 轡 "⿱⿲糸車糸口" and 疑 "⿰⿱匕矢⿱龴疋". This way the layout and the count of each component are also present. IDS is originally meant to describe characters missing from Unicode to the reader, so having such descriptions would also be useful if the user's font is lacking some rare characters that are in Wiktionary. Having a list of these would also facilitate advanced searching in external software (such as browser plugins or free dictionary software). Simple indices should of course ignore this extra information, as that would get too complicated. Simple component lists (i.e. "宀子", "糸車口" and "匕矢龴疋" for the above) are not bad either, but I think the extra information with IDS is useful, too. -- 17:46, 4 April 2007 (UTC)
This is probably obvious, but.. The component list should be restricted to characters that have entries in Wiktionary and be linked there. Index:Chinese_radical lists the radicals. There are some compatibility characters in Unicode that look the same but don't have Wiktionary entries, e.g. ⼥(U+2F25) vs. (U+5973). As a result some differences will have to be ignored, e.g. instead of using the compatibility characters ⻌⻍ one would always use . In the same vein characters like would be decomposed as and . Using instead of is better because that's how it looks like; similarly is better as and than and . -- Coffee2theorems 13:31, 7 April 2007 (UTC)

As it looks like the discussion has died, here's a concrete proposal (much the same as A-cai's above): Add parameter "comp" to Template:Han char, e.g. comp=⿱, and display it as composition. Indexing by these may be done later. At least for now such sequences should be limited to elements that have Wiktionary articles (or at least redirects) and all the elements should be linked. If such a parameter is added I'm interested in adding decompositions from time to time.

Examples for 10 random characters: 付=⿰, 鳴=⿰, 鬩=⿵, 蛾=⿰, 掴=⿰, 職=⿰, 潔=nothing for now, 核=⿰, 巾=nothing because it's atomic, 余=nothing for now. The "nothing for now" characters didn't have decompositions into Wiktionary characters that I'd consider obvious (although they can be decomposed), so I let them be. I believe most characters can be described this way. These are somewhat useful even without indexing (knowing the components helps in learning the characters for instance) and there's always the "what links here" page. -- Coffee2theorems 05:34, 30 April 2007 (UTC)

I added ids= (as being more specifically IDS than, say, "comp=". You should think about whether you really want to link them; if you do, you break the Unicode IDS sequence: a browser or extension that would render them cannot. Without linking, they are an IDS sequence both in wikitext and in HTML. (Note that we can always automatically link or unlink all of them later.) And you are correct above, we never use the compatibility characters, only the standard ones in Han Unified + Ext A + Ext B. Robert Ullmann 12:59, 2 May 2007 (UTC)
Thanks! I tested it on , looks like it works. Good point about breaking the IDS sequence, I didn't think of that. I still prefer linking for the following reasons:
  • links are helpful to the reader
  • visually (from the reader's, not software's point of view) the IDS sequence is correct, and the description is meant for reader's consumption
  • there's no widely used IDS renderer as far as I know, and special rendering is not required by the IDS specification
  • such rendering may not work at all correctly if later someone wants to use less obvious sequences (e.g. of the kind ⿰水十 instead of ⿰氵十 to represent 汁 for cases where a 氵-like alternative form character does not exist)
  • all the characters this is used for already exist in Unicode, and if e.g. ⿰氵十 were rendered as 汁, there would be no purpose in using the description at all (unless one could still copy/paste the parts, but still linking is better)
  • non-standard characters are easy to spot because they become broken links
  • as you say, this can easily be changed later automatically
Basically I suggested IDS because it contains slightly more information than a pure component list and may in the future be useful for indexing. Many (all?) electronic kanji dictionaries allow you to search by components (or do a search such as "kanjis which contain a component with this reading") and in SKIP codes there's precedent for indexing by structure (e.g. stroke counts of left and right parts). A full description is the most general way possible and there's little extra cost in it. -- Coffee2theorems 16:18, 2 May 2007 (UTC)
I added these for the easy cases of Grade 1 kanjis (though I may have missed some). I described as ⿱ despite its etymology, as that is what it looks like. I didn't add ⿴ for or ⿱ for yet. A common similar case outside grade 1 would be e.g. ⿰ for . As etymology is not such a simple thing and there's already a section for it, perhaps it would be best to use the IDS field for a graphical decomposition and leave the etymology to its own section. The other choice would be to use the etymological decomposition (e.g. ⿰ for ). One could also give both. Thoughts? -- Coffee2theorems 17:46, 2 May 2007 (UTC)
The etymology should (must) always be how the character originated. IDS is purely descriptive, as defined by Unicode. So they are definitely different in some cases. Robert Ullmann 17:51, 2 May 2007 (UTC)
I will consistently use a graphical description for the IDS field then. -- Coffee2theorems 11:53, 4 May 2007 (UTC)
Would be good to also add the etymology when you know it is different. Can be very simple: "From (flesh) + {hide/hidden)." Someone else can go into more detail if they have the reference information. Robert Ullmann 12:04, 4 May 2007 (UTC)

Getting backing before making drastic changes.

I've been reprimanded recently for making changes to some of Wikt's pages. Sorry for that. I'm still quite new tho'. To make my point, where does one go to get support for changes here? One example is my recent creation Template:Keene-un. This is a template which I figure is used to save time, and isn't a 'bot, so is it ok to use? Do I have to get backing to use it? Also i editted WT:ELE recently, making only minor changes to improve the flow of the page, but got blocked for it. Is Wikt so stringent as to worry about things like this? --Keene 23:14, 4 April 2007 (UTC)

I don't know what the policy is for having personal templates in the common space. I guess there should be some recommendation about it since it's easier to type {{subst:Keene-un}} or the like than {{subst::User:Keene/un}}. These kinds of templates are useful, and could be developed into a Go-failed button. Do make sure you do substitute it though, including the 5 pages listed under "What links here". DAVilla 00:14, 5 April 2007 (UTC)
I have done this {{xhan}}; of course I can just delete it myself when I'm done. I don't think it is a problem if you make sure the name doesn't conflict with various reserved spaces (2 and 3 letter templates, and things starting with 2 or 3 and -). "keene-un" seems reasonable. Make sure it says in noinclude tags that it is yours, and can be deleted if left around, and do tag it with {{delete}} when done.
As to the WT:ELE edit, you did more than "improve the flow"; you deleted important text, explaining that they should not be entered manually. (IMHO, the section could be reduced to just that sentence; it is the only thing most users need to know: don't add or modify iwikis!) Robert Ullmann 12:16, 5 April 2007 (UTC)
WTF? Why aren't you just using the preload templates? Is there a bug in one of them? --Connel MacKenzie 21:33, 5 April 2007 (UTC)
But why was he blocked for this? The edits don't appear all that radical. Granted, he deleted the last sentence, which was perhaps a mistake. But, it does not appear to be a malicious act on his part. As for the preload templates, maybe as a new contributor, he did not know about them. Am I missing something? -- A-cai 05:52, 6 April 2007 (UTC)
The 3-day block seems a little harsh. Aside from the last paragraph, the edits did not change the substance. But that issue is completely unrelated to this. DAVilla 21:38, 7 April 2007 (UTC)
Hehe... what are preload templates? *Language Lover deftly dodges all the thrown tomatoes and eggs* Language Lover 14:01, 6 April 2007 (UTC)
Is there no process by which contributors can go about making new tools? This template clearly had a more specific purpose than any of the preload templates provided. DAVilla 21:38, 7 April 2007 (UTC)

Thesaurus resource


--Connel MacKenzie 20:17, 5 April 2007 (UTC)

Before I start the pagefromfile.py to populate Wikisaurus with some real entries, does anyone have comments on this? --Connel MacKenzie 05:57, 7 April 2007 (UTC)
As thesaurus entries are generally interesting, I plan on not requesting the bot flag for these, to increase visibility, and throttle them to one entry per 20 minutes so people can fiddle with them. --05:59, 7 April 2007 (UTC)
I thought the argument against using a bot for the Thesaurus was that entries were too complex and required close scrutiny of the precision of a given term for a given definition. I would be interested to see what pagefromfile pages looked like, but I have to imagine that very few meaningful pages would emerge from them. - [The]DaveRoss 01:39, 11 April 2007 (UTC)
Wow I wish I had your mad skills at programming, Connel :-) A programming master like you is a great boon to the wiktionary. Let's turn Wikisaurus into a Wikisaurus REX!! :-) Language Lover 02:16, 11 April 2007 (UTC)

Time to whittle

Original by dcljr

The entry for time has become our longest regular definition page, at over 40K, thanks to hundreds of "Derived terms" added by User:Paul G in February. I wanted to bring people's attention to this because it seems to me that many of the added terms are unnecessary, being either technical terms that probably don't warrant their own entry here (such as acquisition feeding time or clot retraction time), terms that are [arguably] easily understood by considering their constituent words (such as at what time or closing time), or alternate forms of other derived terms (such as about time too, when about time is already listed). (Note: I've notified Paul G about this comment, in case he wants to respond.) - dcljr 22:06, 5 April 2007 (UTC)

I would be quite happy to keep them all (Wiktionary is not paper). They are nicely hidden, and we might even get around to defining some of them one day. I am a bit miffed that he has beaten my list of defined terms at poly- (definitions in progress). SemperBlotto 22:23, 5 April 2007 (UTC)
What I don't like about this is that is obscures the more critical words like timely in this huge list. I have suggested before another section called Compound terms which would take phrases and compound words, those formed by simple concatenation of words with or without spaces, and leave Derived terms for the remaining words, those being words formed as blends and in particular with affixes. However, I'd like to hear what User:AutoFormat has to say about this since he or she likes to revert my edits and is clearly more knowledgeable on what would be best for Wiktionary with regard to this matter. DAVilla 07:38, 6 April 2007 (UTC)
Sounds like a WT:VOTE is needed for "===Compound terms===" then? --Connel MacKenzie 06:07, 7 April 2007 (UTC)
Does anyone have a better suggestion for what to name them or how to define the differences? A good test case might be vineyard. Should I bundle into the proposal that their priority placement is much lower than Related terms, even lower than Tranlations? If it's a 3-level header then it isn't dependent on part of speech. Is it dependent on etymology? DAVilla 21:29, 7 April 2007 (UTC)
Compounds like at what time are completely transparent to fluent English speakers, but if you've ever studied other languages, you know that these are actually very idiomatic. The prepositions are mostly arbitrary. For someone learning English as a 2nd language, such constructs are not transparent at all. Now as for the bigger issue... I seem to be in the minority for being in favor of making lots of "/" subpages. If I were a supreme arbiter, I'd make a list of the most "important" derived terms, and below that, have a link to a subpage with the complete list of derived terms. :-) What does everyone think of that idea? Language Lover 13:56, 6 April 2007 (UTC)
Subpages are NOT supported for this stuff, by the WM software. Don't use subpages for anything other than "Citations" (which has only rudimentary SW support.) --Connel MacKenzie 06:02, 7 April 2007 (UTC)
Long pages are not bad, in and of themselves. --Connel MacKenzie 06:02, 7 April 2007 (UTC)
Derived and compound terms should be dependent on etymology, yes; rush hour is certainly unrelated to the Old English rysc. -- Beobach972 19:41, 9 April 2007 (UTC)
Well, yes, but currently, as derived terms, they depend on more than the etymology. Being level-four headers they would depend on the POS. This is deliberate and supported by Paul G. But I'm not the only one who has had difficulty in classifying them. At the same time, for those that are classified correctly, do we want to toss that differentiation out? I need to look at time again... DAVilla 20:38, 9 April 2007 (UTC)
Yes, this can be confusing, e.g. timer is derived from the verb, not the noun. But at least it's clear where that one comes from, and that's a bad example because it really should be a derived term anyway. Paul G had brought up two examples with seal, I think, that even he wasn't sure of, but those cases are rare.
There are also some terms that include "time" but are not derived from it, such as counter-time. So I'm not entirely certain that Compound terms even at level four is an appropriate as a header unless we were to clarify that they are also derived terms, or if we can accept that they may not be. I do think being able to extract timely from that list would help a lot. DAVilla 23:52, 9 April 2007 (UTC)
While long pages aren't bad necessarily, they are usually bad anyway. Even though we aren't paper and we technically have the capacity for gigantic pages, they aren't generally easy to navigate or particularly useful beyond a certain size. 40k of non-prose text is HUGE, and I think that if anyone were to do a study on the readability of pages like time et al. it would be right down there with technical documents for lay persons...bad. We want to balance the inclusion of as much relevant information as we can stuff in there with cleanliness and readability, if we have everything anyone could ever want to know about a given term on a page that is wonderful, but if no one is actually able to sift through the stuff that they could care less about to get to what they actually need than what good have we done? I agree that that list should be cut down, we don't need every collocation and phrase ever written that includes "time" to be listed there, probably just idiomatic and other "interesting" terms belong. - [The]DaveRoss 20:50, 9 April 2007 (UTC)
The problem is that they are all idiomatic, or they shouldn't be listed. DAVilla 21:03, 9 April 2007 (UTC)
"Achilles tendon reflex time", "French Revolutionary Time", "QuickTime", "Hawaii-Aleutian time"...there is plenty in this list that doesn't belong, timezone names, random phrases which aren't idiomatic containing the word time, they are certainly not _all_ idiomatic. There are plenty there which should be on the page, but I guess what we are getting down to is that it is time for a more strict criteria for "derived terms", "related terms" etc. sections, especially for the exceedingly large pages. - [The]DaveRoss 22:05, 9 April 2007 (UTC)
Hmm... part of the problem is that it's impossible to tell from the list what deserves an entry and what should be removed. "Achilles tendon reflex time" = Achilles tendon + reflex time as far as I can tell, but the expression "a stitch in time saves nine" was removed! Plus it's difficult enough keeping the list alphabetized. Someone decided to list old as time itself under "A" with as old as time itself. What is this, a topical list??
I'm moving the red links to Wiktionary:Requested articles:English/time so that if anyone wants to argue their inclusion they can simply create the page. DAVilla 23:52, 9 April 2007 (UTC)
Sounds like a good cleanup for this page, but I think a general discussion is called for regarding treatment of these sections. It is obvious that some delineation needs to be made, but where to draw the line? - [The]DaveRoss 00:01, 10 April 2007 (UTC)
Long pages aren't bad, you say? I just spent over half an hour, probably more than an hour actually, going through the derived terms at time. All I was doing was correctly alphabetizing the list (per below), removing extra words like "the" and trailing <!- comments -> (per below) many of which I intended to move in creating the actual page later, and standardizing other comments like <!- a stitch [in time] saves nine ->. I pushed the wrong button at some point and the browser paged back, which 50% of the time means I lose all of my work. I lost all of my work. So if you want the list to be managed, have fun managing it yourself. I've already rolled back my move to WT:RA, and it's not my fucking problem any more. DAVilla 15:36, 20 April 2007 (UTC)

Policy proposal

This policy is narrowly intended for pages with a great number of derived terms. However, it hashes out some specifics with regard to the Derived terms section in general, and may have implications on other such sections.

  1. The section is to be listed alphabetically. That means closely related words with different spellings—such as old-time and old times, or tact time and takt time—must be listed separately.
    Rationale: An ordering that is alphabetical does not necessarily coincide with one that is topical, even weakly so. Consider Taiwan time and old-timer, which would separate the above examples. Of the two incompatible orderings, only the first can be clearly defined in formal specification. It also has the advantage of being manageable by bot.
    Point of contention: It may be permissible to list on the same line terms that use the same letters but have different spacing or hyphenation, or use ligatures like æ=ae and ï=i which are conflated. However, note that these are not always synonymous, e.g. some time and sometime. Likewise summer time and mean time, as spaced, are systems of measuring time, while summertime and meantime are not.
  2. Only blue links are to be shown, with the exception of closely related words such as alternative spellings (which would be shown in the see-also at the top of a page and/or as alternative spellings in the language section) or inflected froms (where there are additional definitions, as would be shown in the see-also at top of a language section).
    Rationale: Red links are fine for giving an indication of what needs to be done, but an overwhelming number of red links are impossible to manage. To avoid removal, red links need comments if they do not appear idiomatic, such as short time and to time, or legal/medical terms. The term just-in-time is an example of one removed from time (by Paul G no less) perhaps because, lacking a comment, it did not appear idiomatic. On the other hand, these partial definitions, information that really belong on the pages themselves, may not be is not commonly removed after the page has been established. Furthermore there is no process for determining if the comments are correct, or if certain words are in fact idiomatic, other than the RFV process for entries themselves.
    While a sea of unverifiable red links do injustice to the page, and in my opinion more closely resemble requests for articles than a useful compilation, at the same time we cannot push requests off to another space when a closely related term exists in the Wiktionary. Doing so would be asking for a good number of pages that could be soft redirects, or very brief at the least, to be recreated from ground zero. This ties up time of knowledgeable contributors in wikifying the page, finding the existing alternatives perhaps much later, and then having to coalesce the information. At the same time, not allowing these red links to remain on the page might suggest that there is one principal spelling and no alternatives to a term. While that may certianly be the case for many spellings, for spacing and hyphenation in particular there is a good variety even among the major English dictionaries.
  3. Terms that are added in the derived terms of a derived term (especially one that is a string prefix; see below) should then be omitted from the page. For instance, space-time and time series are derived terms of time which themselves have a number of derived terms. Otherwise any blue link is acceptable.
    Rationale: Either this system or a more complete one are feasible, but this is more elegant since anyone looking for e.g. time series analysis, time series data, time series database server, time series model, or time series prediction (assuming those are all idiomatic) would be just as inclined to follow a link to time series. At the same time, terms that are not derived terms of the derived term time series in this example, such as time series animation, should not and would not be excluded from the listing at time. Another such example is space-time trade-off.
    Point of contention: Since words that are not string prefixes of the derived term, such as anti de Sitter spacetime, are alphabetized differently, could they also be included as a derived term of e.g. de Sitter, Sitter, space, and time? Presumably not of de?
    Point of contention: Are blue links unquestionable if they are redirects to other pages? One example is take time to smell the roses, which redirects to stop and smell the roses. In that case it is not possible to link to the primary title as a derived term. There are other cases where both could be listed, e.g. have a whale of a time and its redirect whale of a time. Should they both be?

DAVilla 05:38, 10 April 2007 (UTC)

Stupid question: when you say "may not", do you mean "might not", or "must not"? ("these partial definitions […] may not be removed") Ordinarily they're distinguishable from context, but that sentence is kind of confusing me. —RuakhTALK 05:45, 10 April 2007 (UTC)
Might. Not dumb, thanks for pointing it out. DAVilla 13:25, 10 April 2007 (UTC)
I disagree strongly with the removal of red links. They are our friend, and tell us what terms are still to be defined. (Some of us actually define words here.) SemperBlotto 07:35, 10 April 2007 (UTC)
I want to agree with you, but if we are to find another solution then please aknowledge that not all red links imply the term is needed. Some of them should simply never be defined. The more questionable include "former + times", "Old + Father Time", "time-and-motion + expert", and "waste of time", and then there are the musical meters (now there's a can of worms). Longer phrases like "at the present time" and "this is no time for" might be better at shorter ones like present time and be no time for. And you can't know that all, possibly shortest remaining time and worst-case execution time for instance, are idiomatic until you look them up. I wouldn't have known man time was tosh™ until I saw the defintion "a man's bowel movement". What would you say if I added rotation time as a derived term? Considering you've deleted the page before, I would hope that means you would be willing to remove it from the list. You've also deleted preposition of time, stoppage time, and even time limit, presumably for content one would hope? I suppose "at" is a too succinct definition of the first.
The hedges that grow on time are possibly some of the most laboriously trimmed. Do note that I added a number myself, of already existing entries, but I also seem to be the only one using clippers (May, Sept 2006). If you want to keep all of those links, please propose a system for keeping track of what is or is not worthy of inclusion. DAVilla 13:25, 10 April 2007 (UTC)
I also disagree with the removal of red links. A cleanup of the section in which those appear is the proper way to go, laborious though it may be. Red links show us what needs to be done, but at the same time, if you see a red link of which you think it should not be defined, and the page has no comment regarding some idiomatic meaning, removing it is probably less time-consuming than actually creating the page and defining the term. H. (talk) 15:19, 10 April 2007 (UTC)
I agree that we shouldn't be basing any sorts of content decisions on the "redness" of a link, whether or not we currently define a term doesn't hold water when deciding it's relevance in this case. That leaves us still with the decision of how to choose what does and does not merit listing in a given headwords "derivatives" section, not an obvious set of rules.
I like the idea of second tier derivations being pushed onto the first tier derivation's pages (space-time continuum on space-time but not on space or time). How we should organize them...well I suppose that comes down to what we think they are actually used for. I am not exactly sure what the purpose of these sections are, but the purpose should define the form. - [The]DaveRoss 20:19, 10 April 2007 (UTC)
2B. Alternate proposal to #2. Derived terms are not to be <!-commented-> with definitions, context, or any other information specific to a term. Any red link can be removed by any contributor to the requested articles page indefinitely if he or she has any reasonable (if uninformed) doubt of the term's idiomatic status. If any of the terms are recent additions by non-regulars, the edit should be so commented, e.g. "indefinite removal to RA per DT policy".
Conduct: This provision shall not be abused. Contributors are advised to perform a simple search of any terms that appear to be jargon before deciding on them. Deletions can be rolled back if the contributor is not familiar with the RFD process or does not make a good-faith effort to abide by existing standards, as would likely be indicated by a removal of red links en masse. However, deletions cannot be rolled back simply because the contributor was wrong. Subjective opinion is allowed, and individual removals are not to be questioned. If the term has idiomatic status, the page can simply be created before a term in the list is reinstated.
At the same time, other contributors are not required to check the history before adding derived terms. While they are instructed not to reinstate terms they feel were removed incorrectly until that page exists, they are neither liable for accidentally reinstating derived terms that have been previously removed, for instance one added by another contributor formerly and included in a long list of new additions.
Summary: En masse additions are okay. En masse deletions in the general case are not. Individual deletions are okay, and should not be reviewed unless the contributor intends to turn the links blue. Essentially this gives all contributors veto power on any term. However, this is a weak power since any link can be reinstated by simply creating the page.
Rationale: This proposed policy allows for a large number of red links and at the same time avoids vilifying the targeting of red links by those who are willing to tidy a page, to remove links that could never be blue. More importantly it avoids the need for commenting Derived terms. Comments are not visible to the outside world and are a waste of our time. DAVilla 10:32, 11 April 2007 (UTC)
While I like the spirit of this option, I question the functionality. One of the more annoying things about editing lists of red and blue links is that while you are editing you can't tell what is what. If we have large lists of variously commented terms in these sections they will quickly become difficult to edit and control. Is there some way we can prevent that from happening? - [The]DaveRoss 20:40, 11 April 2007 (UTC)
I don't understand. Why do you think the terms would be "variously commented"? DAVilla 23:33, 11 April 2007 (UTC)

Inclusion of derived terms

Wow, I'm surprised that my contributions to time have provoked so much comment. I'd like to add some of my own.

"Time" is, apparently, the commonest (clean) four-letter word in the English language, according to a question on The Weakest Link (they gave a source - I don't remember what it was, though). A large number of the uses of this word are, no doubt, in idiomatic phrases, and so, necessarily, the list is long.

The derived terms I have been adding to "time" and elsewhere are compiled from various print and online dictionaries (onelook.com is very useful in this regard, given that it allows for the use of regular expressions in searches). Many (or most) of the terms that I find I am unfamiliar with. Some are obscure or dubious. I prefer to err on the side of inclusion, figuring that if the terms linked to are not idiomatic or do not exist, they will be removed, but if I leave them out and they are worthy of inclusion, no one else might ever enter them. That is not to say I have entered everything I could find - there is plenty that was, to my mind, not idiomatic or too obscure that I therefore left out.

The derived terms for "time" took a very long time to compile and verify, needless to say, but they are there to be edited, so by all means whittle away any terms that fail CFI. However, note that many of these terms are in the OED with citations, or in reputable online sources. Terms that appear to be unidiomatic might in fact be idiomatic. I suggest checking the OED, other print dictionaries and onelook to confirm one way or the other before entries are deleted from the list. (Inclusion in any of these sources doesn't necessarily mean that a term passes Wiktionary's CFI, of course.)

All the terms for time zones that I found (mainly in Wikipedia) have been included. It's debatable whether these should be in. Some print dictionaries give "Greenwich Mean Time", so why not the others? The list of these is finite and fairly short. Again, delete if these don't pass CFI, but my thinking is that they do (all or most have Wikipedia entries).

Technical (including medical) terms certainly do belong in Wiktionary if they pass CFI. In fact, they are more likely to do so, as they often appear in print in journals and other scientific publications.

I have tended to list terms B derived from terms A derived from "time" under B rather than under A itself. For example, "a stitch in time saves nine" comes under "in time", I believe, with a comment to that effect. I think the "derived terms of derived terms" system is cleaner, but this might make it harder for users to find terms or make them think that terms have been overlooked. (Incidentally, this is why "just-in-time" has been removed from the derived terms: you'll find it under just in time, which is the phrase from which "just-in-time" is derived.) If there are inconsistencies (such as "Achilles tendon reflex time"), then please fix these.

In short, the list of terms derived from "time" is not set in stone. None of us are infallible experts on everything, so please edit anything that I have not got right, and if I might be so bold as to ask, possibly be grateful that I researched and entered these hundreds of terms? — Paul G 09:23, 11 April 2007 (UTC)

On the whole it's a good list, yes. I have no doubt that most of the terms, nearly all in fact, should be included. I will revert my change shortly. DAVilla 10:37, 11 April 2007 (UTC)

Wiktionary:About Ancient Greek

I realize this has been a while coming, but I feel that I've finally gotten this page to a point where it's ready to be accepted as official Wiktionary policy. Will everyone who has any interest in the state of Ancient Greek on Wiktionary please take a look. I've recently made a few minor changes to the page, in preparation for this. In particular, the Pronunciation & Romanization section has been updated. Unless something major comes up, it is my intention to start a vote in a week or so to make it official policy. Please, if anyone has any problems with the page (or is considering having problems), please bring them up now, before the vote. Thanks very much. Atelaes 04:12, 6 April 2007 (UTC)

I think the policies/guidelines there are great, but much of the page seems intended to inform the reader about Ancient Greek (especially the "Diacritics & Accentuation" section); I think that that information is fascinating and should be kept somewhere, but probably not at Wiktionary:About Ancient Greek. (Maybe it could be put at a Appendix:Ancient Greek or the like?) To a lesser extent, I don't think that Wiktionary:About Ancient Greek should duplicate as much of WT:ELE as it currently does; I really think Wiktionary:About Ancient Greek should simply tell people-​who-​understand-​Ancient-​Greek-​and-​have-​read-​WT:ELE the Wiktionary policies that are specific to Ancient Greek — which is to say, the specific things they'll need to know in order to contribute to entries on Ancient Greek words.
That said, I do have one minor policy/guideline quibble; I think primary-source attestations should go in unordered lists after the senses they correspond to, or in "Quotations" sections, or in /Citations subpages, like at entries for words in other languages. (My personal preference is for unordered lists in each sense, but WT:ELE says that there's no consensus yet.) I don't see what benefit there is in giving these in the "References" section.
RuakhTALK 05:09, 6 April 2007 (UTC)
Concerning the excessive information in the diacritics & accentuation section, I tend to agree. However, I was ordered to write that section (at gunpoint, I might add). Perhaps it should be trimmed down somewhat. As for the primary sources in the references section, I feel that to be somewhat of a shortcut, for the time-being. Writing citations for the Ancient Greek entries is incredibly time-consuming, and I don't think it will happen much in the immediate future, although ultimately they should all get some. For an example of what all goes into them, take a look at θεῖον. I really don't like the convention (that a few people have tried) of simply scattering the sources throughout the definitions, as I think it's rather unhelpful and makes the entry look messy. Putting these sources in the references section provides a quick and easy (and temporary) way to reference the words. Atelaes 05:38, 6 April 2007 (UTC)
What's the difference between a gloss and a translation? Am I to understand that the gloss is in an original somewhere? If so it isn't cited as to which version it comes from, and it needs to be to give credit. If you like your translation better then why have the gloss at all? By the way, does the translation belong in italics or not? DAVilla 07:46, 6 April 2007 (UTC)
The difference between the gloss and the translation is that the gloss retains more of the original language, at the expense of English. It doesn't come out terribly well in these two passages, admittedly. A gloss is not an authoritative version, by any means. Rather, it is an attempt at as simplistic a translation as possible, which follows the word order, grammatical structures, etc. of the original. The translation is meant to feel like real English, but this often requires a bit more freedom with the language of the original. Its main benefit is to allow people who actually have some handle on the language to see an intermediate step between the original and the translation. Atelaes 07:55, 6 April 2007 (UTC)
By the way, I'm not sure how to reconcile "The normal standard for modern languages is three independent attestations. However, Ancient Greek, as a dead language, requires only one attestation." with WT:CFI. I didn't think language considerations pages could override CFI? Or is the thinking here simply that all surviving Ancient Greek manuscripts can be considered "well-known works"? —RuakhTALK 07:44, 6 April 2007 (UTC)
Yeah, I was expecting to get more flack on that when I first proposed it, but no one said anything. It's certainly open to debate, but I think that one citation should be the norm for all dead languages because they're not subject to the same flux that living languages are. And, yes, I would say that all Ancient Greek works would count as well-known works, at least within a certain context. Atelaes 07:55, 6 April 2007 (UTC)
I think it is OK for an "About Language" page to differ from both the ELE and the CFI. However, those differences should be clearly spelled out, and each About Language page must be voted in as policy. There are enough oddities and special cases in various languages that we can never hope to have a concise ELE or CFI document if we try to incorporate them all into those two primary documents. --EncycloPetey 22:27, 6 April 2007 (UTC)

I thought I'd explain my motivations for the most recent changes to θεῖον. First, I really hate ELchar. I really have no idea why it does this, but on my browser it puts all the characters into this weird loopy font that just looks ridiculous. Polytonic does not do this for me. My hope is that polytonic is allowing people to see just as many characters as ELchar is. Any feedback on this? Are people seeing more or fewer characters with the template switch? I see them all completely regardless of fonts templates. A second comment, I changed the indentation, because I think it rather important that the words in the three lines (most especially the original and the gloss) are in line with each other as much as possible. Responses? Atelaes 08:19, 8 April 2007 (UTC)

Either template looks fine on my screen. In fact ELchar is a little straighter and the present one more cruvy, but not "loopy" or anything. But it needs to be one or the other, or I can't read it... rather, it doesn't show; I can't read it regardless.
I really have to say, Ruakh, that I don't like the new look. "Original" and "translation" are just unnecessary, and the word "gloss" is confusing. The only reason I knew it wasn't an annotation in the original text is the source itself. You know, the Bible is rather ancient and all. But in a modern work that's what "gloss" would mean to me. As to indentation, will there ever be a need to preserve a translation that was in the original work? I would think placing them at the same indentation should be preserved for that. Or maybe it would be enough to put our own words in italics. Or maybe we really ought to do both. I don't know if this has ever been discussed. DAVilla 00:26, 12 April 2007 (UTC)
That's O.K., I don't like it that much either. My preferred versions are the first two I did (http://en.wiktionary.org/w/index.php?title=%CE%B8%CE%B5%E1%BF%96%CE%BF%CE%BD&oldid=2284749 and http://en.wiktionary.org/w/index.php?title=%CE%B8%CE%B5%E1%BF%96%CE%BF%CE%BD&oldid=2290015 — they differ only in indentation levels, with one putting the translation on par with the original, the other indenting it less than the original and more than the gloss); the last one was just an attempt to line them up nicely, as Atelaes prefers. (Seeing as he actually understands Ancient Greek, I think it makes sense to trust his instincts.)
It's actually pretty standard to use the term gloss to refer to a pseudo-translation that maps each word in the original text to a word or phrase in the target language, sometimes with annotations like "-DATIVE" and whatnot. If you can suggest an alternative word (or short phrase) to use, though, I'd be O.K. with that.
You know, rather than give a separate gloss, we could do something like this:
γένος οὖν ὑπάρχοοντες τοῦ θεοῦ οὐκ ὀφείλομεν νομίζειν […]
(Most of those are probably wrong, but you get the idea.)
It would probably be a lot of effort, though. :-/
RuakhTALK 01:44, 12 April 2007 (UTC)
That is an interesting idea, but would indeed be a lot of work to implement on a regular basis. Also, I don't know how many users would get the idea, unless we had a little box saying, "scroll over text to see gloss" or something. I've fixed it, by the way. And yes, Davilla, this is virgin territory which I've never seen a discussion on, nor have I seen anything of the sort anywhere else on Wiktionary. We just might want to start a new discussion just on this, as we might benefit from the opinions and technical expertise of others. Unless I'm sorely mistaken, this will be setting a precedent for all other languages, as I have to imagine that we (eventually) want to have citations for our foreign language entries as well as our English entries. Atelaes 04:14, 12 April 2007 (UTC)

Placenames redux

Since the discussion earlier up petered out without a resolution, I want to bring this up again. We have a considerable number of placenames that seem to be in contravention to WT:CFI#Names of actual people, places, and things, which gives "A name should be included if it is used attributively, with a widely understood meaning." and "A name should be included if it has become a generic term." In essence, a placename still needs to meet the same attestation standards as any other term, since this is still a dictionary, not a placename database or encyclopedia. We should only include placenames with some significance towards our goal of defining words, not collecting geographical data. However, I seem to be able to find a large number of placenames that cannot be attested as generic or attributive, in my opinion. Consider Alagoas, Maceió, Abilene, Afula, Aeolian Islands, Lipari Islands, Ahmedabad, Aegadian Islands, Adyghe Autonomous Oblast, Adigoppula, (yes, these are just grabbed at random from the first page of the proper noun category) Bursledon, Titchfield, Tula, Thousand Oaks, etc. As you can imagine, there are a lot more. Not just the 50 in Category:English_counties and 100 in Category:Towns, but the many hundreds more in Category:Place_names. Clearly I can't just go on a deletion spree, can I? (It would take forever!) The main problem is that even if any of these have generic or attributive senses, and some, though not most, do, almost all of them are "defined" in the form of "A town in Oaxaca, Mexico." (Juchitán) That, to me, is an encyclopedia article (if a stubbish one), not about the word. So, what to do about these? I don't think they are adding much to the dictionary; this isn't Wikipedia. Frankly, I'd like to see most of them gone: all the ones that cannot be attested according to the standards currently in WT:CFI. Is there an efficient way to do this that doesn't involve hundreds of RFD listings? Or people violently disagreeing, ideally? Dmcdevit 09:03, 6 April 2007 (UTC)

I, for one, am fully supportive of such a deleting spree. Although, I imagine others might disagree. I think the CFI paragraph you quoted is quite clear on this, and should be followed. I suppose somewhat major placenames (at the deleting admin's discretion) should be placed under RFV, so that people are allowed the chance to save such words, if they care to try. But, Juchitán should just go, as far as I'm concerned. Atelaes 09:08, 6 April 2007 (UTC)
On the other hand, what place name entries can do here that they can't do at Wikipedia is provide translations. If I want to know, or inform others, what the Aeolian Islands are called in Yiddish, or Turkish, or Swahili, the place for that is Wiktionary. Wikipedia is willing to provide the name in the local language (in this case Italian), and the interlanguage links work for those languages that have an article, but not all other languages do have an article. Wikipedia does have lists of translations of place names, to be sure, but most of them have been nominated for deletion at one time or another on the grounds that the information there is more appropriate for Wiktionary than for Wikipedia. Angr 09:31, 6 April 2007 (UTC)
That's Wikipedia's problem though, not ours. Which is to say, that is fallacious logic: just because we can do something that another project doesn't, does not make that dictionary-appropriate. Wikipedia doesn't give translations (transliterations) or all of its Latin-script people, either which is another tens of thousands of entries we could add (or phonebook entries of restaurant reviews, for that matter). But if Juchitán doesn't belong in a dictionary, neither does Juchitán in Hawaiian, if there were such a word. You might get aways with sticking a compendium of placename translations in an appendix, but I still don't think they belong as articles. Dmcdevit 22:19, 6 April 2007 (UTC)
Juchitán gets [http://books.google.com/books?hl=en&q=Juchit%C3%A1n&btnG=Google+Search&ie=UTF-8&oe=UTF-8&um=1&sa=N&tab=wp 675 Google Books hits. I'll bet you a thousand dollars that at least three of those are uses that I would consider "attributive" (but don't hold me to it right this moment, I'm going to be incommunicado on vacation until mid-next week). Cheers! bd2412 T 06:15, 7 April 2007 (UTC)
I was about to put it on RFV and say "prove it," but I'll just wait then. :) Part of the problem is that "attributive" as noted in a discussion somewhere earlier on this page, is a bit ambiguous. I think it's clear it's intended to mean a having specific meaning tha describes something other than being simply in or from the place. So three cites of Juchitán being uses in the same sense, as in (totally made up) "a Juchitán pizza" or "a Juchitán sandwich" meaning "a pizza with fish" or "a sandwich with fish". "A Juchitán pizza" just meaning a pizza made in Juchitán is not the spirit of the criterion, since any placename can be used to modify in that way. The problem with the current entry is that even if there is such an attributive use, it is certainly not the one defined, which gives encyclopedic data about the city's location. Dmcdevit 06:42, 7 April 2007 (UTC)
Saying "That's Wikipedia's problem though, not ours" shows precisely what Wiktionary's problem is. Wiktionary and Wikipedia are complementary sister projects, not two completely unrelated websites. Dictionaries, not encyclopedias, provide translations of individual words. Angr 11:06, 7 April 2007 (UTC)
Wikipedia does not exist in a vacuum, but Wikipedians tend to behave as if it does. They have no one but themselves for their reputation. Perhaps if Wikipedians were inclined to cooperate, they'd find sister projects more willing to help where they can.
All that aside, it is a problem for Wikipedia, and not for Wiktionary. It is an encyclopedic concern; demographic statistics are the useful criteria - and that should not be in a dictionary. Including demographics has certainly met fierce resistance, in the past. --Connel MacKenzie 20:15, 7 April 2007 (UTC)