Last modified on 24 May 2014, at 11:35

Wiktionary:Beer parlour/2007/February

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives +/-
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014

Articles de qualité

Wikimedal.png

Here's an interesting idea the French Wiktionary has started: fr:Catégorie:Articles de qualité (that's "Articles of quality" in case that's not clear). The selected articles are given a Wikimedal (as shown at right) for being voted "pertinent, rich, and complete". (Note: I'm not sure how the French define "pertinent", since espéranto was one of their first selections, but what the heck.)

Is something like this worth pursuing, knowing that it may clutter the votes page? I had already begun a Model pages project of my own. This could be used as a more formal system. Thoughts? --EncycloPetey 07:18, 12 January 2007 (UTC)

DAVilla, can we just put the gold star on WOTD's?  :-)
Hmmm. Joking aside, the WOTDs do get a fair amount of scubbing and polishing. I'm interested to hear what the criteria would be for a kwality article. Example sentences for every definition given? Translation sections with at least 18 languages represented? Etymology, pronunciation, audio, image?
As far as mechanisms, copying the WOTD scheme should work well enough. But I'm not sure that could be sustained. Mabye one (or two or three) a week?
--Connel MacKenzie 08:09, 12 January 2007 (UTC)
Yes, very few at a time please. You can see a short list of beginning criteria if you look at what we want to accomplish in the Collaboration of the Week page. Those criteria could apply to this as well, with the slight modification that they be completed rather than the goal. --EncycloPetey 15:57, 12 January 2007 (UTC)

This discussion in whole may be moot. Do we even have any articles of qualitý? DAVilla 14:04, 12 January 2007 (UTC)

Yes, listen is a quality article, and Central Europe is close. These are relatively simple articles, though, without a huge number of definitions or varying parts of speech. --EncycloPetey 15:43, 12 January 2007 (UTC)
Listen has no noun sense and doesn't explain the verb as used in "The new recording doesn't listen as well as the old one." (from AHD) DAVilla 06:13, 13 January 2007 (UTC)
Thanks. I was aware of the missing noun sense, but wanted to wait on adding it until I had such backing quotations. The additional verb sense is one that hadn't occurred to me, and for which I also have yet to see print quotations, but that might be the result of it being primarily a British usage and occurring later than the majority of the entries in Wikisource. --EncycloPetey 18:07, 13 January 2007 (UTC)

I like this idea. Heres my two cents. Instead of having it be checked off as a whole, why not have each section checked by translation projects.

Each project could say, oh the spanish, french, german parts are good. This would be easier to keep control over the quality, IMO.

Also when every article is "quality", give it a category so we know how many more we would have to go.

To summarize, im all for it.

Bearingbreaker92 20:57, 17 January 2007 (UTC)

German spelling reform

How do we handle words that have gotten (are going to get) obsolete by the German spelling reform? As long as the final decision hasn’t been made, we probably want both, but I suggest the obsolete form contain a mention what the new form is. I just edited miß. Do people agree with this system? Maybe a template could be made to handle this... henne 14:24, 15 January 2007 (UTC)

Ditto for French, although no one in France seems to be aware that there was a reform (w:Reforms of French orthography#The rectifications of 1990). Both spellings are officially considered correct for an indefinite period (with nobody using, teaching, or promoting the new spelling in France). But it makes sense to have both sets of entries and indicate their status somehow. CapnPrep 15:03, 15 January 2007 (UTC)
This also applies to Dutch, thanks to the "little green book" and three major spelling reforms. We do want the older forms, though, because the older books don't change or disappear as a result of spelling reform. For instance, I have a copy of a late 19th-century Dutch pamphlet on the Medieval names of Leeuwarden. It contains many words and grammatical features not present in modern Dutch, so (for now) I have to use a 19th-century Dutch dictionary to look words up. We probably will need to say for these languages something about when the word was ruled obsolete. A template caould be used for this. --EncycloPetey 16:02, 15 January 2007 (UTC)
Even after the German government makes a final decision, Wiktionary will still want both spellings, clearly indicating which was considered correct for which time period. An English speaker, looking up the term, would not necessarily have any solid indication that a word was valid, otherwise. I think using a template for those entries is a very good idea. The wording and format of your edit looks fine. --Connel MacKenzie 19:26, 15 January 2007 (UTC)
Ok, I’ll make that into a template, eventually. henne 21:43, 15 January 2007 (UTC)
I imagine it could be done as an extension of the archaiac and obsolete templates, provided they could accept a language and reform date parameter. --EncycloPetey 01:39, 16 January 2007 (UTC)
The format looks great to me. BTW, is there a consensus on the use/non-use of "archaic" and "obsolete"? -dmh 06:18, 16 January 2007 (UTC)
For English, no change has been made from "obsolete" meaning something fell out of use 100 years ago, "archaic" falling out of use 50 years ago, that I know of. But the delineation is obviously different for country-languages that have a central authority dictating spelling rules. I had meant that the ===Usage notes=== section be "template-ized", I hadn't noticed the use of the English language template {{obsolete}}. That needs to use its lang= parameter. --Connel MacKenzie 12:06, 16 January 2007 (UTC)
Where is this 50/100 year distinction documented? It's not on the talk page for {{obsolete}} or {{archaic}}. Note that the current text for the Category archaic is:
    Archaic terms are no longer widely used or understood, but will be found in older texts. Archaic English terms 
    should be attested in Modern English sources, that is, those from around the year 1550 and later.

...while the text for the Category "Obsolete" is:

    Words, phrases or usages no longer in current use, but found in older texts.
    That a word entry is listed in the category does not mean that the word is necessarily flagged as (obsolete). 
    It may be just one usage or meaning of the word that may be obsolete, whilst other meanings remain perfectly current. 
--Jeffqyzt 20:32, 16 January 2007 (UTC)
But the problem here is that we're talking not about obsolete words but obsolete spellings. This means that in some cases, a 200 year-old spelling could still be interpreted correctly by a native reader, even though the spelling has been reformed. I take back what I said. The {{obsolete}} tag wouldn't be appropriate for this, though the {{archaic}} tag might. --EncycloPetey 01:17, 17 January 2007 (UTC)
Ah, now it's coming back to me. I think archaic was supposed to mean "still understood but no longer widely used", which would include things like smite, whence or thou; obsolete was supposed to mean "neither widely understood nor used", which would include things like bodkin (as Hamlet uses it), hir (their, from Chaucer, not one of the several attempts at genderless pronouns) or swinker.
This is somewhat subjective, but so is "fell out of use". I prefer a functional definition to an arbitrary cutoff date. In the likely case of disputes, we do what we do best: dispute.
Note also that there is a fair bit of regionality involved. Dictionary.com suggests that swinker is only Archaic in parts of Britain. Likewise kin might be considered archaic in parts of the US but is very much current in others (I've seen it in CNN headlines, for that matter, so maybe that's not a good example). -dmh 04:44, 17 January 2007 (UTC)
I remember seeing an oldish discussion (and I thought agreement) of the definitions you mention for obsolete and archaic, and believe that functional definitions are preferable to fixed dates of usage, not least because archaic words may purposely be used in new publications to give a sense of period. Such words remain archaic by any normal definition. --Enginear 11:34, 17 January 2007 (UTC)
Oddly, I'm not finding the conversations before Template talk:dated. In that conversation, it looks like I'm the one suggesting 50/100 years (which I know for a fact is not true.) That is, Ec had previously suggested those limits in another context, and I thought the numeric limits had some amount of support. I suspect they may be hidden in a deleted section of WT:RFD from that era. --Connel MacKenzie 23:59, 17 January 2007 (UTC)
Please see my version of miß for an alternative format. The relevant part of the definition should be generated by a template. Ncik 14:28, 17 January 2007 (UTC)
Very nice! Should the link to miss really be miss?Never mind, it already is. Popups pointed to the wrong target. I think adding a definition/translation gloss (e.g. "to measure") is a really good idea too. --Connel MacKenzie 23:59, 17 January 2007 (UTC) (edit) 00:03, 18 January 2007 (UTC)

Fundraiser is over

MediaWiki:Sitenotice is still set to fundraiser. The drive closed yesterday. Please update the notice to the thank you notice. Gebruiker:Dedalus 13:18, 16 January 2007 (UTC)

Oh thank God! I thought it was going to drag on to 2008.
Glad we hit a goal, if not the goal. DAVilla 18:41, 16 January 2007 (UTC)
Is there some reason the broken "thank you" message is staying at MediaWiki:Sitenotice, or can it all be replaced with "-" now? (The single hyphen is the sitenotice "magicword" equal to "no message"...or used to be.) --Connel MacKenzie 23:02, 17 January 2007 (UTC)

User:Hamelg

What's the policy on spamvertising within a user's own page? Jonathan Webley 10:45, 18 January 2007 (UTC)

  • Users should be allowed some lattitude, but in this case we are just being used as a link farm. He has made no other (non-deleted) entries. He should be blocked and the user page deleted. SemperBlotto 10:52, 18 January 2007 (UTC)
  • Just blanked the page for now. User might be inclined to contribute? In any case, it isn't vandalising a page of spamming the main namespace, possibly more just misguided? Robert Ullmann 11:23, 18 January 2007 (UTC)

Deletion: a couple of suggestions

First, I recognize that the sheer volume of both wheat and chaff that CheckUser/admin folks have to deal with has only increased over the years. However, it's still an important part of Wiktionary (and Wiki*) to encourage new entries and to encourage partial work on the assumption that the community will be able to bring it up to snuff. So I wanted to make a couple of suggestions to our admins:

  • Please check the talk page, if any, before deleting. Most garbage entries will not have a talk page. Legitimate ones might, and I for one will be glad to take extra care to put useful information on the talk page if I know it will help. The case in point here is twistification, which I heard on the radio and confirmed by trolling through Gutenberg and b.g.c. Unfortunately, it wasn't easy to pull out a nice, concise definition from what I found. So I saved what I had in hopes that it could be improved later. As I understand it, this is how Wiki* is supposed to work. Unfortunately, I had once again not noticed that I was logged out when I made the entry, and through the completely understandable actions of veteran Wiktionarians, the entry got gunned. The problem was that there was a bit of useful information that would not have been easy to recreate (at least compared to the time it took to delete the entry).
  • Likewise the history. In this case, the history included "moved content to talk page" (by one of the veterans)

I would hope these would not be controversial. The following are more "If I ran the zoo" sorts of things, on which I don't expect ready agreement.

  • Agree on criteria for a "prima facie case" for a word. For example
    • Appearance in a "dictionary of record" (Webster's, OED, AHD, etc.)
    • A single citation from a "very likely" source (BBC online, New York Times, any print magazine with a circulation more than X).
    • A few citations from a "likely" source (less widely-circulated magazines, small technical journals, etc.) These might turn out to be typos, or not independent, or all within the same year, or whatever, but they might well pan out as real citations.
    • Some arbitrary number of raw google hits for a standard search, perhaps "term -site:term.com -site:term.org".
  • The idea here is not to claim that the term meets CFI, only that there's a reasonable chance it does, and it shouldn't be deleted without further discussion. If you see a likely new word, but don't have time to work up a full definition, then
    • Create the entry
    • Tag it with {{primafacie}}, which puts in a disclaimer, categorizes and whatever other magic is needed. The "prima facie" category would be a subcategory of RfV.
    • Add a note (with the tag? in the talk page?) stating the reasons. E.g. "in OED" or "found in link to BBC online" or a list of minor cites, or "3000 ghits".

Like anything else, this can be abused, but I doubt that the average "my English teacher is an idiot" or "hey look I know some bad words" vandal will bother. These should all be easily verifiable by an admin, who at worst has to go to askoxford or chase a few links to minor publications. The hardest one would be the major print magazine, but many of these publish at least some of their content online. We could certainly tweak the guidelines in any case.

The basic idea is to have a safe place for plausible but not-yet-baked entries, clearly marked as such for the casual browser. None of this is too far from various past suggestions, but I'm not sure that we've hashed out this particular proposal before. -dmh 16:59, 15 January 2007 (UTC)

(deleted) oh never mind. See choda. Robert Ullmann 18:38, 15 January 2007 (UTC)
Dmh, that notion has been repeatedly rejected, no matter how restated. (Note, this link is merely the most recent "round" of that debate.) --Connel MacKenzie 19:03, 15 January 2007 (UTC)
Now that's a VERY LENGTHY discussion :-). I confess I didn't read it all. Did someone make this particular proposal, or something very close to it? Particulars matter.
The basic problem here is that — in my experience — it's a lot harder to paste in those three valid citations than to find them. But if you don't do so — and this is the part that perpetually amazes me — people tend to assume they don't exist, rather than taking a minute or so to search b.g.c etc. So there's no room for a valid-looking-but-not-nailed-down entry. We have a good solution for this, actually, namely RfV. Having looked at the latest version in more detail, I have to say it looks pretty reasonable. Unfortunately, it doesn't seem to be universally respected. In the short time I've been back, I've seen at least one good-faith entry summarily gunned and another set of changes reverted without comment, after I had put considerable effort into them and explained that effort on the talk page. This is not encouraging
Is there a standard template for cites? That could help. Does Wikipedia's <ref> tag work here? -dmh 19:37, 15 January 2007 (UTC)
What I (and others) have sometimes done on the RFV page or talk pages, when there isn't time to format cites properly, is to just give the internet link. Arguably, it would also be better than nothing on the entry page. That seems more in the wiki spirit than the other alternative, which I also sometimes use, of storing the whole entry off site until it is formatted correctly. (If you were talking of cites not available online, I suggest the time to find them is often much longer than the time to format them.) --Enginear 21:31, 16 January 2007 (UTC)
My experience is the opposite. It's often trivial to find a gazillion ghits for some term that supposedly doesn't belong in a dictionary, with between 0.1 to 0.9 gazillion of them being valid uses. Total time under a minute. Likewise for b.g.c and Gutenberg. Either it's there or it's not. Then I spend about 5-10 minutes cutting and pasting, double-checking the format, double-checking the links (usually :-) etc. This is the main reason I'd like to be able to say "look, the cites are there, let's put it on the cleanup list until we get them pasted in" (and I'm perfectly willing to do my share of that). -dmh 01:22, 17 January 2007 (UTC)
Hang on! There is a lot of work needed between finding the results of a Google search (or even a b.g.c. search) and knowing that you can find durably archived cites as recommended/required [there seems to be disagreement on this] by CFI. Even on b.g.c., where by definition the cites are durably archived, I recently found only one (or arguably two) out of 90-odd hits which were independent examples of use (rather than mention) as required by CFI. It is also necessary to open the link and read the extended context (to check the link works and to be sure which definition the usage of the target word relates to). With ordinary Google cites, it can also take some time to track down date, author, publication name, and publisher details. For many definitions, careful use of advanced search options is needed to tease out less common meanings (eg, see nope). The minute or two to edit the cite into the entry is often the quick part. Of course there is another fall back option of leaving a link to the search results of an advanced search, so someone else can verify if any of the hits are appropriate as cites.
I hope you are not adding apparent cites without checking their validity. --Enginear 12:09, 17 January 2007 (UTC)
Although ref/references tags "work" here, they generally are not used. WT:" is the formatting guideline for citations. (Shortcut = quote mark, for the quotations forma...oh never mind.)
I am surprised at your dour experiences. I know I don't submit RFVs without some checking. I still maintain that WT:CFI is enormously too weak. We are not meeting anyone's expectations, of what they might think of, as a dictionary. Not what they might think of, as a multilingual dictionary, either. --Connel MacKenzie 11:59, 16 January 2007 (UTC)
I'm not sure how we could broaden it any further without letting in just any protologism that springs to anyone's mind. I assume that's what you mean. For a while I was maintaining User:dmh/field sightings, which were intersting things I saw that weren't in Wiktionary but I was not going to waste time putting in just to see them gunned (the ones with (e) in parentheses were ones that Ec gunned in the course of, I think, a few days). I'd run across two or three in a typical day. -dmh 01:18, 17 January 2007 (UTC)
Are you trying to be patronizing or did you really not suspect that the CFI is letting through a lot of objectionable entries, e.g. outright illiteracies? DAVilla 21:23, 17 January 2007 (UTC)
For example? -dmh 22:54, 17 January 2007 (UTC)
Ingenuitive and frictive come to mind. --Connel MacKenzie 23:29, 17 January 2007 (UTC) And you thought the summary discussion was "VERY LENGTHY"? Each of these have several follow on conversations (equally lengthy) on those same pages. --Connel MacKenzie 00:09, 18 January 2007 (UTC)
Goodness (I seem to be saying that a lot here). Failure to keep ingenuitive out is a "Complete breakdown of the RfV process"? Frictive, which is used clearly and consistently in the cites and is clearly not a typo for fricative, is "an illiteracy" (see [1])? What the hell is going on here?
What, exactly, is the point of having Wiktionary in the first place, other than to keep us word geeks occupied and entertained? If it's actually an attempt to develop an open-content descriptive dictionary, then we include words that people use for their meaning, regardless of whether we like them (the words, or the people). This doesn't seem complicated.
Otherwise, we're left tilting at windmills trying to find criteria that will only let in "good" words. A couple of turns of the wheel ago, there was a flap over only letting in words from sources printed on paper. Right about that time, Google came out with b.g.c. By now it should be clear to all that all sorts of crap makes it into print, while there is plenty of perfectly good stuff online. So that attempt failed. Now what? Shall we narrow down the list of allowable sources to "reputable" ones? Who decides?
Leaving aside whether defining "correct" usage is a worthy goal for Wiktionary, prescriptivism has one inescapable inherent flaw: it's inherently subjective. Advocating for words that meet our CFI (which, for all the swirling controversies on this page and others, have remained stable for at least a year now) is not gaming the system. Trying to set objective-looking guidelines that keep out "bad" words is gaming the system.
Follow that path to its logical confusion and you have the proposal, put forth seriously more than once, that Wiktionary should limit itself to words found in other dictionaries. Assuming we agree on the canonical set of dictionaries, again I have to ask what is the point? Right now you can go to Dictionary.com and Askoxford.com to get well-researched definitions. In the brave new world you will be able to sleep soundly knowing that their coverage is at least as complete as ours. If you want a clue as to the latest slang, you've got the august and reputable Urbandictionary.com. If you want to see citations, presumably you'll want to see them for yourself through google and friends, and not some edited list aimed mainly at combatting the latest RfV challenge. Why, exaclty, would anyone go to wiktionary.org?
Of course, if we're limiting ourselves to established dictionaries, we'll need to limit ourselves to senses listed in those dictionaries. Otherwise we would be letting in anyone's harebrained inventions, just because people use them. We'll also need to take care not to deviate from the established definitions as we write ours. Essentially we end up with the same content as Dictionary.com, only less of it, perhaps with the prospect of a "derived work" copyright suit thrown in. But by God the definitions will be in the public domain!
Or ... we could accept that people use words the way they use them, document that, and come up with something new, unique and useful. It would be more nimble than existing dictionaries in the same way that Linux is more nimble than Windows. It would be more reliable than Urbandictionary. It would be lively, with multiple points of view clashing and providing better perspective together than any single one oculd alone. It would be, in other words, Wiktionary.
People, all this wrangling over whether this word or that is attested, or really used in a given sense, serves a useful purpose. It keeps contributors honest and helps ensure that we only define what's actually used. I may grumble that starfucker gets challenged within milliseconds while HKSAR, aerial cableway, sedums and so forth sit indefinitely, but if we're bound and determined to have the world's best-documented articles on profanity, so be it.
If push comes to shove, all I'd ask is that if a word ends up passing the CFI, we accept that and leave it alone, and not take it as the beginning of the downfall of civilization.
[1] "An illiteracy" is just the sort of dodgy usage that authors of "No one uses English correctly but me" books like to pounce on. Illiteracy is an abstract concept. Saying "ain't" is a concrete action. Committing "an illiteracy" in print is surely a contradiction in terms. But then, prescriptivists aren't known for consistency.
Now put aside prescription and look it up in b.g.c. On the first few pages of hits, at least, you'll find lots of phrases like "an illiteracy rate" or "an illiteracy problem". A couple of pages down, all by its lonesome is, I kid you not, Usage and Abusage: a guide to good English : abusus non tollit usum., telling us that "He talked friendly to me" is "an illiteracy". The presence in the title of a dead language and an English word marked obsolete in at least one dictionary is a good indication of the stick-in-the-mud conservatism on offer.
We then resume the parade of page after page of hits using illiteracy to mean the inability to read. Evidently, the countable sense of "illiteracy" is peculiar to authors of prescriptive guides. A search for "illiteracies" confirms this. Ain't that something?
Oblexicography: The word "illiteracy" also has the extended meaning of "illiteracy rate", as, "Elbonia has an illiteracy of 2.5%". I've added it. There's also a sense meaning "a lack of understanding", as "an illiteracy of America's origins". I haven't added this, but it's there. I've also cleaned out the prescriptive claptrap in the "denigrated" definition. -dmh 07:20, 18 January 2007 (UTC)
Do all your screeds have to be so long winded? Whiners like you have instigated a CFI that has kept both of those examples. We should say that they are misspellings, and nothing more. We should not have entries in the main namespace, for them, to do so. The microscopic handful of people that have misused language shouldn't be rewarded with their silliness set in stone by an increasingly mirrored Wiktionary. I must say, I find your behavior disruptive. Welcome back indeed. --Connel MacKenzie 07:53, 18 January 2007 (UTC)
Two nice, short, questions:
  • Are you advocating the "only terms in other dictionaries" approach? -dmh 13:20, 18 January 2007 (UTC)
Connel, I think you have hit the nail on the head with "We should not have entries [for misspellings/typos (and maybe for "illiteracies")] in the main namespace. I worry that while some typos are obvious, others, and most misspellings, can trip up someone learning the language, and even make the rest of us stumble. Perhaps uot, used by someone as an example, is in the obvious category, since, AFAIK, there are no other single syllable English words using uo, so nearly anyone would think through simple transpositions. But while I would like, eventually, to see all the less obvious ones in a namespace somewhere, linked from homographic entries where appropriate, and searched for by disperatedesperate people (and nerds), they definitely do NOT belong in the main namespace. --Enginear 20:39, 18 January 2007 (UTC)
Right, but ingenuitive and frictive aren't misspellings. They're words that Connel and others just don't like (I don't like ingenuitive either, but I believe trying to remove it does much more harm than good). They belong in the main namespace. What we need is a nice way to separate simple typos and scannos from legitimate entries. I tried prevalence, but it won't work as is. I suspect it's going to be hard to find one, partly because there is such a thing as a genuine alternate spelling or regional variant, and partly because it's going to be hard to screen out typos without leaving too much room for arbitrary challenges of words people just don't like. I would love to be proved wrong on this one. -dmh 00:24, 19 January 2007 (UTC)
Excuse you? Do you have a spell-checker? Everyone else here does, and can see that you are 100% wrong. They are first and foremost spelling errors, but with an astronomically remote chance of actually being used in a "valid" context. --Connel MacKenzie 01:58, 19 January 2007 (UTC)
OK, let's try again. You're wrong and I can explain why, instead of just asserting it, if you've got the patience:
You didn't actually answer my question, so I'll assume that you mean ingenuitive is a misspelling of ingenuity (that's the top offering from Thunderbird's spell checker) and frictive is a typo for either fictive or fricative (ibid). If this is so, googling "ingenuitive" should turn up mainly places where "ingenuity" is the right word and the writer just goofed. So here are the top few of the 11K hits for ingenuitive (ommitting the mentions in Urbandictionary etc)
  • another ingenuitive bikini top style
  • their capacity for ingenuitive song-writing
  • I contest that atonal jazz is neither ingenuitive nor daring.
  • Ooh, very nice, very ingenuitive. I'd appreciate it being later re-scaled based on both city population and era, but right now I think it's plenty good
  • Envision v2 boasts a brand new engine, 100% template support, ingenuitive new features and an all-in-all community builder
  • I want to let you know about an ingenuitive ISP owner
  • As I said before, you just have to do something clever and ingenuitive
Need I go on? Not a one of them is a typo for ingenuity or anything else. Evidently ingenuity is a relatively recent coinage, probably a blend of ingenious and creative or intuitive (UD seems to think the latter). It doesn't seem quite the same as ingenious. Maybe these people just don't know ingenious? It turns out that "ingenuitive ingenious" gets 725 hits, including
  • They are ingenious, ingenuitive, and imaginitive.
Others are not so clear-cut, since the words are separated and I'm not going to track it down. Some are people's definitions of ingenuitive as ingenious and intutive, and a couple of places it looks like they're just used interchangeably. Still want to say it's a misspelling?
Now what about frictive? Again note that in no case would fictive work. Two are clearly typos for fricative, but the rest are very clearly not. It may also be worth noting that several of these appear to be from patent applications:
  • Calculations stipulate the frictive or delaying force that hampers the motion of the projectile.
  • These measurements monitor the frictive drag force
  • Enhanced frictive, adhesive and attractive forces imaged at etch-pit edges on highly-oriented pyrolytic graphite by scanning force microscopy
  • The present invention relates to a method and frictive fluid composition for application to a shoelace knot
  • it provides a simple and rugged method of transmitting torque continuously across a wide range of varying transmission ratios while suffering minimal frictive losses and wear.
  • it is desirable to protect pristine glass surfaces from frictive damage
  • frictive phoneme follows another frictive phoneme with. slightly different characteristics (finally, a real typo)
  • High temperature frictive damage; glass strengthening; surface tensioning/viscosity
  • the disposition of tin oxide onto glass surfaces and the function of tin oxide when overcoated with polyethylene in minimizing glass frictive damage
  • This affricate and its voiceless counterpart (as in chew /t$uu/) he represented as in IPA with a stop plus a palato-aveloar frictive
  • The wall velocity is friction limited, but determining the strength of frictive effects involves determining the nonequilibrium populations
For comparison to actual misspellings you might try googling actaul or comparson. In almost every single hit, changing to the correct spelling yields a meaningful sentence.
I've added these to the appropriate talk pages. -dmh 04:44, 19 January 2007 (UTC)

Korean romaja entries

moved from Beer parlour talk page

There have been some Korean romaja entries I've just removed.

My baffle question: Should be created a "Category:Korean romaja" for these words?

I've already created this category Category:Korean romaja based on the existence of Category:Japanese romaji. The idea of allowing romaja entries is to enable people who can't type Korean to look up Korean words (why I think they are important), as well as to reflect the fact that romaja are actively used in communication (why User:Robert Ullmann thinks they are important). Kappa 17:45, 18 January 2007 (UTC)

Note that this is not a Korean/Korean dictionary. It is an English dictionary of all languages, including Korean. The hanja and romaja entries belong here whether you like them or not. We are not a prescriptive dictionary of current usage; we include older, archaic forms as well as newer forms in (e.g.) SK 2000 Revised. Robert Ullmann 17:49, 18 January 2007 (UTC)

Format of examples

I was surprised to discover that the WT:ELE had no recommendations for the format of example sentences. I thought of this after DAVilla made an edit to the page for listen that changed the definition section formatting from this:

  1. (intransitive) To pay attention to a sound, to note.
    Please listen carefully to what I am about to say.
    I like to listen to music.
  2. (intransitive) To wait for a sound, such as a signal.
    You should listen for the starting gun.

to this:

  1. (intransitive) To pay attention to a sound, to note.
    • Please listen carefully to what I am about to say.
    • I like to listen to music.
  2. (intransitive) To wait for a sound, such as a signal.
    • You should listen for the starting gun.

So, I thought I'd get feedback from the community. The issues I guess are threefold. (1) Should English example sectences appear in italics or not? (2) Should bullets precede example sentences? (3) Should we have a uniform policy on this added to the WT:ELE or not?

Personally, I like putting example sentences into italics because it reduces their visual intrusion among the defnitions. The version with italics places the visual emphasis on the definitions, but the version with bullets draws attentio to the examples instead. I also think that the use of bullets should depend on which way we go with italics; they seem unnecessary with italics, but necessary without italics. What opinions do other people have?--EncycloPetey 18:26, 13 January 2007 (UTC)

Are you looking in the wrong place? We used to have very specific example sentence format recommendations. --Connel MacKenzie 18:36, 13 January 2007 (UTC)
No. There is a good deal of information (in need of revision) concerning the inclusion of Quotations at Wiktionary:Quotations, but nothing concerning the format of user-invented example sentences. --EncycloPetey 18:43, 13 January 2007 (UTC)
Holy smokes. Well, rather than digging through the beer parlour archives to find out what the recommendation was two or three years ago, what do we want it to be?
Personally, I like the italics, absolutely detest the bullets, and think the inflected form of the headword should be in bold. I also understand that format (that EP used in listen) to be the current "accepted best practice." The italics are important, in that they visually remove the contrived stuff from the actual usable content. I think the bullets are overused as it is, even without adding them to example sentences.
Yes, WT:ELE needs to reflect the format. --Connel MacKenzie 19:15, 13 January 2007 (UTC)
I use bullets because on the first hand there was, a while ago, at the time that the style for quotations also changed, a turn against italics in general, with rejection also of multiple form-of lines in italics, on account of legibility, but more importantly so that example sentences and quotations can be intermingled. 01:34, 14 January 2007 (UTC)
I guess the main thing I like about the italics is that that format is consistent. I'm not sure a 'bot could easily change them all, at this point. Whatever we decide on, we should probably show a unified format. --Connel MacKenzie 01:47, 14 January 2007 (UTC)
The concern from my perspective is mostly how we format the examples (and quotes?) that are intermingled with definitions. I do agree with DAVilla (why isn't your signature showing up by the way?) that bullets are useful and desirable before each dated quotation in the Quotations section. However, the format of the Quotations section and Citations pages is a separate issue and should not be affected by anything decided as a result of this discussion. --EncycloPetey 02:17, 14 January 2007 (UTC)
I'm not sure they can be totally separated -- after all, there should eventually be many cases where a definition has both an example sentence (or maybe more) and a cite or three. They need to look acceptable together.
I too am concerned that the definitions should have obvious primacy over both examples and cites -- I believe that, for usability, both examples and cites should be listed after each definition (rather than, say, referenced by gloss on a separate page). However, if it is hard to find the defs amongst the examples & cites, this usability is lost. (I believe we should also move towards having quotations shown/not shown according to user preference, but the issue would still apply to those who preferred them shown.)
Why do we not use <small> for examples and cites? (Even better would be a font that was only 20% smaller than normal, but that is not so readily available.) --Enginear 21:32, 14 January 2007 (UTC)
Because a long list of citations, such as the ones we have for listen on its citations page, causes visual and formatting problems if you try to include them among the defnitions. Inserting a timeline and a dozen or more quotes between defintions (and hoping that consecutive numbering will still work!) doesn't seem realistic if we want also to keep the definitions visible and emphasized. The definitions get swamped even if the font size is reduced, and the numbering of definitions all but disappears from visibility. I once thought that having the citations on a separate page was a bad idea, but have come to see that this solves a number of problems, including the problem of where to put citations when we don't yet have an entry -- I've seen that happen several times.
In my opinion, any sentence(s) included among the Definitions should be the minimum necessary to provide context to distinguish and correctly interpret definitions. Any quote or example that does not assist clearly and cleanly in this goal should not be used. Any quote or example that is so long winded as to lose the reader should not be used. Any quotes or examples added beyond the minumum number necessary to demonstrate usage should not be used. Now, the Quotations section is different. That section exists for the purpose of historical documentation of the word's spelling and usage through time. There, we want multiple quotes from each century, from varied authors, of varied backgrounds, in varied contexts, using varied grammatical structures. Sometimes a quotation may be included among the definitions, but such a quote serves a very different function. --EncycloPetey 23:07, 14 January 2007 (UTC)
Yeah, I actually agree with you. I only add quotations under definitions because I'm active in RFV, where it's sometimes necessary to indicate which sense a quotation applies to. I don't expect them to stay there for eternity, although it would be nice if they somehow stayed linked to the corresponding definition. But all that aside, listen is actually a perfect example of where the two have to be intermingled, on account of the Shakespearean quote for the archaic sense. DAVilla 23:31, 14 January 2007 (UTC)
I disagree. I would prefer to see (though I accept that most users wouldn't) quotes showing all shades of meaning, with dates, interspersed between the definitions. I am beginning to wonder if there is any reason, other than our own convenience re RFV, for the informal "middle ground" de facto rule of 3 cites max per definition if interspersed. Apart from that, is there any reason for any middle ground format or do we believe that users either want
  • Definitions interspersed with one (or occasionally more) example sentences per def (the requirement for "normal" use) or
  • Definitions interspersed with many quotes showing gradations of usage and how that usage has changed over time (the requirement for those fascinated by the words themselves, or for serious research).
If these are the only two views required, then we could acheive both without too much difficulty, either clunkily by a citations page onto which the definitions were copied (manually or semi automatically) or better, by a facility for turning off quotes, having a comparable effect to turning off translations. (I accept that interspersed quotes would be more difficult to deal with than the single labelled block of translations, but presumably we could arrange some formatting to search for maybe even by use of small print. Some other formatting thoughts have also appeared elsewhere.)
I do think that, for a word with many meanings, it is essential to use the definitions to split the quotes into manageable chunks. While this could theoretically be done by glosses, it seems perverse to use a shorthand for the def adjacent to quotes showing great detail of usage.
For those who don't have access to the "academic" front end for OED, it does have buttons to turn on/off individually Pronunciation, Alternative spellings, Etymology, Quotations, Date Chart and Additions since last publication. I'm not sure any of the others are relevant to us at present, though I suppose Pronunciation might if we ever include a load of dialect pronunciations, and there are some who suggest that Etymology moves the defs lower down the entry than they deserve. But the principle might usefully be applied to some of the sections we have and OED doesn't, eg Synonyms.
I don't understand why a Citations page should be suggested as a repository for quotes for which there is not yet a def -- surely, such quotes could equally well be put on the main entry page, to enable another editor to write defs to match above them. And finally, apologies for derailing this thread originally about format of example sentences. Should we split this off elsewhere. --Enginear 21:18, 16 January 2007 (UTC)
Moving back left <--

Enginear, I agree that there eventually needs to be a better way to organize the various bits of data on the page; something like the translations templates that allow show/hide functionality might be more generally applicable. However, what was initially proposed was formalization of the format of example sentences in WT:ELE. Perhaps we can agree on this in the meantime? I suggest that a vote is in order. I would be up for a yes/no vote on keeping the format of the preponderance of entries (i.e. the italics, bold headword, as per first example above.) However, we could also make it an option vote, listing different alternates (much as several other stylistic questions have been voted on recently.) What do people think? --Jeffqyzt 18:23, 19 January 2007 (UTC)

Use/mention, independence and CFI

There's a pissing match on Talk:bukkake over which citations are valid. It shouldn't matter there, since with 6M ghits — twice and many as preempt, for example — bukkake is clearly in widespread use. Nonetheless the general point is valid.

CFI gives the example of “They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind,” as a valid cite, but the material on independence and the general rule that "A term should be included if it's likely that someone would run across it and want to know what it means" argues against it.

I'm of two minds, as I was when I put in the jib example to start with. In most cases it should be possible to come up with straight uses without definitions. On the other hand, it's a problem for terms for which online citations are rare. I have in mind older words and older senses like "curved fork used for cultivation" for hacker (see Talk:hacker). Another example would be terms found only occasionally in technical journals, but clearly in use in that community. It's quite possible that the first two of three uses of retrofrobulation come with definitions, but the third doesn't and the fourth, when it appears, won't.

All in all, I'd prefer to keep the jib example in CFI, but say that it fails independence. I don't see a problem with keeping use/define cites on the article, as they can make the entry easier to understand, but when it comes to RfV wars, they shouldn't count.

Conversely, cites that establish usage but give little clue as to the definition are probably better placed on the talk page. -dmh 16:01, 18 January 2007 (UTC)

Regarding the "independence" of the jib example, I don't see how that fails our WT:CFI#Independence criteria. That deals with cites from different works or in the same work referring back to the same original source.
Regarding a change of citation policy (allowing "mention" cites on def. page, relegating "use" but not very illustrative cites elsewhere), it might be better to have something along the lines of "particularly illustrative citations that fail use/mention may be added, but only if three citations that pass use/mention have already been added, either in the main definition page or in the page's discussion or quotation sub-pages." The actual location of the cites is still not standardized enough to be able to vary from policy (we have interspersed quotes, quotations sections, and quotations sub-pages, and formatting variations amongst those as well.) --Jeffqyzt 16:31, 19 January 2007 (UTC)
I think that's a good proposal. And yes, the jib example is use/mention, not independence. Ultimately it goes back to "and would want to know what it meant" in the general rule. -dmh 18:00, 19 January 2007 (UTC)

illiteracy

It is my understanding that a definition like "A denigrated word or phrase or grammatical turn, the use of which is brands the user as ignorant and uneducated (e.g., ain't, irregardless, or the use of the double negative).", apart from being bloated and ungrammatical, is blatantly POV. It's also only marginally accurate. I would expect that the vast majority of times someone says "ain't", the listener doesn't even bat an eye, much less think "that person is ignorant and uneducated". Giving an explicit list is a mistake in any case. Different people object to different things.

I would like to substitute "A denigrated usage."

I would also like to add the sense "The portion of a population unable to read." (i.e., illiteracy rate), as in "Elbonia has an illiteracy of 2.5%." I ran across this form several times while looking at the actual usage of "an illiteracy".

I have made both these changes (except that I neglected to remove the list in the first case), but both changes have been reverted as being POV, pushing some unspecified agenda and, bizarrely enough, "removing" a definition.

Could I get a quick straw poll as to whether these changes should stay or go?

-dmh 20:21, 18 January 2007 (UTC)

To begin with, I think it might be more clear if a word other than denigrated were used, as I didn't know the meaning of that word, and I imagine a lot of users don't either. At the very least, an entry for denigrated should be created, and linked to. Also, I think the whole section here is rather opaque, and could be worded better. Also, I think a compromise between the two viewpoints would be most appropriate. The wording does admittedly sound a bit POV, as dmh notes, but it's a worthy point that is probably worth keeping all the same. Perhaps it could go something to the effect of, "a word, phrase, or grammatical turn which is considered incorrect, sometimes with the implication that the speaker is ignorant/uneducated." Secondly, I've never heard illiteracy used in the second sense, it's always been Elbonia has an illiteracy rate of 2.5%. So, at the very least, you should cite an example of usage there. Thirdly, while this may be tangential, I think it important for both users involved in this debate to realize that neither one is acting as a vandal (i.e. inserting something they clearly know to be nonsense), but rather are simply trying to improve the article as they see it (albeit, perhaps somewhat aggresively and without regard to efficient communication). Cerealkiller13 20:43, 18 January 2007 (UTC)
I would object to "sometimes" in that wording, right off the bat. --Connel MacKenzie 20:52, 18 January 2007 (UTC)
  • Excuse me, but Dmh has suggested a stronger approach with regard to honoring appearance in other dictionaries. Please take a peek at some other dictionaries, to see just how common his strange definition is. Please also note that they are verbose in their wording, describing the contested meaning (contested by none other than Dmh, mind you) with pejorative terms.
  • I must note explicitly that Dmh's WP:POINT edits here were triggered by a discussion on this page, above.
  • Please don't use fictional references, like "Elbonia." That "example" really isn't helpful. --Connel MacKenzie 20:52, 18 January 2007 (UTC)
Upon further investigation, your objection to "sometimes" seems quite valid. Most other dictionaries seems to directly equate an "illiteracy" with an illiterate person. Perhaps, "a word, phrase, or grammatical turn which is considered incorrect and thought to be characteristic of an illiterate person." However, I fail to see the problem with the usage of fictitious references. The grammatical point seems to be made quite as well with Elbonia as it would be with some real country, and with rather less work. Cerealkiller13 21:04, 18 January 2007 (UTC)

Please accept that, except in one instance, I was not trying to make a point. That instance was the example for the third defn. I occasionally do such things, as do many people (e.g., the def of prætentious). I fully expected that example to be replaced by something more neutral. Otherwise, everything was straight-up Wiktionary. The (new) second def is attested. Try b.g.c for "has an illiteracy of" for several quick cites (a broader search will turn up more). There was absolutely no agenda behind that one. I put it in, without cites, in good faith. The (old) second def is certainly attested, but the def itself needs fixed. Cerealkiller's points and suggestion are good, though I might say "inference" (on the part of the listener) and not "implication" (on the part of the speaker). If the "illiteracy rate" def needs cites, then the "mistake" sense definitely does, as it's also rare and only seems to appear in — how can I put this neutrally? — books and articles about what usage might be considered incorrect. If one of the defs is to be removed for lack of cites, then so should the other. Finally, there's another sense, "ingorance", as in "cultural illiteracy" or "illiteracy of (some subject)". Very much like using "misnomer" to mean "myth", except that that's jumped on as "incorrect". Who knows why? -dmh 21:16, 18 January 2007 (UTC)

How about "a word, phrase, or grammatical turn thought to be characteristic of an illiterate person." N.B. I'm not objecting to the use of "incorrect" here. It's quite appropriate. It just seems redundant. -dmh 21:18, 18 January 2007 (UTC)
I think that's a highly appropriate definition. As far as citing it: [1] and [2]. The first one is OED online, and so you'll probably need a subscription of sorts to see it. I must admit that this sense does not seem to be obscure in any way. I have great faith in the folks at Oxford. As far as the "rate" sense goes, a quick google search of "illiteracy of" brings up only the first sense. Also, I highly agree that the sense of illiteracy as in being ignorant of a certain subject (regardless of an ability to read) is a good idea. Cerealkiller13 21:29, 18 January 2007 (UTC)
Again, try "has an illiteracy of" (or probably "with an illiteracy of"). I found the "rate" sense by trawling through the results of "an illiteracy". It really is there.
That's a reasonabe segue, actually. I wanted to make a point to Connel, but I think I'll make it here since a lot of people here don't know me. From comments of Connel's and reading through w:WP:POINT, it appears that at least Connel (and who knows who else) believes I'm trying to game the system by getting all kinds of random words into Wiktionary that don't belong there. Not so.
I've entered (and found cites for) words like starfucker or twistification in good faith. I can honestly see no reason why they shouldn't be in a serious dictionary (and in fact, twistification is already). They certainly meet CFI, and I have no problem with CFI. In fact, I had a hand in drafting the current version.
In the past, I've often trawled RfV (or RfD before that) looking for words that were clearly attested and had no business being deleted. Certain kinds of word appear there disproportionately, particularly slang and profanity. No one seems to care whether hydrogen harmonicon or ignipuncture are attested. But I'll look up anything (e.g. bogotify and Santana wind). I want to tread carefully here, but preferentially RfVing words simply because they "shouldn't be in a dictionary" and not simply because they seem unlikely to meet CFI, comes dangerously close to gaming the system in order to undermine CFI. If you don't like CFI, and evidently some don't, propose changes.
I realize this presents the real danger of a vicious circle. I (and others, I believe) push hard to keep in words that clearly meet CFI. Those who don't like this work harder to RfV more words and try to raise the bar on citations — for the objectionable term at hand, at least. I'd rather get away from that, but these things are very hard to dig out of once dug into. -dmh 22:05, 18 January 2007 (UTC)
I hasten to add that as far as I can tell, people work within the RfV/CFI process almost all the time. The disputed cases are a fairly small minority. Even in those cases, I'm more inclined to believe unconscious bias (everyone has one) than a conscious attempt to game. -dmh 22:43, 18 January 2007 (UTC)
It may be POV to consider a person to be illiterate on account of a particular word they use, but it is not POV to warn that a word is an illiteracy if there are biased people who consider that word to be such. DAVilla 05:21, 19 January 2007 (UTC)
I'm fine with that (see my reply to Connel's straw poll below). The problem is that putting a list of "illiteracies" in the definition asserts that these are illiteracies the same way putting "table" and "computer" in the definition for noun asserts those are nouns. OTOH, a list would probably be OK with the proposed definition above ("a word, phrase, or grammatical turn thought to be characteristic of an illiterate person."), since we're asserting that the words on the list are thought to be characteristic of an illiterate person. Since the discussion seems to have gone quiet, I'm going to put that definition in (without a list). If that's not good enough, we can always edit. -dmh 16:36, 19 January 2007 (UTC)
WTH? 11 hours is hardly quiet! And you aren't pushing a POINT? --Connel MacKenzie 16:44, 19 January 2007 (UTC)
Dmh, it is clear you are thwacking this entry just to rattle cages. Your second definition has no bearing on this page - it is a definition for a derived (two-word) term. It obviously does not have precedence over the real definition. Removal of the examples is likewise POV; the very few examples illustrate the classic use very well, but you insist on removing meaningful information because they are pet terms of yours. I cannot AGF for your latest edit. --Connel MacKenzie 17:31, 19 January 2007 (UTC)

This needs to move to WT:TR. --Connel MacKenzie 16:53, 19 January 2007 (UTC)

No, Connel, I'm not pushing a point. I'm trying to get on with editing in good faith. I removed the examples because the cites provide perfectly good illustrative examples, and I thought we had a long-standing preference for real examples over invented ones (I've certainly been chidden for that before). I added cites supporting a sense I don't even like, for crying out loud. The quotes for the second sense have "illiteracy of" because searching for that was the easiest way of separating it from the other senses. You could just as well argue that the third def is just a definition of "is an illiteracy", which I chose for the same reason. If you want to dispute the analysis, provide evidence. I'm about tired of bald assertions with no backup. What cites have you looked through? Any, besides the ones on the page?

Interestingly, "illiteracy above" and "illiteracy below" almost always mention rate in some form, while "illiteracy of" generally doesn't. But there are occasional exceptions: "And, too, in my corner of the world, illiteracy is at 27%, with functional illiteracy above 40%." There's no of in the sentence, but illiteracy clearly means "rate of illiteracy" rather than "inability to read". -dmh 18:21, 19 January 2007 (UTC)

I took out the "brands the user" POV stuff because, well, see above. No one seems to have jumped up in favor of the old way, and with the input above we have something better than either the old one or mine. Hey, the process works! But is there any real reason, other than keeping you from reverting changes you don't like, to drag a perfectly ordinary discussion through the beer parlour and possibly on to tea room (where, admittedly, it probably should have gone first, if it was to move at all)?

I'd invite anyone to look at the old version and the present one (or my last, if you've reverted it again) and try to discern any hidden agenda from the changes. I've told you exactly what I've done and why. I've spent quite a bit of time looking at real uses. I've added information. I've helped improve a definition. Reverting that kind of actual dictionary work because you think it might be ... oh Hell, I don't even want to try to guess why. It's just rude. Please stop. -dmh

Because you are unable to check other references when you are making bald assertions?
  1. http://dictionary.reference.com/search?q=illiteracy
    a mistake in writing or speaking, felt to be characteristic of an illiterate or semiliterate person
    An error, as in writing or speech, made by or thought to be characteristic of one who is illiterate.
  2. http://m-w.com/dictionary/illiteracy
    a mistake or crudity (as in speaking) typical of one who is illiterate
  3. http://www.bartleby.com/61/62/I0036200.html
    An error, as in writing or speech, made by or thought to be characteristic of one who is illiterate.
Nothing that was there was POV, but your removal of correct information is. You know perfectly well that you met immediate resistance to the changes you did; pretending that a brief period of sleep is "long enough" for people to comment, or even notice your latest POV edit is absurd. --Connel MacKenzie 18:14, 19 January 2007 (UTC)
Yes, I would think that 36 hours was more appropriate before believing a discussion had gone quiet. Please both of you, your time is too valuable to spend bickering over a little-used word. Some general points were raised by each of you, which were valuable in setting some of us thinking, but the nitty-gritty of this word is not worth the effort. Please take the advice somewhere in w: go edit elsewhere for a few days. You can always return to this later. --Enginear 20:09, 19 January 2007 (UTC)
Fine, I'll leave the entry alone for a while. It's strange indeed that any of the edits in question would have to have this much ceremony about them. Wiki means quick, right? I'm still baffled as to any of the changes I made was POV. I have two POVs (PsOV?) here: The third def is silly (but attested), and Connel would rather just say things like "that's a sense of illiteracy of" and "ingenuitive is a misspelling but I won't tell you of what" than take the time to find out what's really going on. Maybe I'm wrong, but they're my viewpoints. But do I put those into the definitions? No. The old definition says "ain't, irregardless and double negatives brand you as illiterate". This is a blatant POV statment about those constructions. It's like defining "Silly" as "behaving in a foolish manner, as for example ... ", well I won't name names. The new definition says that illiteracy is used to mean something that can make people think you're illiterate. This is an NPOV statement about illiteracy. The headword is illiteracy, not illiteracy, ain't, irregardless and double negatives.
The one POV sin I committed was the example for the third def. That's long gone. I didn't expect it to stand to begin with and in the event I took it out myself. -dmh 22:36, 19 January 2007 (UTC)

300K+ entries

Somewhere I saw an assertion that, at 300K entries, we're ahead of or competitive with print dictionaries. While digging up examples elsewhere, I've taken samples with "random page" a couple of times. I didn't record the results, but the majority of the entries I ran across are not English headwords. The major categories seem to be, in approximate order of prevalence:

I wouldn't say the Spanish inflections are the majority, but definitely the plurality. Scaling back down from the random sample, I'd guess there were closer to 80-90K English entries. Not bad, but OED2 has what, 500K? 1M? -dmh 22:14, 18 January 2007 (UTC)

Do we normally include possessives like vessel's? RJFJR 22:20, 18 January 2007 (UTC)
Anyone that asserts we are ahead of other print dictionaries is completely out to lunch. We still have less than 100,000 uninflected English headword entries. Last time I checked, something like 30,000 entries have "Translations" sections. --Connel MacKenzie 23:13, 18 January 2007 (UTC)
Your random search agrees with Wiktionary:Statistics, which shows that about one-third of our entries are English, and about one-third are Spanish (most of the figure for Spanish came after running the Spanish verb inflection bot). --EncycloPetey 23:53, 18 January 2007 (UTC)
Of the entries for a given language, do we or can we break down how many are simple regular inflections, soft redirects and such? As I recall there used to be a few heuristics for that. -dmh 04:02, 19 January 2007 (UTC)
Have you written any of those statistics programs? I'd like to see your results, if you have. --Connel MacKenzie 05:59, 19 January 2007 (UTC)
No. I don't even have a good idea of where to start (e.g., is there a good Wikimedia API out there someplace? Probably.) -dmh 16:28, 19 January 2007 (UTC)
There are many. http://en.wiktionary.org/w/api.php or http://en.wiktionary.org/w/query.php are the latest up-to-the-minute ones. Toolserver is of course, the logical place for such a thing. And http://downloads.wikimedia.org/ is (as always) the least abusive to the servers. http://sf.net/pywikipediabot has the bot framework. Meta: has a lot of api pages, Wikipedia has a few also (even though they have all been moved off Wikipedia several times, they tend to reappear there.) --Connel MacKenzie 17:00, 19 January 2007 (UTC)

Wiktionary:About Hebrew

Those with any knowledge of Hebrew, or with Wiktionary experience in non-Latin-alphabet-using languages, are invited to drop in at the brand-new Wiktionary:About Hebrew policy think-tank and put in their two cents. Ruakh 20:29, 19 January 2007 (UTC)

coördinator, preëmption, preëmptive, preëmptively, preëmptions, etc.

User:Doremítzwr is again busy creating complete pages with nonstandard spellings again. We went through this some time ago and I seem to remember that we were allowing only {{alternative of|...}} for these things. User:Doremítzwr is insistent on keeping these spellings as the principal entries. (These spellings were used in the U.S. during the mid 1900’s, but have long been abandoned. User:Doremítzwr seems intent on reviving them.) —Stephen 21:03, 14 January 2007 (UTC)

Also, preëmpt, preëmpted, preëmptible, preëmpting, preëmptor, preëmpts. —Stephen 21:45, 14 January 2007 (UTC)

Due unto the (seemingly well-founded) fear of having my work here deleted without warning, I make sure to add three citations per entry for most things I add here (thereby satisfying WT:CFI and preëmpting WT:RFV). Nevertheless, Stephen G. Brown decided to strip preëmption and preëmptive of their newly added content, reducing them unto alternative spelling entries. Both the fused and the hyphenated spellings thereof were uncited and contained far less content than the entries for the diæretic spellings that I had added.
The solution proposed by Widsith at Talk:coördinator, unto which I agreed, was to keep full entries for all alternative spellings. Unto his credit, Stephen G. Brown ceased deleting my work when I directed him thither. The discussion unto which he refers can be found, archived, here; as you will see, the conclusion reached thereïn is not as Stephen G. Brown remembers it.
These spellings are attested and standard (in accordance with the defintion given at nonstandard), and some citations I have provided are from twenty-first century sources. They are functionally superior spellings (compared with both their fused and hyphenated forms), for reasons given in the above linked archived discussion, and here.
I don’t know what Stephen G. Brown is trying to achieve by bringing this to the Beer Parlour, but in any case, this is my preëmptory defence. Raifʻhār Doremítzwr 22:01, 14 January 2007 (UTC)
Well it's not policy, but certain influential persons here carry the opinion that "all soft redirects should be strongly discouraged", so replacing the content of the pages apparently wouldn't be appropriate, especially if there is anything that distinguishes the page such as quotations etc. What does it mean to have a spelling as a "principal entry"? Can they all be given equal weight? DAVilla 22:37, 14 January 2007 (UTC)
Do we really need this sort of disruption? --Connel MacKenzie 23:10, 14 January 2007 (UTC)
Perhaps not, though I think there remains a lot of confusion on the general issue of "what on earth to do with alternative spellings of the same term", to date. My take on it from the reading so far is "create duplicate entries, and try to keep them in synch with the originals, rather than create redirects", but I'm not even sure that is consensus-correct at this point. Maybe the policypage on this topic (see my original appearance on this page >;-) needs an update, based on talk archives here and other relevant places? That is, disruptions of this sort might be less frequent if policy were clearer, and no longer marked "draft". NB: I side with Stephen on this particular issue; the diaeresis spellings are weird and obsolete (the fact that they are functionally better is kind of irrelevant; it would also be functionally better if we used upside-down questionmarks like the Spanish do at the beginning of written questions, or started using N-Adj[-Adj...] order with adjectives. But we don't actually do this in English.) I do think such variants should be recorded, though, if sourceable, and I wouldn't want to see content removed from the dictionary entirely on the basis that it appears in an alt. diaeresis spelling; if those spellings are to be reduced to some form of redir, the content (what Doremítzwr refers to above) should get moved to the "main" article, I suppose. At any rate, I seem to recall this overall issue touching on years-old flame wars about Commonwealth vs. US English, and so on, so I image there are lurking pitfalls to be found here... — SMcCandlish [talk] [contrib] 21:37, 15 January 2007 (UTC)

Oh ... my ... God ... If we let words like preëmpt into Wiktionary, there will be entries in Wiktionary for these words! What exactly are we trying to protect ourselves from by keeping them out? The New Yorker still likes this form (e.g., "In title and structure, Goodwin’s “Team of Rivals” echoes the idea, popular among management consultants, that a company can preëmpt the natural selection of the marketplace by having its divisions compete against one another internally." [3]) as do a few other sources, I think.

The interesting thing to me here is that the diaeretic form is rule-based: Put a diaeresis, or diæresis if you like, after the second of two vowels that are to be pronounced separately. However, it only really applies where there is liable to be confusion. It doesn't apply, for example, to -ing. There may be a hard and fast rule; I would expect the New Yorker style manual has one. There may also be more than one hard and fast rule.

In any case, what is the harm of recording these uses, as long as they're attested? That people might think that they're in use, or have been in use? They are, and have been. If the problem is that people might think they're "correct", we can deal with that. We've dealt with the same issue elsewhere (with varying degress of success).

I will repeat my strong opinion that trying to resolve disputes like this by deleting entries made in good faith runs completely counter to the whole Wiki philosophy. -dmh 22:00, 15 January 2007 (UTC)

No one has deleted the entries or suggested that they should be deleted. Just like many other similar cases, such as propeller, paralipsis, alright, kaputt, teddybear, and son-of-a-bitch, these terms should get Template:alternative spelling of, pointing to the standard spelling, which is without diaeresis or digraphs. These spellings were implemented in U.S. primary schools for about ten years from 1950 to 1960, then abandoned. Only a few of the students surviving from that era, and perhaps a small number of crackpots, still use these spellings. User:Doremítzwr doesn’t simply wish to document them, he wants to revive them, along with many other weird spellings that were not even part of that failed experiment, such as whereäs and prætentious, not to mention a slew of oddball plurals such as octopodes that he wants to normalize and even prescribe. —Stephen 22:24, 15 January 2007 (UTC)
Right. So the problem is the precription, not the words themselves. I'm fine with the alternate spelling approach you describe. I thought it was SOP, but things change. As to whereäs and prætentious the regular RfV process should work fine. Have we been through one round of it already with them? If so, yes, they should be gunned absent new evidence. If not, we need to go through one round.
In cases like preëmpt, the spellings haven't been abandoned, nor would I call the editors of the New Yorker a small bunch of crackpots. Even if they have been abandoned, we do keep words which are only historically attested, and so we need to keep them on that basis.
I said "a few of the students surviving from that era". I myself was in primary school at that time, which is why I, like the editors of New Yorker, know about this. I, however, along with the rest of the U.S., abandoned these strange markings. Again, I never suggested, and no one has suggested, that they be deleted. To repeat, they need to be kept, but marked as alternative spellings using the template I named. —Stephen 23:36, 15 January 2007 (UTC)
You and Chloë Sevigny, no doubt :-). -dmh 06:09, 16 January 2007 (UTC)
User:Doremítzwr, if you're trying to actively promote spellings and forms that you feel are right, or more logical, or whatever, please stop. As others have pointed out, Wiktionary is about recording usage as it is, not as anyone thinks it ought to be. There's a slight tricky area where a prevalent usage is particularly likely to draw flack for being "incorrect", but except perhaps for octopodes, we're not in that area. As far as I can tell we're either dealing with things like preëmpt, which are have been current and are still used, albeit rarely, or things like whereäs and prætentious, which (unless there's new evidence) don't seem to have been used, ever. The former stay as alternate spellings, the latter go.
Stephen G. Brown, regardless of who has what agenda, it might help to back off a bit. Stridently criticizing others has no place in Wiktionary, unless I do it (that's a joke, Son).
You have arrived in the middle of this business. We went through all of this in great detail with User:Doremítzwr some time ago when he first started with this business. He was crestfallen to learn that we refuse to take a prescriptive stand and that we do not support his fantasies, but I was under the impression that he had come to terms with it and was cooperating (or, as User:Doremítzwr writes, coöperating). But he has suddenly jumped into it again, with a vengence. I’ve referred the matter here because I am out of patience and do not want to deal with it any further. I have explained as well as I can, and I should repeat myself that no one has suggested that the entries be deleted ... only that they be marked as alternative spellings in the same was as alright and propeller. What he is doing is harmful to this project, but I’ve had enough and will now resume my normal work. —Stephen 23:36, 15 January 2007 (UTC)
We do need to clarify how CFI interacts with alternate spellings. It's a fair criticism that we appear to play favorites in requiring cites for preëmpt but not preempt. I think right now preempt is covered under the nebulous "clearly in wide use" concept.
There seems no harm in allowing cites on an alternate spelling page. However, unless there is a consistent difference in meaning between the two spellings, I would just as soon have the definitions in one place. That place would be the most prevalent spelling. If there is no single prevalent spelling — either because of regionality as in apologize/apologise or because two or more spellings are within an order of magnitude of each other — then one might consider duplicate entries. Note that the infamous color/colour are not duplicates, as there are senses unique to one or the other. -dmh 22:56, 15 January 2007 (UTC)
(Replying to —Stephen) I had thought another well-known mag still used the diaer funny marks as well. The Economist perhaps, but I didn't see it in the issue I scanned. I'd also want to make sure that it's not more common in other English-speaking areas. Neither the Torygraph nor the Grauniad seems to use it, and like I said, it seems rare outside the Algonquin crowd, but it wouldn't hurt to check whether there's a regional element.
I believe that what may be bothering User:Doremítzwr here is that the usual attitude toward prescriptive frippery is "put up (cites) or shut up". This works fine in most cases, but in this case User:Doremítzwr is providing cites and the supposedly unobjectionable articles don't provide them. This gives at least the appearance of inconsistency.
It's cases like these where frequency matters. If all we have to go by is that at least three people have used preempt somewhere and at least three have used preëmpt somewhere, why not say that preëmpt is the primary entry and complain when material there is removed and the spelling is marked "alternate"? We do the same thing, I believe, with borrowings that are accented in other ways (and that, say The Economist would include with accents), but are often spelled without them. See for example our entry on deja vu, which for the last year and a half or so has worked just the way User:Doremítzwr is suggesting.
The spelling section in CFI is unfortunately weasely. I pushed for following prevalence, period. Anything within a factor of 2 or 5 or 10 or something so was to be considered roughly equivalent, so ghits are generally good enough to sort things out. For example, octopodes is clearly very rare compared to octopuses and octopi, which are about equally prevalent. However, Ec didn't like that on the grounds that some mispellings are more prevalent than the correct spellings. This seems like prescriptivist doublespeak to me, but I'd spent enough time butting heads CFI in general to worry about that particular fine point.
Right now there is no real standard for which spelling is primary, and in fact there's a precedent for exactly what User:Doremítzwr is doing. Plus, there are actual cites, including recent ones, so it's hard to dismiss as "fantasy". IMHO we need a better consensus on how to handle alternate spellings.
BTW it occurs to me that whereäs and prætentious are classic "defined in hopes of use" protologisms. -dmh 06:09, 16 January 2007 (UTC)
You say there is a precedent? For subverting the real spellings in favor of the obscure ones? Is that a precedent here? --Connel MacKenzie 12:30, 16 January 2007 (UTC)
I would vote to keep prætentious in exchange for the permitted use of a context label of the same name. :-) Seriously though, how are the more common spellings subverted? The spelling pre-empt still points (redirects, yikes!) to preempt rather than preëmpt. I wouldn't object at all to what Stephen is doing if, instead of arguing against constructive edits, he were to simply wait for the two versions to get out of sinc and then merge them at the more common spelling, a process that will likely flatten out years from now when all of us have moved along. In the meantime Doremítzwr is doing a better job at defining those words than the rest of us have done, at the same time releasing control of his work. I'm quite confident that those definitions are going to end up at the more common spelling in the end, and aren't you? So just let him has his fun for now, and let's start talking about how to objectively apply labels, measuring acceptance beyond the minimal inclusion criteria. DAVilla 19:31, 16 January 2007 (UTC)

The articles created by User:Doremítzwr should really be alt spelling entries but there is no harm in having citations there, I have to insist that they are labelled {{rare}} or {{non-standard}} and have Usage notes also to make it clear that this spellings are by no means the norm.--Williamsayers79 11:32, 16 January 2007 (UTC)

The New Yorker magazine's use of spurrious diacritics is intentionally prætēntîøůs. While I had thought a concession might be made to calm the situation, by using {{rare}}, it really is more appropriate to mark them non-standard. But to be complete, these forms need to be marked as {{offensive}} and {{cattag|non-standard}}, with a generic ===Usage notes=== disclaimer, explaining why they should be avoided. --Connel MacKenzie 12:30, 16 January 2007 (UTC)
Connel, I'm assuming your recommendation about marking them {{offensive}} is merely tongue in cheek. Surely, no one but a grammarian would find them so (they'd just find them weird.) I'm sure there's a good technical reason why we haven't done it before, but why can't the {{alternative spelling of}} point to some kind of disambiguation-style page (the entry could say something like "this word may alternatively spelled" <-- link would be to the list-page), thus bypassing the whole "primary" entry question? Of course, articles at the different spellings would diverge, but the current system certainly doesn't prevent that. --Jeffqyzt 20:03, 16 January 2007 (UTC)
Only partly. If someone is going out of their way to be pretentious, a sane reader should be offended. --Connel MacKenzie 01:23, 22 January 2007 (UTC)

Well said, DAVilla.

Connel, I thought I made it clear that deja vu was at least one precedent. When I looked, it was hard redirected to the fully accented version which is the "real" spelling. In French, that is. In English prose it should only appear in italics. It's (arguably) not even English, whereas the funny-mark version of preempt is English, just rarely used. I'm also saying that if all we go by is three cites, who's to say which spelling is primary?

In any case, "subverting"? "real spellings"? Goodness. Best we stick to our descriptive knitting, note how often variants occur and adopt a uniform rule for handling variant spellings. -dmh 04:58, 17 January 2007 (UTC)

Your statement re deja vu appears to be wrong in two respects:
  • I have checked the diffs for the last two years and during that time it has never had a hard redirect, although for most of the time, and at present, it has a soft redirect.
  • In the UK, at least according to OED2+, déjà vu is the only correct British English spelling, although out of all the (approx 30) cites using the term, as part of six different entries, they do have one use of déjà-vu and one (from 2004) for deja vu. The other approx 28 are for déjà vu.
So a soft redirect is acceptable according to WT:ELE, etc, even though my personal preference is for separate full pages. --Enginear 11:21, 17 January 2007 (UTC)
You're right about the redirect. The point about deja vu/déjà vu is that we insist that the accented form is correct for it, while we insist here that the un-funny-marked form is correct for preempt — and some insist that the funny-marked form is not even a word. I'm not claiming we have to be consistent in that regard. I just want to know the basis for deciding. As far as I'm concerned, the basis has to be actual usage.
BTW, is déjà vu even the correct French spelling anymore? -dmh 14:44, 18 January 2007 (UTC)

Note that these entries (preëmpt, etc, as well as campi and scenarii) have been create in part to support this user's continual edit-warring on the wikipedia. He routinely points to the entries he created or modified on the wikt to "justify" replacing (e.g.) "campuses" with "campi" (!). Why the 'pedia puts up with it I don't know; but I propose that we do not. If they can be cited, they should be tagged non-standard and rare. (and adding a usage note about the failed '50s spelling reform is good) There certainly should be no implication that these forms are in any way accepted in standard usage. (See w:Poisoning the well history and talk pages, w:Community of Christ ditto) Robert Ullmann 15:37, 21 January 2007 (UTC)

Etymology references

A vote has been requested for determining whether to allow references in the etymology as entries. I'd like to hash out the options before actually calling a vote. The quintessential example is an entry that everyone would consider to be an encyclopedic topic. McDonald's might be included on other grounds... how about Abraham Lincoln? (Okay, I'm not being too original here.)

Some proposed solutions (originally by Connel, modified) are:

  1. All references should be to local lexical equivalents of their Wikipedia counterparts, which would be full entries or at least soft interproject redirects. (Hard interproject redirects don't work, which is probably a good thing.)
  2. Local references (as on Honest Abe) should be limited to shortened forms, e.g. "Abraham Lincoln" instead of "Abraham Lincoln" as I understand it.
  3. All references that are proper names (including McDonald's) should simply point to Wikipedia.

The current de facto common law whatever, which is to use local references only when they exist, is unstable. DAVilla 23:10, 14 January 2007 (UTC)

There are some causes for using wikipedia links, some for local entries. While I have some personal thoughts on this one, I've run low on the brain power necessary to articulate them. Too many cans opened at once and there are worms everywhere. --EncycloPetey 23:24, 14 January 2007 (UTC)
Note that languages in the etymology are pointed to Wikipedia. DAVilla 10:11, 15 January 2007 (UTC)
By the way, that is the strongest argument yet, that I've seen for #3. --Connel MacKenzie 19:38, 15 January 2007 (UTC)
I don't understand #2. One WP, if you created wikitext that read Abraham [[Lincoln]] it would be expected a) that no article existed for Abraham Lincoln himself, and that the Lincoln wikilink would go to an article about Lincolns in general, perhaps a history of a notable family. Since "lincoln" isn't a word in English (that I know of) why would one use Abraham [[Lincoln]] on Wikt. instead of [[W:Abraham Lincoln|Abraham Lincoln]]?
See Appendix:Names. The names themselves (not a historical figure who had that name) may have interesting etymological information. --Connel MacKenzie 12:09, 16 January 2007 (UTC)
But (not to sidetrack this thread too far) that interesting information is seldom of relevance to the etymology at hand. However interesting the origins of Lincoln may be, they have only the most tenuous connection to the etymology of Honest Abe; of far more relevance would be w:Abraham Lincoln. -- Visviva 04:04, 17 January 2007 (UTC)
Yeah, I don't really understand it either. Perhaps Connel had something else in mind? To my memory he'd written abbreviations and acronyms originally. DAVilla 21:11, 21 January 2007 (UTC)
Just to be clear, DAVilla, are you referring to other dictionaries, encyclopedias, etc. (e.g. Oxford English Dictionary) when you say "references"? Or do you mean something else? Is this a proposal to have local definition articles to refer to in Etymologies/References sections of entries? --Jeffqyzt 20:11, 16 January 2007 (UTC)
References = links. DAVilla 21:07, 21 January 2007 (UTC)
Although soft redirects are not a bad thing, I would strongly favor option 3 for most cases. The obvious exception would be where a proper noun is used attributively with a specific sense that the Wikipedia article is unlikely to cover (I can't come up with an example off the top of my head). -- Visviva 04:04, 17 January 2007 (UTC)
I have to reject #3. Otherwise, we redirect every capital city name, country name, language name, constellation name, scientific taxon name, etc., and for these we decided long ago we decided that we wanted them as entries, in part because we wanted to be able to list the translations into other languages. --EncycloPetey 04:46, 17 January 2007 (UTC)
Place names and language names we could always discuss as exceptions to this rule. Do you have examples for any of the proper-name science terms being used in the etymology? DAVilla 21:11, 21 January 2007 (UTC)

Put up or shut up

Following my previous assertion that any serious dictionary has clear rules and applies them consistently (or as Connel likes to rephrase and atrribute to me: "there should be no rules at all and anyone should be able to do anything at any time"), I'd like to pose a question to everyone who wants to keep "illiteracies" and whatever else out of the main namespace:

What are the rules?

CFI hasn't changed significantly in months, but oh my, has there been a bunch of complaining about it. If you don't like it, fix it. Give new criteria that will do a better job of sorting wheat from chaff. These need to be clear and as objective as possible. "Illiteracies must appear in another namespace" is not objective. "Citations must be from respectable sources" is not objective. The three-cite/one-year/durably archived rule is far from perfect, but at least it's reasonably objective.

Keep in mind that our aim is "to describe all words of all languages".

Discuss. -dmh 22:47, 19 January 2007 (UTC)

Being "a usable, "real" dictionary" is not objective, but it is better than a "free-for-all" which is. The discussion has started. In your words, "Put up or shut up". --Enginear 16:53, 20 January 2007 (UTC)
I don't see a proposed change in there. Do you have one?
I believe my position is clear, but just to be sure:
  • CFI is not broken. It lets in words people don't like. So what? It's not like someone looking up potato is going to even know that your favorite example of "nonsense" is here. Our stated mission is "all words of all langauges", not "all words of all langauges except the ones people don't like".
  • The RFV process is OK, except that people routinely use it abusively to try to kick out words they don't like, often without even checking whether they meet CFI. In the past, people would even delete entries that were known to meet CFI because the cites hadn't been put in the article. I would hope that's stopped. In most cases it should be enough to say "b.g.c gets 80 hits. Most are valid." or "b.g.c gets 80 hits, but none are actually uses. No independent web hits" (and so go ahead and delete). The rest is RFC, not RFV.
  • It would be good to standardize our handling of alternate spellings (in, noting non-standardness where appropriate) and typos (out).
  • It would be good to standardize our handling of non-standard/archaic/vulgar and other such, though that's been contentious for years.
What, exactly, would you say I need to "put up" here? -dmh 18:09, 20 January 2007 (UTC)
I suppose what I would say to this is that this entire conversation is utterly without merit. You say that your definition of illiteracy should have been put in there. But the simple fact is, that it would have been a decent entry without that definition, and most people would have figured out definition #2 from #1. Connel seems to think that the whole of Wiktionary will utterly break down if we let a few obscure slang words in. The simple fact that you don't seem to realize is that neither one kills the project. If ricockulous gets in Wiktionary or gets deleted (it got deleted, as it turns out), its not the life of the project. But two people bickering like small children over inanities, filling up the discussion pages, is really distracting. The simple fact is that CFI works, in a general sense. Sometimes it does kill words that, technically, should have been kept. However, those words are nearly always obscure words, that don't have a very high priority for being on Wiktionary. It is quite impossible to put down in writing an exact, completely objective set of definitions for CFI, it always has to rely, in part, on the judgement of the people running RfD and RfV. It would be in the project's best interest if this whole discussion was just dropped, and the next time a word that you had a thing for gets cut, learn to live with it, because it probably was an odd, obscure word, and its loss to Wiktionary is not severe. If you want to put obscure words in Wiktionary, you're certainly allowed to do so, it's simply harder. You have to fight tooth and nail to prove that it's a used word, what time period it's from, what region, etc. And that's really the way it should be, because otherwise, as Connel said, we'd become the Urban Dictionary, and no one wants that. Cerealkiller13 19:11, 20 January 2007 (UTC)
I agree that much of the flaming here has been pointless, and I'll own a certain amount of it. But let's stick to the topic. The idea that "obscure things should be hard" seems harmless, but it's not (as someone working on ancient Greek ought to recognize :-). Words don't get RfV'd here because they're obscure. They get RfV'd because people think that CFI is too broad and such things should't be in Wiktionary. If CFI is too broad, offer a fix. Don't try to game the RfV process to discourage people from entering terms you don't like ("you" is generic here; I don't think you (CK) do that). If you can't fix it, then RfV only if your quick google search turned up absolutely nothing promising, or you have some other good reason to think that the term doesn't meet the CFI we're currently using. Otherwise AGF and leave it alone. -dmh 20:20, 20 January 2007 (UTC)
(To be fair, quite a few RfVs are legitimate "no google hits" challenges. I only dispute those if there is other evidence available, generally because the term is old. This doesn't happen often.) -dmh 20:31, 20 January 2007 (UTC)
  • Dmh, that repeated LIE is borderline slander. You know perfectly well, 99.9% of what is nominated on WT:RFV has already failed preliminary tests before being nominated.
  • As far as your obfuscation: the earlier conversations (which you are going out of your way to undermine) were observations that #1) CFI is broken, and #2) requests for ideas on strengthening it. The obvious solution is to require at least 10 valid print citations spanning ten years. But just as obviously, that would put too much burden on our already overtaxed volunteers. Your ridiculous assertions that CFI should be weaker, are not appreciated by someone who patrols Special:Recentchanges, such as me. --Connel MacKenzie 22:07, 20 January 2007 (UTC)
How about starfucker, which I entered with a cite from a nationally published magazine in the article, and which gets over 80 b.g.c hits and 185K web hits. Which 999 other terms failed preliminary tests? I've already pointed out that "quite a few RfVs are legitimate" and I don't doubt the good faith of many of the others.
In any case, we now have a concrete proposal:
  • Require at least 10 valid print citations spanning 10 years.
(whatever a "valid print citation" is)
To that I'll add another that I've seen:
  • Restrict Wiktionary to terms occuring in some agreed-upon set of print dictionaries.
Does anyone support either of these? Connel, I believe you said don't support the first. Maybe we could generalize it a bit to:
  • Increase the number of cites needed, or the time span, or both.
So, what does anyone think about any of the above? -dmh 23:20, 20 January 2007 (UTC)

In parallel to whatever we do (or don't do) with CFI, here's a solution to RfV wars that's so simple it hurts. I wish I'd thought of it sooner:

  • Anything that turns up on dictionary.com or askoxford.com is automatically in. RfV can be removed on sight.
  • Everything else is automatically RfV. We then follow the existing process.

This should remove any dispute about what belongs on RfV. It lets in everything that's in current dictionaries (with maybe a very few exceptions that the online sources don't cover). It requires people submitting new terms to show verification. It's quick and easy to check, and unambiguous. Less work all around. -dmh 01:56, 21 January 2007 (UTC)

I was cautioned against systemic copyvio solutions such as that. I think dictionary.com might take exception to being considered a bona fide part of WT:CFI. --Connel MacKenzie 07:21, 21 January 2007 (UTC)
Could you elaborate? I can completely understand how "Wiktionary will include exactly the same set of words as dictionary.com" would be a copyvio. This is different. This is "words not in some agreed-upon set of easily checked dictionaries are subjected to extra verification before (possibly) being included in Wiktionary." I agree this is a serious issue, and could be a showstopper for this approach, but I'd like to know more. -dmh 14:20, 21 January 2007 (UTC)
Can I elaborate on how I didn't pursue it, because I was advised against pursuing that route? What? You are talking about using another secondary source to systematically verify this secondary source. How could that not be a problem? --Connel MacKenzie 20:06, 21 January 2007 (UTC)

Hmmm... have to disagree. First, ten years is too long. We would have to wait several years yet to put in texting. And in the case of words that are part of a highly specialized legal, technological, or scientific vocabulary, ten hits may be too much to ask from gbh. I think the current CFI is fine, but agree that requiring editors to bear this low burden of proving a word should be a strict process. bd2412 T 13:26, 24 February 2007 (UTC)

Nouns/Proper nouns

I was wondering if I could get a quick clarification on what exactly it is that defines a proper noun. I am specifically asking about nationalities. Certainly Greece is a proper noun. But, is Greek a proper noun? In what formats? I speak Greek. I am Greek. I am a Greek. Which of these is a proper noun and which is simply a noun (I suppose the second one is an adjective, actually). I even saw "proper adjective" as a heading under American. Thanks. Cerealkiller13 19:36, 19 January 2007 (UTC)

Broadly speaking, proper noun is a catch-all term for nouns that are always capitalized, though sometimes days of the week and months of the year aren't considered proper nouns. In your examples, we have a proper noun, a so-called "proper adjective", and a proper noun, respectively. It works differently in other languages, though; in French, for example, only the last of those would be a proper noun (and it's even less idiomatic in French than in English). Ruakh 20:34, 19 January 2007 (UTC)
In French, only the last case would be capitalized, but even this one would not be a proper noun. In French, a proper noun is a place name, a person name, a company name..., i.e. a noun referring to something specific, and that you should not use with an indefinite article (in common uses) because it would be meaningless. In French, there is no such concept as a proper adjective. Lmaltier 21:28, 19 January 2007 (UTC)
For some people it's quite simple: a noun starting with a capital letter is a proper noun. I have always thought that the concept proper noun was (primarily) based on semantical and (morpho)syntactical principles, and not on spelling conventions. Semantical arguments could be uniqueness and (morpho)syntactical ones could include combinatory restrictions or absence of plurals, as Lmaltier argued. If spelling would be decisive, Greek (the language) and Greek (the inhabitant) would both be proper nouns in English, grec (the language) and Grec (the inhabitant) would have different POSes in French, griego (the language) and griego (the inhabitant) would both be (common) nouns in Spanish. This seems fairly inconsistent, especially if we take into account that these words have the same meanings in these three languages and that there is a clear distinction between the language (uncountable) and the inhabitant (countable). Just like there is a difference between the two nouns in The Scotsman reads The Scotsman.
I agree, it's not simple: Picasso (the painter) is undoubtedly a proper noun, but what about Only the very rich can buy a Picasso.? A democrat is surely a (common) noun, but what about a Democrat (or democrat?) as a member of the American Democratic Party? In my mother tongue (Dutch) all names for adherents of political, philosophical or religious movements start with a lowercase letter, and I can only think of them as (common) nouns. Does it make a difference if you (must) write Christians, Socialists and so on? Just to consider: is the word I a special personal pronoun because of its capital letter? By the way, Wiktionarian is an example in Template talk:en-proper noun, but guess what POS it has got in its article (all these years). The most important is to agree on the POS to assign to noun classes like languages, nationalities, ... (if possible the same for all Wiktionary languages). 84.197.181.104 22:39, 19 January 2007 (UTC)
A proper noun is a word that refers to a specific person, place, or thing. Thus, Athens is a proper noun because it is a specific place. Aristotle is a proper noun because he is a specific person. Greek (in reference to the language) is a proper noun because it is a specific thing. In general, proper nouns are capitalized while (common) nouns are not, but this isn't always the case, such as Jack or Jill which can refer to specific people but are used generally as names. There is a gray-area when it comes to words like Greek (person) or Englishman, which are capitalized nouns not referring to specific persons but to a specific group of people. Additionally, in older books (a century of more old), it is not uncommon to find names of specific intangible ideas capitalized, such as "Honor", "Liberty", or "Socialism", but in modern English these are neither capitalized nor generally regarded as proper nouns. --EncycloPetey 00:42, 20 January 2007 (UTC)
The CGEL makes a distinction between proper noun (POS) and proper name (a type of noun phrase). "The main use of proper nouns is as head of an NP that serves as a proper name" (p. 328). "The central cases of proper names are expressions which have been conventionally adopted as the name of a particular entity — or, in the case of plurals like the Hebrides, a collection of entities" (p. 515). Their examples are:
  • names of particular persons or animals (Mary, Smith, Fido)
  • place of many kinds (Melbourne, Lake Michigan, the United States of America)
  • institutions (Harvard University, the Knesset)
  • historical events (the Second World War, the Plague)
  • names of days of the week, months of the year, recurrent festivals, public holidays
"Proper nouns are word-level units belonging to the category noun. Clinton and Zealand are proper nouns, but New Zealand is not. America is a proper noun, but The United States of America is not — and nor are The United States or United and States on their own." Proper nouns function as heads of proper names, but not all proper names have proper nouns as their head." (p. 516)
They don't use capitalization as a defining criterion, and they have no notion of "proper adjective". I am Greek: adjective. I am a Greek: common noun. I speak Greek: proper noun. The Scotsman reads The Scotsman: both common nouns (but The Scotsman is a proper name when referring to the newspaper). The museum bought a Picasso: proper noun (but the NP a Picasso is not a proper name). CapnPrep 00:53, 21 January 2007 (UTC)
Very clarifying text, CapnPrep. So, in the sentence We went to see Catch Me If You Can. the NP Catch Me If You Can is a proper name, though it doesn't contain a proper noun, not even a noun. I am more convinced now that capitalization is no criterion to define a proper noun. I saw that Bordeaux (the city and the drink) are both under the header Proper noun, but Cognac (the city) is under Proper noun whereas cognac (the drink) is under Noun. Even if capitalization is different for both drinks - in Dutch both drink names start with a lowercase letter - I can't understand their POS distinction.
I am also wondering if absence of articles is a valid criterion to define proper nouns, given examples as the Nile, the Thames, the Shetlands, the Knesset, the Kennedies and many others. 212.29.160.170 02:18, 21 January 2007 (UTC)
The article is added at the NP level, so for H&P this is a property of proper names, and they can go either way: "A proper name is inherently definite. This excludes the inclusion of an indefinite determiner, and makes the marking of definiteness unnecessary. We distinguish, then, between strong proper names like Kim or New York, where there is no determiner, and weak proper names like the Thames or the Bronx, where definiteness is redundantly marked […] Plural proper names are always weak." (p. 517) If a singular, non-pronominal NP has definite reference without any article, I think this is a good clue that it's a proper name (and if the NP consists of a single word, then it's likely to be a proper noun). CapnPrep 02:59, 21 January 2007 (UTC)

While this discussion has been far more interesting and educational than I really expected it to be (thanks to everyone who contributed), I must admit that I was more concerned with what a proper noun is specifically on Wiktionary, as opposed to in general. Certainly this is one of those questions which has scholars debating throughout the ages, and yet the fact remains that we have immediate formatting concerns. It seems to me that there should be some sort of policy set up, so that people know which words to put under which header. Certainly any decision which we could come to would be imperfect and not convey all the senses that a word could possibly have (for example, an excellent sentence I ripped off of Wikipedia: This is very much like the Paris of my youth), but I think an imperfect decision would still be better than none at all. I couldn't find any specific policy page anywhere about this, but I'm still rather new to the project, and so could well have missed it. If there's a Wiktionary old-timer who knows of a BP discussion on the issue or a policy page, please direct me to it. Otherwise, does anyone think it would be a good idea to come to some sort of consensus on the issue? Cerealkiller13 08:01, 23 January 2007 (UTC)

Proposal to allow an exception unto WT:CFI

For cambriphone (and possibly other -phone words, such as italiophone and the like). As explained in its RfD discussion, cambriphone seems like it would fail WT:RFV. However, cambriphone was a necessary tag to distinguish the two pronunciations for eisteddfod (the other tag was anglophone). As it is a necessary bit of Wiktionary jargon, I propose that the entry be allowed to exist, contrary unto the rules of WT:CFI. There is a precedent for this in the allowing of protologism. In the meantime, I ask that the entry for cambriphone not be deleted until the discussion concerning this proposal is concluded. † Raifʻhār Doremítzwr 13:57, 21 January 2007 (UTC)

You want CFI to be modified / waived to allow a word that you have said that you just invented? No. Robert Ullmann 15:40, 21 January 2007 (UTC)
Proposal is from a known disruptive troll — declined. SemperBlotto 16:26, 21 January 2007 (UTC)
Is this word in use in other dictionaries? That is, used similarly to the way you’re proposing, as opposed to being defined or what have you? Then it could be cited as such, even being in a dictionary, since it would constitute use in that way. Otherwise we’ll just have to employ a wordier explanation.
P.S.: The CFI should be careful to exclude example sentences in dictionaries, although they could be considered use as well, because they are contrived to support a definition when that causal relationship should be the other way around. A perfect example of this is the Idiot’s Guide quotation of “knobology”, contrived as an example sentence, which does not fit the observed pattern of use. DAVilla 20:05, 21 January 2007 (UTC)
Good point, it was in the spirit, but not explicitly mentioned. I've added it in. --Enginear 20:50, 21 January 2007 (UTC)
This word has been in use in my small circle of logophile friends for a few months. I imagined that it was rare, but not this rare. It’s a protologism it seems. I suppose we could, as you suggested, employ some circumlocution of some kind, or we could just leave it there. (The question is, in doing the latter, would most people be able to guess the meaning of cambriphone?) What do you suggest?
Thank you, DAVilla, for having the decency and logical integrity to discuss my proposal on its own merits, rather than damning it by its association with me. † Raifʻhār Doremítzwr 20:30, 21 January 2007 (UTC)

(DAVilla: it is best not to feed trolls; he said he invented it himself (now there is a small circle of friends; just like Eddie's friends that all used ex**rnt, and were sure it was a word ;-).

I never said anything to agree with him. I was only double-checking his understanding of when words are legitimate. Just because I'm not criticizing does not mean I'm feeding him. Just because I'm not criticizing does not mean I'm supporting him on this point. DAVilla 16:15, 22 January 2007 (UTC)

Everyone: you really ought to look at this edit for amusement. All of this user's edits are pushing completely non-standard usage and typography; they all need to be reviewed and corrected. A lot are non-standard plurals, like seneschaux [(French) sénéchaux] instead of seneschals the standard English plural of seneschal. Etc. etc. Robert Ullmann 21:31, 21 January 2007 (UTC))

Blocked this user 1 hour: removal of non-standard tag from campi. Note this is his second warning and second block. Robert Ullmann 22:47, 21 January 2007 (UTC)

(my favorite quote from hunting around to see how common the ill-educated use 'campi' was: "When I was in college, a fellow student was being interviewed by some news show. Everyone figured it would be some good publicity for our school until he said campi for the plural of campus (which is campuses). You could just hear some of the jaws dropping." ;-) Robert Ullmann 23:49, 21 January 2007 (UTC)

Wiktionary vs. Urban dictionary

A repeated complaint is that Wiktionary is in danger of turning into Urbandictionary.

Compare choda with [UD Choda].

From what I recall of UD a year ago, if anything UD is in danger of turning into us. -dmh 14:30, 21 January 2007 (UTC)

I suppose I should also point out that UD consists solely of slang and such, while Wiktionary contains hardly any. Take a random sample if you don't believe me. I hear the contrary assertion a lot, probably in part becaue we see a disproportionate amount on RfV and elsewhere, but repeating an assertion does not make it true. -dmh 15:07, 21 January 2007 (UTC)

The thing I've been "repeating a lot" lately is that UD a year ago is not UD of today. So, is this another irrelevant point, or do you have some further disruption you wish to base upon this? --Connel MacKenzie 15:36, 21 January 2007 (UTC)
"a repeated complaint ..." Straw man. A lot of the cruft that shows up in RfV is UD type stuff, but that is because almost all the new wikt entries do not need to be reviewed. A word appearing in UD is always suspect, but that just means suspect. Robert Ullmann 17:06, 21 January 2007 (UTC)
Right. That's exactly my point. Wiktionary is actually in no danger whatsoever of ending up like UD. Therefore, there is no need to deviate from "all words of all langauges" to solve this non-problem. Personally, when I'm looking at new entries, I completely ignore UD. It's a secondary source. -dmh 19:50, 21 January 2007 (UTC)
So I should run the bot to tag all entries that don't have three times as many citations as definitions, with {{rfv}}? While that is a admirable eventual goal, I think doing so now would be a little counter-productive. --Connel MacKenzie 19:58, 21 January 2007 (UTC)
No. I'm not sure what that has to do with anything. What you should do is:
  • By all means suggest corrections to CFI if you think the current definition of "word" as "something people use to convey meaning" is too broad.
  • Otherwise, not RfV anything with dozens of b.g.c hits or thousands of ghits without a specific reason (e.g., everything seems to be mentions, not uses), as these practically always meet CFI.
  • Not assume bad faith so easily on my part. I realize my sometimes sarcastic tone has not helped, but what I hear back from you about what you think I'm up to is often very, very far off the mark. When in doubt, refer back to the bit about "Wiktionary should have clear rules and apply them consistently."
  • Not take my criticisms of some of your positions so personally. Being a CheckUser isn't easy. It involves sorting through a whole lot of patent garbage. Though it may not appear so, I very much appreaciate the several kinds of grunt work you do when not arguing with me. I'm not criticizing your desire to keep garbage out. I'm arguing that your personal notion of garbage is — in some but by no means all cases — at odds with Wiktionary's stated mission. -dmh 22:11, 21 January 2007 (UTC)
dmh, I find your tone very objectionable. Who are you, a single member of a cooperative community, to tell the rest of us what we "should do"? You have persisted in wasting people's time in arguments to the point where those of your stated aims which have some merit are likely to be ignored, in accordance with one of our aims: don't feed trolls. Wiktionary as a whole, and RFV in particular, works on the basis of cooperation. We don't all do every type of task. Some, including Connel, tend to submit words to RFV without detailed checking. Others of us rarely submit words, but frequently research to find cites, which results in some of the words being found to comply with CFI. THIS IS A WIKI WORKING AS IT SHOULD. None of us work for you. We are volunteers working in a way which intended to benefit the aims of our community in addition to our personal aims. If you want to tell us what we should do, rather than cooperating with us and changing things only through consensus, then you are indeed a troll, and will be treated as such. --Enginear 13:46, 22 January 2007 (UTC)
I apologize for any offense. My comments were in reply to Connel and directed solely at Connel. Please re-read the above with that in mind, particularly "I very much appreciate the several kinds of grunt work you do." I am in no way asserting that anyone here is working for me. -dmh 14:13, 22 January 2007 (UTC)
So, I should be taking offense, then?  :-)   Enginear, please note that Dmh has been "around" Wiktionary longer than I have, with a couple periods of extended absence. Overall, he is pursuing his points in what he thinks is good faith, (and what I do not think is good faith.) But he certainly is a very well-established contributor in this community. I agree that the recent tone has diminished the arguments on both sides tremendously...if disparate "sides" even exist. My POV is that our current CFI is broken because it allows too much; Dmh's POV seems to be that CFI is broken because it excludes too much.
Dmh has recently raised several issues with CFI & RFV that have been controversial since inception. The tone I do not appreciate is the dismissal of concepts that have been battled over (sometimes bitterly) in the name of one or two odd examples.
As I final note, I don't know of any term I've ever submitted to RFV without that term failing some preliminary test. But that is beside the point; eventually we should have citations for every sense of every entry, right? --Connel MacKenzie 19:47, 22 January 2007 (UTC)

RfV doesn't look broken

I just had a look at RfV for frivolous challenges. There don't seem to be many, if any, recent ones, and the discussion is pretty civil and evenhanded (particularly compared to here :-). I note Connel supporting a challenged word as attested, and myself arguing that an attested word isn't idiomatic and questioning someone supporting a word without evidence. So much for the stereotypes.

I strongly dislike the tone of some of the discussions I've been involved in here. That includes my end, though lately I have made an effort to stick to the point and support my points with evidence. Obviously the previous discussion, including some of my own remarks, has poisoned this, leaving an impression of ill will where none is meant. I would like to try to climb down from most of what's been said and return to two basic points:

  • There is clearly dissatisfaction with CFI as it stands. I don't know how to fix it. I tend to think that CFI is fine as it is and perfectly in line with "all words of all languages", but if people are unhappy I'd like to find a way to deal with that so that we have clear guidelines that everyone's reasonably comfortable with. Marking and otherwise separating terms and senses is obviously part of the basis for a solution. There is a lot of work done to nail that down to practical rules. I would like to be part of that work. At the very least, we need to deal with context (e.g., "this is seldom (or almost exclusively) used in formal writing") and connotation (e.g., "this is considered substandard in most of the US" or "this term is liable to offend"). I'm agnostic as to whether the right solution is namespaces, categories, hidden text that only appears when specifically asked for, something else, or some combination.
  • There is a likely need to deal with spelling variations and typos/scannos. Common but not commonly accepted spellings should probably be handled as with other non-standard constructions (see above). Personally I'm undecided whether we should handle outright typos the same way, or try to exclude them. My first instinct is to exclude them (shall I repeat that? :-). I'm concerned, though, about a slippery slope towards eventually excluding legitimate variants. It also seems useful for someone having a bad day to be able to look up uot and see "likely a scanno for out or not", say "D'oh! Of course!" and have a somewhat better day.

Please consider my proposal on spelling to be included by reference here. Beyond that, I will not be responding further on any of the topics previous to this one. If you feel a need to make a further point, please leave a note on my talk page. I promise I will read it. I also promise only to respond if there seems to be something constructive to say. Please take a non-response as a polite "point taken". I'm tired of fighting, and I doubt I'm the only one. -dmh 16:57, 22 January 2007 (UTC)

Thank you. We have a project here which we all know is far from perfect. If we spend most of our time attempting to upset the imperfect bits (and I don't believe anyone in the conversations you are referring to has done so), we will succeed, and the project will fail. If we just spend most of our time fighting, the project will probably fail. If we spend most of our time cooperating, even when we are not fully happy with the solution, then we will probably be able to make it work.
I see two issues which need attention. Firstly, there are undoubtedly a few people who try to "test" the system, to see if they can make it fail. With the present ratio of admins to (lets call them) vandals, only a summary response is possible to much of the vandalism (and even if a more careful approach were practicable, I don't personally believe it would be a good use of effort). My guess is that the admins get this right over 90% of the time; certainly complaints about summary deletion run at <<10%. I contend that the "speedy delete" and "blocking" systems are not broken. It is a pity they are needed, but they do their job pretty well.
Then there is RFV. I have yet to see strong evidence that anyone could slide a cigarette paper, well a sheet of cardboard anyway, between your views, and mine, and Connel's. It's just that we express them differently. You say, to paraphrase: RFV's not broken, but there seem to be issues with CFI (which differ between editors, but include that you are instinctively opposed to including typos which CFI would accept), and...(we've yet to hear your views on how to deal with the way "inappropriate/wrong" [which I am purposely not defining] entries get propogated very quickly after they are listed in Wiktionary and thereby make us a less useful tool). The right solution might be namespaces, categories, hidden text that only appears when specifically asked for, something else, or some combination.
I say "RFV's not broken, but there seem to be issues with CFI (which differ between editors, but include that I simultaneously want somehow to be able to locate all words, including (clearly stated) misspellings and typos, providing they are citeable in use, and add separate cites [ie cites exactly as CFI is presently written] for each sense BUT also I want to have a simple format, to suit most users, and I don't want misspellings/typos and other "inappropriate/wrong" entries to be found by search engines and thereby make us a less useful tool). The best suggestion I have yet heard to solve this and several other problems is to move such entries to other namespaces, accessible via links from the main namespace, but where appropriate, set to discourage search engines."
Connel says "RFV's broken", and I wish he wouldn't because I feel it is misleading. What he goes on to say is (in my paraphrase, culled from a number of threads): There's room for almost everything, but some of it shouldn't be in the main namespace, but should be in other namespaces, accessible via links from the main namespace, but where appropriate, set to discourage search engines. In fact, consider the radical suggestion that the main namespace should include only those definitions of English words which are in common, standard, use worldwide, with all regional, and/or rare, and/or non-English, and/or jargon, and/or slang, and/or excessively vulgar, and/or archaic, uses appearing in a number of separate namespaces which could be included in searches or not by user preference, and some of which would not be visible to web searches. Each namespace would therefore have its own CFI. Many of the items currently deleted (ie wasted) consequent to the present RFV would in the future find an appropriate home in one of the namespaces, though very few would be in the main namespace. Therefore it can be said that RFV's broken. And for the present, where all definitions are found by web crawlers, CFI is broken, but not only have we not thought of any manageable way of tightening it, but even if we managed, a lot of entries which might in the future be usable would be wasted. (I suppose that's more acceptable than saying Wiktionary's broken, but...)
I have taken some liberties in the paraphrases, so I will not be offended at being rubbished if I have seriously misinterpreted you.
And by the way, I think we (at least most of us, but I now realise not dmh) have recently been interpreting CFI as requiring 3 cites/definition, rather than 3 cites/entry. That isn't actually what it says, but it works for me. That difference explains some of the disagreements above. It's a small point compared with the larger ones we're discussing, so I won't propose it for a vote. But if anyone else wants to, fine. --Enginear 20:19, 22 January 2007 (UTC)
"RFV is too lenient" then?  :-)   the arbitrary choice of three citations was chosen so that specific examples of the time, could pass...that is, three was seen as a decent compromise between verification and posing too inordinate a burden upon the sysops here who end up doing that verification. There is no reason the arbitrary limit is still at three when it could be at ten or fifteen or thirty or a hundred. Other references (i.e. OED) have much higher thresholds. Now that more sources are easily available, I do believe that the arbitrary limit three is too low.
"RFV is too lenient" also in that it allows for groups.google.com as citations. These are not proofread sources, but they are "durably archived." Including such items as valid citations of use (that usually haven't even been run through a spell-checker, let alone proofread or edited by an editor) tends to skew en.wiktionary toward nonsense, instead of a usable reference.
All of my proposed technical solutions are attempts at finding a balance. If a place for "bad words" can be found that doesn't dilute the main namespace, then we'll have less vandalism and fewer disputes. --Connel MacKenzie 20:42, 22 January 2007 (UTC)
Dipping my toe back in the water ...
I also agree with "3 cites per meaning". As in "you'll be able to find 3 cites for some particular meaning, thereby producing a valid entry, but I can't tell you which one in advance." A cite for meaning A and a cite for meaning B and a cite for meaning C won't do. I've rejected these before, myself ... e.g., wob didn't go in because of the various nonces people had cooked up for it (and gosh, there were a bunch). It went in when consistent cites for "IWW" member turned up (and now it's out again ... WTF?). It's all up there in "Rules of thumb ...", but you might have to parse it a bit carefully, in which case, apologies. To my knowledge it has very rarely, if ever, happened that a term had more than two total cites without having some particular meaning common to three of them.
Enginear: Without endorsing every single word, I think your paraphrase is reasonable. I hope any discrepancies will become clear in the sequel.
There is a major split in opinion regarding non-proofread sources. I have absolutely no problem accepting them as evidence that people use particular words. With certain clear exceptions, people don't just string random sequences of syllables together on blogs, chat rooms etc. They follow the same rules as everyone else (from a descriptive linguistics point of view, not from a primary-school grammar point of view). In the process they create useful new words, and they also display a number of regional and other socially-driven variants that are generally barred from print sources. They certainly don't pay close attention to spelling, and that's a potential issue, but there's a qualitative difference between spelling typos and variations in usage and grammar. A descriptive dictionary should try hard to record variations in usage and (to a lesser extent) grammar, pointing out which are considered non-standard, but not eliminating them entirely. This is particularly true in a community-run internet dictionary that purports to describe all words of all languages. One of our few inherent advantages is nimbleness.
As to crawlers. I've pushed before (and above, I think) for a "limbo" area, possibly a different namespace, for entries that aren't up to snuff yet. Collect the incoming flotsam and jetsam there, filtering out obvious garbage. RFC entries should go in the same bucket (in fact, incoming works in progress are RFC entries as far as I can tell). Personally I would think that a big honking disclaimer at the top would be good. It would read something like

"Wiktionary is an open dictionary. We receive many entries that do not yet meet our standards for a complete article. This is one. We also receive entries for words that do not meet our CFI [link] This may be one. We have not verified the information herein."

The second part would go away on RfVpass. It might also be good to leave a stub behind on RfVfail, i.e. "We checked but we don't think this is actually used" would be more informative than a blank result. Such templates can be applied quickly and could help shift the perception from "these guys will let anything in" to "these guys are on top of it". I could even see people starting to use this as "I don't know what this word means. Can you tell me?" — a much lighter weight request for a new entry. Handled well on our part, this could be a Good Thing. All IMHO. I don't think these are necessarily original ideas, or the best answers, and probably some aspects have been considered and rejected. But right now the important thing is to get suggestions on the table and talk about them.
Finally, as others have pointed out, it's important to handle dodgy entries calmly and clinically. There's absolutely no reason we can't document that some word means your favorite disgusting practice. We just do that and move on. I remember as an adolescent being excited that Webster's (I think) included fuck, but being profoundly disappointed by the entry itself. Someone browsing for potty words to get their jollies will quickly learn that UD is the better source. -dmh 23:09, 22 January 2007 (UTC)

{{inflected form of}} of more specific

Is {{inflected form of}} to be used for all verb and noun forms in all languages, or do we prefer more specific information. Personally, I find it a pity if for mis in Dutch, it would only say this is een inflected form of missen, rather than which inflected form (i.e. both imperative and first person present tense). For this purpose I created {{first person singular present tense and imperative of}}. Similarly, a whole lot of such templates could be made. Does anybody have a comment on this, or some better ideas? henne 14:44, 15 January 2007 (UTC)

Take a look at what was done for the various conjugated forms of Spanish hablar and nadar. The information was added by a bot. --EncycloPetey 16:15, 15 January 2007 (UTC)
Perhaps this should move to WT:GP; I agree that each form should be identified within the template, but don't have the patience to delve that deeply into the technical aspects today. --Connel MacKenzie 19:05, 15 January 2007 (UTC)
I have continued this at WT:GPSaltmarsh 10:32, 24 January 2007 (UTC)

RfV rules of thumb

Having done dozens of these:

  • If there are 10 web hits, there are probably not 3 legitimate cites.
  • If there are 100 web hits, there are probably 3 legitimate cites.
  • If there are 1000 web hits, there are almost certainly 3 legitimate cites.
  • If there are 10 b.g.c hits, there are probably 3 legitmate cites.
  • If there are 100 b.g.c hits, there are almost certainly 3 legitmate cites.

Based on that, I'd add another zero and set "clearly in widespread use" at 10,000 web hits or 1000 b.g.c hits. Of course "clearly" and "widespread use" are subjective but the more pertinent criterion is "highly likely to meet CFI". I believe these numbers satisfy that, but if anyone with similar experience would care to confirm or disconfirm, I'd be interested -dmh 14:27, 21 January 2007 (UTC)

I disagree with your numbers under certain circumstances, specifically when it comes to gamer terms and 'leet' speak. My experience with these terms suggest:
  • If there are less than 1000 web hits, there are probably not 3 legitimate cites.
  • If there are 10000 web hits, there might be 3 legitimate cites.
  • If there are 50000 web hits, there are probably 3 legitimate cites.
Most of the time, web hits for these terms are in gamer forums, blogs or usernames on websites.. which fail to meet the 'durably archived' criteria required to be cites on Wiktionary, and are often things which do not show usage in context, --Versageek 15:11, 21 January 2007 (UTC)
Absolutely unacceptable. We have had plenty of examples of >1,000 b.g.c. hits, that haven't had a single usable citation. Numeric indications can never be used in the manner Dmh suggests. --Connel MacKenzie 15:40, 21 January 2007 (UTC)
Similarly, the proposal misses the point that the hits may not be for the usage being defined. There have been many cases on RFV (perhaps 5% of submissions) where there are plenty of hits, but for a totally different definition (or definitions) to the one proposed in the entry.
It also fails to address properly the English usage of a word much more common in a foreign language, eg château.
More generally, it misses the point of how to address RFV-sense. There were 3500 b.g.c. hits and 28,300,000 Google hits for nope when I tried just now. What does that tell us about the verbal sense of nope or the rarer noun senses, or the geographical name Nope? Nothing much.
So while the relative absence of b.g.c. hits or g hits for a word suggests that it does not meet CFI, a plethora of hits merely indicates that it is worth checking further. Having said that, studying the first hundred hits, without necessarily following the links, is enough to "OK" a good proportion of meanings (though in the case of nope it took a trawl through several thousand to find the rarer senses). --Enginear 15:21, 21 January 2007 (UTC)
"rules of thumb" aren't useful. (oh, except that 0 usually means 0 ;-). Raw google and g.books are interesting, but mean very little in themselves. Only looking at the sort of hits you are getting will tell you anything. rel gets 55.9 million hits ... Robert Ullmann 16:33, 21 January 2007 (UTC)
I added rel. Thanks for the pointer! -dmh 23:59, 21 January 2007 (UTC)
That nonsense has been deleted. Thanks for the heads up. --Connel MacKenzie 00:22, 22 January 2007 (UTC)
OK, time out. How is a perfeclty ordinary abbreviation, which appears in other dictionaries (albeit I misfiled it under rel and not rel.), and which obviously meets our CFI, considered "nonsense" to be deleted on sight? -dmh 01:39, 22 January 2007 (UTC)
The question no one has asked is what this rule of thumb could be used for. I agree that counting numbers of hits (as noted above) is usually worthless for deciding whether of not an entry should be deleted. However, it can be used when trying to decide whether or not to nominate an entry for deletion. Low numbers of web hits suggest deletion should be investigated. Of course, the inverse (that high numbers of hits suggest it should not be investigated) is not a usable criterion. With high numbers of hits, one must pursue other criteria, in particular the nature of those hits must be looked into. They may all be blogs or gamer sites. They may all be pages in which the "hit" is for a corporate name or in which the "word" is part of a URL. But low numbers of hits can be used as an initial rule of thumb to investigate the possibility of deletion. --EncycloPetey 18:47, 21 January 2007 (UTC)
Another counterexample is fauxtography, which was quickly picked up in reference to a single news story, but couldn't be cited in the span of a year. Your rule of thumb is best used in combination with other factors, and only as a guide whether to pursue sourcing. DAVilla 19:52, 21 January 2007 (UTC)
I wasn't addressing rfv-sense, only rfv. Teasing out senses is harder than merely determining whether a word is used at all. And I also pointed out that in some cases there is clear extra information. For example, the word is common in another language. This happens, but not too often and it's generally obvious when it does. If fauxtography picked up only one hit, that would fall either under fewer than ten web hits or wouldn't fall under at leats 10 b.g.c hits. So I don't think any of the examples given above is particularly relevant.
Fauxtography gets 551,000 web hits. It doesn't get any Google book hits, but you didn't metion books when you said "If there are 1000 web hits, there are almost certainly 3 legitimate cites." DAVilla 05:40, 22 January 2007 (UTC)
Versageek raises a good point, however, in that one should check google groups (which are archived) and not necessarily web hits in general (which aren't, or aren't known to be). So please substitute google groups hits for web hits above.
Finally, Connel asserts plenty of cases with >1000 b.g.c hits that turned up nothing. Connel: If there are plenty, then surely you can list three. Please do so. -dmh 23:37, 21 January 2007 (UTC)
You wish me to do your homework for you, because you have been absent for about a year, and wish to disrupt every possible aspect of Wiktionary you can? Stop now, or you will be blocked. --Connel MacKenzie 00:22, 22 January 2007 (UTC)
No. My personal experience is that when there are even dozens of b.g.c hits, with a few exceptions like obvious scannos, it's very easy to pull out three valid cites. You have made an assertion that runs counter to that experience. I'm asking you to back that assertion up. Frankly, there have been several assertions you've made (e.g., frictive is a misspelling, dmh wants Wiktionary to be a free-for-all and this most recent that I want to disrupt every possible aspect of wiktionary I can) that just don't match the facts. The statement you made should be simple to verify. Give me three terms with >1000 b.g.c hits and fewer than three good cites (excluding scannos). I'll likely respond one of two ways: 1) Here are three good cites for each of those terms or 2) You're right. I hadn't thought of that possibility. Honestly I'm open to either one. Or you could even say "those cases are explained by something I haven't thought of" That would be fine, too, and I, at least, would learn something. This would all help us get to more complete and consistent rules that would pre-empt a lot of bickering, just like the inedpendent cites rule has pushed protologisms to LOP.
This is not abuse. This is asking you to back up your assertions in a debate. If you don't like the content or tone of this discussion, walk away. I'll understand. -dmh 01:06, 22 January 2007 (UTC)
dmh, six people have replied to you above, and you have ignored almost everything said. That is abuse. You say (elsewhere) that you appreciate Connel's time spent on his "grunt work", but above you ask him to waste his time justifying his statements in a discussion where five of us agree in principle with him, and none of us agree with you. That is abuse. Your statements mark you out as someone who has not cited significant numbers of words with multiple meanings, yet you write as if you are an expert. That is abuse. You are keen to tell others in the community what to do, but so far you have failed to follow your own advice: "Walk away". --Enginear 14:02, 22 January 2007 (UTC)
I don't believe I ignored any of what was said. Most of it was not germane to my point. I'm not talking about separate senses. Of course you can't say "there are many b.g.c hits for nope, therefore nope is a verb". You can, however, be quite sure that nope is a word that meets CFI. There will be three valid cites for at least one sense, not any particular sense. I had thought I'd made this clear at the outset, by referring to rfv and not rfv-sense, but if not I would think it would be clear from "I wasn't addressing rfv-sense, only rfv."
As for my digging out citations, see illiteracy, anarchy, Talk:hippopotomonstrosesquippedaliophobia, Talk:frictive, Talk:ingenuitive, the discussion of Santa Ana wind vs. Santana wind (and thanks again for the b.g.c work), Talk:scenarii and probably others for recent examples. If you like, I can go into why not all of those references are up to CFI standards. I haven't been around for several months, but apart from that hiatus I've been searching for uses and teasing out senses for years. In my experience, the best way to find out what's going on with a word is to look openly at how people use it.
I really don't see how it's a waste of Connel's time for him to justify his own statement. Frankly, his refusal, and any number of other similar cases, mark him clearly as someone who makes assertions, doesn't back them up, and refuses to change his mind in the face of contrary evidence. In this case, he's quite probably wrong again. Very likely any three examples he'd care to cite with >1000 b.g.c hits would meet CFI as we state it or have obvious extenuating circumstances, which is my claim. My understanding from him is that he's reluctant to cite any examples because he feels the words shouldn't be in Wiktionary despite CFI. He didn't put it quite that civilly, of course. His words were "I can only assume you'll continue with your bizarre justification of previously failed terms". As I've said: Fine. Either fix CFI or accept it as it is. -dmh 15:18, 22 January 2007 (UTC)

BTW, I don't see any evidence for nope as a verb, though it does sound plausible, maybe something like "I made a reasonable suggestion but he just noped me." Looking for forms like "to nope", "would nope", "noping " etc. in b.g.c, I see:

  • Scannos for "hope" or "rope"
  • A verb sense of [[nop] (computing, "no-operation").

-dmh 15:55, 22 January 2007 (UTC)

This discussion of nope is purely tangential, but I believe what Enginear is referring to is the recent RFV of that term; the rfv'd (and very hard to find) sense was "a bullfinch", not the verb sense. The verbal "to hit" sense was discovered by Enginear during the search for the bullfinch sense (once again, kudos to Enginear for the research!) Neither is contemporary or common, and both are overwhelmed by the more common adverb/noun negation senses; in addition, you might expect unusual constructions due fact that it's archaic/dialect. There is no verb "negation" sense at the article. You'd be hard pressed to pass either the noun-bullfinch or verb-hit senses based on a Google based rule of thumb alone. --Jeffqyzt 20:26, 23 January 2007 (UTC)

Misspellings

A while ago I promised a proposal for dealing with misspellings, which I hope could help screen out true typos while letting in legitimate alternates.

My working definition of a misspelling is "a variation in the spelling of a term due to accident or to ignorance of the commonly accepted spellings." This specifically excludes regional variants, particularly US/UK, and common alternations. Note that invented spellings such as those of e. e. cummings or James Joyce fail independence unless others pick them up, in which case they're legitimate (few if any are actually picked up). Unless, of course, we want to record them as appearing in a well-known work.

This won't work as it stands, since we can only guess at a writer's motives or state of knowledge. So instead, look for a pattern:

  • Another closely related spelling exists, with the same meaning (that is, you can substitute one for the other without changing the meaning of the sentence).
  • That other spelling is much more common in the same context.

The second rule is still vague. If we go this direction we'll have to come up with some general guidelines. Borderline would be resolved by discussion as with anything else.

"Closely related" is also a bit vague. In practice there are several common causes:

  • Dropping a letter.
  • Transposing letters.
  • Doubling a letter.
  • Replacing part of a word with a homophone (e.g., "ee" for "ea" or vice versa)
  • In the case of scannos, replacing a letter or sequence of letters with one of similar appearance (e.g., one of "n","h","m" or "r" with another, or "cl" for "d").

There may be others as well. We could probably codify a standard list of likely sources of typos/scannos/phonetic spellings.

I'm not claiming these rules are perfect, but they seem reasonable in a number of cases:

  • occurred is significantly more common than ocurred (even on the web, overwhelmingly so in print), and occurred can be substituted where ocurred appears with sensible results.
  • In US/UK cases like color/colour one or the other is overwhelmingly common in one or the other context.
  • In a case like scenarii one could either argue that scenarii is overwhelmingly common in particular technical communities, or fall back on the general rule of intent, or claim that they're simply misspellings.
  • In a case like frictive, the alternates fricative and fictive don't mean the same thing, so frictive clearly isn't a misspelling of either. The three are simply words that can plausibly be mistyped for each other.

I'm sure there are issues to be worked out, but I hope this is a start.

-dmh 00:44, 22 January 2007 (UTC)


I can't disagree more, about frictive. The likelihood that someone typed that in error, while intending either fictive or fricative is an order of magnitude greater than the likelihood of them intending frictive. The use is only likely in limited contexts (biology: circulation, and engineering: heat dissipation physics.) All other contexts, it is a misspelling, so we should mark it as such. --Connel MacKenzie 01:14, 22 January 2007 (UTC)

Wait, that's not the way we handle definitions for uncontested words, so it shouldn't be the way we handle them here either. Therefor and effect as a verb aren't listed as misspellings, although for the average Joe they are now most commonly such. DAVilla 05:47, 22 January 2007 (UTC)
Connel, did you look at the list of actual hits for frictive above? I tried the experiment, and it directly contradicts what you say here. By contrast, something like actaul behaves like a misspelling as I describe it. -dmh 14:37, 22 January 2007 (UTC)
Whoa, yes indeed, I did. What were you looking at? www.google.com or books.google.com? (I was looking only at the latter.) --Connel MacKenzie 06:32, 24 January 2007 (UTC)
DAVilla, this is all about Dmh's new proposal, to perhaps solidify how we should approach misspellings. I don't see anything in Dmh's suggestions that match current practices. So I was suggesting an additional concept to perhaps clarify what is meant by "misspelling." At least, in the case of frictive...perhaps many others, also. Basically: anything that appears in red in a spellchecker, yet appears in an unabridged (or topic-specific) dictionary somewhere. So, how should they be dealt with? Including them as "valid" entries with no indication of the common misspelling, does our readers a tremendous disservice. Not including them at all is too heretical a thing to say, for something that has passed RFV. So what compromise can be found? --Connel MacKenzie 06:32, 24 January 2007 (UTC)
We don't have a logical way of dealing with misspellings, because, as Dmh pointed out, we have never dealt with the topic, other than to dismissively say "we don't want entries for them." Certainly, the fact that certain words are much more likely to be misspellings or misuses, is of much more value to readers than an unrelated definition, right? --Connel MacKenzie 07:41, 22 January 2007 (UTC)
The principal principle should be for Wiktionary to be useful. I believe the only really useful solution is to list two definitions, ie the misspelling AND the correct one. Otherwise, people seeing the correct usage in authoritative texts will merely doubt the correctness of the misspelling tag.
I agree for the most part. It may be simplest just to let every misspelling in and tag it as such, using rules like the above or something better. However, statistically, there are relatively many rare typos and relatively few common misspellings. It would be nice to be able to list ocurred as a common misspelling without having to list occuured or occuredd or whatever, even if it turns out that these have appeared enough times to pass CFI. It would be nice to be able to use print sources for spellings, since they're more consistently proofread. E.g., "A less-common spelling will only be listed if it appears in print some number of times over some span." Unfortunately, we can only verify print usage on line via scanned texts, and by that measure "uot" would be attested as an alternate for both "out" and "not".
Frankly, I'm not sure this is really a problem. There may not be as many typos as we think, and in any case I doubt anyone will be going out of their way to enter them. It's more a matter of when a typo comes along, saying "Wherever this appears, a much more common spelling would work in the same sentence, and there doesn't seem to be any specific context where this spelling dominates; therefore, it's a misspelling." -dmh 14:37, 22 January 2007 (UTC)

Run-off vote on Translations - wiki links

Wiki links in translations have been discussed and a preliminary vote held. You may wish to vote - the choices being between the status quo and 2 options. —Saltmarsh 07:25, 24 January 2007 (UTC)

's

What happened to the rule of thumb I wrote here? It's been deleted, but there's still a reference to it on the page. I'm OK with it having been deleted if the rest of the page covers the same content, but the reference to it suggests it ought to be restored, either wholesale or in a revised format. — Paul G 21:11, 21 January 2007 (UTC)

It appears to have been deleted by Widsith on 18 August 2006. I don't know if it was moved elsewhere as the edit is uncommented, and the talk page does not exist. That would be a blunder in my opinion, whether the information belongs there or not. DAVilla 16:04, 22 January 2007 (UTC)
It doesn't seem to be in the archive page for August either. --EncycloPetey 16:33, 24 January 2007 (UTC)

Translation Sectioning

Since the BP's quieted down a bit as of late, I guess I'll throw this question to it. There was a bit of a conflict over the formatting of uncle, see Wiktionary:Requests for cleanup#uncle. A new style of formatting was then introduced in an effort to satisfy both parties. Is this a reaasonable way to format entries in which many other languages make distinctions that English doesn't? If not, why? If so, are there any other situations in which we would use a similar solution (I ask because, if this gets support, the next step is to write a template for it, in which case we'd need to know what sort of situations it'll be covering, in order to word it properly). Thanks for your input, and I apologize that simply reading the comment requires a bit of research, but there was no other precise and concise way to put it. Cerealkiller13 08:11, 23 January 2007 (UTC)

This seems a good way of solving the problem. May there be situations which become intractable, and which need some other solution? I don't know what this could be - Inuktitut (Inuit), apocryphally, has 40 words for snow. —Saltmarsh 07:19, 24 January 2007 (UTC)
Indeed. I think that there would need to be some rule of thumb, otherwise, every word would end up with eighty translation tables. In cases where the majority of languages make the same distinctions as English, it would be best to simply let the translation boxes correlate to the English definitions. But, where a sizeable number make distinctions, especially along similar lines, that's when this solution would be put into effect. Perhaps a possible rule of thumb would be, if a third of the languages currently listed make more distinctions (and on a similar line) than English, then we create separate boxes. When less do, we don't. But, it seems that, for lack of interest in the topic, uncle may well remain the only entry with such a procedure in place. Cerealkiller13 07:31, 24 January 2007 (UTC)
I strongly agree that for familials we should use this template. Wider use might be too hard to delineate, until people see it in use. --Connel MacKenzie 19:59, 25 January 2007 (UTC)

User:Barbara Shack

I'd like some direction here. Does everyone agree that all this users contribs should be reolled back/deleted? I had meant to keep an eye on these, after the initial dodgy edits, but there seems to be some significant damage done, now. I'll test out the VoA tool on this user if there is consensus. (That is, one click, to (undo) or [delete] all contribs.) --Connel MacKenzie 18:58, 24 January 2007 (UTC)

Feel free - I've had a look and it seems most of the contributions are born out of total ignorance. I think we may consider blocking too?--Williamsayers79 19:33, 24 January 2007 (UTC)
Not all. Some of her formatting is misguided though. --EncycloPetey 20:32, 24 January 2007 (UTC)
Sorry, can you give an example of a good edit, please? --Connel MacKenzie 19:55, 25 January 2007 (UTC)
You've changed the question. I had answered that not all of her contributions are "born out of total ignorance". I didn't say that her edits were good. --EncycloPetey 18:53, 26 January 2007 (UTC)
Certainly ALL of her contributions need to be looked at as they are mostly bad in one way or another. Some of them just need to be corrected or rewritten though. Not the best thing we could be doing with our time. SemperBlotto 11:41, 25 January 2007 (UTC)

Reconstructed languages

crossposted from Wiktionary_talk:Criteria_for_inclusion

I object in the strongest possible terms to the unilateral imposition of 'policy' on the part of User:Robert Ullmann. Refusing repeated invitations to constructively state his position on Wiktionary_talk:Reconstructed terms, and following a failed deletion request, he just made unilateral changes to Wiktionary:Reconstructed terms to suit his whim, without bothering to give any explanation beyond '1/2 rewrite', knowing perfectly well his changes would be controversial.

I am not seeking to impose any fixed opinion of mine, I am looking for intelligent debate among people aware of the issues involved. Robert Ullmann's suggestion has some merit, but it also has flaws, and as long as he just keeps imposing it without debate, there is no way of ironing them out. Robert Ullmann does important work on wiktionary. But he has very idiosyncratic views on etymology and langauge reconstruction, and no interest in, and consequently no knowledge on the matter. It is bad enough that he abuses his admin privileges to chastise me over alleged violation of CFI (which has still 'Semi-Official status'), but to insert such a "policy" into CFI after the fact, and after realizing that it had not in fact been there at the time he chose to chastise over it is simply wikityranny (making up your laws as you go along), indefensible under wikiquette, and unacceptable on any Wikimedia project. Let him either discuss the issue amicably, or step down from policing about it.

I do invite anyone interested in the topic to seek for a solution acceptable to everybody, but I will not put up with such bullying tactics. IDbachmann 10:48, 26 January 2007 (UTC)

Knowing, willful violation of result of policy vote. This user knows perfectly well that the change to CFI was discussed and voted on, and that it reflects both consensus as well as the status quo pro ante. His whinging that there was no debate is flatly false. (and note that the "change" was a clarification of existing policy: PIE has no ISO code, fails CFI)
His referral to a "failed deletion request" is insane: I was the one who cancelled it! The changes were implementing the result of the successful policy vote.
He is a disruptive troll, pretending to be a victim because we do not accede to his POV demands. He cares nothing about a "solution acceptable to everyone" (which we have, except for him): he wants his way regardless. He is precisely determined to "impose a fixed opinion of [his]".
Blocked for one week, for knowingly removing the result of a policy vote from WT:CFI 3rd warning, 2nd block. I believe he should simply be permanently banned. (note: I'm perfectly willing to review the block. Dbachmann: that does not mean I will discuss the block with you.) Robert Ullmann 11:37, 26 January 2007 (UTC)
It should be noted that with two policy votes on PIE/Proto- languages, spanning a month, properly announced on Beer Parlour, user Dbachmann did not vote in either one. He prefers to simply make demands, and whinge about not getting his way. Robert Ullmann 11:47, 26 January 2007 (UTC) (did vote in the first, missed it because it is obscured by more whinging)
I was not aware of that vote (which closed four days ago; it would have helped to point to the vote when changing CFI). I am of course prepared to respect consensus. RU has been kind enough to point me there after blocking me. It would have been even nicer to ask for my concerns, instead of having me lead a monologue on Wiktionary_talk:Reconstructed terms, which was supposed to be a policy debate (not vote). Regarding RU's characterisation of me as a troll, I suggest you review my recent contributions (my most recent entry being daṇḍa). I am a professional linguist, and I am rather irritated nobody thought it necessary to even listen to what I have to say. If RU's block goes unchallenged, I will conclude that wiktionary is indeed run by authoritarian cowboys with an attitude problem and no grasp of wikiquette, I will gladly step down leave the project to its fate, Wikimedia or no Wikimedia. 130.60.142.152 15:50, 26 January 2007 (UTC)
I strongly agree with the block. No verification of Dbachmann's credentials are possible, nor plausible. The tactics I have witnessed mirror closely en.wiktionary.org's most infamous copyvio vandal. If the unrequested addition of policy or policy-like pages continues on his return, particularly in opposition to other vote results, I would support extending the block in gradually increasing increments. --Connel MacKenzie 19:44, 26 January 2007 (UTC)
Addendum: I have requested comments from this Wikipedian's talk page on Wikipedia. Something is horribly amiss here. --Connel MacKenzie 20:14, 26 January 2007 (UTC)

Judging by this exchange, Wiktionary is managed like a petty fiefdom, eh? Admins /sysops calling experts in the field "disruptive trolls" and handing out blocks & spurning linguistic and etymological accuracy for the sake of being able to impose their will on their scant sphere of influence at Wikimedia. The arbitrarity and patent lack of critical thinking concerning what constitutes copyvio here is embarassing. I'm glad en.wikipedia has evolved past the stage of letting petty tin-horn dictators bully and ridicule those whom they disagree with. - WeniWidiWiki 20:39, 26 January 2007 (UTC)

(edit conflict) I've looked at the talk pages of User:Robert Ullmann, User:Dbachmann, Wiktionary talk:About Proto-Indo-European, Wiktionary:Reconstructed terms, WT:CFI, and the various edit histories. In addition, I've looked at the contribution history of User:Dbachmann. In User:Dbachmann's favor, he has a long list of seemingly good-faith entries and edits, and has apparently been working in good faith to create PIE entry standards (even if his views are controversial.) Also, he has on occasion participated in exchanges (via user talk pages and article talk pages) in which he has apparently adapted his methodology to meet community standards. Working against, he has had previous instances where he failed to change behaviour after being warned on his Talk page. It seems as if User:Dbachmann has had difficultly understanding where decisions affecting his contributions are being made, and has also made the common mistake of assuming that Wikipedia policy is the same as Wiktionary policy.
User:Dbachmann's assertion that he was unaware of the 2nd policy vote regarding PIE being formally assigned to Appendix status seems plausible. In fact, it is currently not obvious that various policy votes are ongoing (which is something we should change), unless one happens to peruse the Beer Parlour on a regular basis, and even then a small notice (as this one was) can be easily overlooked in the vast stream of other postings. There was no notice made at the relevant policy page itself (by which I mean Wiktionary:Reconstructed terms, which pre-existed this user.) --Jeffqyzt 20:45, 26 January 2007 (UTC) Also, in both of the previous disputes, User:Robert Ullmann was the administrator that performed the blocking, and apparently some bad blood has arisen.
It seems as if User:Dbachmann's reversion of the policy page was quite possibly in good faith, although rash; it would have advisable to ask questions prior. If User:Robert Ullmann considered the reversion of the reconstructed terms page to be vandalism, it would perhaps have been prudent for User:Robert Ullmann to request another administrator to perform the block, considering the history between these users. In addition, page reversion per se is not anti-policy even though DBachmann's earlier re-adding of data was. However, given the history of User:Dbachmann ignoring warnings on his talk page, it seems not unlikely that a block would have been issued in any case. It seems that an unfortunate lack of communication has led to a bad outcome all round. I would not say that a permanent ban is warranted. As far as the block period goes, while this is technically the second block for this user, and per the WT:BLOCK policy for blocks for behavior which is counter to policy, productivity or community 7 days is the short end of the spectrum, the user's first block seems to fall under the "only if other communications fail" short-term block category, and perhaps shouldn't be considered for "sentencing." --Jeffqyzt 20:41, 26 January 2007 (UTC)
I am pleased to see this discussed; and any sysop is welcome to change the block. (If you are wondering why a previously unknown user showed up in the middle, see w:Wikipedia:Village pump (miscellaneous)#trans-wiki diplomacy.) My problem with Dbachmann isn't his position on issues, and isn't the issues under dispute: it is his whinging everywhere he can find that he is being victimized. Connel started a vote to eradicate PIE; I tried to find a middle, consensus, ground. Which we have. But this user is still playing victim. Go read the first 'graph of his own page on the 'pedia: "... it will become more and more important that disruption is unceremoniously dealt with (i.e. that offenders are blocked quickly), for the protection of the sane and fruitful editing process." Robert Ullmann 21:12, 26 January 2007 (UTC)
I completely disagree with the notion that this should be "unceremoniously dealt with." This is a recurring problem that only better inter-project communication can address. Simply blocking Wikipedia admins because they think of this as Wikipedia may help in the short-term, but is absolutely counter-productive. --Connel MacKenzie 21:30, 26 January 2007 (UTC)
I concur, it should not be "unceremoniously dealt with"; that is a quote of how Dbachmann himself feels that disruptions should be dealt with by sysops. I was suggesting that his whinging should be viewed in light of his own statement. Robert Ullmann 21:39, 26 January 2007 (UTC)

Block removed (one day) Robert Ullmann 07:55, 27 January 2007 (UTC)

New vote: WT:VOTE#Idiom translations on English entry

Another vote has started, for (hopefully a minor) clarification. --Connel MacKenzie 05:31, 27 January 2007 (UTC)

capiche and {{alternative spelling of}}

Capiche

The English Wiktionary is not big on policies. Maintaining flexibility has been the primary reason for opposing mass-policy writing.

Currently, Wiktionary:Redirects has a recommendation for never replacing content with a redirect. I interpret this to mean a hard redirect (#REDIRECT [[]]) or a soft redirect (misspelling, alternative spelling, form of, etc.)

This nascent policy has been a standing practice for a very long time and very well established. Earlier this evening, I found myself in a heated debate. I instinctively know the spelling "capiche" to be the English language spelling, yet numeric google hit counts were thrown at me as "evidence" that the Italian spelling "capisce" is favored.

Setting aside that single specific dubious claim for a moment, what should be done in such a circumstance? Obviously, the English normalized spelling came from the borrowed term, not directly from the Italian. Certainly, the different spellings (numerous variants are listed) have different connotations, right? Some may be limited to certain regions, while others may have different shades of meaning. The point of dispute seems to be the definition of what is a word.

So I'd like community clarification: it is never OK to remove valid content, and replace it with a soft redirect, right? The entry at capiche currently does not even contain a definition gloss, nor the citations it obviously should have, nor pronunciation(!), nor links to the eight or nine alternate spellings.

--Connel MacKenzie 08:14, 8 January 2007 (UTC)

The content on capiche should never have been removed, if I'd seen it I would have reverted and blocked the offender on grounds of vandalism. I think it would be OK to add the {{alternative spelling of}} line in but useful info has been removed. This edit is yet another example of people barging into Wiktionary and promoting their own agendas for English langauge, this is becoming more commonplace e.g. scenarii, anarchy - both examples where contributers try every possible ploy to instate their own view over others.--Williamsayers79 10:45, 8 January 2007 (UTC)
The "offender" did merge the material from two sources though, so on the whole it was a lateral change if not positive. It was certainly well-intentioned, and by no means vandalism. DAVilla 14:25, 8 January 2007 (UTC)
While in the "heated discussion," I too, used the "v-word." True, it is very clearly a good-faith edit, but POV (unwittingly.) For that reason, I think it would be helpful to have policy clarification for soft redirects. --Connel MacKenzie 16:15, 8 January 2007 (UTC)
Williamsayers79, yes, I agree. That is why I think it is important that we close some of the existing loopholes in our policy scheme. --Connel MacKenzie 16:20, 8 January 2007 (UTC)
I read redirect as a "hard redirect", not a soft one. A change like this is probably permissible if it's handled correctly, although it isn't always preferable and should probably be discouraged on those grounds. In this case, the variant was probably thought to be inferior but, as you claim, turns out to be quite common. Some of the others are undoubtedly a bit wacky, and probably don't deserve anything more than an alternative spelling... or misspelling if they're even verifiable. But we don't yet have objective criteria for that. DAVilla 14:24, 8 January 2007 (UTC)
I think you mean, "we don't have any distinct criteria for those." We certainly do have our existing CFI which does not narrow it down (yet) for this type of case.
Your interpretation of it applying only to hard redirects, implies that enough ambiguity exists, for us to need policy clarification on the topic. --Connel MacKenzie 16:15, 8 January 2007 (UTC)
In the last year, my stance has shifted. I now believe that the rule should be the de facto one "never replace content with a hard or soft redirect, unless, exceptionally, the community consensus is that the original entry was inappropriate." The mechanism for the exception is already in place and has very occasionally been used in that way without anyone really noticing: the original entry is RFDed; on I think one or two cases in the last year, the decision has been to replace with a soft redirect (with or without moving the content to a more appropriate location). --Enginear 21:34, 8 January 2007 (UTC)

Just to return to the content at capisce for a moment:

  • raw google hits? Sure, capisce has a lot more. But limit the language to English (e.g. exclude Italian!), and the difference is in the noise range. (duh). (where was this heated debate?) Robert Ullmann 16:28, 8 January 2007 (UTC)
    • On IRC; it started out candid but became a little too excited to repost. If I've missed any relevant points, I'm sure they'll be added to this conversation (but I don't think I did.) --Connel MacKenzie 20:13, 8 January 2007 (UTC)
  • and "capeesh" has 20x more than either. This is the spelling used by Richard Condon (Prizzi's Glory) and Robert Parker. Robert Ullmann 17:57, 8 January 2007 (UTC)
  • the Bound citation is not evidence of the spelling, only the spoken use of ~capeesh. The script says: "You understand?". And the English subtitle says "[Italian]" (;-) Robert Ullmann 18:13, 8 January 2007 (UTC)
This may be getting off topic for BP, but would you suggest that a phonetic transcription should appear in that quote in place of the word? Dfeuer 01:52, 9 January 2007 (UTC)

I recall seeing "kapeesh?" or "capeesh?" various places as a kid, never capiche or capisce (dictionary.com doesn't list "kapeesh"). Then I went to Italy and heard it in regular use and thought "Cute! They've adopted another bit of American slang, just like 'OK'." Then I talked to an Italian and found out that they were just speaking Italian. The typical exchange was

Capisci? (Do you understand?)
Capisco. (I understand) or I think more common, at least where I was in Lombardy, Capito. (Understood).

Note the spelling. "Capisci?" is what we generally mean by "Capeesh?". "Capisce" is third person, but I believe it can also be used as second person formal, so "(Lei) capisce?" would mean "Do you understand (sir or madam)?". I forget whether people actually tended to say "Capisci?" or "Tu capisci?", likewise for "Capisco." vs. "Io capisco.", but I'm pretty sure "Capito" was used by itself. I would bet that at least some dialects of Italian drop the "Tu" and "Io" in cases like this (but see below). I would not necessarily claim "capisce" as a "correct" spelling of "capeesh" or whatever, since it doesn't mean the same thing. I'm also unsure where the Frenchified "capiche" comes from. As a general rule borrowings tend to acquire the inflections and often the spellings of the borrowing language (and inevitably people who know the origin claim that this is "incorrect"). "Capeesh" isn't correct Italian, but it's perfectly good (informal) English. As to frequency of the various forms, Google hits are probably a good rough indicator, as long as we keep can filter out actual Italian. With that in mind:

  • kapeesh gets about 23K hits
  • capeesh gets about 20K hits
  • capiche gets about 120K hits. Interestingly, many of these seem to be proper names. Only 2 of the first 100 hits were French, and one of those was clearly referring to a proper name. Google thinks 900 or so of the total hits are French. On the one hand, if we filter out cases where "capiche" is not used as a verb, we're probably closer to the first two spellings. On the other hand, it's probably popular as a preoper name because people take it to be the verb. E.g., a screen-grabber utility named "capiche" would play on the connotation of "got it?". But why this spelling? As far as I can tell it's not a French word, at least not a widely-used French verb, and if it were Italian, "capiche" would be pronounced "ca PEE keh".
  • capisce gets about 4.5M hits. Of these, Google thinks about 150K are in English (intrestingly, these seem to be heavily overrepresented at the top of the list), and it's the same story as "capiche": lots of proper names. Google thinks that 1.5M are in Italian. A quick survey indicates that it is generally used in third person, not second ("he understands", not "Do you understand"). So while "capisce" could conceivably be used to mean "do you understand" in Italian, I'm not finding it. Where are the rest of the hits? Not sure. Perhaps unclassified? At this point it's anybody's guess whethere they're mostly English, mostly Italian, or some other language entirely.
  • capisci gets about 1.5M hits, 34K english, 1.2M Italian. The Italian ones include the occasional "Mi capisci?" (do you understand me), but in a very quick check I didn't run across a lone "Capisci?". It's probably there, but not common. "Capisci X?" (Do you understand X) is pretty common.

My guess is that "capisci" is rare in English, and that a large portion of the missing "capisce" hits are English but not tagged as such. Even without any untagged hits, "capisce" is clearly the most common English spelling, even though it doesn't match Italian. So ... the main entry should be under "capisce" (along with an Italian entry, and a note explaining that English "Capisce?" is more like Italian "Capisci?" than "capisce". The others, as far as I'm concerned, can soft redirect to it, as long as there aren't any separate senses unique to those spellings. Personally I'm not against soft redirects, or in some cases even Wikipedia style disambiguation pages, but if the community has decided otherwise in my absence, I'm fine with that. -dmh 19:56, 11 January 2007 (UTC)

I've just seen that the Italian verb capire hasn't got a conjugation table - I shall add it (and the words) tomorrow. SemperBlotto 20:01, 11 January 2007 (UTC)
For what it's worth, the use of capiche in Burro Genius seems to suggest that the author of that book believed it to be a Mexican Spanish or Spanish-American word. Dfeuer 22:27, 11 January 2007 (UTC)
Two cents, maybe not even that: The only mouths this Italian-American ever used to hear that word come from were Italian-Americans, then urbanites, then a few others when it showed up on television (Bruce Willis used it on the television show "Moonlighting" in the late 1980s) and in the movies (Mafia movies, of course). I assume it progressed something like the way many Yiddish words did into American English (with a smile and the aroma of ethnic cooking in the mental atmosphere), then perhaps into English elsewhere. I always heard it pronounced "Cabeesh", which I assume is a regional pronunciation in some part or parts of southern Italy. Perhaps the "Do you understand?" meaning is also of some Italian regional origin and just spread like kudzu throughout Italian-American neighborhoods and then beyond (the way final Italian syllables get dropped). I have never, ever heard the word without a question mark after it, in fact, it's always been an interjection with a question mark after it and always, always in America means "Do you understand?" I've also never heard it used without a touch of humor and perhaps a hint of aggression (maybe picked up from the Mafia-movie transmission). Noroton 21:44, 28 January 2007 (UTC)

Duplication versus see instead

I think our discussion on IRC was a little more general, so I'll split off a new discussion. There are many terms with multiple spellings and variants; for example, capiche is also spelled capeesh, capice, capisce, capisci, capish, coppish, kabish, and kapish (quickie references: WordWizard.com, Poets.org, Urbandictionary, and Yahoo! Answers). The definitions, translations, alternative spellings, synonyms, antonyms, et cetera are identical. The etymologies are usually identical, and a few sections may be different.

The debate centered on how to process such terms, and we discussed two very different methods:

  • Duplication: Duplicate all relevant information on all pages. The definitions, etymology, references, translations, alternative spellings, synonyms, antonyms, et cetera should be on every variant page.
    Benefits:
    • Reduces load time by providing all the information immediately, without requiring that users click a link.
    • Does not favour any particular variant; pointing all variants to a single one suggests that it is the right spelling, rather than one of many alternate spellings.
    Disadvantages:
    • Greatly increases the difficulty of editing entries, and discourages edits. A spelling correction to capiche would need to be repeated on capeesh, capice, capisce, capisci, capish, coppish, kabish, and kapish.
    • Adds additional workload to keep all variants up to date; many users may edit a single variant without bothering to update every variant.
    • Fractures the ability to provide comprehensive information. A user visiting capiche will not see the corrected definition added to capish, the expanded etymology on capeesh, or the synonyms on capisce.
  • See instead: Move all duplicated information to a single variant, preferably the most common (since that will be the most visited), and link to it with a template such as alternative spelling of, misspelling of, or plural of. Any information specific to a particular variant can be added alongside the see instead link:
    ==English==
    ===Etymology===
    etymology specific to this variant.

    ===Noun===
    * {{alternative spelling of|blah}}

    ===References===
    references specific to this variant.
    Benefits:
    • Very easy to create and update pages for variants, and allows for divergent information.
    • Every reader is given the best and total information available.
    • Editor workload is minimized by reducing the number of pages to edit for every change, which increases productivity and spare time.
    • 'See instead' is the traditional method used by dictionaries; the American Heritage Dictionary provides two definitions for ear shell, one of which is "See abalone" and the other of which is specific to that word.
    Disadvantages:
    • Implies that the variant with the information is right, rather than one of many alternatives.

I tried to present all the arguments impartially above, but I'm sure I'll be corrected if not. I prefer the 'see instead' method, for which several templates already exist and are available from the edit tools. —{admin} Pathoschild 21:07, 8 January 2007 (UTC)

I think Pathoschild summed things up pretty well. There is at least one other alternative, which has its own advantages and disadvantages: Create templates for common portions of similar entries.
Benefits:
  • Very easy to add a new variant spelling.
  • Easy to make a change that applies to all variants, keeping articles in synch.
  • No multi-click problems.
Detriments:
  • Difficult to set up.
  • Confusing for new editors.
  • Tricky to split up the parts when they need to diverge.
I am very much opposed to trying to maintain completely separate entries for each spelling, although Connell may be able to support separating capisce from (most of) the other spellings, and making capeesh the primary spelling among those variants. -- the "vandal", Dfeuer 22:19, 8 January 2007 (UTC)


Firstly, IF we wanted to do it, it would be possible to link in a NPOV fashion. In the above example, capiche and ALL the others, would have a soft redirect: "See capeesh / capice / capiche / capisce / capisci / capish / coppish / kabish / kapish". OK, that's alphabetist, but in other respects its NPOV. But all that happens is that everyone then needs an extra click to see their information. Everyone suffers the same imperfect service.
So why not a hard redirect (which I used to think was the answer, although that was a minority viewpoint)?: because a "full" entry will always have some items specific to that spelling, certainly cites, possibly differing etymologies, in many cases homographs in different languages.
It is therefore clear that, if a soft redirect is used, a user will have to look in two places to get all the info on a word -- not a problem for deep research, but a pain if only looking for a simple definition. It makes the dictionary less accessible to those only dabbling in it, and I suspect most of us want to encourage that market as well as more esoteric uses.
BUT, wiki is not paper. Why make life awkward for users when we can give them all the info they want in one place? I know of three reasons:
  • To improve data integrity, ie to help reduce erors due to unnecessary duplication. The suggestion of using templates might help achieve this in the short term while still making the entries appear separate to the user (ie reader). I believe the medium term answer may be to use cleverer software, rather than squidgyware or whatever the term for us editors is. It feels as though some tool, working rather like the one that allows "diffs" to be viewed, could compare versions and (semi-)automate the editing of multiple copies, or indeed store only a single copy if there were sections long enough to make that worthwhile (which I increasingly doubt).
  • To make less work for editors -- very important, but potentially ameliorated as above.
  • Because the amount of information in one place may overwhelm casual users. This is an important issue, but peripheral to the present thread (for recent thoughts touching on it, see here; there are also earlier threads about Multi-level Wikt, user choice to hide etymologies, cites, translations, etc, which I will leave it to someone else to link). For the present purposes, I suggest that a front end which splits the information about a word which a particular user wants to see is a Bad Thing. It may sometimes be necessary (eg see above link) but we should not do it if there is an alternative practical solution.
In my opinion, in view of the ameliorations available, these reasons do not represent adequate excuse for the reduction in ease of use created by having the information for a particular spelling split between a spelling-specific page and another reached via a redirect. Para inserted 13:41, 9 January 2007 (UTC) to clarify my viewpoint --Enginear
There is however an alternative way of applying the soft redirect to a composite entry, which needs at least brief consideration: rather than the composite containing only the information common to the heterographs, it could be used to contain all the information for all the spellings in one place, which might be useful for in-depth research -- for example, the differences between color and colour [there goes that alphabetism again] are as interesting (to me at least) as the similarities, but are difficult to see with our current separate entries. However, I suspect only a small proportion of users would want this (and they could potentially be served by the same modified diff engine mentioned earlier). For other users it would be unwanted clutter. (I also have a sneaking suspicion that if we were too lenient in allowing entries for typos, we would end up with a single composite entry trying to define all short English words...well perhaps that's an exaggeration, but I think it would become unworkable.)
Sidenote: the MW software allows you to 'diff' different articles, by revision number. color/colour: diff --Connel MacKenzie 04:40, 9 January 2007 (UTC)
That's great! I can see a way of doing it by first going to one word, using the History tab, clicking on the latest entry, and copying the "oldid" no from the internet page name, then going to the other word, clicking the History tab, selecting a diff between any old version and the latest version, then pasting the number for the other word's current revision in place of the old version, and hit return; or is there an easier way? --Enginear 13:41, 9 January 2007 (UTC)
I haven't found an easier way to do it, no. I suppose you could ask w:User talk:Lupin for a feature within popups that opens a dialog to ask for the entry to compare against? I know popups has a way of finding that last revision (obviously) so it can't be too impossible. --Connel MacKenzie 13:05, 16 January 2007 (UTC)
Erm, I just wrote diff.js which does it. It is linked in as the last item (currently) on WT:PREFS. It is not the sort of thing to leave turned on all the time though. Ugly as sin. And it steals focus sequence, and for long titles, wraps to next heading line stealing precious vertical pixels. But it works, for firing off a diff between color and colour. (In Firefox, anyhow...haven't tested anything else yet.)
On second thought, I shall rewrite it to trigger off the "{{see}}" thing's output instead. I hope that is wrapped in a named class... --Connel MacKenzie 15:57, 16 January 2007 (UTC)
OK, the bottom of WT:PREFS you can turn off the "(diff)" links that are now added to the {{see}} items. If the "other" page has newer revisions, is seems to choose the revision closest to it, instead of the latest. But hopefully with such a tool, revisions will naturally become more synchronized anyhow. A null edit should be able to reset them though...worst case a minor addition of a space to the older article will invariably do the trick. --Connel MacKenzie 19:20, 16 January 2007 (UTC)
Brilliant. Have to go out tonight. Will try it tomorrow --Enginear 20:21, 16 January 2007 (UTC)
So on balance, in the last year I have changed my mind on this issue, and now support a long term goal of having substantial entries for all heterographs. Meanwhile, if someone wants to add a word with a full entry for one spelling only, and they or another decide to add soft redirects to it from other spellings, they should not be discouraged. But nor should anyone be discouraged from adding full entries for new heterographs, or changing existing redirects into full entries. --Enginear 23:33, 8 January 2007 (UTC)
So in the short/medium term, if someone wants to make a change that applies to all the spellings, they should change all the articles? Or Alice should change one, Bob should change another, Charlie should change a third, and then Derrick, Erwin, Frank, Gwen, and Hilda should argue on all the various discussion pages (some on one, some on another, some on several) about the problems they have with the entries, and then move things from one to the other to the other? To me, that sounds like a maintenance nightmare and a huge waste of time. Dfeuer 01:50, 9 January 2007 (UTC)

Connel's response

  • Well, reading the inaccuracies here, I am reminded precisely why the IRC conversation was so heated.
  1. It is completely untrue that different spellings even can share etymologies. They are different words representing the same concept, perhaps even the same pronunciation.
  2. Wiktionary.org exists for readers not editors. You don't want to update 7 entries? DON'T. It's a wiki - someone else (or their semi-automated bot) will.
  3. The different spellings almost invariably represent different regional spelling normalizations.
  4. The different spellings usually carry different connotations/shades of meaning.
  5. To anyone even remotely linguistically curious, the differences between words (particularly as they are only starting to diverge) is of enormously greater importance, than the editor's "maintenance" concerns.
    • Tell me, should ain't and is not share the same page?
  6. Huge waste of time? Writing a free dictionary is a huge waste of time. Why stop short of being accurate?
  7. It is a flat-out lie, that "alternative spelling of blah" saves a click for the reader. During slow times, when a single page load takes 10 seconds to two minutes, that is a devastating difference.
  8. With the "alternative spelling of blah" method, a reader is almost guaranteed to have to click through their first page hit, particularly when many spellings exist.
  9. Entries are violently stiffled from growing as they should, to represent the divergent connotations. A change to any one entry's definition affects (mistakenly) all spellings.
  10. The color/colour "shared section" method is difficult for most new editors to grok, and makes even the most seasoned veteran pause. The optimal solution is to put the redunant data where it belongs - in the entries of the redundant spellings, allowing them to diverge just as they do in natural language.

In any dictionary, nothing is more distinctive than the spelling of a word. This is of astronomically greater importance to a multilingual dictionary. Disparaging one spelling in deference to another spelling is always non-NPOV. Accurately describing the relationship between two spellings must always be preferred. --Connel MacKenzie 04:40, 9 January 2007 (UTC)

Bit of a note: Some of what you say about only holds true for synchronic examination of language, but we are also interested here in diachronic examination -- the look at language through many times, rather than a single one. Some spelling differences are temporal, and have nothing to do with geographic regionalism. Would you consider the various spellings of Shakespeare's name (used in his own signatures) to carry different meanings? At an earlier time, before the widespread use of printed dictionaries and formal education, spelling was a bit more fluid. We therefore have to incorporate a bit of temperance in declaring all recorded spellings independent. --EncycloPetey 05:10, 9 January 2007 (UTC)
The primary arguments seem to be that there could be different information on the pages, and on the other hand that there isn't always. Why not just say that replacing information with soft redirects is disallowed when any of the content conflicts? (That would include cases where a user is uncertain that certain information applies to all variants.)
The consolidation that originally raised the issue was a beneficial change in my view. A click is one thing, but I certainly wouldn't want to have to search around to get all the bits and pieces of information about a single word. In fact I probably wouldn't have even thought to, instead considering the entry to be as yet incomplete. With the soft redirect pointing to the merged content, I don't have to look around, or wonder if I should, because I know it's all in one place.
At this point if you want to add information to one of the variants, your workload is reduced because you don't have to piece things together. Just override the soft redirect and add the region or etymology or whatever else is conflicting information. That too is a positive change in my view. The only problem is where to point the other redirects if they exist, and regardless of your views, I don't really know if that can be resolved without saying "no soft redirects". DAVilla 06:13, 9 January 2007 (UTC)
Part of the "heated argument" was when I suggested restoring capiche as the primary form (with cites, of course) and was told in no uncertain terms not to. In my view, restoring capiche as the primary form is simply undoing "vandalism" :-) as a chunk of valid content had gone missing. It was not "vandalism" - instead it was two sysops conferring to reach what they thought was correct, at the expense of valid content. If it is that easy for two smart guys to make a major mistake even after deliberation, I can't imagine how many times I've erred when shooting from the hip. Did Shakespeare use different spellings to convey different moods, or vary spelling depending on who his target audience was?
By the way, to add citations to capiche is more than slightly odd at this point, with that entry not even containing a definition (instead only the soft redirect.) Should I even bother? --Connel MacKenzie 07:26, 9 January 2007 (UTC)
I agree with Encyclopetey re your points 1 & 3, but otherwise I agree with you, in particular that the extra click required for "Alternative spelling of..." or "See..." can be a pain. Re Shakspear/Shakspeer/Shakespeare/etc, of course he did, and so have other good authors since. Re capiche I don't understand why, under our present guidelines, you can't return it to being a full entry. --Enginear 19:40, 9 January 2007 (UTC)
DAVilla's solution (soft redirect, but duplicate if there are differences) sounds good to me. Connel MacKenzie pointed that the etymologies might differ, but the etymologies for the capiche variants are identical at the current level of detail: they're all borrowed from "Italian capisce, third person present tense form of capire (to understand)". If someone decides to add greater (referenced) detail that differs from other variants, they could easily create a full entry there.
Adding references to soft redirects is useful to show that the word actually exists. Adding citations could arguably be useful to show that it's used, but could just as well be added to the linked-to variant's page. —{admin} Pathoschild 17:50, 9 January 2007 (UTC)
Point of clarification: soft redirects would still not be encouraged, but in my view they should not be discouraged, as in such cases at least the progress is forward. DAVilla 15:47, 10 January 2007 (UTC)

Dfeuer's response to Connel's response

1a. It is completely untrue that different spellings even can share etymologies.
Different spellings can reflect (perhaps subtly) different etymologies. You might reasonably argue that capisce and capeesh deserve separate etymologies, because capisce derives its spelling from the Italian, whereas capeesh, a phonetic spelling, derives from spoken use. It is therefore possible that the word entered English through a spoken route and a written one, which could, maybe, be demonstrated through a tremendous amount of difficult research and included in the etymology. If not, it is still possible to note in the etymologies that one preserved the Italian and the other didn't, but that fact is obvious to anyone looking at the etymology anyway, considering that it appears just below the list of spellings. You may also be able to show that capisce/capeesh/capiche are pronounced differently than capish/coppish, which could reflect interesting regional differences. On the other hand, I don't see how you can, even in principle, distinguish the etymologies of capeesh/kapeesh or capish/coppish. Furthermore, as a reader, I would find it much more interesting to read a single entry containing a discussion of these differences than two read seven entries and then try to piece together an explanation of my own for why the seven entries are different and what that might mean.
1b. They are different words representing the same concept, perhaps even the same pronunciation.
The editors of most or all major dictionaries disagree. While the use of a word in writing is an important record of its history, words are typically spoken far more than they are written, and the spoken usage is considered the primary one. A dedicated literary critic might find meaning in the various ways Shakspear spelled his name, but I don't think anyone will ever find a meaningful difference between the spellings privily, pryvely, prively, pryuely, and priuely.
3. The different spellings almost invariably represent different regional spelling normalizations.
Can you give any evidence for that? Even if it applies to current spellings, I don't think it could possibly apply to older spellings. In the time before standardized spelling, different spellings almost invariably represented the tastes of the individuals writing the words, and could vary from moment to moment.
4. The different spellings usually carry different connotations/shades of meaning.
I could easily be convinced that different spellings occasionally carry different connotations, and would agree that in such cases the entries need to diverge. Can you provide evidence that this is true of more than, say, 3% of the words in Wiktionary?
5. To anyone even remotely linguistically curious, the differences between words (particularly as they are only starting to diverge) is of enormously greater importance, than the editor's "maintenance" concerns.
10. The color/colour "shared section" method is difficult for most new editors to grok, and makes even the most seasoned veteran pause. The optimal solution is to put the redunant data where it belongs - in the entries of the redundant spellings, allowing them to diverge just as they do in natural language.
You focus a lot on the evils of false commonality, where entries should diverge but don't. I see a problem also with false divergence. If I look up two words (or two spellings of one word), and I see that they are defined in substantively different ways, I will generally assume that those differences are real, and my choice between the two will be guided by the differences in definition. If in fact these differences are there simply because an editor changed one and neglected to change the other, then I, the reader, have come away learning something that simply isn't true. We want to record the language as it changes naturally. That doesn't mean that we should allow our dictionary to change naturally, independently of the language.
7. It is a flat-out lie, that "alternative spelling of blah" saves a click for the reader. During slow times, when a single page load takes 10 seconds to two minutes, that is a devastating difference.
I'm not a web programmer, but might it be possible to arrange for "soft redirects" to be supported by Javascript (where available) so that the target of the redirect will be loaded along with the initial page? Dfeuer 08:33, 10 January 2007 (UTC)

Wiktionary:Spelling variants in entry names

Wanton changes to this Wiktionary page during the start of this discussion are not helpful. One of the ideas in all the mess I just rolled back has potential, but barging ahead with the changes, without any discussion at all is pretty foolhardy. --Connel MacKenzie 04:12, 9 January 2007 (UTC)

The user who changed the page did not participate in the discussion; perhaps they're not aware that there is a relevant discussion. —{admin} Pathoschild 18:27, 9 January 2007 (UTC)
I've left a note on his Wikipedia User Talk page --Enginear 19:27, 9 January 2007 (UTC)
"The user" referred to is apparently me, so I'll just reply here. 1) Connel: the changes were not a "foolhardy" "wanton" "mess", they were simply bold, and made in good faith after considerable thought, and after not seeing the issues I was attempting to address adequately covered on the talk page (though various threads there touch on aspects of them); exaggeratory hyperbolic descriptions of the edit are not particularly helpful or civil. I'm glad you found at least one idea in it that "has potential"; everyone wins! 2) Draft guideline pages are made to be edited; they beg for it. If you want to "own" it directly until it is in more solid form and less likely to be edited by people who were not its principal authors, put it in your user space until you feel it is ready to be a more formal proposal. If you want to have a cadre control it, create a project and put it in the project's namespace until it is ready for primetime. But please don't flame people for making good-faith edits to "Wiktionary:"-space draft projectpages that make it clear that they are drafts. Whether or not I'm a long-time participant in that document's Talk history is of no relevance to whether the edit has value (indeed, the opposite can often be true - long-time participants sometimes do not see the forest for their personally-tended trees). 3) It's not a big deal; reverts are easy. Again, I'm glad some value was found in part of what I added, and I hope someone more deeply involved in that document will incorporate it in some way that doesn't make you all freak out over molehills and lambaste people in the BP about it. NB: I have to note that reverting a good-faith edit with admitted value rather than improving it or finding some way to work it in to the document more harmoniously is considered a bad idea - cf. M:Rollback#Don'ts, second bullet. 4) I honestly don't see how I am supposed to be expected to know about this particular "capiche" discussion (which seems to be the referrent of "this discussion" [Connel] and "the discussion" [Pathoschild]), which is nowhere near the projectpage in question, much less see how it is of such supposed vital relevance to the page I edited. Frankly, I don't see much relevance at all, to the material I added in particular. The "capiche" spelling issue appears to be covered by a point in that projectpage already, part I did not edit (namely "Types of Spelling Variants" entries 1-3, and 5, depending on which variant of "capiche" one is talking about). 5.) It's fine to disagree of course, but please: I don't need my ankles chewed. Maybe Wiktionary is somehow developing norms of editing, and responses to edits, that are wildly divergent from the WikiMedia culture in general (exemplified at Wikipedia and Meta)? I can't recall ever being categorized as a "[non]participat[ory]" and "[un]aware" maker of "foolhardy" "wanton" "mess[es]" anywhere else in the wikisphere. — SMcCandlish [talk] [contrib] 20:57, 9 January 2007 (UTC)
I would not rate the English Wiktionary as the friendliest wiki around, but despite the short tempers, ambient hostility, and lack of policies, everyone does more or less work in good faith. One reason Connel MacKenzie took your edit so seriously is that Wiktionary's draft policies are the de facto guidelines. Please stick around; a few new users might even out the community. :) —{admin} Pathoschild 06:56, 10 January 2007 (UTC)
Re: "Wiktionary's draft policies are the de facto guidelines" — Noted! Hardly the place I'd expect to find toes to step on, but now that I know they are there, I'll watch out for them. And, yes, I'll stick around. I've been on Wiktionary almost as long as Wikipedia, I'm just a total gnome here. I make very few edits, and they are usually twiddles, not major things. This was my first policy-sphere foray on this Wiki* (a review of my Wikipedia activities will demonstrate that I've been quite active in WP policy issues, especially at W:Wikipedia:Notability, so I don't think I was "randomly wading in" here. I was just somewhat ill-preparedly doing so.  :-) Being bold can have its down side. — SMcCandlish [talk] [contrib] 08:54, 10 January 2007 (UTC)
I'm glad you're not put off. Personally, I find Wikt no less friendly than the real world. I rarely encounter hostility towards me, but there is a significant amount towards Connel, to which he unfortunately responds. I have not been around long enough to know which came first. He does however, like many of us, say what he thinks rather than playing political games. Knowing that makes it easier to respond civilly to him, even when he has, due to his personal experiences, wrongly assumed bad faith.
If you want to be significantly involved in policy matters at Wikt, you really need to keep up with the Beer parlour discussions. For whatever reason, we almost always discuss them here rather than on the policy talk pages. There is also a tendency to fly kites on Connel's user page -- in spite of his frequent requests to have the discussions here instead -- so it's worth watching that too.
As time goes on, we are finding a growing number of ways in which procedures and formats for a dictionary aiming to cover "all words in all languages, with the descriptions, etc being in English" need to be different from those which best suit an encyclopedia. Many of us arrived here with inappropriate preconceptions, which are modified as we gain experience and listen to those who have been here longer. Certainly, having now done about 700 edits, I am conscious I still have a lot to learn.
You had the misfortune to be in the wrong place at the wrong time. Inappropriately, IMO, "spelling variants" is the most argued over issue at Wikt, and the resulting animosity has caused at least one previously prolific editor to walk away. Again, I have not been here long enough to know, but it has been suggested that the origin lies in a horribly POV US v UK battle over whose spelling was "best". IMO, this is highly unfortunate, since together the US & UK populations still form a relatively small minority of the English speaking world, and this battle detracts from the similar needs of other countries (eg I was told yesterday that the standard spelling of okra in Ghanaian English is okro).
A related issue is how to gloss words where the spelling is considered "correct" (ie students won't be marked wrong for using it) in one country, but "unacceptable" (ie they may be marked wrong) in others. And this tends to get caught up with the issue of how to deal with "misspellings", see another current thread above. (I have purposely used emotive prescriptionist words there. Most of us at present want Wikt to be descriptionist (perhaps aka NPOV), but accept the need to flag in some way those words which are "not standard English" in any or all different countries.)
Most of the arguments that I have seen about spelling variants in the year I have been here have either been raised again, or at least alluded to, in the threads currently on this page. It was thus a particularly emotive time to find the relevant draft policy page being edited. You really were in the wrong place at the wrong time! --Enginear 12:00, 10 January 2007 (UTC)
Sounds like fun. :-/ Anyway, thanks for the heads-ups with regard to "how things are done around here". Some of them sound rather strange (policypages have Talk pages for a reason, don't they? Heh.) But I'll remember them. — SMcCandlish [talk] [contrib] 23:37, 10 January 2007 (UTC)
  • I apologize for the "personal attack" nature of my comments. --Connel MacKenzie 14:43, 10 January 2007 (UTC)
Thanks, and accepted.  :-) — SMcCandlish [talk] [contrib] 23:29, 10 January 2007 (UTC)
  • I have to say I rather regret partially reverting to some of SMcCandlish's edits, especially the part about not noticing the introduction of an entire paragraph, which I didn't intend to push through. Thank you for not calling me on that mistake. DAVilla 15:57, 10 January 2007 (UTC)
Cool by me either way; if the material eventually provides a minor clarification in the policypage, or even just provides fodder for discussion, then my mini-mission was accomplished. — SMcCandlish [talk] [contrib] 23:29, 10 January 2007 (UTC)

Summary of above

There are too many separate tangents in the comments above to react to, so I'm starting over here.

  • An encyclopedia's headwords represent concepts.
  • Any dictionary's headwords represent spellings.

The most important data that have about a word is not the definition, it is the spelling. The definitions are what makes that useful. But there is nothing that can distinguish entries better than spelling.

  • This is 7,000+ times more important for a multilingual dictionary, such as Wiktionary.

I have not corrected the entry above, because I view this as a content dispute with Pathoschild/Dfeuer. Out of respect for them, I am not "attacking" their entry even though it is wrong. I am trying to attack the underlying misconception. So Enginear, yes, feel free to correct the entry, when we've reached a consensus.

It is very wrong to assume that different spellings share etymologies. That assertion has been made and remade above (without citations, by the way) yet I still cannot agree. Clearly, the absorption of an Italian pronunciation into the language will be represented differently (in spelling) based on what is adapting the borrowed term. When 100% of etymological information is speculative by definition is seems specious to demand citations for some 3%, suddenly.

In this dispute, it has seemed to me, that Pathoschild has called for Wikipedia reinforcements. That is a good thing; the more people involved in a conversation, the more likely an amicable solution can be found. But because the fundamental philosophical difference (is this an encyclopedia, or a dictionary?) has not been stated as starting definition, and bizarre Wikipedia-like arguments have been made that do not support building a dictionary.

Using javascript to display the contents of another page whenever a "alternative spelling of" tag is found, has many drawbacks; the page load time is comparable to the user clicking through to the other entry, anyway (and the technical effort to accomplish it does not seem to be a worthwhile expenditure of technical resources.) Furthermore, such a solution is inherently unstable (=high maintenance.)

  1. It is my view that all hard redirects should be prohibited.
  2. It is also my view that all soft redirects should be strongly discouraged.
  3. It is also my view that all automated synchronization of soft redirects with their referents should be prohibited, and manual synchronization discouraged/watched carefully.

I wish to emphasise that last point: minor omissions and minor corrections are all that should be synchronized between, say, color and colour. Efforts to reword dialectical definitions to become common with some other dialect should be vigilantly quashed. (That would be analogous to writing the English definition of the English word epee in French, otherwise.) Our current guidelines and practices do not yet address this final point, adequately.

--Connel MacKenzie 15:21, 10 January 2007 (UTC)

We cleared this up on IRC, but for the record I never 'called for Wikipedia reinforcements'. In fact, I was inactive on en-Wikipedia between 8 December 2006 and 11 January 2006, both on wiki and in the IRC channels. I joined the discussion when I noticed Dfeuer's edits, and Connel MacKenzie joined later for the same reason. The only coincidence in this case is SMcCandlish editing a draft policy page about spelling variants, and they seem to be a longtime lurking Wiktionarian.
If the community favours duplication, I'm inclined to grudgingly duplicate entries—though I reserve the right to grumble about it ;). However, I fail to see the point of your third numbered statement, that manual synchronization should be discouraged. If someone adds a meaning to colour which is not present in color, why should he be 'strongly discouraged' from adding it to the latter as well? —{admin} Pathoschild 06:11, 11 January 2007 (UTC)
  1. Talk:color
  2. Talk:colour
  3. Wiktionary:Beer parlour archive/April-June 05#Color/colour
  4. Wiktionary:Beer parlour archive/June 06#"color-colour" doesn't work
  5. Wiktionary:Beer parlour archive/March 06#First quarter 2006 US vs. UK flamewar
Happy reading! --Connel MacKenzie 07:02, 11 January 2007 (UTC)
Connel: I think there's a misprint. In your first post on this thread, approx 6th para, did you mean I am not attacking their entry?
Wow, thank you. I've added "not" there! --Connel MacKenzie 08:13, 12 January 2007 (UTC)
I think we may be splitting hairs over etymologies: I accept it is technically true to say that, eg, the Etymology for chametz is "Transliterated from Ancient Hebrew חמץ using transliteration system x", while the Etymology for hametz is "Transliterated from Ancient Hebrew חמץ using transliteration system y". But at the practical level, we are likely to say "Transliterated from Ancient Hebrew חמץ" for both. If that is what you mean, then I agree. I also agree that it is possible one of the above transliterations was influenced by one being borrowed from another language along the way, (along the lines of color/colour) but in the chametz/hametz case, I don't think we have any evidence for that at present.
I disagree with your suggestion that "Efforts to reword dialectical definitions to become common with some other dialect...would be analogous to writing the English definition of the English word epee in French." I suggest rether that it is the same problem that the Portuguese wikt presumably has with entries relating to Portuguese or Brazilian usage, or that the Esperanto wikt would have defining epee#English and epée#French. If we accept that your and my language are both dialects of English, as we do, then we need to write definitions as far as possible in language most of each others compatriots will understand correctly. You should not define to postpone as to table#Verb and I should not define to propose as to table#Verb. Actually, we are both aware of both those meanings of table so are both qualified to define the verb table. Either of us could edit it for the above definitions successfully. However, for the noun senses, you could probably edit the computing usage straight off, while I would need to check some references first.
So there are two issues:
  • We should only edit existing definitions if we believe we understand them better than the previous editor, and
  • We should attempt to use language which will be understood correctly in as many as possible of the dialects of English (or at least the main dialects which are thought of as "standard" in different locations)
If one if the meanings of color is identical to a meaning of colour (and perhaps there is none) then, if we find it worded differently without good reason in the different entries, we should question whether one of the definitions is misleading. There may well be a good reason, eg the relationship with the surrounding definitions, but if we want the dictionary to be accessible to all English speakers, the question should be asked and, if appropriate, the definition amended. But this should only be done by someone who (or some group of editors which) is fluent in both dialects. The prerequisite of manual synchronisation is to use the brain, and the first use should be to ask "Am I more competent to do this than the last editor?" and if not to leave it (or send it to the Tea room). So I believe manual synchronisation by editors competent in usage of both words should be encouraged, but yes it should also be carefully watched, in case someone overreaches themself. --Enginear 12:27, 11 January 2007 (UTC)
Thank you - that is the most reasonable thing I've read on this topic, to date, Enginear. --Connel MacKenzie 09:49, 12 January 2007 (UTC)
Connel, it looks to me as though one of the main reasons you want to keep separate entries for separate spellings is that separate spellings sometimes reflect regional variations, as in color/colour, and you believe that the word colour, used in much of the world, may evolve separately from the word color, used primarily in the United States (I believe). There are two major problems with this:
  1. It may make meaningless distinctions: There is enough communication between English speakers in different parts of the world that English remains a largely unified language. That is, if an American sees the word "colour" used in a certain fashion, they will assume without a second thought that the word "color" can be used similarly, and a Briton reading the word "color" will make a similar assumption.
  2. It may fail to make meaningful distinctions: A word will often have different meanings in different places yes still have the same spelling. A good example would be the word "napkin".
No matter how we organize the dictionary, there will be problems. Personally, I think it would be more useful to have a single article for the various spellings, with notes about how the words (and spellings) may be used in different regions or by different groups (as print dictionaries usually do) than to separate them altogether. Dfeuer 22:58, 11 January 2007 (UTC)
Dfeuer, my objection to it is that procedurally forcing them to merge, 100% prevents the proper distinction, when needed. There are about 140 terms that I know of that are wildly different for US/UK. Forcing them to merge, or even just "encouraging" them to merge causes misconceptions like Pathoschild's above, where he was ready to synchronize color/colour. But that doesn't even begin to scratch the surface of Indian English, South African English, Jamaican Patois, Canadian English, Australian English, etc. Even within the US, we aren't providing the expected, needed distinctions. Can you say "Fo' shizzle ma nizzle" on a farm in the deep south? Can you say "Howdy y'all" on the street in NYC? (Well, of course you can, but will you survive?)
We don't have the same concerns that most dictionaries have. Most dictionaries are not multilingual dictionaries. As much as I do have expectations of what a dictionary should be, we have the multilingual constraint added here that makes several of the above "solutions" (based on paper dictionary's approaches) impossible.
Kappa had a "bill of rights" thing for a word. I assert that his "bill of rights" should apply to spellings. (Actually, I think he did assert that, himself.) Not for ease of editing, but for ease of reading and looking up entries. If entries are given room to diverge, they will do so naturally as a reflection of the English language, and the contributors adding the entries for their preferred spellings. As long as things remain sufficiently cross-referenced, the parts of entries that should be kept in sync, will be.
Making the distinctions in 'Usage notes' I think is to be encouraged. But having the possibility of a good entry like capiche being destroyed, is the only certainty, if there is no policy in place to prohibit wanton merging in the name of "making editing easier." Enginear stated very nicely, a potential guideline to use. His suggestion merits some experimentation and then further discussion.
--Connel MacKenzie 09:49, 12 January 2007 (UTC)
I've jokingly said "Howdy y'all" on the streets of New York City and survived. The secret is to say it so slowly that no New Yorker hears both words. (Indeed, few would hear more than a single syllable as they scuttle past.) Depending on the exact locality of your upbringing, the necessary speed may even come naturally.
So, no soft redirects? Are we going to have to rethink {{plural of}}? DAVilla 14:39, 12 January 2007 (UTC)
At the very least, yes, {{plural of}} should allow for a gloss, if not a repetition of the multiple definitions it refers to. As for fo' shizzle/howdy: my example meant to emphasize the embarrassment resulting from misuse; fatality was an exaggeration. --Connel MacKenzie 17:57, 12 January 2007 (UTC)
Sorry it's taken me so long to respond. Call be a dunce, but the only reason I see not to merge color with colour is that the wiki software isn't good enough to do so without making one of them the primary spelling and the other a secondary one. The way I would get around this, which would be wildly unpopular, would probably be to make both of them redirect to a page with an ugly name like 328f5a9. As for the multilingual aspect, it makes absolutely no sense to me for the different languages to share a namespaces. I would subdivide them, so that en.wiktionary.org, fr.wiktionary.org, etc., each contained many separate dictionaries. I'm sure there's a reason this hasn't been done, but I still think it should be. Dfeuer 22:11, 22 January 2007 (UTC)

Proposal to exclude from Wiktionary all English possessives formed by the addition of either a bare apostrophe or an “’s”

Following from this discussion from WT:RFD:

Moved from tag on page:

I do not think that we should include possessive forms. They are very regularly formed, but there seems to be some disagreement concerning inflecting terminal ‘-s’ singulars and plurals; therefore, it is within the realms of possibility for us to have multiple entries for differently inflected possessive forms. Sure, we have vessel’s, but what about vessels’ / vessels’s? Anyone can take off the possessive suffix of a word to look it up. Who is honestly going to look for vessel’s? I reckon that everyone would simply search for vessel.

  • From RFV:

I’m short on time. See the entry. Raifʻhār Doremítzwr 08:58, 19 January 2007 (UTC)

Moved, see above. DAVilla 00:19, 20 January 2007 (UTC)
“Obviously” this shouldn’t be here (don’t need possessive forms), but I cannot seem to find anything in our CFI or elsewhere that proscribes them. If we don’t have this spelled out, we should begin a Beer Parlour policy debate. --Jeffqyzt 19:03, 19 January 2007 (UTC)
That is my gut feeling too, but I can’t think of any logical reason to back it up. Since we include all other inflections of nouns, including genitive cases in other languages, shouldn’t we include the English possessive case too? --Enginear 20:33, 19 January 2007 (UTC)
I’ve said before that I don’t have a strong opinion one way or the other here, but I will make four observations. (1) Possessives in English aren’t as regular as we usually suppose; I have seen arguments begin over the correct possessive form of s-terminal proper nouns like Jones. (2) Possessives will not be obvious to learners of English; consider that Spanish forms its possessives using a prepositional phrase, German by declension, and I can’t begin to guess what Semitic languages like Arabic and Hebrew do. (3) The possessive forms of pronouns (e.g. its, his, hers) do not include an apostrophe, only possessive nouns do. (4) The pronunciation of possessives is at least as unpredicatable as the pronunciation of plurals; the additional s may be pronounced /s/ or /z/, and this is not easy for English learners to predict (It was one of the rationales for including plurals). I don’t know whether these points make a strong case for including possessives, but I’ve said everything I can think to say on the subject. --EncycloPetey 04:44, 20 January 2007 (UTC)
  • Delete. As an aside, the apostrophe is also used for elision, as in “The vessel’s been to sea.” Jonathan Webley 16:32, 20 January 2007 (UTC)
  1. Which is the very reason why we should not have possessives. Everyone knows that vessel becomes vessel’s, but there is some disagreement as to whether vessels becomes vessels’ or vessels’s; and what about crisis? Do we write crisis’s or crisis’; and in the plural, do we write crises’s or crises’? (That one even causes me some confusion.) Even if we, personally, use one form, we can instantly recognise that the use of the other is intended as a possessive. As Wiktionary doesn’t seem to be in the habit of prescribing, it’s best to avoid the whole issue of whether one ought to use “’s” or just “’” to form possessives of certain words. Not having possessives would avoid arguments like those concerning your guys’s.
    You’re underestimating how hard people around here will try to argue about unpopular words. — Keffy 00:33, 21 January 2007 (UTC)
  2. Forming possessives is a very basic and fundamental part of grammar in English, and is one of the first things learners are taught. If forming possessives really needs to be explained somewhere, then it should probably be in a Wikipedia article linking from possessive, genitive, -’s, or some or all of the above. If some Wiktionary users are incapable of forming possessives, then it is highly unlikely that those said users will be able to understand most of the entries’ defintions, due unto the high standard of English used therein.
  3. Clearly, we should retain every word inflected for the possessive otherwise than by adding an ’s or ; which (I think) means that we should retain only my, mine, our, ours, thy, thine, your, yours, his, her, hers, its, their, and theirs.
  4. Whether one ought to pronounce the “’s” as /s/ or as /z/ is a very minor point, and probably varies a lot from dialect unto dialect.
There is a simple rule for this: it is /s/ after /p/, /t/, /k/, /f/ and /θ/; /ɪz/ or /əz/, depending on accent, after a sibilant (/s/, /z/, /ʃ/, /ʒ/); and /z/ otherwise. The same rule applies to plurals formed by adding s or es. I don't have a reference to give, but that's the rule. — Paul G 18:08, 27 January 2007 (UTC)
I’m going to propose a policy in the Beer Parlour to disallow possessive forms. † Raifʻhār Doremítzwr 21:37, 20 January 2007 (UTC)

Just a clarification: The ’s that forms possessives is not a noun inflection. It is a clitic that attaches to the end of the entire phrase. The last word of the phrase, which ’s might seem to be a suffix on, can be anything at all: noun (the candidate’s promises), verb (the candidate who won’s promises), preposition (the candidate I voted for’s promises) — anything!

So unless we are eventually intending to add “possessive” forms for every single word in the English language (won’s, for’s, with’s, tiny’s…), there’ll have to be a line drawn, and that line might as well be here, as soon as the resulting form is predictable. Ditto for plurals ending in s’. Ditto for the ’s that’s a contraction of is. Ditto for ’re. (Except on pronouns, of course, where the result may actually be regular, but you’d never know that unless you were told.) — Keffy 00:33, 21 January 2007 (UTC)

I agree completely. If “+’s” isn’t even an inflexion, then there is even less of a reason to have entries of words so suffixed. † Raifʻhār Doremítzwr 02:59, 21 January 2007 (UTC)

Another clarification: there is no disagreement among grammarians on how to form the possessive of plural nouns that are formed by adding an s: you always add just an apostrophe. Hence vessels’. Consult any good grammar book. The plural of common nouns ending in s is always formed by adding ’s; hence crisis’s. As this sounds odd, “of the crisis” is usually preferable. Now, as for crises, this is not formed by adding an s, so, strictly, the rules suggest that the possessive should be crises’s. Again, for euphony and editors’ peace of mind, “of the crises” is probably better.

Sorry, but that conflicts directly with the rule that "'s" is never added to words ending in "s". Can you provide a reference (or three) for that, please? --Connel MacKenzie 00:26, 22 January 2007 (UTC)
As far as I am aware, there is no such rule. Apostrophe-s is indeed added to words ending in s, but not plurals formed by adding (e)s. Hence the bus's journey, but the buses' journeys. The s after the apostrophe is omitted when forming the possessive of singular proper nouns ending in s depending on pronunciation: hence James' or James's, depending on whether you say /dZeImz/ or /"dZeImzIz/. Reference: The Oxford Guide to English Usage, second edition, 1994. It's possible that forms such as the bus' journey arise through a misunderstanding of this rule. Certainly I have never seen the bus' journey supported in any grammar book. There must be an s after the apostrophe otherwise "the bus journey" and "the bus' journey" are identical when spoken aloud.
Now that I've given it further thought, the possessive of crises has to be crises' . — Paul G 17:58, 27 January 2007 (UTC)
1990, Harold Van Winkle, Elements of English Grammar EEG Rules explained simply, page 176
This text very clearly indicates that there is such a rule. That is, rule #2 indicates the possessive of plural nouns get only an apostrophe if they end in s. (I'd love to cite that exactly, but that would be a copyvio, and given the tenor of conversation elsewhere on this page, that would be quite silly.) Rule #4 indicates when apostrophe + s is added: only for possessive plurals that do not end in s. The first sub-note of that rule extends the rule to all sibilants: words ending with the sound of s, z, sh, or zh, especially if there is more than one syllable in that word.
1982, ... 2003, The University of Chicago Press, The Chicago Manual of Style, page 281
The general rule very clearly states the rule for plural nouns is to add an apostrophe only! 7.18, 7.19, 7.20, 7.21, 7.22 and especially 7.23 all reinforce sub-rule specific cases where anything (not just plural nouns) ending in a sibilant gets only the apostrophe.
Would you like more summarized citations? Certainly, you can visit your local library and verify each of these, and many more. Muke had requested I provide direct citations for something-or-other, so I have about 20 grammar books from the library on hand at the moment. I am not comfortable repeating the exact text, in this discussion, as that might be considered a copyright violation.
In conclusion: yes, it is wrong to add 's to form possessives of words ending in sibilants. --Connel MacKenzie 19:29, 27 January 2007 (UTC)
I accept Connel's evidence but disagree with his conclusion. What we have demonstrated between us is that there seem to be two rules in effect. The Oxford University Press states that 's is added to singular nouns ending in s (eg, bus's, Thames's) except in the case of names ending in -es when it is pronounced /ɪz/, which add just an apostrophe (because the pronunciation requires this; eg, Moses' . Apostrophe-s is used with French names ending in s or x (eg, Degas's), again because this is required by the pronunciation; words ending in a sounded sibilant other than s have 's added (eg, Fernandez's, Asterix's). The guides Connel mentions disagree with several of these rules. It looks like the rules for UK English go largely by the pronunciation, while the rules for US English look at the final letter.
I think it's likely that what we have here is a UK/US divergence in grammatical rules (given that each of us is quoting references from his own country - at least, I take it Connel's references are both American sources) and so we need to reflect this in any article where the distinction might arise. — Paul G 20:49, 27 January 2007 (UTC)

I wrote some extensive usage notes on forming the possessive at ’s but unfortunately they don’t cover odd cases like “crises”.

In any case, this is somewhat tangential to the discussion — I would say that, as we already do with inflected forms, possessives need not be added, but if they are, there is no reason to add them. What we certainly should not be doing is adding them as a matter of course every time a new noun is added to Wiktionary, or suggesting to users that this is what they should be doing. — Paul G 21:01, 21 January 2007 (UTC)

I hereby propose that all English possessives formed by the addition of either a bare apostrophe or an “’s” be excluded from Wiktionary. Please add your names unto the lists either in favour or against this proposal. Discuss further thereunder.


In favour:


Against:

  • Kappa 03:19, 21 January 2007 (UTC) “and in the plural, do we write crises’s or crises’” — that’s what made up my mind.

Further discussion:

Concerning Kappa’s rationale for voting against:

“and in the plural, do we write crises’s or crises’” — that’s what made up my mind

The point is that it doesn’t really matter — I’d take both “crises’s” and “crises’” to be two different attempts at forming the possessive form of crises. Moreöver, I wager that the vast majority of other readers of English would too. Either we include possessives and prescribe a single, correct way of forming them, or we exclude them altogether — what sane alternative is there? † Raifʻhār Doremítzwr 03:29, 21 January 2007 (UTC)

This is NOT where votes happen. A vote should be proposed here, but carried out at WT:VOTE. --EncycloPetey 03:40, 21 January 2007 (UTC)

OK. Just regard this as a preliminary vote, before we start the real vote at WT:VOTE. † Raifʻhār Doremítzwr 13:49, 21 January 2007 (UTC)
Connel’s observations
  1. Automated tools are beginning to use en.wiktionary.org... so a lookup of vessels’ is probable. (Well, restated: lookups of plural possessive forms are very likely. Do we have plans to implement stemming logic on lookups?)
  2. The plural possessive form of crisis is crises’.
  3. Are naming collisions with words in other languages possible? Probable? I don’t know. The Old English contributors here, in particular, may wish to comment further.
  4. The forms + “’” and + “’s” for forms not ending in a sibilant, are the only forms without both dispute and confusion. Is this proposal only for them, or accidentally wider in scope?
  • Support general concept of excluding regularly formed possessives, assuming the above four points are addressed before it becomes a real WT:VOTE. --Connel MacKenzie 07:05, 21 January 2007 (UTC)
  1. I don’t understand what you’re talking about here, so I can’t address this one.
    The preamble above, stated that such lookups wouldn't happen, because humans are too intelligent to expect weird word forms, and would instead enter the root form. The point is that, human lookups account for a smaller and smaller portion of lookups here. E.g. someone highlights a paragraph, then clicks "check in Wiktionary." Even if something other than the simple javascript look being used widely now gets used, the older Macbook extensions, FireFox extensions, Google extensions about.com extensions and others, will continue to have this problem. --Connel MacKenzie 16:59, 21 January 2007 (UTC)
    If people are writing automatic lookup software that uses Wiktionary, that's great. But it's their job to incorporate the appropriate logic into their software; it's not our job to distort our database so that they no longer have to. I'd even be willing to distort the database for a good reason, but this ain't one of them. Anyone who publicly releases lookup software that can't perform the most minimal punctuation stripping (and will, for example, blow up on the first plain ASCII British text with a quotation that it comes across) is, to be blunt, incompetent. I'm not willing to lift a finger to save them from the consequences of their incompetence. (Similarly, we shouldn't be trying to jump to 300,0002 entries, one for every pair of current entries separated by an en-dash, on the off-chance that someone is too stupid to strip off en-dashes too.) -- Keffy 03:46, 24 January 2007 (UTC)
  2. That’s resolved then. (Though only according unto the way in which we both form possessives; I’m sure there are people who would write crises’s. Remember your guys’s?)
  3. Umm... how is this a problem?
  4. Yeah, there is no dispute (as far as I am aware) about forming possessives of words “not ending in a sibilant” — absolutely everyone just adds “’s”. However, because there is some disagreement as unto how to form possessives of words which do end in a sibilant, it is likely certain that we’ll end up with two possessive forms for many sibilant-terminal words (such as having both vessels’ and vessels’s). Unless, that is, we prescribe one rule for forming possessives unto the detriment of the other. That is our dichotomy.
I would mind that very much. EP specifically asked people not to vote here; I wished to indicate conceptual support without causing further aggravation. Adding my name to the list above would just serve to annoy him. --Connel MacKenzie 15:54, 21 January 2007 (UTC)
Fair enough. It’s pretty irrelevant anyhow. † Raifʻhār Doremítzwr 20:40, 21 January 2007 (UTC)
Yes, there are many people who would consistently write “crises’s” but they are consistently wrong. So what? --Connel MacKenzie 15:54, 21 January 2007 (UTC)
Hyperdescriptivism commands that we give crises’s an entry. Moreöver, it demands that we treat it and crises’ as æqually valid. That’s hardly workable (or is there someone willing to defend such a situation?). The choice we have is betwixt having no possessive forms (therefore avoiding the whole issue) and prescribing a single rule for forming possessives; I don’t mind which, but we have to choose one. † Raifʻhār Doremítzwr 20:40, 21 January 2007 (UTC)

Quotations (citations) in foreign language articles.

Is it OK (even to be encouraged) to add quotations to foreign language entries? If so, should we provide an English translation of them? (I added a quotation with no translation to the Italian word deflusso more or less at random. SemperBlotto 11:39, 25 January 2007 (UTC) p.s. Google books has LOTS of foreign language books.

I've done this a few times in French, I think it's an excellent idea, and I think a translation is useful (because often the citations will demonstrate how a single word in a foreign langhuage may be translated by different English words depending on context). Widsith 12:09, 25 January 2007 (UTC)
I'm not sure what help it is, without translation. Seems harmless enough, though. To my eye, it looks almost like a speedy delete/{{notenglish}} for an entry that might have been intended for it.wiktionary.org. (I did a double take, looking at the example above.) --Connel MacKenzie 19:53, 25 January 2007 (UTC)
Someone looking up a foreign language word most likely knows the language, otherwise how would they have occasion to look up a word. Also for RfV, as Robert mentions. Translations certainly are the norm for paper dictionaries though (at least, their example sentences have translations). --Eean 20:34, 25 January 2007 (UTC)
I disagree, when I become fluent in Greek I will look up words in Βικιλεξικό, until then I will need a bilingual dictionary hopefully with quotations and translations of them. —Saltmarsh 07:30, 26 January 2007 (UTC)
A-cai does this fairly routinely with entries for Mandarin and Min Nan, and WT:AJ advises it for Japanese kanji entries, either example sentences or quotations. There should always be a translation! (Of course, that can be added later, so it shouldn't be a absolute requirement for adding the quotation; but the person adding the quotation almost certainly knows enough to add it.)
And citing foreign language entries up for RfV pretty much requires this. (e.g. SDF) Should be encouraged. Robert Ullmann 20:30, 25 January 2007 (UTC)
Agree for many reasons. Should be encouraged. DAVilla 00:23, 26 January 2007 (UTC)

OK. I have added English translations for the two quotations (changing a quote that was just too archaic to translate easily). There is, of course, a tradeoff to be made - in the time I took to complete one word with two meanings and therefore two quotations to be searched for and then translated I could have added half a dozen words with no quotes. I shall add such quotes sparingly. SemperBlotto 12:36, 26 January 2007 (UTC)

I wouldn't worry too much about it. Wiktionary is a work in progress. Maybe someone who doesn't like looking up sentences will get a real kick out of translating your sentences. :) --Eean 00:58, 28 January 2007 (UTC)

PIE, Proto- languages and copyright

I've been very interested in constructed and reconstructed languages for a long time (decades ... ;-). Esperanto, Novial, various Inter-ling-whatevers. They are very similar ideas: "reconstructed" languages attempt to figure out what forms sets of cognates have originated from; "constructed" languages (usually) try to unify sets of cognates back to some pan-language form.

Most of it is incredibly Euro-centric ...

We have a serious problem with the reconstructed language entries: they are not citeable. At all. Note that we can cite everything from Modern English to Sumerian Cuneiform, with real usage, whether a quote from the New York Times or text inscribed in a clay tablet by a real, living, breathing, Sumerian scribe.

But all of the reconstructions are conjectural; and all of the work is copyright. The word-forms themselves are conjectured by the authors of the various academic works, and are within the copyright. There is public domain work, from the 19th century, but it is mostly dismissed by modern theory. (And note there are competing modern theories; also see Proto-Indo-European language for an overview.) Recent work, such as Indogermanisches Etymologisches Wörterbuch, Pokorny 1959, is well within copyright.

So to add any information to a PIE entry, the wikt contributor must:

  1. use an obsolete public domain source
  2. violate the copyright of the theorist, or
  3. conduct original research

None of these is acceptable (although the first might be if still current). So it is hard to see how a Proto- language entry can add anything to the wikt. Robert Ullmann 16:12, 26 January 2007 (UTC)

I reject your claim that unattested reconstructions of Indo-European stems are in any way analogous to wordforms constructed languages; the former are science, the latter engineering. —RuakhTALK 17:49, 26 January 2007 (UTC)
I just said they were ideas that were similar. But reconstructed languages are not science, since they are not proveable or falsifiable. They are conjecture, possibly useful, which is all they claim to be. Anyway, this is about copyright. Robert Ullmann 17:55, 26 January 2007 (UTC)
The authors of the books, dictionaries, and journal articles in which PIE is discussed have copyright over their original ideas, but those original ideas involve way, way, more than mere reconstructed forms. The reconstructions themselves are the common property of the entire community of historical linguists. No one "owns the copyright" to a reconstructed form like *h₁eḱwo- because the reconstruction evolved over more than a century through the work of dozens of different researchers. To take a similar example, consider the Native American language Yaqui. Only one dictionary of this language has ever been published, in 1999, so it's still copyrighted for a long time to come. Will you ban Yaqui words from Wiktionary on the grounds that they're copyrighted? As for your suggestion that something isn't science if it's not provable or falsifiable, that will certainly come as an unpleasant shock to the physicists who work on string theory, which is also neither provable nor falsifiable. Angr 18:49, 26 January 2007 (UTC)
Angr, that is absolutely absurd. There are 15,000 speakers of Yaqui; each of them are welcome to add their knowledge here. There is no possible way you can suggest that anyone currently speaks Proto-Indo-European, and therefore has that factual knowledge of the language subject. --Connel MacKenzie 19:50, 26 January 2007 (UTC)
The Yaqui language itself isn't copyrighted; but if someone copies entries from that dictionary, yes, that's a problem. If the PIE proponents (like you) want to add information here, you have to show PD sources. (Which, if what you say is correct, should be no problem at all.) As to string theory, the scientists are completely clear on the issue, and unlikely to be "shocked" ... I'll quote the pedia, emphasis mine: "No experimental verification or falsification of the theory has yet been possible, thus leading many experts to turn to one of several alternate models, such as Loop quantum gravity. However, with the construction of the Large Hadron Collider near Geneva, Switzerland scientists may produce relevant data." Take note: string theory's acceptance as science directly depends on whether it can be proven or disproven; by contrast, PIE can never be proven or falsified, therefore can never be science. Robert Ullmann 20:02, 26 January 2007 (UTC)
I disagree. Evolution (at least macroevolution) is rather outside the scope of experimental observation and thus, on a certain level, is outside of science. But it is still taken to be science within the biological academic community. The reason is that it makes predictions which can be tested. The same holds true for hypothesized language. However, unfortunately, proto-languages do not get nearly the scholarly attention that evolution does, and so is much more shadowy, so relating them always as "possible sources" is wise in this case. But they are not unscientific intrinsically. Cerealkiller13 20:18, 26 January 2007 (UTC)
There is definitely a relationship between science and falsifiability, but a statement can be scientific without being falsifiable. For example, imagine that I hold a block in my hand for a moment, and then set it down. I can say, "If I had let go of that block right then, it would have fallen"; and this is a true, scientifically valid statement. But it's utterly unfalsifiable; I can't go back in time and have let go of the block to see what would have happened. —RuakhTALK 20:25, 26 January 2007 (UTC)
Back to Connel's statement, "There are 15,000 speakers of Yaqui; each of them are welcome to add their knowledge here." That would be original research on their part, though, and so equally unacceptable, right? Angr 20:26, 26 January 2007 (UTC)
I also want to reiterate a point I made at Template talk:proto that was never responded to: To suggest that all trace of PIE be removed from Wiktionary is essentially to suggest that all etymological information be removed; after all, the claim that Old English hægl is derived from Proto-Germanic *haglaz is no more conjectural than the claim that Modern English hail is derived from Old English hægl, even though the PGmc word is not attested and the OE word is. All serious dictionaries include etymological information, however conjectural it may be; why should Wiktionary be any different? Angr 20:29, 26 January 2007 (UTC)
Here on Wiktionary, unlike Wikipedia, we require citations of use, not citations of other secondary sources. I understand this is very different from Wikipedia and I am sympathetic to the certain confusion it causes. However, that does not change the fact that if you were to enter all (or many of) the words in the Yaqui dictionary you refer to above, that obviously would amount to a systematic copyright violation. Also, while the definitions of a word fall under fair-use clauses, etymological information cannot automatically be afforded the same protection. As you indicate, etymology is not fact but conjecture. How Wikipedia's policy of WP:NOR fits in to Wiktionary is not very well defined. I do not think that controversy about NOR on Wiktionary has ever been fully investigated, nor adequately discussed. Since that seems to be one crucial factor here, it is perhaps time to explore that further.
None of that, however, refutes my conclusion that 100% of P-I-E information is either copyright protected, obsolete or original research. By choice of one P-I-E form over another, one would inherently (in the entry title) divulge which source they were copying.
Additionally, I have never heard a convincing argument as to why we should include P-I-E forms at all. The linguistic technical details are first and foremost likely to confuse the average reader, rather than what it should be doing: clarifying the origin of a term. With the proto forms all being originally derived from etymological information, I cannot for the life of me see how they can be considered authoritative origins of a word. But alas, the community stated otherwise, at WT:VOTE (fully acknowledging visiting Wikipedians with little or no other Wiktionary involvement, outside of P-I-E topics.) --Connel MacKenzie 21:18, 26 January 2007 (UTC)
We do include PIE, simply in a different format than attested words. And, while you do make a good point that atymologies nearly always involve a certain amount of guesswork, the existence of hægl is not a conjecture, even if its connection to hail may be. Note that the connection between attested words and non-attested is not different in form than the connection between attested words and other attested words. But rather, the format of the non-attested words themselves is different. And Connel, it should be noted that only considering regular Wiktionarians, the PIE decision would have been the same, albeit by a slimmer margin. Cerealkiller13 21:29, 26 January 2007 (UTC)
I disagree: the existence of *hægl is conjecture, as is its relation to hail. As to the vote: I was simply noting that Wikipedian's votes were counted, without question. --Connel MacKenzie 21:35, 26 January 2007 (UTC)

I asked a friend of mine who's doing graduate work in Old English about this, and here's an excerpt from her email. Just thought folks would like to know (it also jives with what the OED states): Hægl is a real word, and it comes into ModE as hail--in the sense of ice falling from the sky. It is attested in various places, including The Seafarer: hægl feol on eorþan, corna caldast (hail fell to the earth, coldest of grains). The rune form for h is also called hægl. Hail, as in 'hail, well met', is from OE hæl and/or Old Norse heill, both meaning health. It is attested in various places, including Bible translations and medicinal texts, and functions as the modern word in Elene: Héht hé Elenan hæl ábeódan (he bade them greet Elene). Cerealkiller13 20:48, 27 January 2007 (UTC)


<< The Old English word hægl is attested; it isn't conjecture. Saying Wiktionary requires citations of use still doesn't allow for the possibility of Yaqui speakers adding words based on their own knowledge of the language. Basically what you're saying is that languages that have only been recently written down, so that all attestation is still copyrighted, cannot be included in Wiktionary at all, because of the absurd idea that the fact that the Yaqui word for dog is chuu'u is somehow "copyrighted", just because the only verification of the fact is to be found in a copyrighted dictionary. As for a reason to include PIE forms, the best reason is that all serious dictionaries include etymological information; I don't think you'll ever get consensus for removing etymologies from Wiktionary. Angr 23:27, 26 January 2007 (UTC)

Note: Copyright law's fair use doctrine would allow the use of an individual sentence from a copyrighted work for the purpose of illustrating the meaning of a word used in said sentence. —RuakhTALK 23:56, 26 January 2007 (UTC)
Angr, thank you for the elaboration and correction. Yes, I meant the P-I-E term itself is conjecture, not the Old English term.
I would appreciate it if you would stop misrepresenting things I say. I have never suggested removing etymologies from Wiktionary. I have suggested, and will continue to suggest, that copyright violating etymological information be systematically removed from Wiktionary. --Connel MacKenzie 02:11, 27 January 2007 (UTC)
So you keep saying, but what you fail to understand is (1) the information you want removed isn't copyrightable, and (2) removing the information will inevitably lead to the removal of all etymological information. How can I give the etymology of a word if I'm not allowed to say where it came from? Angr 07:38, 27 January 2007 (UTC)
I'm very sorry, but that is a very silly assertion. Nothing is "copyrightable"? Removal of copyright-violating material leads to the removal of all material? What kind of sensationalism is that? Perhaps if we returned to the topic at hand: what general knowledge source could possibly exist for current P-I-E forms, other than a copyright-protected source? Do you speak P-I-E? --Connel MacKenzie 07:58, 27 January 2007 (UTC)
Etymological information is copyrightable; where it represents novel research and conjecture it usually is: this is most of PIE et al. There is also etymological information in the public domain; this includes a lot (but not all) of the derivations from non-reconstructed languages. Copying an etymology from Webster 1913 is fine, copying from the OED is a copyvio, unless it can be sourced elsewhere (e.g. is not the OED's original research). Robert Ullmann 08:08, 27 January 2007 (UTC)
Individual reconstructed PIE words, stems, and roots are not copyrightable because by themselves they are no one person's creative work. The reconstructed stem *h₁eḱwo- has no author and represents no novel research, and neither do any of the other reconstructed forms used in etymologies. Angr 13:35, 27 January 2007 (UTC)
Copyright has nothing to do with "one person's creative work". It has to do with publication of information not previously in the public domain under copyright. Show us a public domain source. If the PD source exists, then (any given) work's copyright doesn't cover that information. Otherwise, it does. If the only sources for *h₁eḱwo- are in copyrighted texts, it matters not if it is one or a dozen. If author C takes work from author B takes work from author A, all under copyright, the relevant permissions are their problem, but we still can't use it. Robert Ullmann 15:55, 27 January 2007 (UTC)

We have very serious concerns about copyright, that are different from (but in some ways similar to) the pedia. People understand easily that they can't copy from other encyclopedias into the wikipedia; but they don't understand that they can't copy from other dictionaries into the wikt. In the 'pedia case, people naturally use secondary sources to write (tertiary source) 'pedia articles. In the wikt, entries must be from cited usage and/or PD sources, they can't be copied from dictionaries in copyright. Yes, this makes some languages difficult, but that's just the way it is. In everything from Modern English to Sumerian Cuneiform, it is possible to cite usage. But in PIE et al, the only sources (for the modern theory) are copyrighted texts.

This is particularly annoying to people because the copy is so short. They say "fair use"! No. the purpose of that dictionary was to describe this word, the purpose of this dictionary is to describe this word, so copying the description is a copyright violation. Even if it is four words of definition, or "< PIE *haglaz".

So we just can't extract stuff from (e.g.) Pokorny. I know that's annoying. But that's what we live under. If there was a subject area covered only by other encyclopedias, the 'pedia basically couldn't do anything with it. There just isn't much we can do; we aren't going to allow original research either. Robert Ullmann 15:55, 27 January 2007 (UTC)

Copyright does have to do with one person's creative work, because only creative work can be copyrighted. Information that doesn't meet the threshold of originality cannot be copyrighted, and reconstructions like *h₁eḱwo- and haglaz don't meet that threshold. We can't copy whole entries word for word from Pokorny, of course (not even translated into English), but he doesn't hold copyright over the information that *h₁eḱwo- is the reconstructed PIE word for "horse". Don't forget, that reconstruction is not his! He was not the one to propose the reconstructions in his dictionary; all he did was gather the reconstructed forms used throughout Indo-Europeanist literature into one convenient location. And when he did that, he wasn't violating the copyrights of the original scholars either -- a reconstructed PIE root or stem is simply too trivial. The roots that are going into the PIE appendix aren't Pokorny's, or Watkins's, or anyone's -- they're common knowledge in the field of Indo-European linguistics ("common" in the sense of "shared by everyone"), and Pokorny's dictionary is only being used as a reference to enable readers to verify that the root means what we say it does, and to read Pokorny's whole entry (which, since it isn't being copied here, will contain more information than the WT entry). Angr 16:34, 27 January 2007 (UTC)
Then you should have no difficulty at all working from, and citing, the public domain source(s) of that "common knowledge", right? So readers can refer to the source you used (which won't be Pokorny) if they like? Then an additional reference to Pokorny is fine, we have such references to (e.g.) OED2, when (of course) not used as the source. Robert Ullmann 16:37, 27 January 2007 (UTC)
One point you are still missing: an individual root or stem is definitely not "simply too trivial" to be covered by copyright; because those works (like this) are about individual terms, they are very precisely covered by copyright. This is similar to the comparative case of books and song lyrics: you can copy a page or two out of a book and it is fair use, but even a single line of a song lyric is strictly copyright. (That's why when a character in a book says a line of a song or a poem, you will always find the notice that it was "used with permission" in the front of the book.) Robert Ullmann 16:46, 27 January 2007 (UTC)
No, there is no single source -- public domain or copyrighted -- for common knowledge. That's what makes it common knowledge. That's why Pokorny himself could list all those reconstructed terms in his dictionary without citing individual sources for them and without violating anyone else's copyright. The publisher of the only Yaqui dictionary cannot claim copyright over the fact that chuu'u is the Yaqui word for "dog", and Wiktionary is allowed to have an entry for chuu'u identifying it as such despite the fact that the only verifiable reference for that happens to be a copyrighted dictionary. And the PIE reconstructed forms and their meanings are no different. Angr 17:09, 27 January 2007 (UTC)
Angr, technical details of reconstruction are not common knowledge! Please don't mix in "verifiable reference" here, as that is completely irrelevant. (Confer {{nosecondary}}.) The Wikipedia "verifiable references" concept does not apply to citing usage of a term. And please stop making apples to oranges comparisons about living languages...they have no bearing on the topic at hand, whatsoever. --Connel MacKenzie 18:45, 27 January 2007 (UTC)
Also, I disagree with your assessment as to why Julius Pokorny was allowed to use other's material in Indogermanisches Etymologisches Wörterbuch. The w:United States copyright laws certainly did not apply to him when he wrote it. --Connel MacKenzie 18:54, 27 January 2007 (UTC)
The technical details of reconstruction are common knowledge among Indo-Europeanists in the sense that they don't originate from anyone. No one is the author of individual reconstructions. Comparison with living languages is entirely appropriate because the only difference between the common knowledge that *h₁eḱwos is the PIE word for "horse" and the common knowledge that Pferd is the German word for same thing is the set of people who know it. Showing that reconstructions are actually used in the sense of {{nosecondary}} (and not just listed in dictionaries of PIE roots) is easy; they occur all over the place in books and articles about Indo-European topics -- precisely because Indo-Europeanists actually don't spend their time reconstructing individual roots and stems (which as I mentioned above is trivial) but rather using the reconstructed roots and stems to argue for their theories about the PIE language: how the noun system worked, the relation between stative verbs and the perfective inflection, various sound laws, etc. That's the sort of information that's copyrightable, not the shapes of the reconstructed stems. As for Pokorny's dictionary, the Swiss copyright laws (which don't even have provisions for fair use!) certainly did apply to it. Angr 19:43, 27 January 2007 (UTC)
That does not match my understanding of current US copyright law. You are suggesting a legal test has been used to determine what to call a language, despite it being a constructed language? And you are suggesting that a legal precedent has been set that would afford Wiktionary some protection for individual definitions of astronomically highly technical terms that no one alive can possibly speak? But even if what you say were to be true, in court, there is still no way you can refute that the whole body of P-I-E data entered here would be a systemic copyright violation. The "common knowledge" would be people speaking P-I-E, and that cannot exist without a time travel machine.
US copyright law has changed quite significantly in the last forty-seven years...I can only presume that Swiss law has followed accordingly. --Connel MacKenzie 08:39, 28 January 2007 (UTC)

(unindent). OK, I'm confused by the assertion made, seemingly by both sides here, that information is subject to copyright. Unless I am gravely mistaken, at least under US law this is not the case; while the presentation of information is subject to copyright, the information itself is not (even if novel). Otherwise, anyone reporting the results of a new scientific study (for example), or any scholar citing that study as a source, would be guilty of copyright violation. Right? I have no legal training, so perhaps I'm just out to lunch on this one. But if this is a mistake, it's fairly widespread; see Google:"information is not subject to copyright", w:Bridgeman Art Library v. Corel Corp., US Copyright Office Basics, etc. Puzzledly yours, -- Visviva 15:04, 29 January 2007 (UTC)

Wiktionary:About Ancient Greek

Alright folks, as if you havn't gone through enough of these already, I've made another About language page: Wiktionary:About Ancient Greek. Please take a look at it and edit it, pick fights on its talk page, and scream at me. A lot of the stuff is probably quite controversial, so bear in mind that it's a work in progress. I realize that not everyone knows a lot about Ancient Greek, but please take a look anyway, and ask me for some context if needed. That said, I don't really know that much about Ancient Greek, and so may well have made some mistakes. If I have, please feel free to fix them, or simply tell me to do so. I'm hoping this will spur some excellent debates. A pronunciation section is forthcoming. Cerealkiller13 09:09, 28 January 2007 (UTC)

US state nicknames

I have added Yellowhammer State, and I think that we should have all the others. But I think that an American could do it better than I could. Could someone please improve my effort by ading an etymology and maybe put it into some sort of category. Then, at their leisure, add all the others? Cheers. SemperBlotto 08:54, 28 January 2007 (UTC) p.s. Wikipedia has the nicknames in the infobox of each state.

I can only see two instances of Yellowhammer State on b.g.c that aren't obvious mentions. Some other nicknames may be more widely used (Show Me State), but generally I don't hear these used outside ads for tourism. Cynewulf 16:22, 28 January 2007 (UTC)
Check out Wikipedia's "List of U.S. state nicknames" -- there are even official ones! Seriously, the ones in boldface appear to be the most common. "Nutmeg state" is the most popular in Connecticut, "Bay state" in Massachusetts and "Ocean State" in Rhode Island. Google will show that the popularity extends to business names and names of organizations and nonprofits.Noroton 22:13, 28 January 2007 (UTC)
I think that this is rather trivial. I live in Maryland and I havent even heard it be called "Old Line State; Free State", according to wikipedia. Bearingbreaker92 03:43, 29 January 2007 (UTC)
Most of these are just legislated into existence at the whim of the various state legislatures. The Show Me State is an interesting example, because it drew from an already famous speech by w:Harry Truman, but most of these have little interest outside of the various state advertising and tourism boards. I would tend to consider this encyclopedic, not dictionary, content. --Jeffqyzt 15:46, 30 January 2007 (UTC)

Tables on Category:Given_name_appendices

  • ok, A few days ago, I was foolish and moved a table there from an individual name page. It was suppose to be temporary, until I could figure out what to do with it. It seemed like a good idea at the time, but now the tables are multiplying. They certainly don't belong on that page.
    Are the tables "dictionary material"?
    If they are, where do they belong? --Versageek 20:56, 28 January 2007 (UTC)
An Appendix? --EncycloPetey 21:03, 28 January 2007 (UTC)
Oh, I see. The contents of the category page itself. Yes, that looks like it should be in Appendix:Dutch given names. The Category itself (the grouping of appendices) seem useful as well. --Connel MacKenzie 09:03, 29 January 2007 (UTC)
I've moved it. Jimp 05:50, 30 January 2007 (UTC)

Hanja and Korean language

The en Wiktionary treats Hanja (more see Wikipedia:Hanja) as if it was Korean language. That's a heavy mistake, which I've pointed out elsewhere. Chinese characters that are used by tradition in Korea are called Hanja. And verbal meaning of Hanja is nothing else but Chinese character. This alone would suffice to see that Korean is different from Hanja. Since invention of the Korean writing system 'Hangeul' (more see Wikipedia:Hangeul) we Korean transliterate Hanja into Korean writing. Such transliterated word is called 'Hanja.eo' (한자어) and recognized as Korean word, etymologically of Chinese origin. Wenn you can find sometimes Hanja printed in the Korean newspaper, this doesn't mean that Hanja is Korean, but some prefer Hanja as written medium to Korean Hanja.eo, which sometimes would be understood equivocal.
For example, instead of writing Chinese character '漢字'(Hanja), we use its transliterated word '한자', which is, however, pronounced differently from in Chinese (see IPA of Mandarin). As you can perhaps find a foreign loan 'Feng shui', which comes from Chinese '風水', in a modern English dictionary, but not that Chinese character itself, so you can never find Hanja entries, i.e. chinese character, in any Korean dictionary because they don't belong among Korean vocabularies.
What I warry about is that some Wiktionaries may get wrong if they import Hanja entries from the en Wiktionary, which are currently categorized under Korean language (see this category), assuming that they are Korean words. The only way to avoid such silly mistake as well as to correct false categories on en is to classify Hanja into a subcategory (e.g. Korean Hanja) under Korean language, as the ko Wiktionary does. 아흔(A-heun) 17:01, 10 January 2007 (UTC)

I agree with you. In Korean words, the hanja should only be mentioned in the etymology section in most cases. Korean hanja do not enjoy the same status as Japanese kanji. I believe the North Koreans don’t use hanja at all. —Stephen 17:35, 10 January 2007 (UTC)
Agree that most of those should be in Category:Korean hanja, which is rather underpopulated at the moment. But it is worth noting that hanja is commonly included in the running text, without transliteration, in scholarly Korean, even up to the present day. I've read (or attempted to read) recent Korean texts that were more than 50% hanja. So there is something of a gray area here. -- Visviva 12:59, 11 January 2007 (UTC)
It's right that using Hanja in Korea is still widespread, although "more than 50%" is fairly exaggerated. There are also some Korean who want to maintain using Hanja and even regard it as educated. However, the tendency is in favour of the transliteration. With regard to lexical principle, one should bear in mind that Hanja is not Korean. --아흔(A-heun) 16:33, 11 January 2007 (UTC)
In my opinion the fact that they are symbols used in running text to convey meaning in the Korean language makes them part of that language, from the point of view of a descriptive dictionary at least. Similarly symbols like 1, 2, 3, % etc. Kappa 08:54, 12 January 2007 (UTC)
I am not sure that Hanja in the Korean language is a sort of symbols. I think, one should distinguish between Korean as a language in its own right and Hanja as a conventional medium in Korea that is actually superfluous. Because you can write a text thoroughly in Korean without Hanja, as well as every Hanja.eo in text can be substituted for Hanja. --아흔(A-heun) 15:30, 12 January 2007 (UTC)

It should be pointed that Hanja is part of the writing system for Korean. While hanguel can be used for any given word, places and names usually use hanja, and (as observed by Visiva) it is used, sometimes extensively, in some texts. The idea that Hanja is "not Korean" is either extreme POV, or a misunderstanding of what the word "Korean" means in English. Or some of both.

This user removed Category:Korean nouns from 日本語, and refused to replace it, because "it is not a Korean word, but Hanja". Of course, it is a Korean word, written in Hanja. (meaning "Japanese language" ;-) Or it isn't a Korean word, so it isn't Hanja. (It is Kanji of course...) But since it is used in written Korean, it is a Korean word written in Hanja. (Google gives a million hits for 日本語 on Korean language pages in domain .kr, some of course is in Japanese text appearing on the same page, but lots in running hangeul/hanja text; 일본어 only gets a million and a half.) Anyone care to try to explain this better? Robert Ullmann 21:32, 18 January 2007 (UTC)

w:Hanja is a good explanation; A-heun referred to it at the top of this discussion, but seems to have missed that it explains the extensive (and current) use of Hanja to write Korean. Perhaps he thinks it shouldn't be used, but it is. Quite a bit ... And it directly contradicts what he said above about Hanja not appearing in dictionaries: "In modern Korean dictionaries, all entry words of Sino-Korean origin are printed in Hangul and listed in Hangul order, with the Hanja given in parentheses immediately following the entry word." Robert Ullmann 21:54, 18 January 2007 (UTC)
He might actually think Hanja shouldn't be used in Korean texts, but I think it wasn't his point. What he tried to explain was a general perception among Korean native speakers that Hanja is a foreign character set which is considered not as a genuine constituent of the Korean language but as something auxiliary to it. And, I suppose, based on it he argued that we should treat entry titles in Hanja as mere transcriptions discriminating from ones in Hangul, which is considered as a genuine character set in Korean, so that we can present them in a culturally correct manner. --Tohru 04:34, 20 January 2007 (UTC)
Note that 漢字 (Hanja) is also used in Korean to refer what we would call the written Chinese language (see for example [4], which is very interesting, it discusses in part the grammatical structure of 漢字, what we would call the Chinese language), it means "Chinese writing". Part of the problem here is that the English use of "Hanja" is specifically the Han Chinese script used to write Korean, while the Korean use of 漢字 (Hanja) means the characters used to write Chinese, or Korean words from Chinese, or the Chinese written language. *sigh* Robert Ullmann 22:27, 18 January 2007 (UTC)
Shall we categorize 漢字 as Korean hanja noun instead of Korean noun? I do see many Korean hanja compounds when Googling. I need clarification as Chinese Wiktionary also has so many Korean hanja compounds.--Jusjih 11:58, 1 February 2007 (UTC)

UK tag, bang on

I just tagged both senses of bang on as "UK", as they're not widely current in the US and I have heard both of them in England.

However, as with many of these, I have little if any idea whether they're used in Australia, New Zealand, South Africa, the Bahamas or even Canada or some corner of the US.

It would be nice to have a tag for "This isn't heard in my particular region" or "I've heard this in some particular region" without the implication that everyone everywhere else does or doesn't use it. For that matter, it's practically impossible to say "no one here says this" with certainty, or to speak (even as a native) for all of the speakers in a large country (or commonwealth :-).

I believe there have been discussions about this issue before, but I wanted to know what the latest practice is, so I can use the right tags and (with luck) avoid stepping on toes by suggesting people say things they don't or vice versa. Is there a formal "commonwealth" tag, for example?

There are really (at least) three sensible designations for regionality, for a given term and region:

  • Current here.
  • Not current here.
  • Unknown

but do we really want to see a table of 17 different regions with a designation for each?

-dmh 17:06, 29 January 2007 (UTC)

I think your analysis is, um... bang on, but like you I haven't yet thought of an excellent solution. For meanings which come to RFV or the Tea bar, it half works when some of us note whether we've heard it or not, or have seen cites originating from particular regions, and then the gloss is altered accordingly.
I suspect there are particular groups of regions which "usually" hang together, but it would be good to, firstly, have evidence of that and, secondly, not to rely on that categorisation except as a first stab.
I am particularly concerned that at present we concentrate on usage in US, UK, Aus, and occasionally Canada and Ireland. I have two issues with this: firstly, and perhaps trivially, many words are used only within some parts of those regions. Far more importantly, we never seem to hear about usage in, say, India and China, who between them have more English speakers than all of US, UK & Aus put together. I admit I know very little about either of those areas, or the other large English-speaking areas I have not mentioned, but it seems likely that there are at least some regional variations there.
If there really were just 17 clearly defined areas, then I would be tempted by your 17 sets of radio buttons or equivalent, even though it would be difficult to see how to integrate that into any reasonable layout, given that different meanings are often current in different regions. They could all start at "not known" until "locals" confirmed usage or otherwise. (Personally, I think there should be sets of 5 "buttons" rather than three. Perhaps the solution is a miniature graphic of the world with the relevant areas marked in 5 colours; though that might noticibly increase either storage required or processing required per page load, depending how implemented.)
I am nervous that for a significant % of entries, we would find we wanted to subdivide one of the "official 17 [or however many]" areas, which would bring the whole system into disrepute. But perhaps the solution would be to just stick firmly with whatever areas had been chosen, a bit like the present position with "POS" headers.
So in short, I agree that it is an area we should improve, and would support your idea if no one thinks of a better one. --Enginear 20:21, 29 January 2007 (UTC)
To throw more water on your fire, Enginear and dmh, also keep in mind that in addition to geographical variation in comprehensibility, there is also temporal variation. So we'd need some way to represent a word's ability to be understood in four dimensions. So, let's have a little animated spinning globe that shows the word fanning out from point of origin, blanketing regions with varying intensities of color depending on how commonplace its use is. </tongue in cheek> Don't forget that the same distinction can apply for other languages as well - Québécois French vs. Parisian French vs. Moroccan French, Portuguese as spoken in Portugal vs. that spoken in Brazil, etc. The existing system, while imperfect and perhaps eventually insufficient, at least has the virtue of being simple. --Jeffqyzt 21:23, 29 January 2007 (UTC)
And don't forget, the temporal thing goes both ways; many an older usage is preserved in England but not in the U.S., or vice versa. So yes, animated spinning globe. :-) —RuakhTALK 21:38, 29 January 2007 (UTC)
If we're not sure exactly how to label a term or sense, but it's clear that it's not universal, perhaps it would be best to use a regional tag. We can then have a category for terms that are identified as regional without more information, hopefully encouraging people to discuss them and determine more specific labels that are still accurate.
If a term or sense appears only in certain parts of the U.S., we can reasonably label it U.S. regional; I don't think it's necessary or worthwhile to indicate what regions unless we can find specific reputable sources to back up such claims. (Anecdotal evidence is often unreliable; for example, I've heard some Southerners use coke in way that really seems to mean soft drink, but I've heard other Southerners insist that no Southerner ever uses it that way. I don't know what the truth is.)
If a term or sense is primarily American or primarily British, you'd nonetheless expect to find it in various not-primarily-English-speaking countries, as they tend to look to Britain and the U.S. to guide their English; I don't think that means we need to label those senses U.S. and China or whatnot.
RuakhTALK 21:38, 29 January 2007 (UTC)
I have rebooted my brain, and see things more clearly now. I more or less agree with everything above, except that, while the last para may well be true, it is for the Indians and Chinese to make that assertion, not those of us from US & UK. And again, there is the temporal aspect -- in 1800, I doubt there was much different between US & UK English, but look at us now.
I have thought of three reasons why I might want to know where, and in the last case also when, a word was used:
  • To ensure my target audience would understand me correctly (in 2007). The essential therefore is to flag usages which are understood differently in different regions (or in some cases have opposing meanings both used in the same regions). I suspect issues of this type will become less important as more and more material is published/broadcast worldwide rather than locally. I suggest that the more important item is to flag that there are opposing definitions, so that the user is aware of a potential problem. The issue of which region or regions use which definitions is secondary to being aware of the problem so that the writer can consider if clarification is advisable.
  • To ensure that I would not be criticised (or if I was still studying, marked wrong) for using an inappropriate/incorrect (or incorrectly spelled) word (again, for use in 2007). Personally, I think we should be more robust in our descriptions where this is the case. We should have a clear shorthand for "many UK/US/etc schoolteachers/exam markers find this usage unacceptable". This is not a case of us being prescriptive. We need to be descriptive of a real prescriptivism by others, and we can do this without commenting on whether it is good or bad. As the number of entries increases, the proportion of entries to which this applies is likely to increase. But again, it is more important to flag that there is a potential problem than to explain exactly where the boundaries are. Depending on the circumstances, a user can play more or less safe.
  • For serious study/general interest. For this, the user might want to see whatever evidence we have, and the temporal aspect is important. Leaving aside the notion of pulsating time-globes, and other such marvels which even Wiktionnaire might consider garish and offputting ;-), the best method for recording the raw data is, IMHO, blindingly obvious, since we are actually a scriptionary rather than a dictionary, noting evidence of written, rather than spoken, usage -- we should add the country of origin (perhaps even the town where known) after the date in each cite. Any other method of summarising the data can be bot-generated later if required.
So to summarise, the first two uses can be dealt with easily by glosses as at present. High accuracy is not essential (for comparison, OED normally only use symbols for "predominantly US" and "predominately UK", with very occasional sometimes with edited --Enginear 14:18, 31 January 2007 (UTC) glosses to break down usage more finely). The third use can be achieved by the compact and zero-loss method of adding the location to each of the cites, which represent our evidence of usage. --Enginear 20:30, 30 January 2007 (UTC)
Dialect/regionalism-wise, the labels I see most often in the OED are U.S., U.K., Sc. (Scottish), and dial., often with modifiers like chiefly, now, orig., and exc. arch., not to mention colloq. and so on. For other regionalisms, it seems to prefer a full note, as at prepone, v.2: "In later use, most frequent in Indian English." (That said, Vegemite, n. has the label "orig. and chiefly Austral.", so don't take what I just gave as hard-and-fast description of the OED's policy.) —RuakhTALK 22:54, 30 January 2007 (UTC)
Yes, having done a small non-"random" sample [actually the thirty words in OED2+ in alpha order starting at random], I agree Sc is about as prevalent as the dagger for UK or the || for US Please ignore this and follow Ruach (see below). --Enginear 21:02, 31 January 2007 (UTC) (but there were no other countries mentioned in that tiny sample). And indeed, about half of the words have one or more of these three, which was more than I previously thought. So I defer to Ruakh. --Enginear 14:18, 31 January 2007 (UTC)
Dagger for U.K.? Tramlines for U.S.? I guess you must be using a print edition or something (I use the online edition, as I have access through my school's virtual private network); even so, though, what you're saying is odd, as in the online edition the dagger is the "obsolete status marker" and the tramlines are the "alien status marker" (as described here), and it doesn't make sense to me that the symbols would differ in meaning from one to the other. —RuakhTALK 17:07, 31 January 2007 (UTC)
I stand corrected again! I picked those symbols up from a print dictionary about 20 yrs ago (perhaps not even an Oxford one) and made the mistake of assuming I knew what they meant without checking the key to the online version I use, which states exactly what you say. Thanks for pointing it out, or I might have gone on another few years misunderstanding it. --Enginear 21:02, 31 January 2007 (UTC)

Possible problem

User Dbachmann was blocked, posted under his IP address 130.60.142.152 (above, this is a class B IP net, with some DHCP addressing, University of Zurich), which is a violation of WP policy, I don't know if we say explicitly that this is prohibited, but as he is a WP admin, he should know better.

Note that I removed the block after one day, review by Jeffqyzt (thank you) indicating it was probably too long, or not appropriate.

Now look at the history of aśvamedha created today, marked for cleanup (should be at अश्वमेध). Created by 130.60.142.151.

He isn't blocked. (and he knows that, he added to my note on Connel's talk page) Maybe he forgot to log in? Robert Ullmann 23:01, 29 January 2007 (UTC)

(no, he didn't "forget"): When the entry was tagged for cleanup, another IP-anon user modified Wiktionary:About Sanskrit to "justify" having the entries at the IAST instead of Devanagari, then pointed to About Sanskrit from WT:RFC. That IP user is 83.78.31.94, which is BLUEWINNET, a service provider in Zurich. Robert Ullmann 23:23, 29 January 2007 (UTC)
Don't forget that Dbachmann tried to mobilise friends on WP to his cause. It's conceivable that he has also mobilised some of his university friends who do not have log-ins. Not that a concert party needs to be dealt with much differently to an individual. --Enginear 15:45, 30 January 2007 (UTC)

no, this IP is indeed mine. I made a single edit under an unblocked IP while I was blocked, to this page, asking for my block to be reviewed. My other logged out edits were not block evasion (I had been unblocked before that), and thus not in violation of policy, WT or WP. The fact that RU jumps on the entirely valid aśvamedha entry to allege (once again, falsely) that I violated a policy, shows that by now his interest is just in stalking me and unrelated to wiktionary. His view seems to be that a revert of an unjustified edit on his part constitutes vandalism by definition. I would argue that in such cases, there should be debate on talkpages, and only uninvolved admins should issue blocks for misbehaviour (such as incivility or edit-warring).

I will prefer to not use my account in the future to do such entries as I will (when cleanup of a Wikipedia article requires exporting lexical information) I did describe my "cause" on w:VP, the equivalent of BP here, which does not amount to "mobilizing friends". I did not ask anyone to edit on my behalf, or round up any sort of support off-wiki. Every single of my edits to Wiktionary have been in good faith and informed by linguistic expertise. If the WT community does indeed back up the practice of wikistalking and "block first, then vote on the policy, don't discuss" by RU, I will recognize the effort as wasted and forgo such contributions in the future. I will only enter further debate, content or policy related, if the WT community asks RU to step down from blocking me will agree to ask an uninvolved admin to review the case and issue blocks to either party in dispute according to their behaviour. I argue that anything else is a fundamental violation of the wiki principle, but I cannot be bothered to fix Wiktionary if that's how things are here. 130.60.142.151 16:45, 30 January 2007 (UTC)

See also w:User_talk:Connel_MacKenzie#wiktionary. Incidentially, it is RU's opinion that aśvamedha "should be at अश्वमेध", an opinion that a bona fide editor would discuss on Wiktionary_talk:About Sanskrit. Instead, RU is again using his admin buttons to impose his whim (which, it transpires, has more to do with opposing whatever I happen to come up with than any actual background knowledge of Sanskrit). Well, RU, you are free to move aśvamedha to अश्वमेध, I can't stop you, can I? Squibbles over preferred writing system aside, the entry is still perfectly valid, and I am growing tired of having to defend myself for adding valuable content to the project. 130.60.142.151 17:24, 30 January 2007 (UTC)

While I personally have thought that we should have romanizations entered for many more languages than we do, I can recognize that it is clearly very unreasonable to defy existing conventions (WT:ELE/WT:CFI) for Sanskrit, without significant justification. I wish to note the pointed lack of announcement of the "About Sanskrit" page, and its obvious conflict with longstanding practices with regards to Sanskrit. To effect such a change to policy, to now include romanizations, one would need to #1) garner support for the idea, #2) start a discussion on this page regarding the topic, #3) start a one-month (or longer) WT:VOTE for the change, and then if successful, #4) update the About Sanskrit page. Yes, it is sad, that we are becoming more Wikipedia-like, in our bureaucratic layers. --Connel MacKenzie 06:09, 31 January 2007 (UTC)

I would feel happier if an admin other that Robert dealt with Dbachmann in future, for fairness's sake; perhaps RU could give Connel (or me) or someone else a prod if he feels something is amiss. Otherwise it seems to me to be getting a bit emotional. Widsith 10:30, 31 January 2007 (UTC)

I would word that to make it easier to demonstrate fairness. I believe Robert has actually been fair, but it would be good to be able to demonstrate that simply, without the need to review all the evidence as some of us have. --Enginear 14:26, 31 January 2007 (UTC)
I think he was fair too, I just think we need to address Dbachmann's concerns by encouraging other editors to look at his work. Widsith 14:40, 31 January 2007 (UTC)
I concur, this is a good idea. Robert Ullmann 20:46, 31 January 2007 (UTC)

New word

is there anywhere i can submit a word i created —This unsigned comment was added by 58.168.229.249 (talkcontribs) 06:52, 1 February 2007 (UTC).

Appendix:List of protologisms. bd2412 T 07:54, 1 February 2007 (UTC)

Pronunciation aids: do we list them as linear or bulleted lists?

Like this: enPR: /?/, IPA(key): /?/, Template:X-SAMPA

Or like this:

Which is it? (I personally prefer the latter.) † Raifʻhār Doremítzwr 21:56, 31 January 2007 (UTC)

The former might be more useful if there’s more than one possible pronunciation, though. (Schedule, for example.) – Minh Nguyễn (talk, contribs) 00:28, 1 February 2007 (UTC)
Then we could have, for example:
Which, I think, looks better. What do you say? † Raifʻhār Doremítzwr 00:40, 1 February 2007 (UTC)
No, in general we’ve sorted them regionally as is recommended in ELE. For example:
Changing the ELE would require a vote. --EncycloPetey 02:00, 1 February 2007 (UTC)
I see. That might be OK when there are three or four different pronunciations, but looks really messy when there are fewer. See eisteddfod now, and consult its history — doesn’t the pronunciation section look so much better the way that I’ve arranged it? I recognise that changing WT:ELE would require a WT:VOTE, and I may call one in time. In the meantime, preliminary discussion is useful. † Raifʻhār Doremítzwr 12:27, 1 February 2007 (UTC)
  1. Anglicised:
  2. Vernacular:
No, I don’t think so because it’s not as concise as it ought to be, particularly given that IPA and SAMPA communicate the same information. Your suggestion also leaves me wondering what your designations mean. Is “Anglicised” the same as RP or some other UK dialect? Whose “vernacular” pronunciation is given? I prefer (if I’m interpreting these headers correctly):
These line headers I think are easier to interpret. --EncycloPetey 18:37, 1 February 2007 (UTC)
OK, linear listing it is then. “Anglicised” & “Vernacular” used to be “Anglophone” & “Cambriphone”, but as the latter term was unverifiable, those designations were taken out. “Anglicised” & “Vernacular” were the designations suggested by my conversation with zigzig20s. RP and Wales, however, are not suitable substitutes, as I’m not sure if there is an established RP pronunciation for eisteddfod, and the “Cambriphonic” pronunciation is used outside Wales, by people who know how to pronounce the word correctly. I wouldn’t mind using “correct” and “bastardised”, but I don’t think that that would go down well with most around here… ☺ † Raifʻhār Doremítzwr 22:46, 1 February 2007 (UTC)
OED2 agrees with your “Cambriphonic” (or in their words "non-naturalised") pronunciation, and lists a RP pronunciation which differs from your “Anglophone” pronunciation. --Enginear 18:30, 2 February 2007 (UTC)
Sorting regionally EncycloPetey has shown is the tidiest in my opinion. --Williamsayers79 13:42, 1 February 2007 (UTC)
I know. It was your changing one of my bulleted pronunciation lists into a linear list which motivated me to ask this question. † Raifʻhār Doremítzwr 17:16, 1 February 2007 (UTC)

I think an exception, however, must be made for very long words, such as antidisestablishmentarianism. † Raifʻhār Doremítzwr 22:49, 1 February 2007 (UTC)

Books of the Bible

So, at some point, someone started a category called Category:Books of the Bible. Started, but didn't complete. It's been on Rfc for nearly a year now, and it would seem that Vildricianus left before he got a chance to clean them up. I've decided to do so, but here's the problem. The template that has been used specifies the order of the books, among other things. The simple fact is that, unless we want to prefer one religious group over another (which seems like it goes against Wiki policy in so many ways), we can't specify the order, as different religious groups have different books in their bibles and order the ones they have in common in different ways. I considered modifying the template to reflect this, but there just isn't any other way than to simply lump them all in the category, and let people pick out the ones that apply to them. If we want to be really persnickety we could redo the template and allow it to have which versions have a particular book, but it would be really messy and not terribly useful. Among the template's functions that we don't want to lose are links to Wikipedia articles on the respective books and Wikisource text from the KJV for each book. I propose that the template be dropped entirely, and replaced with the category, 'pedia link, and 'source link put in the old-fashioned way. If no one comes up with a better solution in a few days, I'll just go ahead with it. Yes, that is a threat. Cerealkiller13 04:10, 3 February 2007 (UTC)

I think the template should simply list all the versions (I guess in alphabetic order?) with one row each, we know of. Is there a fairly complete list of bibles and their ordering available? Maybe Genesis/Experiment would be a good place to hammer it out. Compared to other "messy" templates, I don't think this will be all that ugly. Alternately, perhaps you could show what you mean by "old-fashioned" at Genesis/Experiment two? --Connel MacKenzie 04:30, 3 February 2007 (UTC)
Wanna bet? It's messy. There are more than half a dozen major sequences of the 60-80 books. --EncycloPetey 04:36, 3 February 2007 (UTC)
Once upon a time, I tried to conceive a fair system to replace what's currently up, but quickly discovered that even the three Biblical book sequences I knew of weren't all of them. Take a look at the mess Wikipedia has on the right-hand side of the page for the Book of Hosea. I would not want to see a lengthy mess like that overwhelming our entries, so I aborted my cleanup attempt.
So why not just remove the templates? The key problem with interwiki links is one reason. The Wikipedia article on Genesis is titled simply w:Genesis, but the article on Daniel is under w:Book of Daniel. In other words, the names of the articles are inconsistent. Likewise, Wikisource has no text named s:Genesis. Rather, the KJV text of that book is under s:Bible (King James)/Genesis, the Wycliffe translation is under s:Bible (Wycliffe)/Genesis, and so forth. Any desire to keep interwiki links needs a better solution. --EncycloPetey 04:33, 3 February 2007 (UTC)
I've put up a proposal at Genesis/Experiment two. For some reason, I can't get the title of the Wikisource link to display properly, but the link works properly, and we shouldn't have these problems on the real namespace. As for the inconsistencies, we could spend a week trying to figure out a template which will do this for us, or you could just let me do the gruntwork of finding it all by hand in a few days. It's not as though God's putting out a new book every year. Well, I suppose he is, but we don't have to worry about those :). Cerealkiller13 05:04, 3 February 2007 (UTC)
The problem is that the {{wikisource}} template isn't designed for a second parameter. --EncycloPetey 05:31, 3 February 2007 (UTC)
Fixed this in {{wikisource}}, Genesis/Experiment two seems to look alright. Robert Ullmann 05:52, 3 February 2007 (UTC)
Thanks much Robert. Cerealkiller13 06:00, 3 February 2007 (UTC)
I'll be the first to admit that Wikipedia's format for w:Book of Hosea is stupid; that doesn't preclude us from doing something reasonable. If no one beats me to it, I'll try to show you what I meant at Genesis/Experiment three (or four, or five or whatever it's up to by then.) While Cerealkiller's format is clean-looking, I feel it has lost too much of what he initially suggested we should have (with which I agree.) --Connel MacKenzie 05:26, 3 February 2007 (UTC)
There seems to be a lot of Bible related terms in the Wiktionary, and I'd propose that we have a catch-all category category:Bible or category:The Bible (which ever floats your boat!) and put the existing categories category:Biblical characters and category:Books of the Bible under it too. This willmake life a lot easier. Whay do you think?--Williamsayers79 10:11, 3 February 2007 (UTC)
category:Bible created and some articles included too.--Williamsayers79 14:31, 3 February 2007 (UTC)
Question: Will Category:Books of the Bible inlcude books from everyone's Bibles? It's not simply a question of Judaic and Christian traditions because even among Christian traditions, the Catholic, Orthodox, Protestant, Syriac, and Ethiopic Bibles all contain different sets of books. --EncycloPetey 15:01, 3 February 2007 (UTC)
I was thinking that it should. Although, perhaps what we could do is rework the template so that includes a checklist of sorts with the six or so major versions and a checklist, so that when the template is added, it would add the appropriate categories to the entry, such as Category:Books of the Catholic Bible and Category:Books of the Syriac Bible, but not Category:Books of the Protestant Bible, etc. Take a look at Genesis/Experiment three. The template's incomplete (and damned ugly), but it gives you an idea of what I'm thinking. We would probably have to restrict ourselves to perhaps six or seven major English Bible versions, but I think that would be sufficient. The template is admittedly not terribly user-friendly, but it's not exactly brain surgery either. Cerealkiller13 23:09, 3 February 2007 (UTC)
Oh, and the Wikisource doesn't work yet, but I think it will once moved to an actual namespace. Cerealkiller13 23:10, 3 February 2007 (UTC)
The elegance (in my opinion) of this template is its versatility. Anyone who wants to could add any other bible they wanted to (Septuagint, Vulgate, Spanish Bible, etc) as long as they came up with a new character to refer to it within the template. In addition to that, the template does ordering, so that each page (Protestant Bible, Catholic Bible, etc.) will have the correct ordering, even if it's not noted on the book entry itself. However, I'm no good with format, so if anyone wants to clean up the template or look at how the coding works, it's at {{grc-test}} (my general purpose test template). Cerealkiller13 23:27, 3 February 2007 (UTC)

Please vote!

This is a notice to all members of the community that there are currently four five active votes underway at WT:VOTE, including two nominations for adminship and three formatting policy issues. --EncycloPetey 19:05, 3 February 2007 (UTC)

Non-GFDL compliant mirrors

What are we supposed to do about http://www.ninjawords.com/ - send it to WMFoundation? Or do we have a form letter somewhere that explains why the derived works need to be individually labelled as GFDL with links back to the original contributors?

Technical note: very cool interface. We need that kinda stuff here!

--Connel MacKenzie 06:51, 17 January 2007 (UTC)

(edit: fixed link typo), Actually, it does link back to the Wiktionary entry. So the only problem is that it says (c) 2006... on it, with no mention of GFDL. --Connel MacKenzie 06:55, 17 January 2007 (UTC)
A little birdy on IRC suggested this text:
Dear Mr. Crosby:

I am a sysop on <a href="http://en.wiktionary.org/">Wiktionary</a>.  We like what 
you've done with the place.  But we'd like to address a legal technicality.  It's 
no big deal, just changing the fine print a bit.

While the site is copyright 2006 by you, the content is freely licensed and one of 
the conditions of its reuse is that it remain freely licensed and labeled as such.

We ask that you kindly add to your pages, wherever our content is displayed, a 
clarification of your copyright, to cover the software and website <i>only</i>, but 
that the Wiktionary content is covered under the GFDL.  The message at the bottom 
of <i>every</i> page on en.wiktionary.org is sufficient: "Content is available 
under <a href="http://www.gnu.org/copyleft/fdl.html">GNU Free Documentation License</a>."

You may contact me/us via mailto:someWiktAdmin@gmail.com or ask in 
irc://irc.freenode.net/wiktionary or 
<a href="http://en.wiktionary.org/wiki/Wiktionary:Beer_Parlour">Beer Parlour</a> if 
you have any questions.  We're happy to help our data be reused.  Of course, we're 
always on the lookout for tech-savvy volunteers, so if ever you'd like to jump in, 
please don't hesitate to join us.

Sincerely,
~~~~

...and as I was typing this, someone else suggested w:Wikipedia:Standard GFDL violation letter.

So, who/what/which should we send, then? --Connel MacKenzie 06:04, 18 January 2007 (UTC)
Seems to me that they both cover the same topics and request the same thing, except that your version is specifically tailored to Wiktionary. I would use the former version, but perhaps with some of the links which appear in the "standard" version. Cerealkiller13 18:53, 18 January 2007 (UTC)
Would you be kind enought to make the corrections, above, here, please? --Connel MacKenzie 19:01, 18 January 2007 (UTC)
Dear Mr. Crosby:

I am a sysop on <a href="http://en.wiktionary.org/">Wiktionary</a>.  We like what
you've done with the place.  We greatly appreciate you citing Wiktionary as a
source, but we'd like to address a legal technicality.  It's no big deal, just
changing the fine print a bit.

While the site is copyright 2006 by you, the content is freely licensed and one of 
the conditions of its reuse is that it remain freely licensed and labeled as such.

We ask that you kindly add to your pages, wherever our content is displayed, a 
clarification of your copyright, to cover the software and website <i>only</i>, but 
that the Wiktionary content is covered under the GFDL.  The message at the bottom 
of <i>every</i> page on en.wiktionary.org is sufficient: "Content is available 
under <a href="http://www.gnu.org/copyleft/fdl.html">GNU Free Documentation License</a>.  
For further information on the GFDL please follow the following links:
http://en.wikipedia.org/wiki/WP:GFDL &  http://en.wikipedia.org/wiki/Wikipedia:Copyrights 
(please note that these policies hold for all Wikimedia projects, including Wiktionary, 
even if not specifically stated).

You may contact me/us via mailto:someWiktAdmin@gmail.com or ask in 
irc://irc.freenode.net/wiktionary or 
<a href="http://en.wiktionary.org/wiki/Wiktionary:Beer_Parlour">Beer Parlour</a> if 
you have any questions.  We're happy to help our data be reused.  Of course, we're 
always on the lookout for tech-savvy volunteers, so if ever you'd like to jump in, 
please don't hesitate to join us.

Sincerely,
~~~~

I realize this eats up a lot of space to replicate the thing in its entirety, but I thought it would be useful to have the different versions easily viewable by all. The changes are only a clause in the first paragraph and the second half of the second paragraph, starting with "For further." Please look it over before sending, as I have a terrible personal history of incorrect punctuation and run-on sentences, among other things. Cerealkiller13 19:44, 18 January 2007 (UTC)

Whoops, I never sent this. Did anyone? Is it ready to go? --Connel MacKenzie 08:32, 5 February 2007 (UTC)
Wow, that got a good laugh. I thought you were going to send it fast, like a ninja. I think it's ready, although I don't know if anyone else looked it over. Atelaes 08:45, 5 February 2007 (UTC)
Hehe. If Wiktionary were "fast" enough, the world wouldn't need a ninjawords.com! --Connel MacKenzie 03:22, 6 February 2007 (UTC)
One quick thing, though. You're not going to actually sign it with four tildas, are you? I don't think that works in email. Atelaes 03:26, 6 February 2007 (UTC)

Just for the record, I sent an email following the second model here to Mr. Crosby, copied to Connel. We've had a response already, and he offers to add a footnote that should serve the purpose nicely. Dvortygirl 05:36, 6 February 2007 (UTC)

Alternate for AHD

This issue is now up for a vote, until 17 February 2007.

I think it was Hippietrail who coined AHD for the "dictionary-style" pronunication system we use, in the belief that it was a standard consistent among various dictionaries. It turned out that this was not the case, and that each dictionary uses its own system. As a result, we have a template {{AHD}} mistakenly named after a particular dictionary, but which in fact is our own system. A suggestion was made some time ago (again by Hippietrail, if I remember correctly) to change the name of the template and system, but no suggestions were put forward.

Proposal: We replace "AHD" with WPR (Wiktionary Phonemic Representation). I have looked and found no major usages of this initialism; Wisconisn Public Radio and the Western Pacific branch of WHO were the most significant entities I found with this designation.

Is there someone with a better proposal, or is this acceptable? I'll wait for further suggestions before launching into a vote. In any case, we shouldn't use AHD (American Heritage Dictionary), since that term is misleading and belongs to another dictionary. --EncycloPetey 03:13, 28 January 2007 (UTC)

Does Wiktionary have anything like Wikipedia's policy of avoiding self-references (w:WP:ASR)? Maybe something like ELPR (Extended-Latin Phonemic Representation) or PREL (Phonemic Representation in Extended Latin) would be better? By the way, whatever we name this, it should probably link to an appendix (Appendix:Extended-Latin Phonemic Representation or whatnot) rather than to a Wikipedia article on the American Heritage Dictionary. —RuakhTALK 03:41, 28 January 2007 (UTC)
Since the target audience is English readers, I think labelling it "Latin" would be misleading. ASCII perhaps? But I'm not sure that any consistent scheme is followed for these. The ones I've seen have been random, not restricted to a specific format that Wiktionary has provided. How about "unspecified fonetic, other?" {{UFO}}?  :-)   --Connel MacKenzie 08:16, 28 January 2007 (UTC)
I think EPR (English Phonemic Representation) is a viable alternative. I agree that using "Latin" in the name could be confusing. --EncycloPetey 16:22, 28 January 2007 (UTC)
How about abandoning it altogether? It's not as if it's easier to read or learn than the IPA; it's not used in the entries for words in other languages than English; and getting rid of it would allow cross-project consistency with Wikipedia, which uses only IPA. Angr 09:15, 28 January 2007 (UTC)
I agree with Angr. The IPA is more precise, internationally accepted, and not difficult to learn (it also looks like one consistent font, rather than an awkward blend of two or three). † Raifʻhār Doremítzwr 15:20, 28 January 2007 (UTC)
Displays consistently in one font for me. You probably have some interesting fonts set up ;-) Robert Ullmann 15:44, 28 January 2007 (UTC)
The one major shortcoming of IPA is that (properly used), it is phonetic and mercilessly represents the specific sounds of speech. It is therefore prone to regionalism, which is why we have to give the IPA representations separately so often for the UK, US, Australia, etc. The one major advantage of AHD (or whatever we're going to call it) is that it can be made phonemic, which is more flexible in representing the pronunciation of a word. It can also be made to more closely resemble the sorts of pronunciation systems users are familiar with from print dictionaries. It would (of course) be used only for English entries, since the phonemes in other languages will necessaily be quite different.
I am not proposing we abandon IPA, and in fact I am much more enamored of that system becuase it was intended and is designed to be consistent and as universal as possible. However, I see the need to disentangle our phonemic and phonetic representations of pronunication, which is one key reason why we have two systems anyway. If you want to propose abandoning our second system of pronunciation, that would be a separate discussion, and one that has gone round and round before without success. The point of this discussion is to "fix" the name of a system currently in use. --EncycloPetey 16:22, 28 January 2007 (UTC)
Consistency with WP is not the objective; they use one system because they are not a dictionary, and pronunciation is a bit out of scope anyway. Our objective is to describe words as fully as is reasonable possible, and the "AHD" style pronunciation is a lot more accessible to people familiar with English dictionaries (esp. US) than IPA. There is no reason whatsoever to limit pronunciations to IPA. Robert Ullmann 15:44, 28 January 2007 (UTC)
Nice thing about the use of the template is that it is easy to rename. Is the key to this system somewhere? I'd say it should be WPR. (The answer to the question about self-reference supra is no, we don't worry about that the way the 'pedia does.) Robert Ullmann 15:44, 28 January 2007 (UTC)
That will be the second major step: setting up a key to the system. Right now, the system is being used more-or-less consistently with the exception of the way stress and syllable breaks are marked. Before setting up such a key, though, we need to know what the system will be called in order to name the page. This is part of a long-term effort I've been working on to solidify our policy and practices as applied to the Pronunciation section of entries. The key word in that sentence is long-term, since I've found in my work that there is very little actually cemented and much of the information is scattered around in various discussion fora and user pages. --EncycloPetey 16:22, 28 January 2007 (UTC)

I'm all for changing away from the name of a proprietary dictionary. It would be nice to have a system where we can write for arm "Pronounced 'arm'" and know that it's RP /ɑːm/ US /ɑɹm/ AU /aːm/ Similarly for not, soon, tow, etc. Of course we can have IPA too. Cynewulf 16:36, 28 January 2007 (UTC)

I think it's good to have a phonemic representation. Sometimes we'll need regional versions anyway (as for privacy), but generally I think it will simplify matters. The reason I suggested "Extended Latin" is that that's the standard term for ASCII plus non-ASCII Latin-like letters (mostly Latin letters with diacritics, but also a few ligatures — ß, Æ, Œ — and runes Ð, Þ); the term "ASCII" would not be appropriate for our phonemic representation, as it does not restrict itself to ASCII characters. EncycloPetey's "English phonemic representation" suggestion is better, though, as it also leaves the door open for similar systems to be devised for any other languages that we think it could be useful for. (Maybe the abbreviation should be {{enPR}} instead of {{EPR}}, though? "EPR" gets 6.8 MGhits.) —RuakhTALK 19:17, 28 January 2007 (UTC)

Sounds like another feasible choice. (EPR gets so many hits because it has so many uses and meanings). --EncycloPetey 21:05, 28 January 2007 (UTC)
I like "WPR", where "W" = "Webster-ish". "English Phonemic Representation" is a bad choice, since the Webster-ish systems typically aren't phonemic, while the IPA transcriptions here are phonemic (or at least should be). -- Keffy 21:13, 28 January 2007 (UTC)
The IPA transcriptions here can't be phonemic since we use it for every language included on Wiktionary. Phonemes by their very definition are limited to a single language or group of closely related languages. A phonemic system requires a limited set of phonemes so that the symbols will correspond properly, as our "AHD" does for English. By contrast, IPA is intended to cover all the sounds in all the languages all over the world. That's hardly phonemic. It is true, though, that modern print dictionaries have adopted a bastardized version of IPA in which symbols intended to represent one sound are used to represent a different sound. This works in those dictionaries because the modified symbols are being applied to a single language (with a small set of phonemes) and because the idiosyncratic (mis)use of the symbols is carefully explained at the outset of the dictionary. Such a use is not possible once another language is added, and here on Wiktionary we have nearly 400 languages already. Were we to try such an approach, we would need a separate page explaining the use of our IPA symbols for each and every language because the symbols would not be consistent from one language to another. Down that road lies madness, since such an approach negates the very purpose and power of IPA. We should keep our phonemic and phonetic representations separate with one system dedicated for each approach, rather than random use of each system for each purpose.
Consider Cynewulf's example of UK /ɑːm/; US /ɑɹm/; AU /aːm/. The three representations all differ because the phonetics differ, but it's the same phonemes in each word. A phonemic system would look the same for all three because the phonemes are the same, even though the initial phoneme is pronounced differently in each region. comment continues below
Ummmm, not quite sure what you're getting at here. I don't even count the same number of phonemes. -- Keffy 06:14, 29 January 2007 (UTC)
comment continued from above And I'm sorry but I don't understand what you're saying about WPR/EPR. If Webster-ish systems "aren't phonemic" (they are), then why do you prefer calling the phonemic system "Websterish". Did you make an error in typing? --EncycloPetey 21:36, 28 January 2007 (UTC)
Actually, I said Websterish systems typically aren't phonemic. I was thinking specifically of the versions that stick pretty closely to the orthography, for example, happily using three different vowel-plus-diacritic combos for exactly the same phoneme, just so they can keep using the same vowel that's in the word's spelling. Perhaps our "AHD" hasn't been doing that. If so, great.
I was going to put a mini-rant here about IPA and broad/phonemic transcriptions. I'll put that on your user talk page instead and stick to the question of renaming "AHD". Which, hmmmm, I actually have nothing to say about, since I'm completely indifferent to AHD in the first place. -- Keffy 06:14, 29 January 2007 (UTC)
Thanks for making the mini-rant easy to find ;) I'm glad to get as much input at this stage as possible because the next step is to hammer out a draft for the "AHD" system (where you're help would be much appreciated) and then a proper page for Wiktionary:Pronunciation, which is little more than a set of links right now. But, as you noted the discussion of the moment is what to call the Wiktionary "AHD" system. --EncycloPetey 03:16, 30 January 2007 (UTC)
Yes, as we have figured out, the purpose I see in having "AHD" at all is that it may serve as a system to bridge across dialects, without all the detail that enters into IPA. Both the Americans and British know how to pronounce the a in watch. They do it differently, but that difference is consistent, so it is possible to have a relative pronunciation system that takes advantage of that. One can give the "AHD" pronunciation as /wätch/, but the IPA would be (UK) IPA(key): /wɒtʃ/ and (US) IPA(key): /wɑːtʃ/. When the vowels differ, IPA requires different pronunciations becuase it isn't relative. Our "AHD" system can be. --EncycloPetey 18:09, 4 February 2007 (UTC)
This issue is now up for a vote, until 17 February 2007.

The wider issue of pronunciation

I've long thought two things:

  • We should have a pronunciation key and a layout key for readers linked to from the sidebar.
  • Wiktionary:Entry layout explained should be split in twain: Information on how to interpret the layout for readers, the aforementioned layout key, and a "manual of style", explaining the style of layout and how to actually construct articles, for writers.

In addition, and similarly:

How the entire issue here breaks down seems relatively simple:

  • If we do pronunciations in IPA or SAMPA, we should do them to the extent that they specify the phonetics fairly narrowly. That means (taking English as an example) that an article's "Pronunciation" section may well need to have different bullet points for IPA and SAMPA pronunciations in General American, Received Pronunciation, Australian English, and so forth. That this makes Pronunciation sections longer than 1 line is not a bad thing. Wiktionary isn't paper.

    Why should we do it this way? Because it's the only way to be consistent with IPA across all languages. We shouldn't expect readers to read IPA one way for English words and another way for all other languages. For all other languages, we expect readers to understand IPA as specifying a fairly narrow transcription — the exact sounds that should be made. Making IPA phonemic for English is both anomalous and confusing.

  • If we use non-standard phonemic systems, where the symbology is chosen so that accent variations (such as, say, the splits and mergers in English phonology) line up with specific symbols, we will require a different and separate phonemic system for each broad family of accents in each language. There's no single phonemic system in existence, to my knowledge, that properly covers all of the accents of English around the world, for example.
  • For writers, narrow use of IPA to make phonetic transcriptions is easy, as long as we encourage the idea of having a list of pronunciations, one for each (major) accent (e.g. General American, RP, and so forth). They just transcribe the generally attested pronunciation for each major accent into IPA or SAMPA.

    Does it matter that we can potentially end up with a number of of accents? No. Once again: Wiktionary isn't paper. The fact that we can do better than paper dictionaries, by addressing pronunciations in different accents individually if we need to, is one of the good things about Wiktionary.

  • If we use phonemic systems, it is far better to use ones that have already been invented than to spend years creating and discussing a new one of our own. For writers, phonemic systems are only easy if we aren't using one that we are inventing and modifying as we go along.
  • As a corollary of this, picking the name is a doddle. It's already invented and documented, so just use the name that it already has.

    If the "AHD system" here is what the The American Heritage Dictionary of the English Language uses, that is what it should be called. Moreover, we should ensure that we use it as-is, without introducing our own variations into the system. Yes, it's not used by other proprietary dictionaries, many of which have phonemic systems of their own invention. That's a reason to question whether we should be using it, or any other idiosyncratic phonemic scheme from a proprietary dictionary, at all, not a reason to rename it.

  • And on that note, it is worth considering whether we should be using the AHD's (or the OED's, or Merriam-Webster's) system at all. At the very least, we should consider whether there are copyright issues. After all, whilst information is not copyrighted, the form of the expression of information can be. (See idea-expression divide.) And the individual phonemic pronunciation systems of proprietary dictionaries may well be considered to be individual forms of expression of the pronunciations of words, copyrighted by the publishers of those dictionaries.

Uncle G 13:57, 4 February 2007 (UTC)

Brief reply to some of the important thoughts raised. No, our system does not quite match what the American Heritage Dictionary uses. Here is the comment made by Hippietrail, who originally named the system:
I've said this in many places and I'll say it again here. This is not AHD, it should not be called AHD. It's my fault because at the time I naively thought all American dictionaries shared a standard system and I needed a name. It should be called something like "American dictionary style" which is a mouthful. If somebody can come up with a better name that would be ideal. In reality it is a Wiktionary-specific system based on AHD and other American dictionaries so as to appear to familiar to people used to looking up pronunciations in American dictionares. WAP for Wiktionary American Pronunciation or something like that, but I think WAP is already taken (-: — Hippietrail 17:33, 22 February 2006 (UTC)
The problem then with the name is twofold (1) the system is a misnomer, (2) it is based on an American pronunciation, hence the "A" in "AHD". Since our system isn't the AHD system, and since Wiktionary is not exclusively American, the name should change. If we decide later as a community to junk the system, then the discussion becomes moot. However, my general sense is that the community wants such a system to exist, and we can discuss the merits of such a system once the name issue is well in hand. We just have to decide next how we want to use it and to what purpose it will be used. --EncycloPetey 18:02, 4 February 2007 (UTC)

I agree with Uncle G above. I'd like to expand on a few points.

IPA transcriptions should be phonetic -- describing the precise sounds of words. This is because it shouldn't be necessary to know anything about a language to be able to read the IPA pronunciation section for one of its words and understand how native speakers pronounce it. Using IPA for all languages lets people learn only one system that is applicable in the same way across all languages. For instance, we could write that 風呂敷 is pronounced "hurosiki" and that would be unambiguous in Japanese -- but one would have to know something about Japanese to know that the sounds are closer to [ɸuɺoʃiki], not [həɹoʊsəkiː] as an (American) English-speaker might be tempted to say. Similarly, of course everybody knows that the "r" in root isn't an alveolar trill -- everybody, that is, who already knows English phonology. For the same reason that articles about non-English words should be accessible to people who aren't fluent in languages other than English, articles about English words should be accessible to people who aren't fluent in English.

  • Using the simpler of two symbols in IPA because there's no ambiguity in English is English-language POV. Wiktionary covers all languages. If we write [r] in both your and perro, somebody will get confused.
  • Entering a broad transcription is not "wrong" but people should enter (change to) a narrow transcription if they can.
  • If you don't like or can't type IPA's fancy symbols, use SAMPA (or X-SAMPA).
  • Some people have voiced concerns that a narrow transcription would result in an explosion of accents; this can be avoided simply by sticking to a small number of widely-known accents such as Received Pronunciation and General American.
  • Since IPA should use a phonetic transcription, this transcription should be enclosed in square brackets [], not slashes //, because that is the generally-accepted standard representation.

This is not to say that a phonetic representation should be used to the exclusion of everything else. A phonemic representation such as the current "AHD" system can be simpler and easier to use for those who only want to know how to pronounce the second "c" in concerto. It would be nice to have a well-known standard phonemic system that can acceptably describe with one representation the sound of English words in the major accents, but this may not be possible.

Cynewulf 18:45, 4 February 2007 (UTC)

I generally agree. Some minor differences in my viewpoint:
  • Yes, we should aim eventually for phonetic pronunciation for the main regional accents, but I don't feel the need, certainly at this stage, to limit the number of accents for which phonetic IPA pronunciations can be given -- if, eg, WilliamSayers79 wants to add some of the more unusual Geordie pronunciations, that enriches the dictionary.
  • However, phonemic pronunciations for English words are a useful addition not only because the systems tend to be simpler to read, but also, whatever accent they are based on, can be used (by someone who knows both accents) to take a fairly good guess at the pronunciation in another accent.
  • A further reason for changing AHD to something specific to us is that the AHD (or any other dictionary) might just decide to change to a different system, which would cause confusion for us later. Better to decide on what suits us, and name it. If it draws significantly on stuff used only by one other publication, we can credit them with it, but I suspect that, as Hippietrail originally thought, many dictionaries use generally similar systems. --Enginear 20:55, 4 February 2007 (UTC)