Wiktionary:Beer parlour/2007/July

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives +/-

Kanji Special Readings

Is there a category for Japanese ateji (当て字, kanji used for sound only and not meaning) and gikun (義訓, kanji related only by meaning and using non-standard pronunciation)? If not, I think it would be a good thing to include. —This unsigned comment was added by Hikui87 (talkcontribs) at 02:41, 2 July 2007 (UTC).

As far as I know we don't have such categories yet. I agree with you that those two classes deserve readers' attention, and think that it is worth trying to made up those categories :). --Tohru 01:43, 13 July 2007 (UTC)
I've recently encountered the same properties with Mandarin Hanzi - some are onomotopoetic. bd2412 T 15:02, 28 July 2007 (UTC)

Numbers on Wiktionary

I was looking at this page, and I noticed that there are alot of number pages.

Many of these are quite redundant, for example, eighty-one through eighty-nine, and all of these have translations.

Now what I was thinking, is that since numbers go on indefinetly, isnt it kinda silly to have each and every number written out?

When we get to something like "fourty-five million two-thousand and sixtey-six", the entrys are way too large and redundant.

I think it would be alot smarter to just have 0-9, and then ten, twenty, etc. And then on from there have simply 100, 1000, 10000. What does everyone else think about this? Bearingbreaker92 18:48, 3 July 2007 (UTC)

The numbers up to a hundred are not redundant in French, where you have to consider special cases for 70's and 90's and sometimes exceptions if there's a one in the one's place. We do not include all numbers until forever. I don't know where the policy is written, but we actually already have a pretty reasonable system in place much as the one you're proposing. DAVilla 23:23, 3 July 2007 (UTC)
And some languages are not base ten, so they will not be repetitive. The most common example is Roman numerals, which use a subtractive system based on powers of ten and multiples of those by five. So 81=LXXXI, but 91=XCI and 99=IC. this requires entries for all the numbers 1-100, 500, and 900 at a minimum (in addition to what you've noted). Stranger, the names of Latin numerals (the words) don't inflect for the numbers 4-100, but they do inflect for 200-900. There are ancient numbering systems based on multiples of 60, such as the Akkadian, Babylonian, and possibly the Hittite. As a result, numbers like 90, 180, 270, 360, and 720 are important in describing figure skating, skiing, skateboarding, and other rotational activities because circular measurement is based on their number systems. There are stranger numbering systems in use in some languages used in non-European countries. Yoruba and Maya are both base 20. Nahuatl (the language of the Aztecs, still spoken today) has basic numbers of 1, 20, 400, and 8000, with special names for these numbers rather than the ones we consider basic (1, 10, 100, etc). The Ancient Greeks had special symbols for 100, 200, 300, up to 900, which are not repeated in any other number form. That is, they had 9 symbols to stand for 1-9; a different set of 9 symbols to represent 10-90 (multiples of 10); and a third set of 9 symbols to represent 100-900 (multiples of 100). And this doesn't even consider important numbers that turn up regularly in standard English measures. 12 is a dozen, 144 is a gross. Some numbers are also more than numbers. A drunk can be eighty-sixed from a bar; and sixty-nine has meanings beyond its numeric value. The result is that we can't make a nice neat little list of basic building block numbers because the list of basic blocks differs in different cultures and circumstances. --EncycloPetey 06:01, 4 July 2007 (UTC)
Note there are languages that have multiple counting schemes as well, e.g. Welsh uses a base 10 and and bas 20 system, e.g. twelve can be deuddeg (base 20) or un deg daum/un deg dwyf (base 10). Then you get complications of mutations - filiwn is the soft mutation of both miliwn (million) and biliwn (billion), so miliwn is feminine and biliwm is masculine, so two million is therefore dwy filiwn, and two billion is dau filiwn. Thryduulf 08:54, 5 July 2007 (UTC)
On the other hand, Bearingbreaker92 has a good motive since CFI does not say anything about translations being a pertient part of our consideration for inclusion. Technically they should be Phrasebook entries or something, but I'm fine with making an inclusionist exception to CFI. DAVilla 19:23, 4 July 2007 (UTC)
CFI allows words if we can find three durably archived uses. I expect that can be done with most numbers under 100. --EncycloPetey 19:53, 4 July 2007 (UTC)
I just ran across fifty-one and two hundred and twenty-nine while reading a novel, when the hero states:
  • The best thing about books...is that you can always tell when you're getting to the end. No matter how tricky the situation the hero's in, you hold the book in your hand and think, "Hang on, I'm two hundred and twenty-nine pages in, with only another fifty-one to go. It started slow, but it's building to a climax."
Amusingly, this is on page 229, with 51 pages to go. --EncycloPetey 22:05, 4 July 2007 (UTC)
As to CFI, I'm not sure that, technically speaking, two hundred would pass the idiomatic criteria. So a clarification might be in order. DAVilla 11:22, 5 July 2007 (UTC)
What does idiomaticity have to do with anything? A word doesn't have to be idiomatic to be included on Wiktionary. --EncycloPetey 19:00, 5 July 2007 (UTC)
Well, technically, “two hundred” is idiomatic — whereas “two hundreds” isn’t. † Raifʻhār Doremítzwr 19:14, 5 July 2007 (UTC)
Kind of like "two gazillion" is idiomatic? Or "two year" in the phrase "two year old child"? DAVilla 19:34, 5 July 2007 (UTC)
Strictly speaking yes (although isn’t the idiom in “two year old child” “two year old”?), despite the meaning being so obvious. † Raifʻhār Doremítzwr 19:50, 5 July 2007 (UTC)
Um... "A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic."
Are you arguing that the formal guidelines don't apply to this case, that someone who ran across "two hundred" might want to know what it means?
No, I am saying that your reasoning is faulty. The criterion you have quoted is an if, then; it is not an if and only if. It says only that a word will be included if it is attested and idiomatic. You are assuming the contrapositive must be true, which is a logical fallacy. The criterion says nothing about excluding words that are not both attested and idiomatic. If we did that, we'd be excluding a significant chunk of the English language. So, please do not erroneously assert that a word must be idiomatic to meet CFI. CFI makes no such stipulation. --EncycloPetey 21:22, 5 July 2007 (UTC)
I think EncycloPetey meant to say "inverse" or "converse" because the contrapositive of a true statement is always true. Rod (A. Smith) 21:54, 5 July 2007 (UTC)
If in English can have either meaning, by the way. Why do you keep trying to apply mathematical principles to language? Anyways I won't argue your point since the statement does appear in a section called "General rule" meaning that there could be exceptions. So I was wrong to imply that the "formal guidelines" are the complete picture. You're right, it doesn't have to be idiomatic. However, the statement also appears on a page called "Criteria" which means that the general rule or one of the exceptions must apply. It would have been much more thorough of me to have said that two hundred is neither a typographic variant or a name. That's a level of specificity I am incapable of as a non-silica based lifeform. All I'm really arguing is that the CFI should really mention something about numbers. The arguments about non-trivial translations appear nowhere on CFI, so they don't hold water. DAVilla 22:51, 5 July 2007 (UTC)
As I see it, these number are exceptions to CFI. They should be mentioned somewhere official for sake of clarity if nothing else. DAVilla 19:34, 5 July 2007 (UTC)
I wasn’t arguing anything; I was just being pédantique. ;-) † Raifʻhār Doremítzwr 19:50, 5 July 2007 (UTC)

Once we're past 100, I would argue that most numbers (including translations of same) are sum-of-parts. bd2412 T 06:11, 5 July 2007 (UTC)

Then please go back and read the several examples I gave above where this is flatly not true. A blanket statement like that swamps out the ones that aren't sum of parts. --EncycloPetey 07:40, 5 July 2007 (UTC)
Ah everyone here has good points, so is there a page that describes this policy? Bearingbreaker92 19:34, 5 July 2007 (UTC)
Even base 20 numbers and Roman numerals are sum-of-parts once you have laid down the components of each number system. bd2412 T 19:39, 5 July 2007 (UTC)
No, and that's one of my points. For those that are sum-of-parts, the way in which they are summed varies between languages. But Roman numerals are not always done as a sum; they are sometimes additive and sometimes subtractive. Also, since the numbers 200 to 900 inflect in Latin and lower numbers don't, we need to include those entries so that users will have access to the information. --EncycloPetey 21:12, 5 July 2007 (UTC)

Proposal. Would everyone be O.K. with having appendices like Appendix:Hindu-Arabic numerals, Appendix:Roman numerals, Appendix:English numbering, Appendix:French numbering, etc. that cover these? And if so, would everyone be O.K. with including entries for the basic component words/symbols (one, twelve, thousand, etc.; I, V, X, etc.) and any numbers that have CFI-meeting idiomatic meanings (sixty-nine, etc.), but otherwise to exclude them unless there's a specific reason to include them? (Feel free to add clarification to this proposal, if there are any cases or major points I haven't covered.) —RuakhTALK 05:27, 6 July 2007 (UTC)

No, for multiple reasons. One f the reasons is that we run into a previous stalemete problem of defining number and numeral. There is a lexical difference between the cipher/symbol used to represent a count and the word used to represent that same count. I use number for the former and numeral for the latter (following current grammars such as the CGEL), but there is a group here that takes the opposite view of the definitions, using numeral for the ciphers and number for the words. Secondly, the two are not coupled language by language. If you have an appendix on Hindu-Arabic numerals, it has to cover not only how to comstruct the ciphers from basic elements; it also has to cover how to form the names of those ciphers in every language that uses them. That is an enormous project to undertake and we're not ready for that. Third, the appendix has to cover the grammatical functions of such words in all the languages where they occur. The way in which the words function in most languages does not follow any one traditional part of speech, so we have to explain that. Different languages treat their different numerals grammatically in different ways. Fourth, the appendix has to cover more than just the cardinal numbers. There are also ordinal numbers, fractional numbers, and indefinite numbers. Fifth, the names (numerals) inflect in various languages, and not always the same as the parts of speech that they mimic. Sixth, keeping track of every possible number word in every language and managing a list of which ones we allow and which ones we don't becomes a nightmare morass of petty policing that I would hate to see happen here. In short, this is a colossal task with a small return of value for enormous questionable work put in. --EncycloPetey 07:50, 6 July 2007 (UTC)
Nah, we don't actually want to change anything. Everyone likes the current system. This is the biggest non-argument on Wiktionary ever. My apologies for starting and for continuing it. I really only want to see a modest change to WT:CFI to reflect current practice. DAVilla 18:56, 7 July 2007 (UTC)

CFI for romanizations/transliterations

OK, maybe this should go on RFD or RFV or wherever, but I'd like to get more general input on this subject as a whole... what do people think about entries like hanja#Korean? Note that the romanized word "hanja" is not attested in Korean at all, to my knowledge; the hangul (or perhaps occasionally the hanja) would uniformly be used instead. Questions:

  • Is it safe to simply remove such sub-entries, merging anything relevant to the English header or the actual Korean entry?
  • Is our acceptance of pinyin syllables relevant to this matter? added: probably not, per R.U.'s comments above
  • Given that the word hanja is also attested in Korean studies literature in other languages (notably German), does this strengthen the case for including a "romanized Korean" entry? Or would it be better to have German, Spanish, etc. headers? -- Visviva 01:07, 4 July 2007 (UTC)
It's probably better to have separate sections for the various languages, firstly because we'd want to include language-specific pronunciation information, and secondly because we'd want to include language-specific grammatical information (noun gender/class, declension, mass/count/proper status, etc.). I see no reason for hanja to have a "Korean" section. —RuakhTALK 01:49, 4 July 2007 (UTC)
I agree. Our practice has been to not include romanizations that are not a natural part of the language. I don;t see any reason to change that. --EncycloPetey 05:38, 4 July 2007 (UTC)
Yes, as it pertains to automatic inclusion of transliterations. I don't see any reason why this couldn't be included if it were attested in Korean text. DAVilla 19:39, 12 July 2007 (UTC)
They are included for Japanese words, why should Korean be any different? From WT:AJA: "A romaji entry satisfies the criteria for inclusion if any of its hiragana, katakana, or kanji transliterations satisfy the standard criteria." The reason is that there are students of the language who aren't familiar with the script yet, and the Japanese sub-entries are there to direct them to the main article(s) (which use attested forms). If you look at kanji, it has a Japanese section, and e.g. miru has only a Japanese section. -- Coffee2theorems 02:09, 25 July 2007 (UTC)
Well, I don't really understand why unattested romanizations are permitted for Japanese words either. These entries seem to seriously violate the CFI, and don't really add any informational value; in fact, their presence is misleading since such entries give the false impression that a given language can actually be written in the Roman alphabet.
Further questions: 1. If we include romanizations of Japanese and Korean, why not also Cyrillic, Arabic, Greek, Hebrew, et al.? 2. Should we include all possible romanizations; if not, where would we draw the line? (it can't be on the basis of the canonical Wiktionary romanization; surely we don't want to be in the position of deleting/moving hundreds or thousands of entries whenever we change from one romanization system to another). -- Visviva 02:33, 25 July 2007 (UTC)
This old discussion was about linking to romanization entries but is much relevant to the matter in hand. Especially, Hippietrail's comment in the middle of it is what determined the overall direction after that and maybe still worth to read. --Tohru 08:42, 25 July 2007 (UTC)
Visviva, I would be willing to bet that people do use the alphabet to write all of the languages that you mentioned. Here is an example for Mandarin. Granted, Native speakers may take issue with this practice as being unorthodox, and most of these Romanized texts are for the purpose of language learning, but they can also stem from other things such as not being able to display non-ascii symbols on older computers. Another use might be to include non-English terms in English language materials for an audience which is not accustomed to reading non-alphabetic scripts (ex: List of Kodokan Judo techniques). -- A-cai 09:18, 25 July 2007 (UTC)

Codepoints without lexical significance

I don't know if this has widespread significance outside of Korean... but a question which has come up repeatedly is whether syllable entries like belong here, simply by virtue of being Unicode codepoints. Such entries could potentially contain: compositional data, encoding/technical data (keystrokes, etc.), and (limited) romanization data; however, they could never contain an actual meaning .

  • Is it worthwhile for us to have such entries (potentially several thousand in number)? Or would it be better to explicitly restrict these to line-items in an appendix?
    • If an appendix is better, would it be reasonable for us to redirect such entries to the specific appendix line/section, for the benefit of those who a) actually need this information, or b) may interpret the absence of such information as a request for an entry?
  • Is this the sort of thing that has to be VOTEd on, or can we just hash it out like reasonable people hashing something out? -- Visviva 01:19, 4 July 2007 (UTC)
If we're including English and Greek letters (and we are), then already we don't have a policy of including only symbols that have meaning, and I think we might as include the Hangul syllables, as well as any other normal Unicode character. —RuakhTALK 01:54, 4 July 2007 (UTC)
NB: There's a discussion going on somewhere (I cannot remember where off the top of my head) about whether to only include Korean 'letters' (not 'letters' per se, but the equivalent) or to include syllables. The arguement in favour of syllables is that you can't look up individual letters in Korean (for those of you unfamiliar with Korean, it is similar to encountering the word 'fœtus' — if you're familiar with the Latin script, you can decompose that 'œ' into 'oe' and look up each one separately; if, however, you are unfamiliar, you would likely copy-and-paste the 'œ' into our search box). — Beobach972 03:04, 4 July 2007 (UTC)
Why, it's further up on this page... I'm sorry, I'm not all here today... — Beobach972 03:08, 4 July 2007 (UTC)
However, it may be a good idea on your part to split that (admittedly off-topic) discussion off to this separate section... — Beobach972 03:09, 4 July 2007 (UTC)
For reference of others, the above discussion is at #Addition of headers to ELE (and thanks for reminding me of it, I knew we'd been talking about this recently somewhere).
Perhaps a source of some confusion here is the "Syllable" heading, which has a fully legitimate (though somewhat non-intuitive) use for the listing of hanja homophones. But then there is the (IMO illegitimate) use of "Syllable" headings to simply describe a syllable which lacks lexical or graphological significance. I can sympathize with the idea (raised above) that this information might be needed, although I have a hard time imagining a real-word situation where someone would need to look up an isolated Korean syllable... I guess I'm wondering if a redirect to a line in an appendix wouldn't be enough to keep everyone happy. -- Visviva 04:14, 4 July 2007 (UTC)
One aspect of this is that there is often a lot to be said about individual letters (or jamo or what have you), but there is very little that can be said about most syllable glyphs, outside of their composition and a smattering of technical info. There is little more that could be said about (an arbitrary hangul syllable I just made up) than there is to be said about, for example, . -- Visviva 04:20, 4 July 2007 (UTC)
Aren't most hangul syllables used at some point or other? In any event, there are a total of 11,172 (here's a page that has them all), and I see no harm to including them all, if not at least those that can be shown to be used. Some of our "most wanted" on wantedpages are hangul as well. Cheers! bd2412 T 10:15, 25 July 2007 (UTC)

Eleccom Invitation to vote

The Wikimedia Board Election Steering Committee invites all community members to vote for candidates they support. Voting is open until July 7, 23:59 (UTC).

We appreciate all community members who have voted. So far, about 2000 votes have been cast from a total of around 100 different communities. As you know, qualified contributors from all Wikimedia Foundation projects, in all languages, may vote. We hope to see turnout from as many different communities as possible. Reports about the Election are available from https://wikimedia.spi-inc.org/index.php/Reports.

All information is available on meta at:
On each candidate: http://meta.wikimedia.org/wiki/Board_elections/2007/Candidates
General information about the Election: http://meta.wikimedia.org/wiki/Board_elections/2007
FAQ: http://meta.wikimedia.org/wiki/Board_elections/2007/FAQ

Questions about election are welcome at:

Thanks to devoted volunteering translators, those pages are also available in many languages other than English. Some languages, including German, French, Spanish or Chinese translations are fully available as far as those documents.

Thank you for your attention, we look forward to your participation. --Aphaia 14:48, 4 July 2007 (UTC) For the Wikimedia Board Election Steering Committee


A poll to exclude from Wiktionary all possessive case forms of words formed with the enclitic ’s has been started here. † Raifʻhār Doremítzwr 17:02, 4 July 2007 (UTC)

That's not what the vote says. It is a vote on "Excluding possessive case forms of nouns". While the examples given may be possessives of that form, the vote does not limit coverage to those forms. As worded, it would eliminate all possessive case nouns from every language. --EncycloPetey 19:50, 4 July 2007 (UTC)
That is true. Nevertheless, I’m pretty certain that the intention of the proposal was that majority approval would lead to the exclusion from Wiktionary of all Modern English possessive case forms of nouns formed with the enclitic “’s”, or its elided variant, “’”. Does the proposal need to be rewritten (presumably also mentioning where information as to how to form a word’s possessive form(s) should go) and the vote restarted, or can the changes be made to the present poll which has already commenced? † Raifʻhār Doremítzwr 01:46, 5 July 2007 (UTC)
It's early enough in the process. I would advise rewording it to be specific, and extending the voting period a bit so that people can have time to react and rethink. --EncycloPetey 02:11, 5 July 2007 (UTC)
Yes, giving the vote a full period to run from the last change in wording is an excellent idea. It should also be very clear that the vote has been altered. Also, given that the change is done early and there are not too many votes anyway, it would be best to advise everyone who has placed a vote of the clarification, or at the very least those whose opinion does not clearly indicate that they would still support their old choice. DAVilla 11:05, 5 July 2007 (UTC)
I have rewritten and restarted the vote, having attempted to reword the proposal to address the issues that people have raised. I shall now inform all those who have voted thus far of the changes. † Raifʻhār Doremítzwr 19:46, 5 July 2007 (UTC)
That vote still does not describe the exact wording you are proposing (to add? Precisely where?) to WT:CFI. --Connel MacKenzie 16:38, 17 July 2007 (UTC)

Brand names proposal

EncycloPetey wrote in the RfV for Ford:

"If a brand is so well known that a novel or national magazine or major newspaper article uses the brand name in place of the name of the product, without providing context on the assumption that readers will understand, then it is worthy of inclusion."

I wholeheartedly agree with this sentiment, and would like to clarify the CFI to recognize such uses as 'attributive' with respect to brand names. In short, if the name of a product can be found in three sources spanning more than a year, which each of those uses of the name being without an explanation of what the product is (i.e. "Joe drives a Mazda" as opposed to "Joe drives a Mazda automobile" or "Frank was munching on a Milky Way bar" as opposed to "Frank was munching on a Milky Way candy bar"), then it should be defined as a word. I would add the caveat that the source can not be a publication of, or sponsored by, or about, the manufacturer of the product. Following are some examples of use that I think should be acceptable (note that in many instances the mention of the brand is intended to imply something about the person who uses it):

Mike Marcoe, Still Smiling At Twilight (2006) p. 79:
  • He drives a BMW. A restaurant host who drives a BMW. And a 5- series at that. The head chef only drives a Toyota.
Tom Drury, The End of Vandalism (2006) p. 26:
  • Louise helped the girl up, took her into the bathroom, and got her some Alka-Seltzer.
Judith Van Gieson, The Wolf Path (2006) p. 93:
  • George showed up on time, he has a good job, he drives a Toyota with a normal radio.
William Braxton Irvine, On Desire: Why We Want What We Want (2005) p. 26:
  • He is disturbed not by the crass materialism of his life but by the fact that he is still driving a Ford when he could and should be driving a Porsche.
Meg Cabot, Darkest Hour (2005) p. 86:
  • That and a Butterfinger bar soon had me feeling like myself again, and it wasn't long before Jack and I were frolicking in the waves...
Andrew Klavan, Dynamite Road (2004) p. 234:
  • Eating Cheez Doodles out of a plastic bowl, drinking Budweiser out of the bottle. Lighting Marlboros and snuffing them halfway, through in the ashtray she held loosely at her waist.
Drew Pinsky, Cracked: Putting Broken Lives Together Again (2004) p. 68:
  • Responding to the intense sugar cravings of early withdrawal, Amber has littered the nightstand with half-eaten Butterfinger and Snickers bars.
Tama Janowitz, Peyton Amberg (2004) p. 160:
  • Would you call down and find out if they have any Alka-Seltzer and Pepto-Bismol?”
Gaby Triana, Backstage Pass (2004) p. 42:
  • Mom drives an Accord, which is quite a surprise if you think about it. I guess that says something about her response to fame. Anyone can drive a Porsche once they have the cash.
Mary Roach, Stiff: The Curious Lives of Human Cadavers (2003) p. 285:
  • The brain had been flash—frozen and did not slice cleanly. It sliced as does a Butterfinger, with little shards crumbling off.
Mike Daisey, 21 Dog Years: Doing Time at Amazon. Com (2002) p. 42:
  • Jeff is worth billions but rents an apartment and drives a Toyota hatchback.
Harry Lee Kraus, Could I Have This Dance? (2002) p. 293:
  • But the Heineken would cool him down enough to sleep, even if the peanuts tortured him.
Bernard Goldberg, Bias: A CBS Insider Exposes how the Media Distort the News (2001) p. 174:
  • And they can also live in a bigger house and drive something a little fancier than a Chevy or a Ford.
Dennis Lehane , Prayers for Rain (2000) p. 38:
  • Our mechanic, we'd probably guess, drinks Budweiser.
Jon A. Jackson, The Diehard (2000) p. 54:
  • He had to have an Alka-Seltzer, or something, some coffee, maybe.
Robert Siegel, The Npr Interviews (1995) p. 299:
  • I mean, we could be the gangsta gun of choice just like every gangster drives a Lamborghini too.
Chuck Wachtel, The Gates (1994) p. 68:
  • Primo and Manny are sitting on the concrete bank of the East River, eating sandwiches from D's, drinking Heineken from cans, waiting for the sun to finish setting and the Macy's fireworks display to begin.
John Grisham, The Pelican Brief (1992) p. 103:
  • She drove an Accord and lived modestly.
Howard J. Hefley, Way Back in the Ozarks (1992) p. 43:
  • Fesser was standing at the counter munching on a Milky Way bar.
Robert Olmstead, A Trail of Heart's Blood Wherever We Go (1990) p 42:
  • Eddie and Cody sit at the kitchen table drinking Budweiser, a bottle of Jim Beam between them.
Patrick F. McManus, The Grasshopper Trap (1986) p. 187:
  • The Old Man laughed sardonically, which was not easy while gulping Alka-Seltzer.

Thoughts? bd2412 T 21:56, 4 July 2007 (UTC)

Strong oppose. That turns en.wiktionary.org into an advertising outlet; nothing more. Did you learn nothing from the opening of the Internet to commercial exploitation in the early 90s? Are you not getting enough spam?
Furthermore, there is no linguistic value in a brand name. When a word is used generically to represent similar items, then it can be included here. E.g. aspirin, kleenex, band-aid. NOT when it simply promotes a corporation (or a corporation's product.)
--Connel MacKenzie 01:02, 5 July 2007 (UTC)
This does no advertising. Are we going to say that a Ford is a high quality automobile while a Mazda is an over-priced piece of garbage? No, that would be POV. To include Ford and indicate that a Ford is a brand of car does nothing to advertise Ford, anymore than including Romania and indicating that Romania is a country advertises Romania (or including piano and identifying it as a musical instrument promotes pianos). Our mission is to define all words in all languages. Someone reading a newspaper or a novel may come across a statement that a certain person was in a Honda, and may honestly not know what a Honda is - and that reader may get no guidance from the piece they are reading, because authors tend to presume that everyone else knows the meanings of the words they use. We should define the word to the extent that we indicate that it is a brand of automobile, and provide a link to the Wikipedia article if the reader needs more information.
link-farming isn't advertising? WTF? --Connel MacKenzie 06:26, 5 July 2007 (UTC)
Link farming to Wikipedia, our sister project? No, unless you want to call that advertising Wikipedia. bd2412 T 21:52, 5 July 2007 (UTC)
Link farming is the use of other websites to promote your product's name. It skews search engine results in favor of a particular product name. It is advertising, nothing more. --Connel MacKenzie 05:17, 6 July 2007 (UTC)
We won't be skewing any results "in favor" of anyone is we include all terms that meet the proposed requirement. A little-known brand name will not be found in three independent sources (note: I could not even find three such references to "Boeing"). As for particularly well-known brands, well they're already well know, and if we include (for example) both Lamborghini and Ferrari (both of which have numerous stand-alone references in literature), then we're really not favoring one product over the other. bd2412 T 19:01, 6 July 2007 (UTC)
I know this is old, but that's rather wrong. Including the word "Ford" isn't going to skew search engine results for Ford's website at all; I don't know how you could think that. Linking to Ford's website would, except that Wikimedia sites are nofollow, which means links from them don't affect search at all, not that we were talking about links in the first place. Atropos 08:25, 6 August 2007 (UTC)
Thanks for your comment. But while WMF sites are set "nofollow" (the Apache directive) very few mirrors are, which is precisely why WMF projects remain an enormous target for advertisers. They are very well aware that their rankings improve based on WMF project promotion; Wikipedians and Wiktionarians alike seem not to realize this. Like any other general-purpose dictionary, there is simply no excuse for promoting trademarks, brands or products here. In the end, it is still link-farming, off highly popular sites. Even if there were some conceivable way of having a NPOV (by including all competing product names) it would still amount to link-farming for all of them. --Connel MacKenzie 19:57, 6 August 2007 (UTC)
Furthermore, you are quite incorrect in discounting the linguistic value of brand names. Many brands have interesting stories about their names. Some are family names (or variations thereof such as Oldsmobile) while others, such as Acura, are invented by ad-men with the intent of sounding vaguely like they mean something, and then focus-group perfected. You are smudging a perfectly fine line between informing and promoting. I have yet to see a Wiktionary entry promoting a product. bd2412 T 01:15, 5 July 2007 (UTC
Those "interesting stories" belong on those corporation's home pages, not here! There is no fine line to be smudged! --Connel MacKenzie 06:26, 5 July 2007 (UTC)
I'm tempted to agree with you (bd2412) that these should be included, because... well... what does The Old Man laughed sardonically, which was not easy while gulping Alka-Seltzer mean? Is that gulping acid? Gulping liquid cocaine? Gulping air, for that matter? One can't tell. To that end, however, I'd think that the entry for each must contain more than a trademark owned by X company, it must provide a link to Wikipedia, or, if WP has no article, some basic information: in this case, is Alka-Seltzer a poison, an acid, a drug, or what? Eh... — Beobach972 02:49, 5 July 2007 (UTC)
Well, I would hope all such entries would include an actual definition. Saying "It's a trademark" isn't a definition anymore than saying "It's a noun" would be. As I'm the one being quoted here, I do agree with including these. I'm not sure that I would want all of the possibilities to be included, but for the simplicity of evaluating potential entries, I think it's better to include a small percentage beyond the desirable than to exclude a whole category of words that ought to be here. --EncycloPetey 03:00, 5 July 2007 (UTC)
I think that existing entries for brand names (e.g. Microsoft) provide an excellent guide. For an automobile or a candy bar, I'd say no more than that it is the brand name for an automobile or a candy bar, and (if it is not also the company name) who the manufacturer is. That is in no way promotional. For Alka-Seltzer and other medications, I'd be inclined to include formulaic information, to match the content typically provided in a Physician's Desk Reference. For example, Alka-Seltzer should be defined as:
A brand name owned by the German Bayer Corporation for a line of medications sold over the counter and taken by means of rapidly dissolving tablets that form an effervescent solution in water; the original formula consisted of a combination of acetylsalicylic acid, sodium bicarbonate, and citric acid.
Again, this is purely informational. bd2412 T 03:34, 5 July 2007 (UTC)
That's not a great definition. Is it used "in place of the name of the product" or not? If it is generic, then the brand information is irrelevant, certainly not part of the meaning. It is etymology if anything. Dmcdevit·t 03:53, 5 July 2007 (UTC)
I do think Alka-Seltzer should be included, but I don't think I like your definition; while it's true that Alka-Seltzer is a brand name, that's not part of its definition, any more than "A noun referring to […]" is part of the definition of piano. In this case, I think the word has two distinct senses, the first being "Any of a number of over-the-counter medications sold in the form of tablets that dissolve rapidly in water, producing an effervescent solution", and the second being "The effervescent solution so formed." (The etymology would be "A brand name owned by Bayer; apparently from alkaline and seltzer"; and the main part of the entry would provide a link to w:Alka-Seltzer.) —RuakhTALK 05:04, 5 July 2007 (UTC)
I have no objection to indicating the manufacturer as the source of the word in an etymology section (and therefore not in any definition lines). This may have been a bad example for me to use, but I think that for medications in particular, we should most definitely list the chemical composition. Even today, the basic formula for Alka-Seltzer is acetylsalicylic acid (aspirin), sodium bicarbonate (baking soda), and citric acid. bd2412 T 06:03, 5 July 2007 (UTC)
I'm not quite sure I see what you're getting at. "Us[ing] the brand name in place of the name of the product" sounds an awful lot like "A name should be included if it has become a generic term. For example: Remington is used as a synonym for any sort of rifle, and Hoover as a synonym for any sort of vacuum cleaner." It's not a very radical idea. Your examples don't even seem to correspond to the criterion, though, and it would be a bit radical if they merited inclusion under the criterion. Are you asserting that "She drove an Accord and lived modestly." is using the brand name Accord in place of "car" and not, say, describing an actual Accord? There are millions of brand names and, while some of these make sense, many of your examples are just mentions of a brand name in print, not the brand name used as a generic name. Think kleenex in place of tissue, frisbee instead of flying disc, band-aid for bandage, novocaine, aspirin, speedo, photoshop, etc. It is common for someone to say an image was photoshopped, just because it was altered by software regardless of whether it was Adobe Photoshop or not; men wearing tight bathing suits are often said to be wearing speedos, reagardless of whether they are wearing Speedo brand swimsuits or not. These are different from saying "munching on a Milky Way bar" when the authr literally means that the person was munching on a Milky Way bar. That could be said about any of millions of brands, even ones that no one would recognize. Dmcdevit·t 03:53, 5 July 2007 (UTC)
Yes, if to say someone is driving an Accord means that particular car, as opposed to a car generally. But if I say they are driving a Mack or a Ram, what kind of car am I referring to? None, because one is a truck, the other a pickup. But a reader who comes across those words might not know what kinds of things those are. Requiring three independent uses not from the manufacturer, and which do not identify the class of product spoken of, limits inclusion to those terms which are likely to be used by a writer in place of a description of the product, because the writer is assuming (probably with good reason) that everyone knows an Accord is a car, that Alka-Seltzer is a medication, and that a Milky Way is a kind of candy. If the brand is one "that no one would recognize", it is doubtful that it would meet the CFI in terms of number of independent uses spanning over a year (it might be good to further specify that use must come from the works of three different authors, to avoid a single author pushing a favored obscure brand). bd2412 T 04:13, 5 July 2007 (UTC)
You (Dmcdevit) have very good points as well. This is why I'm undecided. — Beobach972 04:09, 5 July 2007 (UTC)
What do you (bd2412 et al) say to the contention that somebody encountering the term can look up 'Milky Way' in Wikipedia, it isn't needed here in Wiktionary? — Beobach972 04:09, 5 July 2007 (UTC)
Well, somebody encountering the term can look up piano in Wikipedia as well[1], or Poland[2], paleontology[3], pandeism[4], panic[5], paradise[6], parasite[7], patriarchy[8], etc. The fact that words reflect concepts which can be discussed in an encyclopedia does not detract from the fact that they are words which should be defined in a dictionary, particularly where we may offer information (translations, pronunciation, definitions or links to definitions of the term other then as a brand name, even etymology) which an encyclopedia might not. bd2412 T 04:21, 5 July 2007 (UTC)
Hmm. On the one hand, I think: trademarks, shouldn't change (thus, they shouldn't have translations, they should be 'Translingual') — Milky Way should be Milky Way in Poland, too. If it's run under a different brand name there, that would (or should) be mentioned in the Wikipedia article: Milky Way is sold in Poland under the brand name / trademark ‘Yaw Yaklim’. Wikipedia articles are also able to, and often do, provide pronunciation information. If the term has a non-brand name definition, we can give that here, and Wikipedia could link to it. On the other hand, I can see the benefit to having it all in Wiktionary (you could look up the whole sentence here, rather than having to switch to Wikipedia for 'Milky Way'). On the one hand, again, I think: anybody with a basic knowledge of English could figure out from context that Milky Way was the specific name of some product, and if it wasn't in a dictionary, it would be in an encyclopedia. Somebody who spoke no English might not know that — but wait (switching now to the other hand, again), wouldn't their native language, whatever it was, have trademarks? They'd be familiar with the concept. — Beobach972 04:40, 5 July 2007 (UTC)
(PS- I'm not sure the last half of this comment is relevant... sorry... — Beobach972 04:40, 5 July 2007 (UTC))
Trademarks are generally translated into languages that do not use the Roman alphabet (see Coca-Cola for several examples). If we do no more than direct visitors to the Wikipedia article, that's fine with me, but we should at least do that much. bd2412 T 04:48, 5 July 2007 (UTC)
No, we should not. That is search-engine skewing, and allowing any to be included on non-linguistic basis guarantees that all many-many millions will enter their crap here. As it is, even with our rules clearly prohibiting it, we get several per day (probably a lot more than that - I haven't checked recently.) This proposal would flip the orientation of en.wiktionary.org from that of some linguistic nature, to 99.9% advertising, overnight. --Connel MacKenzie 06:20, 5 July 2007 (UTC)
Connel, in this instance your hysteria is unsupported by reality. Just how many brand names do you suppose can be supported by three independent cites spanning a year which do not identify the product to which the brand name applies? bd2412 T 18:42, 5 July 2007 (UTC)
Um, what hysteria, and what lack of support? The current CFI allows for those things; you are suggesting a blanket exemption of rules for trademarks and brands. In each language, there are many millions of current trademarks. --Connel MacKenzie 21:31, 5 July 2007 (UTC)
It would seem, then, that you're saying that the current CFI already allows for the inclusion of brand names according to exactly the criteria that I have proposed above. That is, the word should be included if three independent cites (and not connected with the manufacturer) spanning a year use the word without identifying the product to which the brand name applies. bd2412 T 21:52, 5 July 2007 (UTC)
No, I am saying that they should not be given an exemption. I said "you are suggesting a blanket exemption of rules for trademarks and brands." The current CFI allows for "those sorts of things" if used attributively (i.e. of some linguistic value.) Despite your examples given so far, the broad majority of entries resulting from this will be of much more dubious content, of which there is no linguistic value. --Connel MacKenzie 05:17, 6 July 2007 (UTC)
If a widely-read novelist would be inclined to write that someone "drove an Accord and lived modestly" instead of saying that someone "drove a car" or even "drove an Accord car", then the word has as much linguistic value as if that author had written that the subject "drove a midsize" or "drove a hatchback". How will the "broad majority of entries" be "of much more dubious content" if they all have to show three sources independently using the term without identifying the product? And to repeat an earlier query, how many brand names do you think will meet that requirement? bd2412 T 12:57, 6 July 2007 (UTC)
I think this has ceased making sense. The original proposal was "uses the brand name in place of the name of the product." You are saying that any common use of "drove an Accord" is a generic use of "Accord" to mean "car," instead of the more obvious interpretation: that the author literally meant that the person drove an Accord? It is simply not common terminology to ever say "Accord car," or such for any car name, or many other brands. So yes, this odd redefinition of genericized trademarks, whihc are allowed, seems like a backdoor proposal for indiscriminately including all brand names regardless of linguistic value. Dmcdevit·t 08:39, 7 July 2007 (UTC)
By the way, use of a brand name such as Accord in a generic sense such as car is NOT what's under consideration here. We already have a acceptance rule for genericized brands. What's under consideration are uses of such names when they exactly mean, e.g., "a car of the Accord brand". True, you would rarely see Accord used in such an unnecessarily long statement; however, you would normally see brand names used within some sort of context that indicates what they are.
If the context was "we switched highway lanes to pass an X" it's not clear if X is even a normal highway vehicle (as opposed to a tractor or something), which might be pertinent to the understanding of the narrative. In my opinion, in the context "we were driving at the speed limit, and an X behind us switched lanes to pass" then I would think it's pretty clear that X is a type of car, or at least a vehicle designed for road use. I have to disagree on a minor point in that one would natually assume, from a pragmatic standpoint, that if "Joe drives an X" then X is a type of car. Of course, if X weren't a type of car, a Caterpillar for instance, then that could be a valid out-of-context quotation for it, depeding on the surrounding text. DAVilla 12:25, 7 July 2007 (UTC)
An "Accord car" is a ridiculous word combination that only makes sense with respect to our flawed attributive-use rule. A comparison to that, beyond the initial explanation, is hardly beneficial in my opinion, and certainly not the point of the proposed guideline. DAVilla 12:32, 7 July 2007 (UTC) Evidence: "Accord car is" gets a wooping 8 Google hits, and "Accord cars are" only 7. DAVilla 19:36, 8 July 2007 (UTC)
The Honda Accord happens to be one of the most widely owned automobile brands in the U.S. Now, compare that to a relatively little know brand, the Mercury Brougham. The word "brougham" turns up thousands of Google Books hits, but none appear to relate to that brand name. The phrase "drove a Brougham" gets nine hits - of which eight are from the 1800s (obviously a different meaning of the word) and the ninth is unviewable; the phrase "drives a Brougham" does no better, and "Mercury Brougham" gets none at all. The Yugo (which you may have heard of) would not get in either. So, a brand of car has to be at least popular enough that an author would think that the reader needs no explanation that the thing mentioned is, in fact, a car. I happen to think that Accord is a particularly useful case because an accord is an agreeement, and one can "drive" (i.e. be the directive force behind) an agreement, so this should be defined to avoid that particular kind of confusion. But my point is that not all brands would meet the proposed interpretation of the CFI, and in fact only a relative few would meet all the conditions imposed. And those few should be defined here, because they are used in the real world to represent things without further explanation of what the thing represented is. bd2412 T 05:36, 8 July 2007 (UTC)
Note that a source saying "Joe bought a new car - it was an Accord" or "I went down to the car rental place and all they had was an Accord" would also not qualify as references, as each still identifies the brand name as that of a car. bd2412 T 20:06, 8 July 2007 (UTC)
Stong support. In fact I would apply this to all proper nouns, not just brand names. "Athens" refers to a specific city in the United States only if it can be shown to do so outside of any other context, such as newspapers in Athens and in neighboring cities, in which case the defintion of Athens for that sense would warrant a regional label. DAVilla 11:00, 5 July 2007 (UTC)
Cautious support... the criterion of use-without-explanation is reasonable, but only if the lines are drawn extremely tightly, which in this case means that (I think) there must be no explanation whatsoever (and also the cites must be completely independent from the company, i.e. not advertisements, crypto-promotional material of any kind, or part of a discussion of a particular company's brands). Thus I would not consider the above cites for "Snickers bars," "Milky Way bar," "Toyota hatchback" to be adequate in themselves (although I expect more adequate cites could be found for each of those). -- Visviva 11:35, 5 July 2007 (UTC)
I agree that cites must be completely independent of the brand owner. I can see how "hatchback" would be disqualifying, since anyone could look up hatchback and see that it is a kind of car. Not sold on "bar" because there are so many meanings of the word - a bar shape, a tavern, a prohibition, a level to be raised and overcome - and even explicit reference to the shape would not make it clear that it is candy. bd2412 T 18:47, 5 July 2007 (UTC)

Brand names proposal - second vote

A vote on this proposal is now open. I have edited the proposal to address some of the concerns raised in this discussion. Cheers! bd2412 T 22:47, 10 July 2007 (UTC)

Most of this is good, if a bit legaleseish, but I have problems with some of it.
  • Specifically I don't understand the bullet points following "Furthermore, the citations must not identify the type of product to which the brand name applies, whether by stating explicitly or implicitly some feature or use of the product from which its type may be surmised. For example:". The CFI needs to be understandable with ease, so these probably need rewriting into simpler English.
  • The final "furthermore", "Furthermore, the citations must make reference to some quality that is characteristic of the product but not implicated by its type." would seem to invalidate sentences like the following:
    She quickly hoovered the house in preparation for her guests' arrival.
    Despite his fraught emotional state, his voice rang out firm and clear through the tannoy.
    Finally they finished loading the transit and bid a tearful goodbye to the family
    It was a stressful drive through rush-hour traffic, but fortunately Jake and Susan were plugged into their iPods and there hadn't been a single whine of "Are we there yet?" from the back-seat all trip
This will mostly impact genericised trademarks (if these criteria aren't meant to apply, then the proposal needs to say so), but will also hit products that are in the public lexicon not for being something special but for being first to achieve significant penetration of the market, or for simply being (or having been) significantly the most common or most typical. Thryduulf 19:45, 13 August 2007 (UTC)

Well, since the vote solved nothing (despite the support of a clear majority of voters), I will note for the record that the word "attributively" is still a murky mess. Since existence is an "attribute", I see no option but to find that a word is used "attributively" if it tends to show that the thing designated by the word exists. Cheers! bd2412 T 22:29, 14 August 2007 (UTC)

English to Arabic wordlist relicensed to GFDL

Bumping this thread to the surface again: English_to_Arabic_wordlist_relicensed_to_GFDL. --Versageek 14:29, 5 July 2007 (UTC)

Language-dependent see-alsos

Is there any opinion on using {{see}} at the top of language sections, immediately under the language header, when the see-also is not an exact character mapping, but one would be expected to look for the word in a standard dictionary under that spelling? For instance, {{see|trunks}} at trunk since the definition for "swimming trunks" is on the other page? For all inflections at the least, I would link to the other page if there is a definition that is not a form-of. It is not possible to determine that looking at the inflection line itself, and it is quite common for contributors to add "plural only" definitions already provided at the -s page, indicating that the distinction is not prominent enough. DAVilla 10:47, 7 July 2007 (UTC)

That makes a lot of sense. Alternatively, or additionally, maybe we should use the last definition in a section for this:
  1. For uses found only in plural form, see trunks.
RuakhTALK 15:52, 7 July 2007 (UTC)
Personally, I'd like to have {{see}} used only at the top of a page. Anywhere else and people will think it's been misplaced. Ruakh's idea sounds possible, but I've got a nagging feeling it could lead to new difficult issues. I couldn't say what those are, or I would, so they may be only imaginary worries. --EncycloPetey 19:23, 7 July 2007 (UTC)
A definition line "# {{context|in plural|lang=en}} See [[trunks]]." (or with a gloss) seems like a 'better' approach. I think the {{see}} line at the top is already overburdened. It is meant as a navigation aide; moving it around randomly would be a mistake. --Connel MacKenzie 03:11, 9 July 2007 (UTC)
Well I don't think this would amount to moving it around randomly, but I could support your idea too. I just don't think our system of navigation for these inflected/variant forms is sufficient given how radically different we are to traditional dictionaries in English (or any particular language) in this regard.
A different but similar problem to consider is a word that's usually plural, or more frequently that can be either lower/upper-case. I usually do something like '''universe''' # The [[Universe]]. in those cases, analogous to the more complex scenario where compound word = spaced as noun or hyphenated as adjective. DAVilla 19:48, 9 July 2007 (UTC)
How would this apply to WT:AJ#Verb forms of nouns, or WT:AK#Verb forms of nouns (the latter still under discussion)? These forms do call for some kind of special treatment, since they are far more organically linked to the root than most derived forms are; on the other hand, they are not inflected forms in any meaningful sense (which distinguishes them from cases like "trunks").
There are limits to what can be done in the name of the one-size-fits-all approach here, I think. But perhaps there is an alternative solution for the ja/ko situation? -- Visviva 16:18, 9 July 2007 (UTC)
Yeah, it's really the root, as the standard dictionary listing, that I'm after. DAVilla 19:48, 9 July 2007 (UTC)

Sign Languages on Wiktionary

There is a desperate need for a Wiktionary for sign languages, such as American Sign Language and British Sign Language. Obviously, doing this will require storage and bandwith for video uploads/downloads, but as these are very small languages, the demands would not be massive.

The need is clear: many sign languages do not have easily accessible, up to date dictionaries available. Many of the books that are out there are developed without widespread community involvement. There is considerable variation within sign languages, and most published books on sign language barely scratch the surface of the variety of signs in use. The books that are out there are very expensive and rarely updated.

There is currently a private project running on www.dictionaryofsign.com - but this is poorly developed and utilised. This project is an ideal wiki candidate: language belongs to all the people (not to academics, random enthusiasts or private companies) and sign languages need to be reclaimed by all those who use them.

Discussion and proposals on getting this started are most welcome - details of relevant technical and Wiktionary issues need to be explored in much more detail. Apologies if this issue is already being addressed, but I couldn't see any reference to this being the case in my searches.

Zctyp18 21:33, 8 July 2007 (UTC)

Perhaps you could start by adding ASL images of a-z to the entries a - z, to demonstrate what you think would be needed. (Especially j.) Video .ogg files are welcome here, linked from commons. The files themselves need to be uploaded to commons, so other projects can share them. (See commons:COM:FUS for assistance.) How many sign languages are there? How many words are defined in each sign language (and is there a public-domain resource we can tap, for any of them?) While we don't currently have any methods worked out for these yet, it is only because no one has volunteered their time and effort towards it. The concept is welcome, the details (language heading, image placement, video placement) need to be worked out. Would sub-headings under ===Pronunciation=== be appropriate? --Connel MacKenzie 21:52, 8 July 2007 (UTC)
Thanks for this fantastically helpful and rapid response Connel! First, I must state that I do not know ASL beyond the alphabet and am only an enthusiastic, intermediate-level user of British Sign Language (BSL), so apologies for the gaps in my knowledge.

A number of issues spring to mind re adding ASL images to the A-Z list, primarily, which language would we choose. As a non-American, I wouldn't be happy to see only (one-handed) ASL alphabet signs on these pages and would expect (two-handed) BSL, NZSL and AUSLAN signs too, but then where would we stop? Also there is a problem with confusing sign and spoken languages. Sign languages are not visual representations of spoken languages, but complete languages in their own right (although they do draw heavily from spoken languages), so there is a question as to whether proceeding in this way would be appropriate.

With regard to how many languages there are, there tends to be one main sign language in use in most nations (each with regional and community variations - interestingly there are 'gay' signs, and signs particular to certain schools or other social groups). Whilst they link to the predominant spoken languages, they are unique languages in their own right - nb American and British sign languages are very different from each other, whereas French and American sign languages have much in common. There is a growing interest in and use of International Sign Language, although ISL is still in its infancy and as far as I know, not sufficiently documented.

There are fewer signs in sign languages than words in spoken languages, but each sign is open to considerable modification (for example the BSL sign for 'week' - meaning a seven day period - can be modified to mean 'two weeks' or 'three weeks' instead of using the sign for 'two' or 'three' then the sign for 'week'. Also, for example the sign for 'physically large' can be adapted to mean 'very obese' and the accompanying non-manual features (facial expression, etc) can change the sign from one that is a socially-acceptable statement of fact to a term of abuse. Regardling the sub-headings under ===Pronunciation===, that might be a good idea, but to be honest, I feel that those with better knowledge of sign language linguistics than me should comment on that.

'm not sure of any public-domain sign resources out there - most are run by charities and private companies and to be honest aren't that great (especially the BSL ones). This is why sign language wiktionaries are vital. I can't tell you how frustrating it is learning a sign language - you are continually faced with multiple signs that have the same meaning, without being able to easily establish if a sign is a regional one, if its use depends on the context, if its use is out-dated, etc, etc. I definitely believe that a wiktionary will be able to provide so much more richness than any book, DVD or private website ever could and help these rapidly-developing languages to flourish.

It's great to read that it is already possible to do the relevant video stuff that's necessary, I'm sure with the efforts of fellow wiki-peeps, we can get this up and running. I'm more than happy to my bit and I'm sure many others are too. If we can get some of the basics sorted, then we could put notices up on relevant sites to encourage sign language users to get involved - or maybe we should be doing that now?! I work in a deaf school at the moment, so will try to get some staff and pupils interested ASAP! More comments, questions, opinions, etc most welcome. Zctyp18 22:26, 8 July 2007 (UTC)

Our mandate/motto is "all words in all languages" (even though Main Page no longer emphasizes that.) You mentioned four languages above - could you begin an experiment on just the alphabet(s) for them? I'm sure we'll all be able to discuss the different layout possibilities more easily, once we've seen a tiny (26) number of example entries. For the various sign languages, those should be the least problematic - addressing them should pave the way for actual word entries. I imagine we'll eventually need instructions on acceptable photography guidelines as the experiment progresses, but I really don't think there are any problems that can't be overcome. --Connel MacKenzie 03:26, 9 July 2007 (UTC)
Where would sign language entries be located? I don't think there is a standard for transcribing SignWriting in Unicode. One option for ASL (ISO 639-3 code: ase), at least, would be to use w:Stokoe notation, but we should probably build Wiktionary:About American Sign Language to make that explicit. Rod (A. Smith) 06:15, 9 July 2007 (UTC)
All sign languages are visual. Many of the uploaded "words" and "phrases" will have to be video because of motions necessary to communicte the idea. And while most signs correspond roughly to words in another language, the overall grammar is different. As I see it, we have two options: (1) tie all sign language entries to an existing English word (for languages like American SL) and others similarly. However, this has the disadvantage of implying a one-to-one correspondence between the two, what isn't the case. (2) Create a subspace for these visual languages that is organized visually, much as Chinese characters are arranged by stroke in print dictionaries. Of course, all the images would be housed on Commons, and we'd want some consistent naming scheme for these. None of the Sign Languages have ISO codes, of course, because they're not written, so we'd have to develop some coding system or find a standard one we can adapt. For now, I think the logical way to start would be to develop an Index page or an Appendix for a particular sign language, in order to see what is needed. --EncycloPetey 19:56, 9 July 2007 (UTC)
See http://www.ethnologue.com/14/show_iso639.asp?code=sgn for a list of ISO codes for sign languages. Rod (A. Smith) 20:09, 9 July 2007 (UTC)

Infix -ING-. New grammar?

Given the results of the discussion about strikethrough verb in RFV, are we witnessing the development of a new irregular verb form? Most verbs form the 3rd person present with -s or -es at the end, but here we are looking at a verb with -s- in the middle. The same goes for -ing-. And as for the past and participle, this is yet to be demonstrated.

Are we going to see more similar developments? Phrasal verb → noun → new one-word verb with irregular infixed inflexions? To log on → (noun) logon → to logon logson loggingon loggedon. To work out → (noun) workout → to workout worksout workingout workedout. I am a lookout. Today I am lookingout for any grammatical problems on the horizon.

I understand that we are here to report what is happening, not to impose. That is why this question. What proposals do we have for this possible new generation of irregular verbs. Are any of these forms out there and being used now? Algrif 10:50, 9 July 2007 (UTC)

In the short term, I think it's reasonable to treat these as variant spacings rather than infixes (i.e., as two words that just happen to be written as one). If the linguistic community (heh) comes to a different conclusion, then we should of course be guided by that. -- Visviva 12:50, 9 July 2007 (UTC)
This is interesting, but I would be wary of jumping to any hasty conclusions due to the irregular inflexion of just one verb. Even if this becomes a more regular grammatical feature, I still don’t think that it constitutes an infix. The respective etymologies of the strikethrough verb forms (which I wrote) state that they were formed by verb form of “strike + “through”; however, this isn’t quite the whole story. People see and hear words on different “levels” — for those who just see a word, “strikethrough” would conjugate as strikethroughs, strikethroughing, strikethroughed; however, for those who see the separate morphemes which compose the word (namely a verb followed by a preposition), “strikethrough” would conjugate as strikesthrough, strikingthrough, struckthrough, strickenthrough (for prepositions do not conjugate). (This awareness is explicitly exemplified in the second citation for technopoleis.) How a word is inflected is based largely upon how its structure is perceived. It just so happens that because “strike” and “through” are such basic and commonly recognised English words, the vast majority of Anglophones recognise them as the morphemes which compose the compound word “strikethrough”, so they conjugate it accordingly. But how do we explain this in etymologies? † Raifʻhār Doremítzwr 14:43, 9 July 2007 (UTC)
Not really jumping to conclusions. More like preparing the path for the near future. In the case of strikethrough we are seeing something that is neither a normal verb (is there such a thing?) nor a phrasal verb, which as you know, was my POV on this word. There will be more. logon and login immediately spring to mind. And if this phenomenon spreads outside of computer terminology, which it probably will, then we need to be ready for the new grammar and how to explain the differences. Imagine that we have workout verb and work out. The phrasal verb has more meanings than the single word verb, but they will have at least one meaning that coincides. Also, the grammar aspect of having -ing- in the middle of a verb horrifies me, but we need to be ready to include the concept. Sub-category of English irregular verbs, perhaps?Algrif 16:18, 9 July 2007 (UTC)
At the moment, no sub-category is necessary (as there is only one verified example of this phenomenon). I’d wait until there are at least three examples of this curiosity before doing anything too drastic. At the moment, all this deserves is a note in an appendix. If this process becomes more common, then we can cross that bridge when we come to it. † Raifʻhār Doremítzwr 16:28, 9 July 2007 (UTC)
Absolutely agree. It's just that I suspect we might well be contemplating the bridge from a point much closer than you think. :-) Algrif 17:28, 9 July 2007 (UTC)
No Original Research does have some reach here. I don't mind speculation, but if the linguistic community has nothing to say on it as yet, then it doesn't deserve consideration. DAVilla 19:56, 9 July 2007 (UTC)
Just for interest. loggingin Algrif 16:04, 11 July 2007 (UTC)

Category:Place names under Category:Names

This was not implemented as it should be according to the definition of name. As this entails some structural change in the topical hierarchies I will mention it here, in case there might be some reason for keeping person names and other proper nouns segregated. __meco 21:02, 9 July 2007 (UTC)

OK, but if you're going to make this change, you need to make it to all of the Category:xx:Place names (where "xx" is an ISO language code. You can't just change the English language category; you have to change all of them. --EncycloPetey 04:05, 10 July 2007 (UTC)
OK, I did just the English one to make a would-be revert easy. I'll make the rest of the changes. __meco 06:21, 10 July 2007 (UTC)
I think listing it under Category:Names is a good idea. --EncycloPetey 17:01, 10 July 2007 (UTC)

New bot vote

I am currently in the process of writing a bot program, called User:Keenesbot. I am writing under the wing of User:SemperBlotto, and the bot's planned task is to do everything that User:SemperBlottoBot does, namely (for a start): Auto-generate entries for conjugated French verbs. I have, to be honest, never learnt how to write programs with any sort of software, but so far SemperBlotto has given me simple instructions on how to do it. I've downloaded all the necessary files, and would like to give the bot a trial run. There is currently a vote at Wiktionary:Votes/bt-2007-07/User:Keenesbot for bot status to accept or refuse permission as a bot. --Keene 21:49, 9 July 2007 (UTC)

In my opinion you should withdraw your vote until the bot has demonstrated itself. This is exactly what other bot coders have done, and it's basically a requirement. Regardless, debugging is a HUGE part of programming. DAVilla 00:58, 10 July 2007 (UTC)

Consensus and multiple options

I've been thinking about voting on multiple options a lot recently and had an idea about its use here in any form, approval voting or otherwise. EncycloPetey's original objection was that he could not cast a negative vote, a restriction which is fair (I mean that in a very technical sense) because neither can anybody else. However, EncycloPetey wasn't trying to oppose any individual option, he was trying to oppose them all. That's a pretty significant point. It might be possible to consider an opposition to the entire vote that can't be made for any individual options. If that ability were available to everyone, that would also be fair. I'm going to consider the implications numerically. Although votes here should not be taken so seriously as to count hanging chads, or even half as seriously really, the detail in the math is purely for sake of analysis.

First let me make it clear that the total range of expression on the ballot is a minor detail. For a regular yes/no vote, you could tally supporting and opposing votes as +1 and -1, respectively, or you could tally them as \tfrac{22}{7} and the smaller \tfrac{223}{71}, and the result would still be the same. A more subtle point are the increments inbetween, such as what we call abstention. An even more critical question is if a vote allows just a plurality or has a stricter requirement for the outcome to count. The general rule as adopted from Wikipedia is that decisions require consensus. I have always stated as much, in that form or earlier in more specific language, on the approval votes that I created. For regular votes consensus is generally presumed to be a supermajority of two thirds. In order to apply this to votes with multiple options, I'm going to examine that requirement in a different way.

A simple majority is usually framed as considering each vote as +1 or 0 and requiring a sum of more than half of the number of voters to pass. Although we usually think of a supermajority on the same scale, an equivalent way to think of a two-thirds majority is to consider each vote as +1 or -½ (negative one half), and requiring a sum of more than half (yes, still half) of the number of voters for approval. For instance, consider 12 voters, 7 in favor of a proposition and 5 opposed. Under simple majority they would vote +1 and 0 respectively, coming to 7 + 0 = 7 > 6, so the proposition passes. Under a two-thirds majority they could vote +1 and -½, coming to 7 - 2.5 = 4.5 < 6, so the proposition fails. It's easy to see that the boarderline condition would be met if one voter switched over, to oppose in the first case or to support in the second.

We generally think of the simple majority and a supermajority in the same terms of support and oppose, but what this suggests is that in the supermajority case the option to oppose, because it's stronger, is more like half a veto. In other words, a supermajority vote is like a simple majority vote with the option to cast a more negative ballot. How does that translate when there are several options instead of just two?

Approval voting is traditionally thought of over a range of +1 and 0. Just as with the regular vote, it's easy to think of a majority or a supermajority based on this scale. It's also possible to use other ranges for each option, such as +2, +1 and 0 as in the aquarium pet demonstration, or +1, 0, -1 as in the ice cream flavor demonstration. However, neither of these address the more critical point of supermajority, which is a basic element of consensus gathering.

One option is to allow for a veto of the entire vote, instead of or in addition to participation in the approval of individual items. For traditional approval voting, with support counted as +1 and no support as 0, a veto would count as -½. If an individual were to veto the vote, then ½ would be subtracted from each option, making it more difficult to reach half the number of voters. This is analogous to the simple and supermajority case above. If the individual who vetoed the vote were also allowed to participate, then in subtracting ½ from each option, his or her range of expression would be the same, just skewed in the negative direction. Each option that was given support would tally as 1 - ½ = ½, which is equivalent to an abstention, and each option that was given no support would tally as 0 - ½ = -½.

In the case of a range of expression from +2 to 0, as in the aquarium pet example, a veto of the overall vote would be counted as -1, and each option would need as many points as there are voters to be considered a supermajority. The more natural range from +1 to -1, as in the ice cream flavor example, is a more complicated scenario because opposing votes are not implicit. One idea would be to count a veto as -1, and still require the voter to place a negative vote on every option. Another idea would be to consider votes of opposition implicit only in the case that a veto were cast, but that oddly has different effects on previously cast votes: abstaining votes would still count as 0, but where no votes were cast, where abstention was not stated explicitly, it would now implicitly count as negative. A third option is to not count any votes on individual options in the case of a veto, and simply count the veto as -2 for each option, but that shuts out those who object to the overall vote from participating in a potential decision.

The simplest solution, I think, is to use +1 and 0 as in traditional approval voting, but encouraging opposition to be stated explicitly using {{nosupport|}} so that the tally is not messed up, and in addition giving a {{veto|}} option for the overall vote that would count as -½ for every option. The fact that it's a fraction doesn't make the progress of the vote too obscure since its computation would only count with respect to constitution of a proper majority. DAVilla 03:51, 10 July 2007 (UTC)

Or you could just set up the voting options carefully. In the case of the vote you refer to in the opening paragraph, I had twice in discussion before the vote started noted that none of the potential options listed would be what I would want to see and suggested an alternative. That alternative was ignored when the vote began. Seriously, it does no good whatsoever to put so much work into tallying votes, when the setup of the options is poorly constructed. GIGO. --EncycloPetey 04:02, 10 July 2007 (UTC)
I've been considering supporting the idea that votes should be co-authored, or at least seconded. To date no one has started a vote that someone else has created, although I was trying to encourage it for a few days before giving up and starting that flawed vote. I apologize again for missing your June 29 comment, which covers a lot of different topics by the way. Topics have to be separated out or you end up with a mess, but I really should have gone back and reconsidered the vote under all of the opinions offered in discussion.
Now, you say there was another relevant comment somewhere? Do you really blame me for missing your June 28 comment in a discussion that wasn't even linked and, before your made it, had more to do with how bots could find Wikipedia pages? Even now I can't tell if that comment was directed at Dmcdevit's vote or mine. Please, if you're not going to address the vote directly, like on its talk page, please at least grant me some leniency in taking out the crinkles.
I don't know how to prove to you that it was never my intention to omit any possibilities. My response to you on option 1 was most interested in knowing how your opinion didn't fall under any of the options. We've had another flawed vote put up since, and it was resolved with correction and extention and notification, and do you not think the same could have been done? Why have you persisted in lambasting me ever since I've agreed, after having understood your opinion, on the same day the vote was started, that it was in fact flawed?
Or if it's the limitation of approval voting that has got you irritated, why can you not even address my question here directly? I would have to presume, given that your expressed objection is not having your option available, that you are in favor of having votes with multiple options. Like this one, which is ready for comment, and which does address your points, I think. But don't let me put words in your mouth. Having already offered to nominate me for de-sysopping, incredibly, I have full confidence in your ability to criticize what I've written, constructively or otherwise. DAVilla 05:34, 10 July 2007 (UTC)
I think it would be appropriate for every multi-option poll to have a "global opposition" heading, which would be counted in the same way as the others; if "global opposition" gets the highest number of supports, then the poll is rejected. Don't see the reason for counting these as half-votes; but perhaps I don't understand? -- Visviva 05:46, 10 July 2007 (UTC)
On the [1, 0] scale, a true abstention would be ½ (one half), so counting a veto as -½ wouldn't be a "half vote" in that sense. The reason that -½ is used instead of -1 is that the latter would correspond with a supermajority of three quarters, which is a very strong requirement.
This is actually a stronger proposal than the "global opposition" option you propose. If the vote is contentious, it wouldn't take a sizeable group to block the vote. It would be possible for just a handful of voters to keep the resolution from passing. The stronger the support for an option, though, the more sizeable the opposition would have to be. DAVilla 04:56, 12 July 2007 (UTC)

At the risk of stepping into a discussion I seem to have missed a good chunk of … approval voting is not the problem. There's no need to draw a three-way distinction of any sort; for a given voter and a given option, either the voter approves of the option, or he does not. We seek consensus and compromise here, which means that we want to go with the option that the most voters approve of — and approval voting is perfect for determining this.

I think there are two problems here: firstly, votes being set up without all plausible preferred options (which the new pre-vote discussion fad should help with, though of course human error is inevitable), and secondly, people responding non-constructively to others' mistakes. If there's a problem with a vote — it's not listed in the right places, or its time-frame is too short, or it's missing some necessary options — then the simplest solution is to fix the problem, rather than abstain angrily, or oppose all options and start a fruitless argument.

(By the way, another advantage of approval voting is that it allows additional options to be added late without causing a big ruckus. It's not ideal, obviously, but it's not a big deal, as it would be to add additional options to a voting system that allowed explicit negative votes.)

RuakhTALK 06:25, 10 July 2007 (UTC)

DAVilla, I am glad I picked "ice cream" for a few reasons. One reason is that it seemed to be a more naturally concrete example. (Both your "pet" example and my "ice cream" example would be much more illuminating if more people participated; but that is just as true for all votes.) Another was that people probably have strong opinions. Another is that no consensus (i.e. what is the "official" Wiktionary ice cream flavor) is likely to be reached...and that is a Good Thing (tm) IMHO. Another is that it shows what approval votes are good for: weeding out unlikely or implausible options. With "opposes" thrown in, more discussion is garnered. For policy WT:VOTEs, each should start with the exact wording change(s) proposed. So I don't see how approval voting helps. OTOH, for straw-polls to narrow down discussions, it is optimal. --Connel MacKenzie 03:41, 12 July 2007 (UTC)
Re: "For policy WT:VOTEs, each should start with the exact wording change(s) proposed.": I agree. This is true whether it's a simple yes/no vote (in which case it should be at the top of the vote), or an approval vote (in which case it should be at the top of each option, perhaps with visual formatting such as bolding, italics, and/or color-coding to show what's different from option to option). Of course, even if we do this, it's possible that people will qualify their votes in such a way that the listed wording change isn't exactly what passes, but it should help a lot. —RuakhTALK 04:03, 12 July 2007 (UTC)
Note that the above isn't just about approval voting, it's about all votes with multiple options.
EncycloPetey has said the same thing as you, that approval voting is only good for straw polling. Neither of you seem to accept the voting method as legitimate in its own right. Of course, in life there are no such thing as yes/no votes. Every vote must choose among a myriad of options. In restricting the space to one option, simple votes can be just as flawed.
On my talk page I explain why a run-off vote isn't necessary with approval voting providing the process is open and votes can be changed, as is the case with us. Essentially, approval voting whittles the choices down to a contest between the two most supported options. On the other hand, if you wanted to do run-off votes then I would be more than happy to see it illustrate that exact point. DAVilla 04:56, 12 July 2007 (UTC)
Well, but see, that's just it. There shouldn't be an "official" en.wiktionary.org ice cream flavor. That was a primary thing I had hoped would be apparent. Apparently not.  :-)
Now, OTOH, the ice-cream vote does demonstrate that the "leading" option can vary quite a bit throughout the course of a vote's lifetime. I disagree though, that a proper "approval vote" demonstrates consensus. The counter-intuitive nature of the approval vote will cause an unlikely option to seem to have support, that it doesn't really have. We're not trying to pick a winner here, we are trying to gauge community opinion, while encouraging debate. The desired end-result isn't to just pick one, it is to reach an agreement that everyone can live with. No matter what three or four ice cream flavors are popular, it would be unreasonable to call any one of them The en.wikt flavor.
To Ruakh: do we need a vote for "explicit wording changes required" or do you think that can be handled by rewriting the vote creation instructions and auto-opposing votes that don't conform? --Connel MacKenzie 21:11, 12 July 2007 (UTC)
The varying of the leading option is a direct consequence of two aspects of your voting method that I have opposed, the most important of which is (edit:) my insistence that all options be present at the start of the vote. Any option that is added after the first vote has been cast could not be counted as having been given full consideration by the voters. DAVilla 21:39, 12 July 2007 (UTC)
Didn't your demonstration vote have the same limitation? Isn't that feature touted as one of the primary benefits of that style voting? --Connel MacKenzie 16:44, 13 July 2007 (UTC)
I can't speak for how the rest of the wiki community does it, but I have never touted approval voting as having this feature. If this is what is claimed in other parts of the community then I can see how you might feel the method is only useful as a straw poll. In its true form it is intended to be fully legitimate, and whatever discussion or informal staw poll are necessary should take place beforehand. None of the votes that I have created allowed the wording of the vote to be changed after it had started—which includes, for the first few explicitly, prohibiting the addition of new options—without restarting the vote. The reason is to give all options full consideration by the body of voters. The best time to make modifications is before the vote has commenced. If after a proposal has been put to vote is realized that new options are needed, then the vote needs to be scrapped, or however we can agree to rewriting them. This is not a means to pushing my point of view, it is an assurance that the outcome represents the true opinion of the community. Objections such as the one EncycloPetey raised are precisely the kind of discourse that is needed to stop a vote in its tracks. The proposal discussed in this section is a way to measure, objectively, if such action should be taken. I do not wish for others who start flawed votes to be threatened by de-sysopping, or those who find votes in thier disfavor to feel that they need to resort to as much. Regardless of how reasonable or unreasonable the creator, a vote that is vetoed by enough members of the community would not be able to pass. The numeric value of a veto is derived directly from the concept of a two-thirds supermajority. The alternative is to simply require a supermajority, but the option to veto at least gives those who object to the vote the ability to make their opposition explicit and of demonstrable weight. Incidentally, the only vote to date that has not received a supermajority is the contested "Renaming AHD", both the runoff vote and its predecessor, although voters seem to have bound their hands to the outcome in the 10-day "Replace AHD". DAVilla 19:24, 13 July 2007 (UTC)
Please review what you wrote above and note the following error. No one threatened to desysop you or anyone else for starting a flawed vote. What happened was that you edited my vote without my permission. You were told that a a desysop vote would be started if you ever edited someone else's vote again. Editing other users' votes is seriously wrong, and should never happen. I had assumed that the original comment "Do not ever edit another person's vote. If you do this again, I will move to De-sysop," was clear about the reasons the comment was made, but apparently not. If you could explain to me where the misunderstanding occurred, I can be more careful in how I phrase my comments. I am sorry you did not understand that point when I first warned you about your misbehavior. I hope that I have cleared up this point for you now. --EncycloPetey 20:34, 16 July 2007 (UTC)
Changing someone's vote is unacceptable. The chilling effect of such an action is enormous. No clear distinction can be made between vandalism, vote tampering, rigged voting and "administrative corrections." Changing someone else's vote must always be rolled back immediately; otherwise the integrity of the vote itself is compromised. I propose that we start a policy vote to that effect: in the future, anyone engaging in such vote tampering should be immediately blocked indefinitely. --Connel MacKenzie 16:56, 19 July 2007 (UTC)

Russian pronunciation

If you haven't already heard - Tsca has uploaded about 5 000 files with Russian pronunciation to Wikimedia Commons. With such a big number it's very likely that when you create a new entry in Russian, audio file for it is already on Commons - so remember, try adding {{audio|ru-{{PAGENAME}}.ogg|audio}} to your new entries to see if there is an audio file for the word you write. All files have simple names: болеть has pronunciation in ru-болеть.ogg file.

You can consider adding all pronunciation using a bot - as we have done on Polish Wiktionary. By the way, Tsca has a bot which regularly adds audio files from Commons to our entries and currently we have 4 500 pronunciation examples with 75 000 entries total. Haven't you thought of running similar bot here on English Wiktionary? Except English, you have very little pronunciation examples. --Derbeth talk 09:43, 10 July 2007 (UTC)

Sheesh, and I was writing my own bot code for User:Dvortybot? Oh, did Tsca write his only recently? How does he handle formatting for existing pronunciation sections? One pie-in-the-sky project is to link all the audio pronunciation files from commons in the appropriate places here, but English alone has been in some ways easier, and other was harder, than I'd expected. I imagine with the wide variety of formats used on other Wiktionaries, the various bots won't be compatible (but will be comparable) from language to language. --Connel MacKenzie 21:43, 10 July 2007 (UTC)

Tsca has written his bot only for Polish Wiktionary, which has much different entry scheme than English Wiktionary. Later, the bot has been adapted to German Wiktionary, but now Tsca doesn't have time to operate outside pl.wikt. The bot works very well and we don't have any problems with it on pl.wikt.

I don't see any reason why it would be harder to add pronunciation to languages other than English on English Wiktionary. On Polish Wiktionary, we have automatically added pronunciation for English, German, Polish, Russian, Dutch, Swedish, Romanian, Italian, Danish and Spanish. All these languages are easy to parse, because they follow standard naming scheme two_letter_language_code-word.ogg. Also Farsi and Turkish are well-formed and ready to be bot-scanned. --Derbeth talk 23:13, 10 July 2007 (UTC)

In the case where a language has no pronunciation section, it is easy. It just uses {{audio}} nicely. But in cases where IPA, SAMPA, enPR, homophones, hyphenations, etc., has been added, placement is sometimes tricky. --Connel MacKenzie 20:53, 11 July 2007 (UTC)
When you automatically upload English pronunciations, how do you make sure they are paired with the right word? That is, many English words are heteronyms - two different meanings, with different pronunciations, but that are spelled the same. How do you deal with these situations? Russian has the same problem. --EncycloPetey 07:52, 11 July 2007 (UTC)
For now, Dvortygirl currently very limited activity links them manually (from the bot's exception list.) --Connel MacKenzie 20:53, 11 July 2007 (UTC)

We do not upload any pronunciation; these 5 000 Russian files were a single case. Especially, as we are not native English speakers, we don't record any English pronunciation. Our bot is just adding {{audio}} template to our entries. When it comes to heteronyms - well, they have to be fixed later manually. But as far as I know, heteronyms are mostly English problem. I don't remember any heteronyms in Polish I am native speaker of, so I don't imagine myself any heteronyms in Russian, which is also a Slavic language. I also don't remember any heteronyms in, for example, German. Nonetheless, I think that even if there is one heteronym in 100 words in Russian, it's better to have 99 entries with correct pronunciation and 1 with partly wrong pronunciation, than not to have any pronunciation. English Wiktionary is very poor with pronunciation in languages other than English. I think that if we don't count English, we have more pronunciation examples than you. In my opinion, pronunciation examples is an important advantage of Wiktionary over another dictionaries, which don't have them. --Derbeth talk 11:16, 11 July 2007 (UTC)

Agreed. --Connel MacKenzie 20:53, 11 July 2007 (UTC)
Nothing groundbreaking to say here, just a couple observations.  First, I agree that we should have more pronunciation files.  Second, English wiktionary probably doesn't have lots of non-English audio because we aren't native speakers of those other languages (like you mentioned for your reason you don't have English files on the Polish project).  Third, after only a couple seconds of thinking, I came up with the Russian heteronyms писaть - to write and писать - to pee, to urinate.  There are others. — V-ball 03:01, 26 July 2007 (UTC)

Category:Political subdivisions – bastard category

This category is ambiguous with respect to whether it is supposed to contain proper names or the generic terms of county, municipality, etc. Any suggestions on how to separate the two? __meco 15:47, 10 July 2007 (UTC)

a(e)sthesias and the {{alternative spelling of}} template

I don't like the way this template provides preference to alternative spellings of equal stature. Could we not solve this by using transclusion instead? __meco 20:43, 10 July 2007 (UTC)

No, transclusion is not acceptable. No, anesthesia (an actual word) is not "on par" with the non-word (or British word) anaesthesia. etc. --Connel MacKenzie 21:27, 10 July 2007 (UTC)
You are illegible to me in your accute brevity. __meco 07:41, 11 July 2007 (UTC)
The point is that we don't use transclusion on pages like these. Do a Google search for the first two. The spelling anesthesia is preferred on the internet 2-to-1. It's likely that anaesthesia is a Commonwealth spelling, but if it is, then it still shouldn't be transcluded. In cases like that we maintain two separate entries. We've been through this whole discussion before. --EncycloPetey 07:49, 11 July 2007 (UTC)

I think I agree with what everyone's saying here, but what does transclusion actually mean? Widsith 07:59, 11 July 2007 (UTC)

See "How much does it cost?". Here the translations section is transcluded from "How much is it?" for easier maintenance. It was my suggestion above (perhaps not in the particular case of anaesthesia/anesthesia, an alert person would, I had hoped, see beyond one example to the principle I am invoking) that the application of transclusion to major parts of an article when another article exists for a mere spelling variant and no discernible (or significant rather) preference between them exists, would be preferrable to the current practice of applying the {{alternative spelling of}} template which establishes a preference, whether intended or not. __meco 09:22, 11 July 2007 (UTC)
Please don't assume that we're not seeing that bigger picture. We have discussed that issue, as I said above. You've not offered any new information that wasn't considered and dismissed in previous discussion. --EncycloPetey 18:19, 11 July 2007 (UTC)
Where can I look to update myself on this previous discussion? __meco 18:46, 11 July 2007 (UTC)
The color/colour topic has been raised on pretty much every discussion page on en.wiktionary. I don't like mentioning it at all, as it often evokes the wrong sentiment. But you've raised several separate issues. One topic is transclusion (the use of templates or subpages to commingle content of two separate pages.) That topic is painful; one method is to transclude the "translations" section onto both pages (in a manner that doesn't destroy section editing...trickier than it sounds.) The broad majority of conversations (if not all?) have concluded that transclusion has too many problems to be used effectively. While there are a handful of exceptions, their acceptance has been mixed; grudgingly accepted for the exception cases.
The other topic is the labeling of various spellings: alternative, regional, rare, misspellings, misconstructions, etc. I think that that topic is far from being adequately addressed, and is worthy of further discussion. Currently, we use a combination of {{see}}, alternative spellings, alternative forms, usage notes, regional tags on inflection lines, regional tags in the context tags of definitions, synonyms, related terms and derived terms as appropriate. But narrower guidelines of which to use in which situations is worth discussing. Trying to address the notion of having only one entry for multiple spellings is out, though. This could not exist as a multi-lingual dictionary if the alternates were not given explicitly. --Connel MacKenzie 21:06, 11 July 2007 (UTC)
I appreciate the education you provide. How would you characterize the way transclucion is applied in the example I provided? __meco 16:44, 13 July 2007 (UTC)
The use of <onlyinclude> only adds a level of complexity that I don't think simplifies the situation. If you are editing the target page, you are likely to be unaware the page is "transcluded." Newbies would be hopelessly lost when, after editing a section, they find themselves at a different page. --Connel MacKenzie 17:07, 19 July 2007 (UTC)

Request for bot flag - VolkovBot

I'd like to request a flag for this interwiki bot. It uses standard pywikipedia framework and has flags on many wikipedias and also on Russian wikiquote. Botmaster is w:ru:User:Volkov. Thanks. --Volkov 21:01, 10 July 2007 (UTC)

Oppose. One 'all wiktionaries' interwikibot is more than enough, thanks. --Connel MacKenzie 21:25, 10 July 2007 (UTC)
It wasn't even adding the correct iwikis: [9] [10] Cynewulf 21:38, 10 July 2007 (UTC)
The bot can prove to be harmless. Capitalization issue has been fixed. It works OK now and is intended to check new articles at ru.wiktionary and add/modify interwiki links if necessary. It has flag on ru.wiktionary, flags on other wiktionaries have also been requested. --Volkov 07:13, 19 July 2007 (UTC)
But what is it for? We already have User:RobotGMwikt for this. H. (talk) 14:55, 19 July 2007 (UTC)
This was already blocked for running unauthorized. This task is handled by User:RobotGMwikt on all wikts, no other bot should be doing iwikis on any wikt. NO. Robert Ullmann 15:07, 19 July 2007 (UTC)
Apparently RobotGMwikt doesn't have enough time/resources to add all necessary links. Cf. e.g. fr or it or ru --Volkov 19:59, 19 July 2007 (UTC)

Update: The bot got flags on Italian, French, Portuguese and Russian wiktionaries now. Please consider approving here as well. It seems like RobotGMwikt doesn't have enough time/resources to crawl all the wiktionaries and add/update all necessary links in a timely manner. (cf. e.g. bot's contributions on uk.wiktionary) --Volkov 12:00, 5 August 2007 (UTC)

Proposing minor changes to topical hierarchy

I propose that Category:Computer Science be changed to Category:Computer science, and I would also like to change one of its current parent categories, Category:Science, to Category:Applied sciences. __meco 17:49, 11 July 2007 (UTC)

Those both sound like reasonable changes. There shouldn't be a capital letter in the middle of "Computer science" anyway. --EncycloPetey 18:17, 11 July 2007 (UTC)
I think it depends on the variety of English. I've seen subjects like Math and Geography capitalized as spelling words. And would you be okay with EE as Category:Electrical engineering? DAVilla 03:28, 12 July 2007 (UTC)
I have also noticed such use of capitalization in English, more liberal than in other latin or germanic languages I am used to seeing. However, I have seen little information elaborating on this, laying out an explanation or some ground rules. For instance, Math and Geography that you mention have no mention of such spelling practice at all. Shouldn't they? __meco 09:17, 12 July 2007 (UTC)
I can't guarantee this is how most people do it, but for me "math/mathematics" and "geography" are the common nouns, and "Math/Mathematics" and "Geography" are proper nouns referring to a major, a degree, or a course. —RuakhTALK 17:01, 12 July 2007 (UTC)
I think the first consideration has to be our own standard in this case. We typically capitalize only the first letter after the colon, unless there is a proper noun or acronym involved that is always capitalized. So, I'd be content with both Category:Computer science and Category:Electrical engineering. In my experience, the noun would not be capitalized at all unless (as Ruakh) noted the word is used to refer to a particular university department, major, or course. Since that isn't the case here on Wiktionary, it shouldn't be a problem. --EncycloPetey 02:39, 13 July 2007 (UTC)
I'm keen for it. --Keene 23:06, 11 July 2007 (UTC)

Main namespace redirects


Is there some non-obvious place these redirects were discussed? Should we encouraging redirects like these? I've put a one-day block on, (account creation enabled, so that an explanation can be given.) TIA. --Connel MacKenzie 20:45, 11 July 2007 (UTC)

Those redirects look excellent to me. Unicode has two different code points for each individual jamo, and the redirects just send users who enter search for one code point to the lemma entry. Please assume good faith and unblock that user. Rod (A. Smith) 20:50, 11 July 2007 (UTC)
(I unblocked. I hope that is not viewed as a hostile act. Rod (A. Smith) 20:53, 11 July 2007 (UTC))
No, not hostile; thanks for the explanation. --Connel MacKenzie 21:08, 11 July 2007 (UTC)
I've added what you wrote to Wiktionary:About Korean. Please revise and correct. DAVilla 03:11, 12 July 2007 (UTC)
Based on what vote? --Connel MacKenzie 20:47, 12 July 2007 (UTC) Oh, I see - it isn't an official page yet. Still, as per comments below, I think a vote is needed. --Connel MacKenzie 20:57, 12 July 2007 (UTC)
You blocked him? You must not have Korean fonts installed. The characters look exactly the same to me.
Yes, I blocked due to the high rate of additions that all looked erroneous, for one day (which was quickly undone.) --Connel MacKenzie 17:20, 12 July 2007 (UTC)
On the other hand, we don't have a policy regarding characters that look exactly the same, do we? Although there had been a discussion about it at one point.
Glad to see you've apologized already. Maybe someone could do the same in Korean? DAVilla 03:19, 12 July 2007 (UTC)
Never mind, speaks English. But it wouldn't be wrong to. DAVilla 03:21, 12 July 2007 (UTC)
It's good that the user was unblocked, but I don't think such redirects are always OK. For example (now redirected) is the syllable-initial form of the consonant , while is the syllable-final form. There's no difference between the final and initial forms in my browser (although the unmarked ㄱ form is rendered differently), but I think there is ample reason for us to have separate entries for each, at least of the "alternative form of" variety... This will be especially important if anyone ever bothers to write a font that displays archaic hangul properly, since at that point the final/initial distinction will be critical to accurately parsing the syllable. The distinction between these forms might be more apparent if we expand these entries to include basic technical information such as Unicode number. -- Visviva 04:39, 12 July 2007 (UTC)
To consider this in a different way: would we redirect to n? (I'm not sure of the answer, myself.) -- Visviva 05:11, 12 July 2007 (UTC)
I think the difference is that the Korean characters are both in the same language. Like I said, I'd like to see where it was discussed. --Connel MacKenzie 17:20, 12 July 2007 (UTC)
Wiktionary:Beer parlour/2007/April#Much ado about Graphemes and if you go back far enough you'll find discussions of automatically redirecting searches.
Unicode includes different codepoints for symbols that are exactly identical only because they are considered to be in different scripts. I don't think we should be Unicode's bitch. I think if the glyphs are identical, they should be on the same page. How valuable would it be to have all of the character information in every script of a symobl that looks like А? Or u (which is italicized u in Roman script and italicied и in Cyrillic script)? DAVilla 20:02, 12 July 2007 (UTC)
Presumably of some moderate value. I don't have any objection if someone wants to enter all that information (but I don't know offhand what section would be appropriate.) I agree that in cases where such characters are identical in appearance, redirecting one and explaining both on the target page seems reasonable. Thank you for the link - I'll read up on that one now. --Connel MacKenzie 20:30, 12 July 2007 (UTC)
That discussion never reached what I would call "consensus" to change, well, anything. The convention we have in place is to have separate entries. I don't see Korean mentioned there at all. What I do see, is the unresolved debate about the L2 heading "==Translingual==". It certainly seems simpler offhand to continue with separate entries, rather than allowing an exception only for Korean, but I suppose a vote for such an exception might pass. --Connel MacKenzie 20:45, 12 July 2007 (UTC)
Yes, that's what I said: we don't have a policy... other than "no redirects in the main namespace" with exceptions such as idioms, apparently Latin, and why not this? At the very least, it does little to no harm to have the redirects in place before a true consensus can be reached. DAVilla 21:28, 12 July 2007 (UTC)
For the record, I'm unaware of any exception made for Latin. If Latin redirects exist, then they probably should be removed. This issue does sound related to the issue of IPA that has been discussed recently in RFDO (where we have to worry about the difference between dʒ and ʤ; or the difference between regular g and IPA ɡ; which affect links to the Rhymes pages) and like the discussion somewhere up this page on the issue of digraphs in Dutch and Central European languages. As for the current case of redirecting Korean, I have no opinion, since I don't know enough about the script to make a meaningful contribution to the discussion. --EncycloPetey 02:46, 13 July 2007 (UTC)
I think the exception for Latin involves forms with and without macrons; this seems reasonable to me, but perhaps should be revisited? -- Visviva 01:59, 14 July 2007 (UTC)
There is no such exception. I know because I drafted the current Latin policy page. Latin policy allows macrons only within pages as a display form only, not for page names or redirects. The Latin wiktionary has the same policy. --EncycloPetey 02:12, 14 July 2007 (UTC)

I think it makes some sense to have the redirects (though generally I'd prefer a separate page for each Unicode codepoint that the MediaWiki software will allow), but it seems like they should be going the other way: why are the Hangul Jamo redirecting to the Hangul Compatibility Jamo? —RuakhTALK 03:14, 13 July 2007 (UTC)

Because the HCJ are what you get when you type a single Korean jamo into a keyboard (as a general rule), so that's where the entries have been. Also the HJ have two values for each consonant, one for when it is in syllable-initial position and one for syllable-final position, so it wouldn't be clear which of those should house the main entry. The HJ don't really get much use, but they are part of the official standard for entering archaic Hangul, promulgated in 2006 by the ROK Character Codes Research Center (however, the CCRC doesn't even follow this standard for its own database!), and are used for example in the Wikisource text of the Hunmin jeongeum. The initial/final distinction is quite important for the accurate use of these characters, but of course it doesn't correspond to any obvious difference in form. I'm not sure what the best course of action is here, but I think if we are going to allow these redirects the main entry needs to include the technical information for each redirected codepoint. -- Visviva 08:27, 13 July 2007 (UTC)

Word of the Day

Recently there has been quite a few e-mails to OTRS suggesting that we get an e-mail version of the Word of the Day (most recently Ticket#: 2007070210010067 and Ticket#: 2007070510014165). I myself think this is a wonderful idea. We could just add it to the Daily Article mailing list. It already posts the Quote of the Day from Wikiquote in addition to the Article of the Day from Wikipedia, so it only makes sense to include the Wiktionary Word of the Day in it. Please reply with feedback. :) Cbrown1023 23:29, 11 July 2007 (UTC)

OK I've fixed http://tools.wikimedia.de/~cmackenzie/wotd-rss.php and added http://tools.wikimedia.de/~cmackenzie/wotd-rss-b.php (for people who prefer that format. I've e-mailed that list-operator and cc'd you...now we wait. --Connel MacKenzie 17:16, 12 July 2007 (UTC)
It is all done. Tomorrow's e-mail will have WOTD just before the WikiquoteOfTheDay. --Connel MacKenzie 20:52, 12 July 2007 (UTC)
Now, anyone feel like writing up instructions for "how to include en.wiktionary.org's WOTD on your website"? (Rule #1: mention GFDL, Rule #2: do anything else you like.) --Connel MacKenzie 20:54, 12 July 2007 (UTC)
And rule #1.5, attribute Wiktionary, to fulfill the GFDL? :-) Dmcdevit·t 06:29, 15 July 2007 (UTC)
Well, ok. Rule #1: comply with the GFDL.  :-) (The RSS gives links back to Wiktionary, so they don't need to do anything to comply in that regard. They just need to not mangle those links.) --Connel MacKenzie 06:43, 15 July 2007 (UTC)
  • By the way, I did get an e-mail from the happy user who got his WOTD by e-mail (from the link at the top of this section. I'll put the reminder on WT:AN. Perhaps the Main Page's template should link to that as well. I seem to recall more people requesting WOTD by e-mail, than by RSS feed. --Connel MacKenzie 01:00, 20 July 2007 (UTC)

Listing of ttbc categories

This has been discussed before, on more than the one [occasion] which I have found. I just visited island, the host of ttbc-categories listed defeats at least one of the reasons for having categories - all other categories are extremely hard to find. Cannot the display of this class of category be suppressed? —Saltmarsh 06:25, 14 July 2007 (UTC)

The only alternative that immediately comes to my mind is to have one ttbc template for each language (e.g. {{ttbc-Welsh}}, {{ttbc-Afrikaans}}, etc) and to use the what links here on the template rather than the category. The template itself would link to its own what links here to make it easy to find. Perhaps also the {{checktrans}} could be modified to link to a page that links to all the ttbc what links heres. Thryduulf 09:56, 14 July 2007 (UTC)
It wouldn't be so hard to hide these with some Javascript and CSS. In fact I think I did it ages ago for Eclecticology. Connel could even add it then as an option to his extended preferences. Put a request on the grease pit. — Hippietrail 12:48, 14 July 2007 (UTC)
I agree this is better suited for WT:GP. I apologize for letting the Javascript reengineering (Monobook to Common) stagnate. I think I have put that task off long enough, as well as the preferences incorporation and refactoring. --Connel MacKenzie 04:41, 15 July 2007 (UTC)

Wikibooks Glossaries

There are a couple of Wikibooks that IMHO really don't belong there and are much more fitting for this project. Because of the nature of Wikibooks, they don't slide over as short definitions, as many of the projects like this tend to recreate Wiktionary itself sometimes.

One book is a no-brainer, and should be moved to Wiktionary imediately:

This is a multi-lingual word comparison of technical computer words used in computer science, but it has shown nearly zero activity since it was added to Wikibooks nearly 3 years ago. It may take some work, however, to get it worked into a typical Wiktionary format.

A list of words with definitions regarding archaic word usage as would be common for a researcher examining older records. This is simply an outstanding resource, but something that IMHO doesn't really fit the mission of Wikibooks and is much better done here. This particular project is (unfortunately) very active with some individuals who are resisting any attempt to move it elsewhere. A warm and friendly invitation from Wiktionary regulars might be very useful here in terms of redirecting their efforts, and making some sort of accomodation for a specialized project of this nature. Wikibooks admins are trying very hard not to turn this into a VfD fight and piss off the contributors, but at the same time noting that it really doesn't belong on Wikibooks either.

Due to the activity level I'm seeing in this particular project, I have a gut feeling that there are many external links to Wikibooks on this particular project, although I havn't been able to identify what outside groups may be pointing to this resource (aka mailing lists, researcher websites, newsletters, etc.) I also don't want to shut down the positive and valuable contributions that are being made here, but I feel in the spirt of cooperation with Wiktionary that content like this belongs here. If regular Wiktionary participants have a substantial opinion on this topic, I would like to hear about it in terms of if this content even should be moved to this project. --Robert Horning 13:56, 14 July 2007 (UTC)

With regards to the Computer Science Dictionary, I think this would fit as an appendix (appendix:Comparison of computer science terminology in European languages perhaps). The individual words and their translations should get standard Wiktionary entries.
The local history terminology (is this exclusively British?) is laid out like a traditional dictionary. Each of the words should have an entry here, the extended notes about the words seem to fit as etymology and usage notes. We will have many of the words here already, any that have a specific meaning can be tagged and categorised as necessary. Thryduulf 14:22, 14 July 2007 (UTC)
I've now copied the local history glossary to User:Thryduulf/local history gloassary (note the edit history has not been copied at all) so we can easily see which words we do and don't already have and to see if we have the definitions. Thryduulf 14:44, 14 July 2007 (UTC)
Two minds with but a single thought - ALL the words used in this document are wikified at User:SemperBlotto/sandbox (though initial capital letters have been lowercaseified) - we may be able to improve many of the definitions. SemperBlotto 14:56, 14 July 2007 (UTC)
I've just gone through and decapitalised my version as well! Thryduulf 15:39, 14 July 2007 (UTC)
p.s. Both Wikibooks and Wikisource are rich sources of words. I've been wikifying random pages from it.wikisource lately and building lists of words to add. SemperBlotto 14:58, 14 July 2007 (UTC)
I'm also keeping a list of random words that I need to investigate for inclusion (i.e. do they meet the CFI) at user:Thryduulf/words. Thryduulf 15:39, 14 July 2007 (UTC)
But be careful - there are many spelling mistakes in this document. SemperBlotto 12:50, 18 July 2007 (UTC)
I know very little about Wikibooks, but I don't understand why such texts are discouraged there. As far as importing them here into the Appendix: namespace, I don't see any problems. Gradually, the entries there can be re-duplicated into the main namespace. Perhaps even bot-loaded (for entries that don't already exist.) But I am a tiny bit shocked to hear that such content is not welcome also on Wikibooks. --Connel MacKenzie 04:36, 15 July 2007 (UTC)
This is something that has been long established on b:Wikibooks: What is Wikibooks#Wikibooks is not a dictionary, and was mainly seen as a way to not completely duplicate Wiktionary all over again on Wikibooks. If you are shocked about this kind of content being culled from Wikibooks, much more dubious (in terms of not being a book and on much shakier grounds) has been culled explicitly because "Jimbo says" and other such nonsense. Especially over the past year.
There has been a tendency in the past to use Wikibooks as a sort of catch-all dumping ground for nearly any and every crazy concept using a Wiki that you could possibly imagine, and it still is a problem to a smaller extent even now. The new projects page on Meta (until I changed it...with support of several WMF board members) open encouraged people to explicitly use Wikibooks as a sort of project incubator, and even now many Wikipedia editors encourage some users to do the same thing. Wikibooks is about expanding a topic into much more considerable depth than would be typical on a Wikipedia entry, with an emphasis on trying to develop that content into a textbook format if possible. Considerable latitude is given for new participants on Wikibooks, with a general "wait and see" approach to much of what is done. Once a topic or idea is explored in some detail, Wikibooks participants usually try to strongly suggest another wiki (such as Wiktionary in this case) that would be much more appropriate for the content if it seems out of scope to the project.
I look upon those working this particular project (Wiktionary) as more or less specialists for working with dictionaries and glossaries, as participants here tend to have a strong love of words in and of themselves. Otherwise there is no reason why a project like Wiktionary simply wouldn't be recreated completely on Wikibooks, as the Local History Terminology page is demonstrating.
Wikibooks does permit glossaries and other similar kinds of contents, but they really should be something of a sort of appendix to a much larger work, where the word lists are something to help explain in depth the contents of the book to a learner, and contain specialized terms that relate specifically to the topic covered in the book. It is these more general dictionaries that are more of the issue on Wikibooks, and trying to know when to draw the line on what would go into them. --Robert Horning 09:19, 16 July 2007 (UTC)

French <r>

I have mentioned this on a couple of people's talk pages, but I've started a discussion on which IPA symbol we should use to represent the French <r>, since there has never been a consensus on the subject and it would be desirable to hammer down a policy. Please see Wiktionary talk:About French and leave your thoughts... Widsith 08:50, 15 July 2007 (UTC)

OMG, not this again. Have you mentioned this on Hippietrail's talk page yet? I suppose we'll need a bot to switch all the "ɹ"s back to "ɾ"s first? Or is it the upside-down and backwards one?) --Connel MacKenzie 00:56, 20 July 2007 (UTC)
For the English <r>, it’s [ɹ] for UK English and [ɻ] for US English. The [ɾ] phone is an alveolar tap found chiefly in US English — an allophone of [t] and [d] when they occur in unstressed syllables, such as in water ([ˈwɔːɾɚ] or [ˈwɑːɾɚ] instead of UK [ˈwɔːtə]). † Raifʻhār Doremítzwr 02:39, 20 July 2007 (UTC)
FWIW, over at the French wikt they use [ ʁ ].
As to the English <r>, I concur that [ɹ] is the most adequate choice. Hippietrail's contention that no English dictionary uses it is mostly irrelevant here; as the Wiktionary spans several (ideally, all) languages, one should go as close to phonetics as practically possible, to avoid unnecessary confussion across phonemic systems.

Order of parts of speech

Hm, I remember reading somewhere here a suggestion that parts of speech should be ordered alphabetically. I think this is a very bad idea. Consider mint, which looked something like this (with simplified definitions):

Etymology 1 Adjective

In near-perfect condition.


Place where money is made.

Etymology 2


Of a green colour, like the mint plant.


Aromatic plant.

Putting adjective definitions before nouns simply because "A" comes before "N" is very bad - it leads to "use before definition", a concept from computing that applies just as well to lexicography: that is, you can't use a word in some part of speech A in a definition for another part of speech B until it has been defined in part of speech A. This happens in the adjective under the second etymology. The problem is that both adjectives are derived from their respective nouns, and so the nouns should be given first. — Paul G 14:49, 16 July 2007 (UTC)

Then at least offer an alternative system. There may be a few exception where alphabetical is a bad way...but in general it works fine. RJFJR 15:54, 16 July 2007 (UTC)
I did. See my final paragraph. — Paul G 14:10, 6 August 2007 (UTC)
I agree 100%. And the alternative system is obvious: "Part-of-speech sections should be ordered in the way that makes most logical sense. If it is not clear what order is most logical, alphabetical order should be used. (In practice, this means that most entries use alphabetical order.)" I've heard it claimed that this approach could lead to disputes about what order makes most logical sense, but I've yet to see any entries where I could imagine such a dispute. (In the abstract, I could imagine a dispute between etymological order and frequency order — if POS X derives from POS Y and is more common than it, then which gets precedence? — but firstly, I haven't seen any specific cases of this, and secondly, this is a special case of a more general problem that we already need to resolve.) So far, all disputes have over whether to use logical order or alphabetical order. (Put another way, alphabetical order is the problem's cause, not its solution.) —RuakhTALK 16:41, 16 July 2007 (UTC)
Hear, hear. Widsith 08:32, 20 July 2007 (UTC)
Personally, I would want to see evidence that adopting such a system would yield a net improvement rather than a net problem. To date, I've seen no such evidence. The alphabetical order is more generally understood, both by users and bots, so it has one significant adventage over a system that leaves sequence up to individual choice that varies on a page by page basis. --EncycloPetey 20:17, 16 July 2007 (UTC)
Are there any bots that take advantage of our current (putative) alphabetical ordering? And are you sure that most of our users know that's the ordering we use? (Or do you just mean that our users know what alphabetical ordering is?) It goes without saying that logical ordering would be an improvement in many cases; for it to be a net problem, there would need to be some disadvantage to it, and I don't see that individual choice and variability is really a disadvantage; after all, we already have much more serious page-by-page layout variations (e.g., some language sections have etymologies at the same level as part-of-speech sections, while others have multiple etymologies, each containing one or more part-of-speech sections). This isn't to say that evidence-gathering is a bad idea, though; what sort of evidence do you have in mind? —RuakhTALK 20:47, 16 July 2007 (UTC)
I agree with EncycloPetey. I don't see a universal benefit of randomizing the order they appear. For my brain, it doesn't make a lot of sense to allow them to appear in random order. For bots, as long is there is an order (whatever that consistently works out to be,) the bots will be fine. (E.g., if we state somewhere: 1. Noun, 2. Verb, 3. Adjective, 4. Interjection, ...) But putting POS headings in a random subjective order (in 2005) lead to pretty strange disputes. Returning to the alphabetic sequence for POS headings has been an enormous improvement, on that front. On most multiple-etymology "long" entries, it is hard enough to find a sense you are looking for. Randomizing the sequence will only make that harder. --Connel MacKenzie 16:27, 19 July 2007 (UTC)
It's not random though, it's logical. Bots may not be able to follow the logic but users certianly can. To me an alphabetical order is what looks random. It also makes our etymologies look weird if they are applying to a part of speech which is halfway down the entry. Widsith 08:32, 20 July 2007 (UTC)
Of course I understood what was meant; but it is random, from one entry to the next. (Or would be, under that scheme.) Even, as you point out, from one etymology to the next, it would be random. --Connel MacKenzie 10:11, 20 July 2007 (UTC)
It is not random at all. "Random" means with "not having any discernible order or pattern"; putting a noun after a verb because the noun is derived from the verb gives them in etymological order (and so is ordered, not random), and makes much more sense than alphabetical order, which, ironically, introduces randomness because there is then no order to the etymological progression (if any) of the parts of speech, as illustrated by the example I gave.
No other credible dictionary uses alphabetical order for its parts of speech, as far as I am aware. — Paul G 14:15, 6 August 2007 (UTC)
Paul, you seem to have misunderstood; it is random, in that no entry has a set POS order, with any consistency. We already over-emphasize etymology in the heading levels (instead of homonyms.) But you are suggesting the POS order be randomized further to give "etymology" an even more-inflated role (when compared to other dictionaries). Webster's 1913 certainly used alphabetic order for its POS ordering (m-w.com returns separate results for each POS, so it isn't exactly comparable.) Look at a table of contents of one of our longer entries; there is currently no way to discern where a given meaning might be hiding, given the over-emphasis on etymology. But to randomize the POS ordering below that would make it even harder. --Connel MacKenzie 20:48, 6 August 2007 (UTC)
How should we order the senses of green according to this proposal? Consider these four senses:
  • (n) the color of plants, etc.
  • (adj) having green as its color
  • (adj) environmentally friendly
  • (n) a member of a green party, an environmentalist
Clearly (I think it's clear, at least), the second derives logically from the first and is best understood after reading it; likewise for the fourth and the third. So how should we order them, according to this proposal? —msh210 21:38, 6 August 2007 (UTC)
I don't think it's at all "clear" that the first or second definition is the original. I rather expect that both definitions are "orginal", since color words are routinely used as both noun and adjecitve in a wide range of IE languages. --EncycloPetey 21:42, 6 August 2007 (UTC)
But the question isn't which one is the original, in the sense of "came first". Paul G's original comment was "you can't use a word in some part of speech A in a definition for another part of speech B until it has been defined in part of speech A", which applies here, no? —msh210 18:22, 13 August 2007 (UTC)

More than just a form!

{{form of}} and its variants, as well as {{alternative spelling of}} and whatnot, all start with a capital letter and end with a period. This isn't always ideal; for example, the word paparazzi is so much more common than the word paparazzo (google:paparazzi, google books:paparazzi, google news:paparazzi, and google scholar:paparazzi get 10.5, 1.4, 28.2, and 2.8 times the hits as their counterparts for "paparazzo", respectively), and has so many senses that derive from its simple "Plural of paparazzo." sense, that it would make sense to actually define it; but to do so would require either not using {{plural of}}, or resorting to a sloppy two-sentence sense line. Ideally, I think the various {{form of}}-like templates should each take an optional |nocap=1 parameter that would de-capitalize the first word (allowing another part of the definition to precede it) and an optional |nodot=1 parameter that would remove the final period (allowing another part of the definition to follow it). Does this sound reasonable? (The templates are so widely transcluded that I don't want to make modifications to them before being sure that other editors support those modifications.) —RuakhTALK 20:20, 16 July 2007 (UTC)

That sounds reasonable. I've seen some instances where I would have liked to eliminate the final period and add additional information to the definition line, such as when an inflected form is tied to a particular sense of a word and a parenthetic gloss need to be added. —This unsigned comment was added by EncycloPetey (talkcontribs) at 20:39, 16 July 2007 (UTC).
I totally agree. I had intended to raise the issue of optional omission of periods at some point too. † Raifʻhār Doremítzwr 13:14, 17 July 2007 (UTC)
I do not even understand the sentiment you expressed. What is "sloppy" about multiple sentence definitions? Should we remove every sentence (after the first sentence,) from all definitions? --Connel MacKenzie 16:16, 19 July 2007 (UTC)
I guess I misspoke a bit. It's not that multiple-​sentence definitions are sloppy (though I can't admit to being a huge fan of those, either), but that (IMHO) multiple-​sentence-​fragment-​formatted-​as-​sentences definitions are sloppy and potentially confusing. And obviously I'm not suggesting that information should be removed, but I do think that such definitions should be cleaned up. If a definition really can't be re-written in single-​sentence format, then probably either (1) it's encyclopedic, (2) it's covering multiple senses that should be listed separately, or (3) it's containing information that's better suited to another section (usually usage notes or derived terms). If none of those is the case for a given definition, then I guess we can't dispense with the multiple-​sentence (non-)​format for that definition; but I don't see why that means we should use that format in other cases. —RuakhTALK 18:14, 19 July 2007 (UTC)
Note also that, so far as I know, we never resolved the issue of whether we prefer sentence-style (beginning with a capital letter and ending in a period) for definitions with gloss semantics, i.e. definitions that are meant to be able to replace the defined entry. The definitions of the lemma of most nouns, verbs, adjectives, and adverbs are written with gloss semantics. Some definitions cannot reasonably be written as a gloss. Most articles, particles, prepositions, and non-lemma require such sentence definitions. When I write definitions, I reserve the initial capital and trailing period for non-gloss definitions to distinguish them from gloss definitions (except, of course, for glosses of sentences, like proverbs and most phrasebook entries).
The reason I mention this is that multiple-sentence definitions are probably seen as "sloppy" only in the context of gloss-style definitions, yet definitions of non-lemma are not easily written as glosses, so I doubt such criteria apply. Rod (A. Smith) 17:03, 19 July 2007 (UTC)
Personally use sentence-style for all definitions (though this should be discussed separately and a consensus reached on that point), but it's a good point that non-gloss-style definitions aren't so sloppy when written as multiple sentences. My goal here, though, was to cover half-gloss definitions: a definition that contains both a gloss and an actual definition (useful for various kinds of entries). —RuakhTALK 18:14, 19 July 2007 (UTC)
But that only raises another issue: a gloss such as "plural of ----" is not a definition at all. --Connel MacKenzie 10:18, 20 July 2007 (UTC)

What do vule vu mean " Vule Vu and i do mean you"

What does vule vu mean?

Probably voulez-vous, which means "do you want to". —Stephen 01:37, 17 July 2007 (UTC)

two senses or one

Is it reasonable to say that if a word is translated into two different words in a certain foreign language, and those two foreign words have different meanings, then the original word has two senses (definitions) and we should write its entry to reflect that, even if speakers of its language merge them in their minds? —msh210 05:47, 17 July 2007 (UTC)

No, not as a general rule. However, when such cases do arise it is a good idea to take a close look at the definitions because there might be two different senses in there. --EncycloPetey 05:57, 17 July 2007 (UTC)
I agree with EncycloPetey. This is one thing you see a lot with kinship terminology; different languages divide up the kinship space differently. For one example that's fairly radically different from English, Hawaiian basic kinship terminology has no words for "brother" or "sister", instead having words that mean "older same-gender sibling", "younger same-gender sibling", "opposite-gender brother", and "opposite-gender sister". This doesn't mean that brother should have three senses, one for each of "older same-gender brother", "younger same-gender brother", and "opposite-gender brother" — though, since Hawaiian is not the only language that tends to distinguish age in basic kinship words, it might be useful for brother to say something like "A male sibling; a male with the same parents; used regardless of age." —RuakhTALK 15:54, 17 July 2007 (UTC)
Thanks, EncycloPetey, Ruakh. I was thinking specifically about garbage, which is the usual translation (gloss) for two different American Sign Language words, one of which refers to clean garbage (e.g., junk mail) and the other to dirty garbage (e.g., food scraps). But I guess we shouldn't split garbage's definition up into two, then. Thanks again for the clarification. —msh210 13:31, 18 July 2007 (UTC)
In the case of garbage, there may be two separate senses to define...unless both senses you're thinking of are specifically physical items to be thrown away. If, on the other hand, you're just looking for a good match for the sign language words, I's recommend junk for "clean garbage". --EncycloPetey 19:24, 18 July 2007 (UTC)


new to this but a fast learner...tell me what you think about the word pariah...seems to mean many things to many people and I am hoping to put it all together... —This unsigned comment was added by Paloma (talkcontribs) at 18:03, 17 July 2007 (UTC).

Restart Wiktionary:Votes/pl-2007-07/exclusion of possessive case

Looking at the current "support" votes, it seems very clear to me that the broad majority of voters have misinterpreted the vote and more importantly, the clarifications of it.

The vote is proposed as a change to existing practice with the intent of nominating a few dozen useful entries on WT:RFD (which previously have passed) while also excluding future forms from being bot-entered.

The proponents wording has used at least a dozen non-existent words for circumlocution. The qualification comments of most of the support votes indicate that the voters (in spirit) oppose the change suggested; apparently not realizing that the current half-wording is directly a change to current practice. The proposed template changes are technically not simplistic, but are portrayed as such; no prototype has been offered, nor have comments been solicited on such a change. It certainly hasn't been tested in box vs. no-box mode.

But most importantly, the vote itself is invalid. It does not provide the actual wording being proposed as a change to WT:CFI, instead only hinting at what is meant. In a less controversial vote, that might be OK? Considering the voluminous delusory qualifications, it seems untenable to continue this vote.

If any validity is to be given to the proposed change to policy, it would be enormously beneficial for the proponents to withdraw the current vote and restart it with the exact proposed wording changes.

--Connel MacKenzie 19:22, 20 July 2007 (UTC)

Grouping types of categories

Looking at the categories for entry "smelt" the mix of categories there is not particularly useful, e.g. category:Fish is sandwiched between category:Translations to be checked (Spanish) and category:Middle English derivations. Thinking about it a bit more, there are five different types of category on this page

Also on main namespace pages, (but not "smelt") are

Outside the main namespace, we also have

My thought is that to make all of these categories more useful is to divide the categories display into these distinct sections, so that the placement on a page would order the categories only within the relevant section. Possibly with a WT:PREFS preference to show or hide each type by default (the default being to show all)

The way I initially thought to implement this would be for each category page to have a type attribute set by some markup on the category description page (perhaps __type:subject__). This would (I'm almost certain) require the developers to add this functionality to MediaWiki. The functionality would be enabled/disabled and (if enabled) the types and the their display order, would be set on a per-project basis by a page in the MediaWiki: namespace (perhaps MediaWiki:Category types). If it was enabled on a project, a per-user preferences option to disable it would probably be welcomed. As might a preference to override the default order the types are displayed in favour of a custom order (although this is less important).

Any category would have to have a type statement before it can be saved (this currently happens for urls appearing on the spam blacklist, so should be technically possible). Obviously some arrangement would need to be made for existing categories upon implementation so things don't break, perhaps assuming that all categories are of one type (perhaps the first defined, possibly "subject" in Wiktionary's case) unless explicitly defined. A bot could easily add an appropriate definition some, for example, all categories with the word "templates" in the name as template categories.

A possible extension to this would be to only permit certain types of category in certain namespaces (e.g. subject categories should only be in the main namespace; project pages should not be in the main namespace). A non-hardlinked category appearing on a wrong-namespace page would prevent the page being saved until it was hardlinked or removed. Again the permitted namespaces would be set on the same MediaWiki namespace page.

An alternative would possibly be to do something with JavaScript. This would not be intrusive for categories added via template but would definitely be so to categories added explicitly. I have no idea how difficult either of these might be to code. I also don't know where the category types or their order would be defined.

If a change to MediaWiki is needed the devs will not even consider adding it without evidence that there is community support for it. This is also obviously just my idea and likely needs tweaking even if there is universal support for the idea (not something I'm obviously hoping for, but not expecting). I've started this discussion here as the first question that needs to be asked is "Do we want (something like) this?" only if the answer to that is "Yes" is it worth spending any effort on details of implementation (something possibly better suited to the Grease Pit). If the consensus is that we do want this, or something like it, then it would (most likely) have to be submitted to the devs as a feature request, and some of these can take months or years (*cough* single log-on *cough*) to be implemented (although simpler ones have been implemented in a couple of days, this is not a simple change to two or three lines of code).

See also these possibly relevant other discussions/comments: [11] [12] [13] (add any other relevant discussions you know about).

So, do we want (something like) this? Discuss. Thryduulf 15:23, 22 July 2007 (UTC)

No. We just voted to put all the categories at the end of the relevant language section, so opening a discussion to overturn that vote would be asking everyone to start over from scratch. The option of placing category tags withint specific sections was voted down. --EncycloPetey 23:42, 22 July 2007 (UTC) --EncycloPetey 00:16, 23 July 2007 (UTC)
I think you're misunderstanding Thryduulf's suggestion? By my reading, he's not suggesting that categories be placed differently in editing, only that category display be collated; I guess that instead of a single "Categories:" line, there'd be a "Cleanup categories:" line, a "Derivations:" line, etc., or maybe that each of these would be labeled within the general "Categories:" line. —RuakhTALK 23:57, 22 July 2007 (UTC)
You're right, I had misunderstood. The idea might be worthwhile, but I'm not sure it's worthwhile yet. Most of our entries have only one category on them. The realted technical question is how much computer effort would it add (slowing down page display) for those exceptional pages that would have, say 50 categories because of mutliple languages and multible TTBC links? --EncycloPetey 00:16, 23 July 2007 (UTC)
(after edit conflict): I don't think pages need to have 50 categories on them for this to be worthwhile - it will benefit pages with as few as 3 where two of the same type are not consecutive. . Regarding the computational effort - *shrug* I have no idea. Thryduulf 00:50, 23 July 2007 (UTC)
Indeed, I am not suggesting any changes to where categories are placed when editing at all - no changes will be made to the source of any page outside the Category: namepsace. The current placement of categories in the source is logical for the source, but when combined with templates that add categories, the display order is not logical. See for example "smelt", a word with three etymologies. Currently the categories display as:
Categories: Old English derivations | English nouns | Check translations | Translations to be checked (Spanish) | Fish | Middle English derivations | English simple past forms | English past participles | Requests for etymology | English verbs | Translations to be checked (Portuguese)

With my suggestion, these would be displayed something like:

Categories: Parts of speech: English nouns | English simple past forms | English past participles | English verbs
Etymologies: Old English derivations | Middle English derivations
Items to be checked: Check translations | Translations to be checked (Spanish) | Translations to be checked (Portuguese)
Requests: Requests for etymology
Subjects: Fish
The specifics of how it would be displayed can be worked out later, but this is the general idea to make the categories more useful - for example you will be able to see at a glance that there is a use of the word relating to fish; this category is currently lost in the middle.
The different types of category group pages for different purposes - e.g.
My suggestion is to group the display of categories by the function they serve - a function that, if everything is implemented as I propose above, can be tweaked or turned off as users prefer. Regardless of where in the source you add an explicit category, a word used as a noun in computing and as a verb in archaeology will not display the two subject categories adjacent to each other. Equally the two POS categories will also not be adjacent Thryduulf 00:50, 23 July 2007 (UTC)

Separately from the above, I've realised that there are two other types of main namespace categories:

Although the second of these might fit with the POS categories. Thryduulf 00:50, 23 July 2007 (UTC)

And there are more, such as:
There are probebly more types of categories besides. With so many different posibilities, is your system going to be overkill? Or should we leave it to individual editors to group categories as they come across them? Remember that some categories are added by templates, and so their position in the listing may be difficult to place in sequence. --EncycloPetey 02:58, 23 July 2007 (UTC)
The "colloquial" and "slang" categories are usage categories like "informal" and "derogatory" (see my comment immediately above yours). "English nouns with irregular plurals", it could be classed as a part of speech, but it is categorising words by form so it probably should be grouped with others that do that. Statistical ranking probably is separate again, but the point here is not to allocate every single category to a type. If this proposal is accepted and the devs implement it, that will come later. At that time we (the Wiktionary community) will need to decide how we want to group categories - the vast majority of individual categories will be easy to allocate and we can discuss the others then.
Yes, categories are added by template - this is precisely why the current display order of categories is not logical and not possible to rearrange. At its most basic level this proposal is to make the order categories are displayed independent from the order they appear in the page source. In order to make categories transcluded from templates display, the processing of the category display will have to take place after the processing of transclusion/substitution of templates - but this is a technical detail that is of no consequence to the end result. Thryduulf 07:39, 23 July 2007 (UTC)

Pragmatic question: How would all this work on a page like mil, where there are multiple languages but only one or two categories per language? --EncycloPetey 03:08, 23 July 2007 (UTC)

The way I envisage sorting the template display is to separate by function, not by language so Category:Portuguese nouns and Category:Estonian nouns would appear in the same section as category:English nouns. However if this is ever implemented it will be up to the community how we want to group categories, so it could be split by language instead/as well if that is what people want. Which types are used, what they are called, and what order they appear in will be settable by a sysop-editable page in the MediaWiki: namespace. It is possible (I hope) users will able to override the default order for a custom one if they prefer. Thryduulf 07:39, 23 July 2007 (UTC)
I’m all pro. Go ahead, and show us how it looks like, then we can discuss details like order and spliting for languages and such. H. (talk) 11:14, 24 July 2007 (UTC)

This seems like a great idea. Categorical messiness has never really bothered me, but thinking about it, I see that Thryduulf’s idea for sorting them would be a great improvement. Whilst the details still need to be clarified, I think the general idea is a fine and unobjectionable one. † Raifʻhār Doremítzwr 11:22, 24 July 2007 (UTC)

  • I like Thryduulf's idea. I guess the question is: will the benefits outweigh the amount of work required to implement it? A question only the devs can answer. I agree that the categories are messy but I rarely use them directly, hence the is it worth it question. --Kylemew 11:20, 25 July 2007 (UTC)

This is all very interesting. I can't imagine any non-nightmarish way to implement it, though. The groupings themselves would need to come from a MediaWiki: (dynamic) page, which would then have category lists from other MediaWiki: (dynamic) pages, all that would have to be checked each time a page is rendered? That is a lot of extra page hits, for every page rendered. (I.e., after each edit.) I think if we had Hippietrail's software loaded here (demo at http://wiktionarydev.leuksman.com/index.php/mil) we could ask him to think about some of this fancy stuff. That already does language masking, with only a small effort, he can probably apply his filter by language to the categories as well. I just imported en.wikt:'s mil page over there for testing & demonstration purposes. Ask me on my talk page for any other individual pages you'd like to test there. --Connel MacKenzie 03:21, 26 July 2007 (UTC)

I take the point about multiple page hits, but I was thinking of a way that, I think, would involve fewer than how you were thinking:
  1. On saving/viewing the page the list of categories on the page, including those transcluded/substituted from templates, is generated
  2. The category type statement is read from the category namespace page for each of these categories, and a list of category types generated
  3. The order the category types are to be displayed in is read from MediaWiki:Category types.
  4. The page is rendered in this order.
There are no lists of categories anywhere, and only the one MediaWiki: namespace page. I presume that each Category namespace page must already be looked for to determine existence (for the purposes of a red or blue link). If it is possible to do a "if this category exists, read the type statement" as a single page read (and I have no idea whether it is or isn't) then this would not cause any additional page hits. I don't envisage the MediaWiki: namespace page defining the category type display order to change at all frequently, at least after the initial set-up, so it could be cached without problems- I don't know whether this would reduce any of the needed page hits. Thryduulf 08:54, 26 July 2007 (UTC)

Multiple ways to tackle this

  • To group categories one straightforward way would be for each wiki to have multiple configurable category namespaces. Instead of just Category: we could set up Wiktionary to have Language: and Part of speech: etc. Each would work just like Category: but the links would appear in each group instead of glommed together.
  • To make per-user filterable categories for people that only want to see certain categories, it would be possible to add multiple CSS classes to the HTML span of each category: (span class='english' class='noun')foo(/span)

For the former there is apparently a new namespace manager in the works and multiple cateogry namespaces have already been brought up along with the idea of multiple talk namespaces. This needs to be done in the PHP codebase.

For the latter I've looked at the code and it's definitely easier to solve using Javascript than PHP. — Hippietrail 13:33, 26 July 2007 (UTC)

Language templates

Moved to WT:GP --Williamsayers79 07:51, 26 July 2007 (UTC)

Possible loanwords

Terms and phrases that are clearly foreign in origin, but that are used in English contexts, have come up a few times recently at RFV and at RFD, and I think it would be nice for the CFI to say more about them. (Right now, all they say is, "Any word in any language might be borrowed into English, but only a few actually are. Including spaghetti does not imply that ricordati is next (though it is of course fine as an Italian entry).") So, below are a few of my thoughts; I'm signing each one individually to make it easier for them to be discussed separately, and I hope that people will post their thoughts as well (and, just as important, any reasoning they can articulate) so we can set about forming a consensus that everyone is mostly O.K. with. —RuakhTALK 21:24, 26 July 2007 (UTC)

Firstly, I think that all else being equal, the more frequently a word is used in English contexts, the more strongly it merits an English-language section. —RuakhTALK 21:24, 26 July 2007 (UTC)

Agree. Widsith 21:29, 26 July 2007 (UTC)
Agree. Thryduulf 22:41, 26 July 2007 (UTC)
I disagree. The more scholarly or academic a writing is, the more likely it is to borrow foreign terms. That doesn't make it standard English, it makes it a frequent borrowing in academic-only contexts. When it crosses over to colloquial understanding, is a better measure of when it has been absorbed into English. --Connel MacKenzie 23:47, 28 July 2007 (UTC)

Secondly, I think that if a word is generally italicized to mark it as foreign, that's an argument against keeping an English entry for it — not a decisive argument, mind, but an argument. (One complication: more formal/literary/&c. works are more likely to italicize loanwords. This might work out O.K., though, because they're also more likely to use foreign words that really aren't loanwords.) —RuakhTALK 21:24, 26 July 2007 (UTC)

Yes, though I think if a word or phrase is often used in English sentences - whether in italics or not -- we should have an entry for it. The meaning is often different from the meaning in source language, and the pronunciation almost always is. Sometimes the phrase does not even exist in source (this may be the case with lapsus linguae?). Widsith 21:29, 26 July 2007 (UTC)
Yes. Do note the dates of sources however - the frequency of italicisation will generally decrease as frequency of use increases. When first borrowed, most if not all uses will be italicised, but after several years of use it will, generally, be italicised less frequently. For example where a 1990 cite has the word in italics but a 2005 one does not, this should be taken as an indication of naturalisation of the word; the usage in newer sources should be treated as more significant than uses in older ones. Thryduulf 22:41, 26 July 2007 (UTC)
Generally, I agree with this. The long-standing convention of not allowing italics to count for verification purposes is simple and concrete, eliminating dispute (when adhered to.) Diluting that rule to say that is no longer a decisive argument seems subjective (therefore increasing dispute.) If we were to say that "in the last three years, the term was more often italicized then it was used in plain-text, therefore should not be considered English," (even despite three citations in plain-text) then I agree. But is that what you were suggesting? --Connel MacKenzie 23:47, 28 July 2007 (UTC)

Thirdly, I think we should have a higher standard for words that are the same in English as in their source language than for words that do. This is a practical matter; I'm not sure we necessarily need an English entry for lapsus linguae, for example, since anyone visiting that page will see the Latin entry, but we definitely need English entries for Shabbat and Shabbos, since otherwise people encountering those terms won't be able to look them up. (Those aren't the best examples, because there are other complicating factors, such as frequency of use in English contexts; but you see what I mean.) —RuakhTALK 21:24, 26 July 2007 (UTC)

Don't understand this one. Widsith 21:29, 26 July 2007 (UTC)
Well, if jalfouillé is a French word that sometimes appears in English contexts as jalfouillé (i.e., without modification), then an entry for jalfouillé will obviously exist, and if there's no English language section, there'll still be the helpful French language section. (At least, it's helpful insofar as the French and English uses of the word are the same.) However, if Template:HEchar is an Ancient Hebrew word that sometimes appears in English contexts as giduah, then either we have an entry for giduah with an English language section, or we have no entry for giduah and we're useless to someone trying to look it up. —RuakhTALK 22:02, 26 July 2007 (UTC)
I get you. I think I disagree: I think we should only have a giduah entry if it's been used consistently in English as per CFI. Widsith 22:16, 26 July 2007 (UTC)
I think we should have standardized romanized entries for all words in foreign scripts. I think our failure to provide all words in all languages, in this regard, is a mistake. Our target audience will most often type the word in ASCII characters (perhaps as they heard it.) But such an entry should be a hard (or soft) redirect to the proper Hebrew entry. --Connel MacKenzie 23:47, 28 July 2007 (UTC)
That would be tantamount to inventing spellings that have never existed. It would totally go against CFI. And we couldn'y label such an entry with "Hebrew" because it would never have been spelled that way in Hebrew. This information is better coded as a transliteration within a page, which can still be searched for. --EncycloPetey 05:44, 29 July 2007 (UTC)
The problem with that is that giduah does not fit our current Hebrew romanization scheme, nor any I could imagine us adopting (using "g" for Template:HEchar is very old-fashioned). Unless we want Hebrew entries to list all possible romanizations just for searching purposes, but personally I don't feel very comfortable with that idea? —RuakhTALK 06:01, 29 July 2007 (UTC)
I don't feel comfortable with that either. The core problem is still that there are many possible ways to transcribe any given word into Roman script. Sometimes, a writer will choose a standard published transliteration system. Other times, a publication will even develop its own system just for the one book. But even for standard systems there are many competing systems. I've got a chart next to my computer for dealing with the three most common Russian-into-Roman-script transliterations schemes that are used in the US. The Germans use completely different systems, so do the Poles, the Czechs, the Italians, etc. We can't possibly hope to verify and support all the possible Romanizations of every Russian, Greek, Hebrew, Arabic, Chinese, ... word. And having entries for each possible transliteration is just silly. There won't be any additional information carried on such a page except that so-and-so uses this spelling to transcribe word X for use in language Y. And we can't mark them as either language X (which wouldn't spell it that way) or language Y (because it's a transcription). --EncycloPetey 06:32, 29 July 2007 (UTC)

Fourthly, I think that quotations that demonstrate English-ly inflected forms or derived terms should necessarily be considered English. Assuming we have three such quotations that otherwise pass CFI (not-just-mentions, independent, etc.), the word passes CFI. Examples include:

  • writing curriculum vitaes instead of curricula vitae.
  • treating kudos as a plurale tantum, or as the regularly-formed plural of a singular noun kudo.
  • writing octopuses instead of octopodes or octopi. (I realize the latter is incorrect Greek, but I think it nonetheless reflects an attempt to pluralize it correctly for the source language, so shouldn't be decisive for counting octopus as English.)

I think these should count toward CFI even if said regular form is considered non-standard in English. —RuakhTALK 21:24, 26 July 2007 (UTC)

Yep. Widsith 21:29, 26 July 2007 (UTC)
Agree. Thryduulf 22:41, 26 July 2007 (UTC)
Occurrence of "octopuses" implies "octopus" is an English word? I think I agree. Not sure what you mean to say about attestation though. Three occurrences of "octopuses" can be used to "prove" that "octopus" is an English word? That is only true in some limited cases, I think. ("Octopus" is a bad example, as it has been undeniably English for a very, very long time.) "Kudos" is a better example, but tainted by a single talk-show's popularity. The cv example is more complicated, as it is a term in widespread use in Europe, therefore merits an entry (and has for a very long time - longer than Wiktionary has been around, anyhow.) --Connel MacKenzie 23:47, 28 July 2007 (UTC)

Fifthly, I think that while some weird letters are used sometimes in normal English words in some contexts (to wit: è, vowels with diereses, æ, and œ), generally the use of these or others in a loanword suggests that the word hasn't really been adopted into English. (Exceptions include café and piñata; clearly, this criterion isn't decisive.) —RuakhTALK 21:24, 26 July 2007 (UTC)

Would rather not make it a rule since there are so many exceptions. Widsith 21:29, 26 July 2007 (UTC)
The presence or absence of diacritics and ligatures etc, depends largely on the formality, context, pronunciation, similarity to other existing words, similarity of the source-language orthography to English orthography, region (if Connel is correct American English is far less tolerant of these than British English), the individual and even what point they are trying to make (for example the spelling "encylopædia" is far more common on Wikipedia talk pages when the subject is perceived American bias than otherwise). This makes it too difficult to come up with a rule imho. Thryduulf 22:41, 26 July 2007 (UTC)
I agree that if we are suggesting the "proper" spelling is accented or contains a ligature, it is a very strong indication that the term has not been absorbed into standard English. In that case, the loanword should normally appear under a foreign language heading. --Connel MacKenzie 23:47, 28 July 2007 (UTC)

Sixthly, I think that if there's a distinctive English pronunciation of a word, this is an argument for having an English language section, so we can give the pronunciation. (This one is difficult for languages that we borrow productively from — for example, there's a distinctive English pronunciation of lapsus linguae, but only because there are fairly regular rules for how we pronounce Latin words in English contexts — but might be helpful in other cases.) —RuakhTALK 21:24, 26 July 2007 (UTC)

Yes as above, problem is this is almost always the case. Widsith 21:29, 26 July 2007 (UTC)
imho, if there is a consistent English pronunciation that is significantly different from the native pronunciation, this should be an argument in favour of an English Entry. If the English pronunciation is different to what one would expect for a word from the source language in question this is a stronger argument, although neither, imho, should be decisive on their own. Thryduulf 22:41, 26 July 2007 (UTC)
I think I agree, but the definition in the English section will usually be a soft-link to the source language in these cases. Some more examples here would help me decide this one. --Connel MacKenzie 23:47, 28 July 2007 (UTC)

Seventhly, I think that if a word's meaning in English is not quite that in its source language, this is a strong argument for having an English language section. This especially affects terms of art (especially legal and scientific terminology), but might also (awkwardly) affect many words that English still treats as foreign — we're not known for being precise in our borrowings. —RuakhTALK 21:24, 26 July 2007 (UTC)

Definitely. Widsith 21:29, 26 July 2007 (UTC)
Absolutely. Thryduulf 22:41, 26 July 2007 (UTC)
This seems to be too subjective a criteria. Anything can be assigned a definition that is "not quite" the same as the source language. I think I disagree with this, but specific examples would help me decide. --Connel MacKenzie 23:47, 28 July 2007 (UTC)

Eighthly, I think that if a sum-of-parts phrase from a foreign language is used in English contexts with any sort of frequency, then it needs some sort of entry in one language or the other. —RuakhTALK 21:24, 26 July 2007 (UTC)

Yes. Widsith 21:29, 26 July 2007 (UTC)
Yes. If the phrase is not used with any frequency (or a significantly lower frequency) in the source language, this should definitely be an English entry. Thryduulf 22:41, 26 July 2007 (UTC)
Isn't this off-topic? For lapsus linguae I think I've said all along it should have a ==Latin== section? --Connel MacKenzie 23:47, 28 July 2007 (UTC)
I don't think this criterion affects lapsus linguae, because if this phrase did exist in Latin, it was an idiom. (My thought here pertains to something like honi soit qui mal y pense, which is a straightforward phrase in Middle French, not an idiom at all, but likely to be opaque to a speaker of Modern English who's not already familiar with it. I'm not sure if it warrants inclusion as an English entry — it was used in the English translation I read of Anna Karenina, and as the characters do not otherwise break into Middle French, I assume the translator thought that was the appropriate phrase for an English context — but our current CFI don't allow it to be included as a Middle French entry. So, the idea here was override the current CFI in this regard: if a sum-of-parts phrase of foreign origin is found with any sort of frequency in English contexts, then it needs some sort of entry in one language or the other.) —RuakhTALK 02:41, 29 July 2007 (UTC)

Ninthly, I think that where reasonable, similar rules should apply to terms from foreign language X used in foreign-language-Y contexts. (After all, English isn't the only language with loanwords!) —RuakhTALK 21:24, 26 July 2007 (UTC)

Yes. Widsith 21:29, 26 July 2007 (UTC)
Yes. Thryduulf 22:41, 26 July 2007 (UTC)
When in doubt, we customarily defer to other language Wiktionaries. I would not like to see that custom overturned only for borrowed phrases (nor overturned at all.) --Connel MacKenzie 23:47, 28 July 2007 (UTC)

Tenthly and lastly, there may be major things I haven't covered. If so, please add your own points for people to discuss. :-) —RuakhTALK 21:24, 26 July 2007 (UTC)

Why are userboxes not allowed?

Why are userboxes not permitted on Wiktionary? When I created a userbox and put it on my userpage, it was removed, and I was blocked for re-adding it (the blocking admin subsequently unblocked me and apologised for over-reacting). I have little experience here, but quite a lot on Wikipedia (as w:User:Walton One) and I don't understand what's so offensive about the concept of userboxes and of personalisation of userspace. Also, I couldn't see anything in the Wiktionary policies and guidelines about userboxes, or indeed any kind of guideline about userspace content (we have one on Wikipedia at w:WP:USERPAGE). If userboxes are banned here, it would be helpful if this could be explicitly stated in a guideline for new users, and if someone could explain why, as I can't think of a single justification for this rule. Eric the Gnome 14:16, 27 July 2007 (UTC)

There has been considerable controversy about userboxes in other projects. See w:Wikipedia:Jimbo on Userboxes and w:Wikipedia:Userbox policy poll for some background. To avoid such counterproductive squabbles, the only userboxes we have here are Babel templates. So far as I know, though, userbox avoidance is de facto but not de jure. Rod (A. Smith) 15:25, 27 July 2007 (UTC)
I'm familiar with the situation on Wikipedia, since I'm an administrator there and have edited for a year and a half. But if userbox avoidance here is de facto but not de jure, then why was I blocked for re-adding a userbox to my Wiktionary userpage? I don't know about here, but on Wikipedia we almost never issue blocks for a first offence (unless it's blatant vandalism), and we certainly don't encourage sysops to block people in order to enforce policies that aren't even policies. Eric the Gnome 15:38, 27 July 2007 (UTC)
While I would agree that you ought not have been blocked, Wiktionary runs a much tighter ship than Wikipedia in part because the mission of writing a dictionary is much more tightly focused than the rather free-wheeling mission of writing an encyclopedia. While it might make sense for a Wikipedian to advertise personal characteristics and interests for the sake of alterting others to that Wikpedian's knowledge of an area in which lengthy and involved articles may be written, all a Wiktionarian really needs to advertise is what languages he or she speaks and how well. bd2412 T 15:44, 27 July 2007 (UTC)
Fair enough, but I still don't see how personalisation of one's userspace is actively detrimental to the mission of writing a dictionary. Furthermore, if it's not allowed, then there needs to be a coherent policy or guideline about it, rather than just letting sysops run around biting the newbies and not explaining their actions with a basis in policy. Eric the Gnome 15:49, 27 July 2007 (UTC)
I'm sorry, but I don't see how you can assert that "pedophile userboxes" was not actively detrimental. I'd estimate at least a gigabyte of edit history to Wikipedia policy discussion pages was wasted toward that controversy alone. (Multiplied by a million readers per day, that is a lot of wasted bandwidth.) --Connel MacKenzie 17:25, 27 July 2007 (UTC) For example: the ten edits to this thread each "cost" the database almost 300KB of storage each. The similar (but larger) Wikipedia pages had thousands of similar edits. --Connel MacKenzie 17:28, 27 July 2007 (UTC)
I agree that we need a policy. However, apart from the Babel boxes, it may also be worth allowing the userboxes that indicate administratorship and the editor’s (or, indeed, the editrix’s) sex. † Raifʻhār Doremítzwr 16:46, 27 July 2007 (UTC)
Note also that the ratio of new accounts here whose owners are intentionally disruptive to those whose owners intend to help is perhaps greater here than on Wikipedia. With that knowledge, your revert to add controversial userbox, and your username's unfortunate trolling connotations, the blocking admin probably assumed that you intended more disruption than help. Please know that the blocking admin acknowledged the no-newbie-biting guideline when he unblocked you. Sorry you became a casualty of the counter-vandal battle. From your edit history, I gather you are actually here to help. So, I apologize for any inconvenience. You are welcome here. Rod (A. Smith) 16:54, 27 July 2007 (UTC)
I propose the following addition to WT:NPOV:


Due primarily to the considerable time and effort wasted on our sister project Wikipedia, controversial "userboxes" are simply prohibited on the English Wiktionary. The only general exception (allowed without question) are {{Babel}} templates that correspond to an ISO-639 language code.

In conjunction with that, I'd also change {{welcome}} et al. to add a similar reminder (those don't need a WT:VOTE to be changed.)
Does this seem like an acceptable way to formalize our convention, to everyone? Shall I start a vote, or does anyone have a better wording for the above? (As a minor issue, I think it needs to be kept quite brief.) --Connel MacKenzie 17:20, 27 July 2007 (UTC)
That seems fine to me, although I don’t think that userboxes needs quotation marks. Also, what about allowing an exception for sex and administratorship userboxes? –Both would be utile here. † Raifʻhār Doremítzwr 17:31, 27 July 2007 (UTC)
I'm OK with that wording, although I accept that my opinion probably carries little weight here (as opposed to on Wikipedia). As to my username, I didn't realise it carried "trolling connotations"; I don't know what the rules on usernames are here (on Wikipedia, we don't allow those which are profane, highly controversial, or random and confusing). Mine is, I admit, rather flippant (it was an experiment), but I've applied for a change of username to match my Wikipedia name (it hasn't been done yet, perhaps Wiktionary is rather understaffed on the bureaucrat front). Eric the Gnome 17:39, 27 July 2007 (UTC)
To avoid any potential disputes about the deinition and interpretation of "controversial" (cf the vastly differing interpretations of "divisive and inflammatory" at WP) I suggest that we word it as "...Wikipedia, the only userboxes always allowed are {{Babel}} templates ... [and userboxes denoting the users' gender and administratorship]." Perhaps say to bring any specific requests here, but note that there must be a good reason for them. Otherwise the wording is good. 19:34, 27 July 2007 (UTC)
I don't see gender as relevant here. Allowing for adminship boxes is fine, but not really needed. We have only 50 admins. --EncycloPetey 00:13, 28 July 2007 (UTC)
It would be useful if new editors could discover immediately who is and who isn’t an administrator (I often don’t know myself); confusion can arise — an anon. recently seemed to mistake me for the sysop who blocked him. As for the gender boxes, they would of course be optional — they would, however, be useful, inasmuch as having them would allow others to know which pronoun to use (at present, I use he, him, and his in all cases) when refering to another editor. † Raifʻhār Doremítzwr 03:24, 28 July 2007 (UTC)
Well, there's no prohibition on providing such information on your user page; however, I don't see what's accomplished by using the equivalent of a bumper sticker to convey the information; plain text should do just fine.
Then again, I don't even see the merit of the Babel templates, which convey no useful concrete information about language proficiency, and are at best tied to a naive unitary understanding of language proficiency which is decades out of date... Personally, I would support userboxes which would provide detailed information about self-assessed task-type competence ("this user can read non-technical materials in language X") or specific objective measures ("this user has a TORFL score of Y"). These might be messier than the Babel templates, but they would at least be meaningful. But I stray from the topic at hand... -- Visviva 03:54, 28 July 2007 (UTC)
I think the reason to encourage Babel templates is that people wouldn't necessarily add themselves to the respective category. As the largest Wiktionary, we tend to be the most multilingual. (Gasp - did I say that?) When it came time to implement "Translations to be checked" / {{ttbc}}, User:Paul G used the Babel categories to very great effect, recruiting translators. For this project, Babel identification (of any sort) is useful, despite the level inaccuracy. On an encyclopedia, specific topics will need special attention frequently; here only very rarely. The Tea Room does an admirable job of addressing those concerns as they crop up; having to search obscure categories would not help (in fact, would only limit the relevant conversations.) While I agree that plain text would be better than Babel templates for some reasons, their popularity encourages new users to correctly identify themselves. The only other meaningful distinction here might be region/dialect. But going beyond ISO-639 is too open-ended to even consider doing that. --Connel MacKenzie 06:31, 28 July 2007 (UTC)
I think it's overly strict to allow only babel and admin level boxes. The original proposal (banning all contoversial boxes) is much more to my liking. (I say this even though I only use babel myself.) If people want to indulge ina little harmless self-expression, then why not? If we need more language desccribingn what "controversial" is, fine, let's add it. ArielGlenn 05:36, 28 July 2007 (UTC)
I think it was a mistake for WMF to not sue the 'pedophile userbox' proponents for damages; loss of use of significant bandwidth, storage, response time, etc. GB x millions of users = a lot of bandwidth wasted. AFAIK, there was never the proposal to ban "all controversial boxes" here. It was a significant effort for Gerard to get just the Babel's approved, as I recall. --Connel MacKenzie 06:31, 28 July 2007 (UTC)
The accepted orthodoxy on Wikipedia is that editors don't need to worry about server size/bandwidth/technical constraints, as per m:Wiki is not paper. This situation might be changing with the growing size of WMF projects, but to my recollection (although I wasn't involved in the pedophilia userbox war) no one ever made an issue about the bandwidth. Mainly it was an issue over what is and isn't appropriate in userspace, and whether self-confessed pedophiles should be allowed to edit at all (I personally think anyone who admits to being a pedophile should be indef-blocked instantly, considering that we have plenty of preteen editors). I can, however, understand your desire to avoid a similar mess here - but I don't think it would be likely to happen. I would advise Wiktionary to follow the same general precedent as Wikipedia and most other projects; on Wikipedia we prohibit "divisive and inflammatory" templates (see w:WP:CSD), although there's some debate over whether this applies to userspace. Generally, if the userbox says "This user likes cheese" or something of that nature, then it's allowed; if it says "This user is a devoted supporter of Adolf Hitler", then it's deleted, and quite rightly so. Walton One 14:59, 28 July 2007 (UTC)
I more or less agree with Connel's proposed limitation language. Actually, I think it would be somewhat useful to allow userboxes (along the line of babel boxes) that say, for example, This user is familiar with legal terminology (or medical, or religious or other such areas), but only to the extent that there is a possibility that a word might fall into a specialized area of terminology. Templates that indicate hobbies or preferences or political/ideological/social positions are right out. By the way, with respect to Eric the Gnome's pending name change request, I believe we need another 'crat. If no one else is interested, I am. Cheers! bd2412 T 15:13, 28 July 2007 (UTC)
We seem to have a number of bureaucrats who are currently active, so I'm not sure what the story is; but if we do need another one, I'd vote for you. —RuakhTALK 15:48, 28 July 2007 (UTC)
Hmm, looking at the requests to change usernames, Dvortygirl is on top of things. Of course, it never hurts to have another backup. I suppose whether we need another 'crat is up to the community.bd2412 T 17:24, 28 July 2007 (UTC)

Would it be fair to say that there's strong consensus that controversial userboxes should be banned, a rough consensus that non-helpful userboxes should be banned, and a strong consensus that the Babel templates should be kept (albeit with a preference that people include more detailed language information in the text of their user pages)? (Unfortunately, we don't seem to have any sort of consensus yet on what userboxes are helpful. Personally, I'm O.K. with people deciding for themselves what userboxes are helpful to include or not, as long as the Babel boxes precede any others, and as long as when asked, they can give some sort of reason for why each of their userboxes is helpful. Non-Babel boxes that seem potentially helpful to me include: kinds of specialized terminology; preferred pronoun to be referred to with, usually either he or she; if they're skilled with developing complicated templates and are O.K. with people harassing them for help; if it's a bot account, and if so, what human user it belongs to; whether the user is more active on Wikipedia than here, and if so, whether he/she is O.K. with us bringing discussion there if there's no reply here; etc.) —RuakhTALK 15:48, 28 July 2007 (UTC)

I think that's about right. Don't know that we need a userbox to identify the owner of a bot account, but I would not object to one. bd2412 T 16:00, 28 July 2007 (UTC)
I as well agree that any userboxes except babel boxes should not pollute Wiktionary. People can write what they fancy in plain text on their user pages, if they must, but this is a dictionary, nothing else. Dmcdevit·t 16:10, 28 July 2007 (UTC)
How about a babel box for legalese? ;-) bd2412 T 17:24, 28 July 2007 (UTC)
Despite your smiley, I think you are being sincere that you think that might be helpful. So I will reply seriously. I cannot disagree more. A typical lawyer would likely wish to add that to their userpage, even though they may be awful at writing dictionary style definitions. (Thankfully, you yourself are not in that group!) Is there an advantage to having them identify themselves as a lawyer? I think not. It simply is not relevant to building a dictionary. --Connel MacKenzie 17:40, 28 July 2007 (UTC)
I am somewhat sincere. Of course, a person could have a babel box rightly stating that they are fluent in Japanese or Armenian or Mende, and still be lousy at writing definitions. I do happen to be fluent in legalese:
The party of the first part, for the sum in hand paid and other consideration provided by the party of the second part, the receipt whereof is hereby acknowledged, does hereby remise, release and quit-claim unto the said party of the second part forever, all the right, title, interest, claim and demand which the said party of the first part has in and to the described lot, piece or parcel of land, situate, lying and being in the County identified, to Have and to Hold the same together with all and singular the appurtenances thereunto belonging or in anywise appertaining, and all the estate, right, title, interest, lien, equity and claim whatsoever of the said party of the first part, either in law or equity, to the only proper use, benefit and behoof of the said party of the second part forever.
bd2412 T 03:38, 30 July 2007 (UTC)
The main reason I think the "prohibit all non-Babel" templates is good, is a little obtuse, so please bear with me. The Babel templates themselves are popular, and encourage new users to identify their approximate language proficiency. Their popularity, in turn, populates the Babel categories (which are undeniably useful here.) But they also do something else: by being cute and flashy, encourage other new users to add them. The simple creation of a userbox immediately encourages reuse by others. Diluting their effectiveness by allowing "preferential" userboxes would mean we'd see new users adding userboxes, but not Babels. (E.g. Walton One's original page.) Furthermore there is nothing to prevent users from categorizing their user pages in similar categories textually, without userboxen. (If the categories are not divisive, blah, blah, blah.) We could even be explicit that such text catagorizations are permitted. But the added "advertising" of a flashy userbox remains inappropriate for dictionary building. "This user likes cheese" does not help to build a dictionary. Those subdivisions have no lexical relevance (apparently they must be quasi-useful for encyclopedia writing?) Actually, I am curious to know how "this user likes cheese" helps the writing of an encyclopedia. That seems more than a little incredulous. The fact that Wikipedia administrators have given up trying to administer the content they provide, is unfortunate, but certainly not a precedent we should follow. Giving undue status to userboxes is one way of begging for trouble, with no possible benefit in sight. Walton has a semi-valid complaint that we aren't explicit and up-front about that. So the wording change seeks to clarify that, without adding any actual change in practice. FWIW, I agree with User:Doremítzwr's formatting suggestion...the "userbox" does not need quotation marks. Is there any wording that is missed now? Do people think this is about ready to set up a tentative vote, or is more discussion warranted? --17:33, 28 July 2007 (UTC) (re-sign with four tildes instead of five.) --Connel MacKenzie 23:58, 28 July 2007 (UTC)
I'm convinced. —RuakhTALK 19:36, 28 July 2007 (UTC)

There is one other sort of Userbox found on Wikipedia that would be useful here. Specifically, those userboxes which indicate familiarity with certain scripts or systems of writing, such as Cyrillic, Greek, IPA, etc. It is useful to know that a person can deal with a particular script, even in that person has little or no knowledge of a particular language in which it is used. --EncycloPetey 19:27, 29 July 2007 (UTC)

That's funny. I was just wondering where I could find Template:user ipa-3 when I was moving my Babel boxes over from Wikipedia. Mike Dillon 19:36, 29 July 2007 (UTC)
It's not entirely coincidence, because it was following your userpage link to Wikipedia that reminded me of them. I had recently updated the Milestones listing,m and you got credit for adding entry number 750,000. --EncycloPetey 19:38, 29 July 2007 (UTC)
How about userboxes along the lines of "This user is interested in legal terminology" or "This user is interested in military ranks"? Those would, surely, be extremely relevant to writing a dictionary; I myself have contributed a number of entries on UK and US military ranks in the last few weeks. And, to pre-empt an obvious objection, I know it's possible to write such things in plain text, but userboxes are (IMO) an effective form of presentation; it makes it instantly clear what someone is interested in/an expert on, without having to read the page in detail. Walton One 20:29, 29 July 2007 (UTC)
Military ranks is a bit narrow. Military terminology would be better, but not merely to state that a user is interested in it. I am interested in lots of things about which I have not knowledge useful to a dictionary. "This user is familiar with military terminology" would be acceptable to me, perhaps even with a babel-box style levels of proficiency. Also, As EncycloPetey notes above, familiarity with scripts should be permitted as well (I know people who speak Russian or Hebrew but can't read the scripts, and vice verse). bd2412 T 03:45, 30 July 2007 (UTC)
  • Now that this has settled down, I've started WT:VOTE#Babel userboxes. It is scheduled to start in about a week, at midnight on Halloween. Please use that vote's talk page for wording changes/demands/suggestions before that time. --Connel MacKenzie 16:49, 25 August 2007 (UTC)
I'd be fine with the proposed text, but Halloween isn't for another two months. --EncycloPetey 19:03, 25 August 2007 (UTC)

Exclusion of possessive case WT:VOTE (again)

Since my writing is obviously far too pompous for any sane person to understand, here’s my attempt to explain what exactly we’re voting on, in the clearest possible terms:

  • The easiest way to understand this is to take ’s as a separate word which makes whatever it’s written after a possessor (in the now forbidden technical terms, it marks it for the genitive case). If you take it like this, then any occurrence of [word] + ’s which just means the possessive form of that [word] is not idiomatic (or sum-of-parts in our defunct, ol’ style talkin’); if a phrase is not idiomatic, it fails WT:CFI, simple as that. Also, remember that ’s isn’t just added to words, but to whole phrases too (which is why we have the possessive case entry for ’s, which by itself can’t possess anything).
  • BUT! –This is not as simple, it seems, as it first sounds. This vote does not mean that:
    1. entries like Hobson’s choice and none of your bee’s wax, which are idiomatic, are out; in neither of those cases can their definitions be reduced to the sum of their parts:
      “The choice of taking either the primary option or nothing” ≠ “Hobson” + “’s” + “choice”; and,
      “I’m not telling you; stop asking questions!” ≠ “none” + “of” + “your” + “bee” + “’s” + “wax” + “!”.
    2. possessive forms which don’t use ’s (the pronouns, like whose and its) are out. (Some people suggested keeping an entry for one’s too. I don’t think that it’s idiomatic, but if the vote needs to be restarted, I’d be OK with stating that one’s is an exception. –That would be easier than excluding it by using the etymological argument, which probably applies to a lot of other words too[14].)
  • Other issues have been brought up since then:
    1. Some possessive forms of words are not made as simply as by adding ’s — what do we do about them?
      • Plurals which are made by adding -s are an obvious example. A lot of them add just an (no ‘s’). For example, the possessive form of the singular brother is brother’s, but the possessive form of the plural brothers is brothers. They are out.
      • Some plurals are not made by adding -s, but end in s anyway. Words like hypotheses and ephemerides are examples of this — how do we make them into possessives? Usage seems to be inconsistent, and there also seem to be regional differences. This means that it is very likely certain that we’ll have multiple entries for possessive forms (especially for plural possessives). These are out.
      • Some singulars ending in s also add just too, especially proper nouns like Jesus — for this one, the Nazarene gets , but the Spaniard gets ’s. Let’s not think about plural possessives here. These are out.
      • Of course, by “out”, I mean “don’t have their own entries”. There were a couple of ideas for how we would give this information (but neither of these is a part of the vote itself):
        1. In usage notes, like for Jesus; or,
        2. Emboldened but unlinked in inflexion lines, like in the example I gave; either:
          1. by having only the singular possessive form(s) in the singular entry (again, like in the example I gave), whilst having the plural possessive form(s) in the plural entr(y/ies) like this:
            brothers pl (possessive brothers’ or brothers’s)
            brethren pl (possessive brethren’s)
        3. or:
          1. by having both the singular possessive form(s) and the plural possessive form(s) in the inflexion line of the singular entry, like (again, for brother) this (or something similar):
            brother (plural brothers or brethren; singular possessive brother’s; plural possessive brothers’ or brothers’s or brethren’s)
    2. Where do we give pronunciation information for possessive forms?
      • I suggested that we give pronunciation information for possessive forms in the pronunciation section of the base form’s entry (similarly to the way that pronunciation information is given for both the singular and the plural forms for faux pas). Noöne offered a better solution.
    3. Remember that neither of these problems are particularly big issues. There’s no pressing need to give the possessive forms of most words (that is, the vast majority of singulars and sizeable minority of plurals which don’t end in s), and the pronunciation of most possessive forms is sum-of-parts (that is, pronunciation of [word] + pronunciation of ’s, or something similar) — formation and pronunciation information will only need to be given in a minority of cases.
  • As Connel MacKenzie correctly pointed out, the exact wording changes to the CFI were not given. Also, it seems that we will need to edit {{en-noun}} (and possibly create {{en-plural}}, or some such template) in the wake of this vote passing.
    1. Suggested wording to be added to Wiktionary:Criteria for inclusion; §4:Exclusions; §§5:Modern English possessive forms (new section):
      It is community consensus not to provide entries for Modern English possessive forms which are formed by adding the enclitics ’s or , and which are otherwise not idiomatic (with the single exception of the pronoun one’s). However, they are welcome as emboldened but unlinked words in inflexion lines. Pronunciation transcriptions for possessive forms of words, if necessary, can be given in the pronunciation sections of the words’ entries.[LINKS TO THE VOTE, THIS EXPLANATION, A RATIONALE, AND WHATEVER ELSE IS CONSIDERED SUITABLE AS BACKGROUND READING TO STRENGTHEN THE “WHY” BEHIND THIS SECTION]
    2. Thryduulf has asked for people to make comments about editing {{en-noun}} here. Noöne has yet done so. The idea of creating {{en-plural}} is new, so I’d like people to make comments about the idea.

I think that’s everything. Are the wording and template changes OK for everyone? Does the vote need to be restarted? Does anyone want to change his mind? † Raifʻhār Doremítzwr 16:36, 27 July 2007 (UTC)

Some comments:
  1. Thank you for rehashing this more reasonably. I still object to the notion of using a WT:VOTE to impel practice, rather than reflect it. As WT:VOTE matures, I think we should be striving harder towards the latter. I do not think it is fair to restart (nor start) the vote without some prototype/experimental templates in place.
  2. Regarding your brothers example above: very nice. The layout seems to reflect the concerns about identifying the possessives, but doesn't (to my reading) explain precisely when they need to (or must) be used. (That would be the obvious next area of dispute, that this proposal raises, rather than addresses.)
  3. Is there room in your proposed templates for regional notes? I don't know that there is broad agreement as to when "-s's" is considered acceptable (with all British references allowing it, but many US references proscribing against it.)
  4. Lastly, what is the basis of prohibiting the soft-link entries? The only plausible way these would be entered is by bot; they would amount to navigation aides for external links. The benefit of allowing them is (as Mike pointed out) beneficial to English learners, particularly when coming from languages that have different rules for the formation of regular possessives. For native English speakers, the pronunciation itself can be quite tricky. Having that pronunciation information stored somewhere other than at the spelling it applies to, is quite an alien convention here.
--Connel MacKenzie 18:12, 27 July 2007 (UTC)
For my part, this is pretty much the way I had interpreted the vote all along, so I don't see the need to restart. -- Visviva 20:26, 27 July 2007 (UTC)
Likewise, I don't see anything to indicate I had misinterpreted the vote. The wording of the vote proposal may not have been mainstream English, but it was understandable. --EncycloPetey 00:09, 28 July 2007 (UTC)
  • I thought the proposal was to prohibit noun possessives, not all possessives (such as pronoun possessives.) The explicit exclusion of "one's" isn't needed if that correction is made to the wording above. (Although stating that one's is allowed is illustrative.)
  • Why use the word enclitics when you mean suffix? Is that just to confuse readers?
  • Why use the rare form inflexion instead of inflection? Is that just to confuse readers? Why is it wikilinked? Would the resulting wikilink point instead to the relevant section of WT:ELE?
  • Your explanation states that plural possessives are out, without discussion, but that class of words was the only controversial class to begin with.
  • You state that "noöne" (WTF?) has offered an alternate place to provide pronunciation. But you ignore the fact that such pronunciations right now are given on the page the spelling applies to; the target possessive's page. The absence of them is due to my personal honoring of your objections, otherwise User:Dvortybot would be loading pronunciations for very (or very, very) common forms like "men's."
  • Where did he ask about the template changes? Why is that conversation tucked away out of sight, and why is this the first mention of it?
  • --Connel MacKenzie 18:00, 28 July 2007 (UTC)
The point — at least for me — is that English doesn't actually have noun possessives anymore; it has -'s (which is usually considered a clitic, but sometimes considered a phrasal affix — this is really a tomāto-tomäto thing, though), which attaches to the end of a non-pronoun nominal (a noun phrase, or proper noun, or noun clause, or gerund phrase, or substantive adjective, or any other such) and which sometimes reduces to just -' (depending what's on the end of the nominal). Hence, while often -'s does attach to a noun, it also very often doesn't (in "the woman I met's son", it attaches to a finite verb!), and even when it does attach to a noun, it very often isn't a meaningful attachment (in "the son of the teacher's cat", it attaches to teacher, even though the actual possessor is the son).
(There's another point to be made as well — that while we can have usage notes and pronunciation notes telling people how to spell and pronounce these, ultimately we have to admit that just about any conceivable spelling and pronunciation does exist, and usually more than one of them is standard, depending on the region and time period — but to me that's a secondary point, because if we included such entries, there'd be nothing stopping us from having a long-winded usage note at each one.)
Inasmuch as this is the case, I almost don't feel that we need an explicit policy about excluding noun possessives — they don't really exist, and certainly can't be defined meaningfully, so clearly don't merit entries — except that English is written in a way that makes it look like they exist, and students are sometimes led to believe that they exist, so it's worthwhile to explicitly state in a policy page that they don't.
RuakhTALK 19:50, 28 July 2007 (UTC)
Connel, regarding the invitation to comment about template changes, I made the comment at template talk:en-noun with the reasoning that the people who have that page watch listed and the people who have an interest in changes to the {{en-noun}} template are likely to be groups with considerable overlap. To further increase awareness of this, I left a note on the talk page of everybody who had ever edited template:en-noun and/or edtied template talk:en-noun in the past 13 months - including your talk page [15] - that explicitly requested comments. I also noted on the vote page that I was about to request comments [16] and that I had done so [17]. These were in direct response to comments you made about possible technical limitations. I am honestly not certain how I could have made these requests for comments less hidden away. I cannot be held responsible for nobody else choosing to comment. Thryduulf 19:55, 29 July 2007 (UTC)

Reply to Connel (I’m addressing your enumerated points in order, followed by your bulleted points, also in order, in points 5–10):

  1. I think this WT:VOTE has shown us how not to do things. I am in full agreement that WT:VOTEs, in future, should only ever follow thorough and conclusive Beer Parlour discussions. Furthermore, said conclusions must be seen to be practicable by empirical tests.
  2. Sorry, I don’t understand what you mean here — could you rephrase it please?
  3. I don’t see why not… Connel, please nota bene that I have not prepared any templates (what you’ve written seems to imply that you think I have) — I have almost no knowledge of the inner workings of templates, so I can’t yet prepare templates (though I’d very much like to learn…). By the way, prescribe and proscribe are vitally different — proscribe means “forbid (by law)”, therefore “proscribe against” is redundant.
  4. Again, I’m afraid that I don’t understand what you mean here — what is a “soft-link entry”? I don’t know if it would address your concerns, but we could, I suppose, have redirect entries for possessive forms (however, that is very different from what was proposed in the WT:VOTE, so it would almost certainly require that the WT:VOTE be restarted). I doubt that the pronunciation would be given for most possessive forms, but even in the cases of those that are — why not? –It may be different from what we’re used to, but that in itself is not a particularly good reason to oppose the change.
  5. As Ruakh has already explained, we can have possessive forms of virtually any word — such as the verb met, and the preposition of (see of’s). We can’t even specifically exclude possessive pronouns, as that would be an ambiguous statement, for it would seem to be excluding “pronoun + ’s” (you yourself once wrote “that may be both of our’s opinions” — I could dredge up the link for you if you really want…).
  6. Again, as Ruakh has explained, clitics and affixes are different (prefixproclitic and suffixenclitic) — to call ’s a suffix would simply be incorrect. “Is that just to confuse readers?” — come now, why would I want to confuse readers? –Our interactions have recently become a lot more productive; please continue to assume good faith.
  7. As with many words, I use the etymologically correct form — the etymological form of inflect being inflexion. Apart from being etymologically consistent form, inflexion also has the benefit of being the same spelling as its French cognate. Arguments about the pros and cons of alternative spellings aside, I didn’t think that the use of the -xion spellings was an issue, due to the fact that they are far more widely accepted than most of the other non-mainstream forms whose use I advocate, and are still the primary forms for flexion and complexion. It was wikilinked because, like enclitic and idiomatic, it is a word whose meaning may be unfamiliar to some readers (thinking about it, I would also have linked transcription).
  8. My explanation of what the WT:VOTE is about was chiefly intended to clarify, not justify. However, if you want a discussion on plural possessive forms, let’s have one here.
  9. I meant an alternative place for transcriptions which would be compatible with a scenario without entries for possessive forms. By the way, thank you for your “personal honoring of [my] objections”.
  10. The request for comments concerning template changes was made here (which you’ve already been referred to by Thryduulf).

† Raifʻhār Doremítzwr 02:46, 30 July 2007 (UTC)

We need an alternative to “neologism

The {{neologism}} template, I have just learnt, is for words that do not appear in any other major dictionaries, irrespective of whether those words are old or new. It is presently in the entry for embiggen, despite the first citation we have for it being from 1884 — 123 years ago. I can easily imagine this template appearing in the entries for archaïc terms (and now they do; see of’s & upon’t). I suggest that a new template be created for these words which do not appear in any other major dictionaries, but which are certainly not neologisms. † Raifʻhār Doremítzwr 17:09, 27 July 2007 (UTC)

I don't see why we need a new term. All we need is a {{rare}} tag, and perhaps a =Usage note= explaining that it is not recognised by other dictionaries. This is assuming we've got adequate citations for it, obviously. Widsith 17:28, 27 July 2007 (UTC)
I don’t think we need this template at all. However, if we’re going to have it, its use is misleading for anything other than new terms. † Raifʻhār Doremítzwr 17:35, 27 July 2007 (UTC)
I believe the example you gave is misleading. The (Middle English?) term had fallen completely out of use until it was "re-invented" for w:The Simpsons. I'm not sure the historical quotes given can seriously be considered to be "Modern English" even though it bears an enormous resemblance to version the Simpson's popularized. I don't see how this merits a policy discussion, rather than a tea room (or cleanup) discussion on an individual basis. --18:24, 27 July 2007 (UTC) (four tildes not five.) --Connel MacKenzie 00:02, 30 July 2007 (UTC)
It may help to get some clear dates for the demarcation of the historical forms of English:
  • Old English = 5th–12th centuries (that is, from the time of the Anglo-Saxon migrations to England to shortly after the Norman Conquest of 1066);
  • Middle English = c1066–c1470 (shortly after the Norman Conquest to around the time of the introduction of the printing press to England);
  • Early Modern English = c1470–1650;
  • Modern English = 1650–today.
Which means that the 1884 citation is without a doubt written in Modern English. However, I agree with you that there is no link in usage between the occurrence of William John Thomas’s embiggen and the embiggen of The Simpsons (that is, the producers of The Simpsons were not aware of the word’s prior use by William John Thomas). Nonetheless, both coinages share an etymology, a spelling, and a meaning — for all intents and purposes, they are the same word.
The crux of my argument is that the use of a “neologism” tag to mean something wholly unrelated to a word being a new coinage is very misleading. Its existence in the entries for the archaïsms of’s and upon’t are better examples than embiggen of why the use of this tag in its present state is misguided. A “neologism” tag should mean that the word is new, and its use should be disallowed in the entries for words whose oldest citation is 10–15 years old (we could even build in some kind of “self-destruct mechanism” — a date parameter which flags the template up for deletion by AutoFormat or some other ‛bot at some future point in time). We can and should have separate tags for illiteracies and “contemptible piece[s] of business jargon” (to quote my recently misguided self). † Raifʻhār Doremítzwr 03:19, 30 July 2007 (UTC)
I don't think that we should use the {{neologism}} template at all. It means that we have to continually reassess the word to see if it has entered other dictionaries, or has gained a wider use. I haven't got the time to do that. SemperBlotto 19:20, 27 July 2007 (UTC)

I think it's a good idea to have some sort of "if you use this word, people will think you're an idiot" box, not because it's actually a useful box (sense labels and usage notes can express this better), but because there are some editors who really can't set aside their hatred of certain words, and I fear that if they can't express it in box form, they'll instead express it in the form of frivolous requests for verification and/or deletion. I'd especially like it if this box (and any others like it) came with a "don't show me these condescending boxes" link that used JavaScript to set a preferences cookie, for those of us who can't set aside our hatred of unreasoned prescriptivism. —RuakhTALK 00:31, 28 July 2007 (UTC)

If the word is cited to have been used in 1884, it is definitely not Middle English. Even if the word was used in the Simpsons, that's no reason for it to not be considered a real word. Maybe there should be a note to the effect of "This word is not generally considered to be used in proper English." We have entries for words like ain't.--Hikui87 02:09, 30 July 2007 (UTC)

Greek inflection lines

Two of us have been trying to finalise the format to be used in the inflection lines for Greek entries. The examples below (coloured only in this forum) demonstrate the general principles to be followed. Please draw our attention to any conflicts with, or departures from, formats used elsewhere, particularly in other inflected languages.


 πρόβλημα n (próvlima)     pl:προβλήματα


 γράφω (grápho)     perfective past:έγραψα


 ζεστός m (zestós)     f:ζεστή, n:ζεστό

simple POS

 βαθιά (vathiá)

The intention is to show a clean, simple line, the full range of non-lemma forms will eventually be shown in a separate Inflections section.
Saltmarsh 05:43, 29 July 2007 (UTC)

Two things: (1) Rather than a colon, most languages use a space between the gender/form and the inflection. (2) For the inflection line, most languages that mark gender spell out masculine, feminine, etc. But the overall format looks good. --EncycloPetey 06:34, 29 July 2007 (UTC)
Thanks for the comments: (1) I probably agree about the colons   (2) using the f format seems to be cleaner ((a) most visitors will be aware of gender, (b) the full meaning of the f is shown if you hover over it, (c) this format falls between Latin (albus) which shows no gender and Spanish (cansado) with gender in full)
Please don;t use the Latin adjective inflection lines as a model. Medellia and I have been overhauling Latin adjective format. The Latin adjective inflection line templates haven't been updated yet, and may follow the Spanish model soon. Only the LAtin templates for noun and verb are current with formatting standards. --EncycloPetey 07:32, 29 July 2007 (UTC)
This discussion seems like it would be relevant for Irish as well. The current inflection templates for nouns show all noun forms in two boxes. However, this seems in many cases to be excessive and I'm not sure it's terribly useful. All variation is entirely predictable save for genitive singular and plural and nominative plural. In some less common cases, the dative singular also varies. It seems to me that the full-blown inflection boxes (can be seen at focal) are unnecessary. Similar single-line templates could be added for adjectives and verbs as well. I haven't played with the templates before and so wouldn't know how to implement this, but advice would be appreciated. Additionally, is there any sort of discussion anywhere as to standards for stylistic considerations? Leftmostcat 09:00, 29 July 2007 (UTC)
That's a aseparate issue. Many Latin and Greek entries have both an inflection line after the section header and an inflection table. Each and every entry is expected to have some form of inflection line, even if it's just the word in bold, though it helps to have the key summary of grammatical information that a print dictionary would provide. The specifics of what gets included in an inflection line varies by language and part of speech. --EncycloPetey 09:04, 29 July 2007 (UTC)
I agree - the inflection line for inflected languages should be restricted to the sort of info you get in a standard bilingual dictionary and all other forms shown in a separate section as described above. A bit more than the emboldened head word is necessary because (certainly for Greek) the inflection table is a goal for the future! —Saltmarsh 11:28, 29 July 2007 (UTC)
If it's the case that both should be used, how would they be named? The inflection lines seemed to be named along the lines of "ga-noun-m1", so in what way should inflection tables be distinguished? And should there be some form of stylistic standard to keep inflection lines and tables appearing similarly across languages? It seems like this sort of consistency would be a positive thing. –Leftmostcat 23:09, 29 July 2007 (UTC)
Yes, the inflection lines all begin xx-POS, where the xx is the ISO code and the POS is the part of speech (or abbeviation) to which the template applies. The inflection table templates vary more in their names, but most that I've seen use the ISO code at the beginning. In Latin, we follow that with -decl- for adjectives and nouns (which decline) or -conj- for verbs (which conjugate). I've seen some other languages do the same. It's possible to use -infl- as well based on the same pattern. The template name usually follows this at the end with some code to specify which group of words it applies to. This is the most variable part of the template name, since the categories and how their named vary a great deal in the grammars written about each language. Yuo might take a look at the lists of Wiktionary:Finnish inflection types, Category:Latin templates, and Appendix:Swahili noun classes for some of the greatest variability. As you can see, there is no consistent way these things are organized for reference. --EncycloPetey 23:21, 29 July 2007 (UTC)

Protection policy, unprotection request

Hi there. I'm pretty new here and I came across a protected entry that I would like to edit. The entry is -polis. It's been protected since January of this year and the reasoning in the protection log didn't point to any specific protection policy. So, I have two questions:

  1. Is there a protection policy? I looked in Category:Wiktionary policies and couldn't find one.
  2. Can an admin unprotect that page?

I don't see any discussion at Talk:-polis and I can't imagine that whatever cause there was for protection six months ago is still an issue. I didn't see much in the history that seemed like unmanageable vandalism, but then again I'm used to Wikipedia. Mike Dillon 18:01, 29 July 2007 (UTC)

We don't have an ArbCom to permablock disruptive users. So that has simply continued elsewhere. --Connel MacKenzie 23:57, 29 July 2007 (UTC)

Category sorting guidelines

Can someone point me toward any existing guidelines for how entries should be sorted in categories? I've been trying to glean the guidelines by looking at existing categories, but if the guidelines have been written up, I'd rather look there.

As far as I can tell, most categories are sorted by the case-folded version of the entry, except for the proper noun categories which are sorted by the capitalized name. It also looks like the entries are generally normalized to remove diacritical marks and initial hyphens in the case of suffix entries. Are these observations in line with the actual conventions?

Assuming that my idea of the sorting rules is correct, I'm interested in working on this, but I've run into a few issues. I'm generally trying to use {{DEFAULTSORT:...}} to provide sort keys across all categories (both explicit categories and those added by templates). The issues I'm having are:

  1. Some of the templates seem to explicitly sort by {{PAGENAME}}. Most of the also have a parameter that can override this, but it would be nice if they all did.
  2. Some templates don't allow for a sort key for their categories, such as Template:en-proper noun. This makes it hard to override a default sort for just one of the categories on the page (i.e. Category:English proper nouns).

The first issue seems pretty easy to deal with by simply making sure that all templates that specify category sort keys can also be overridden with a parameter. The second issue could also be dealt with by changing the templates, but it seems like it would be better to just make all normal categories sort in a case-insensitive, normalized way. That would allow {{DEFAULTSORT:...}} to be used in nearly all cases. Can someone point to any cases where it's actually useful to separate upper-case and lower-case entries in a single category? Mike Dillon 18:17, 29 July 2007 (UTC)

The short answer, is that you are into new territory for en.wikt. I don't think any serious effort has been made to normalize categories categorically. As to your last question, I don't know of anyplace it is useful, but I do know that case-sensitivity is used to make abbreviation, acronym and initialism sorting confusing.  :-)   So thank you for starting this discussion. I am interested in hearing what others think about your plan of action. meta:Magic words#Parser functions' "DEFAULTSORT" is new to me...when was that added? --Connel MacKenzie 20:25, 29 July 2007 (UTC)
DEFAULTSORT was added some time around January.[18] Mike Dillon 20:53, 29 July 2007 (UTC)
What happens when the defaultsort is changed more than once within a page? Can it be used separately in each L2 language heading section? --Connel MacKenzie 22:17, 29 July 2007 (UTC)
The last defaultsort is the one that is used for the whole page. I seem to remember reading that you can use multiple defaultsorts per page to acheive the effect you're suggesting, but it turns out that isn't the case. The way that I see it is that the overall Unicode sort order has no way to deal with the different alphabet orderings imposed by different languages, so the goal of category sorting on English Wiktionary should be to do what is most natural for English-speaking users. I don't think that any language that has "a" and "á" sorts the "á" after "z" the way that Unicode does, but I'd like to hear about it if that's the case.
On a related topic, the DEFAULTSORT stuff has no way to deal gracefully with the fact that Unicode sorts "ñ" after "z", so Spanish language listings will always have the few words that start with "ñ" at the end. In this case, it doesn't make any sense to "normalize" the "ñ" character to "n". There are also edge cases around normalization of some other characters, e.g. "ü" which can be normalized as "ue" for German and just plain "u" for Spanish, but this is generally a very small portion of the things we want to sort correctly and can be dealt with by using old-fashioned sort keys. Mike Dillon 22:33, 29 July 2007 (UTC)
Are you sure about "ñ" vs. "n"? Our target audience here is English speakers. I presume you mean it wouldn't make any sense on the Spanish Wiktionary, right? --Connel MacKenzie 23:26, 29 July 2007 (UTC)
It doesn't make sense anywhere. All Spanish-English dictionaries treat ñ as a separate letter because it's treated that way univerally in Spanish. By contrast, Spanish considers ch and ll to be separate letters, but they are not separated for purposes of indexing in modern dictionaries, even in Spanish-only dictionaries. --EncycloPetey 23:33, 29 July 2007 (UTC)
All I was saying was that if "ñ" is retained as "ñ" and it is the first character of a word, then MediaWiki will build the category index to have a "ñ" entry that comes after "z". I wasn't addressing whether or not that makes sense in English Wiktionary. Since you've asked, I would think that a dictionary with a Spanish audience would unquestionably want to sort "ñ" as a separate letter immediately after "n". I personally would want to do the same here, but it isn't a foregone conclusion. The other alternative I could see gaining concensus for English Wiktionary would be to treat "ñ" as "n", but I think it would make our Spanish coverage look amateurish. Either way, I don't think that anyone would thing it is sensible or usuable to put "ñ" after "z", but we don't have any choice there if we decide not to "normalize" the letter "ñ" to "n"; the Unicode ordering is hard-wired into the sorting used by MediaWiki. Mike Dillon 23:45, 29 July 2007 (UTC)
However, we can list it anywhere we like in the Category's TOC. The category may sort them after z, but we don't have to set up the TOC that way. --EncycloPetey 23:57, 29 July 2007 (UTC)
That's true, but the user experience is a little weird if the TOC order doesn't match the actual orer used for the category. If you put "ñ" between "n" and "o" in a Spanish TOC, clicking on it will generally result in a page with less than 200 entries where there is no "next 200" to go to "o". Clicking "Prev 200" will go back to "z"... For what it's worth, I don't see any easy solution to this problem since different languages may sort the same characters in different ways. Mike Dillon 00:01, 30 July 2007 (UTC)
In Welsh ch, dd, ff, ng, ll, ph, rh and th are considered separate letters (following "c", "d", "f", "g", "l", "p", "r" and "t" respectively), including for sorting purposes; accents are ignored for sorting, except where this is all that distinguishes two words, where the un-accented letter precedes the accented one. Hence the correct sorting order is: agor, angel, alarch, almaeneg, allan, am, cwrw, chwech, dysgu, ddim, felen, ffair, llety, llethr, pryd, phen, rŵan, rheg, tan, tân. Thryduulf 00:18, 30 July 2007 (UTC)
Oh, boohoo. Try Hungarian. They consider singly-accented vowels to be equivalent to unaccented vowel. However, doubly accented vowels and vowels with umlauts are separate (though considered equivalent to each other for indexing). So you get alphabetical sequences like: fúvóka, fuvola, fűevő, függ. They have separate letters for cs, dz, dzs, gy, ly, ny, sz, ty, and zs. These are indexed separately, both as primary and internal letters. Worse, there are things like ggy which is considered equivalent to gy for indexing because it is simply pronounced longer. As a result you get alphabteical sequences like: faggat, fagyás, faggyús, fagykár. It's a nightmare to remember how to look things up for the first year that you're working with the language. --EncycloPetey 00:38, 30 July 2007 (UTC)
For the record, note that MediaWiki doesn't perform Unicode collation properly. Hopefully, MediaWiki will attain collation awareness some day, and a tag in each Wiktionary category will specify a collation based on the language whose terms appear in that category. DEFAULTSORT is an inadequate long-term solution because a given character sequence can have multiple collation schemes if multiple languages have terms so spelled. Rod (A. Smith) 02:10, 30 July 2007 (UTC)
My understanding is that MediaWiki currently relies on the underlying database to perform the sort in order to have a performant sorting implementation for categories. I agree that the long-term goal should be to have the software the runs Wiktionary deal with differing collations properly, but for the time being all we have is sort keys and the default Unicode sort order. It's my opinion that doing something to try to deal with what we have is better than doing nothing, since I don't see proper collation support in MediaWiki coming any time soon. Mike Dillon 03:27, 30 July 2007 (UTC)
Understandable. Let's be sure, though not to direct much effort toward sorting since the better long-term solution is to fix the underlying software. I.e., let's not create tedious or error-prone rules for constructing category sort keys. Rod (A. Smith) 04:59, 30 July 2007 (UTC)

I've started taking some notes at User:Mike Dillon/Sorting. I probably won't have a lot of time to devote to this during the week because of my day job, but others should feel free to add anything pertinent to my notes. Mike Dillon 05:58, 30 July 2007 (UTC)

This applies to more than just categories, and I've tried to start a collection of the language issues at Wiktionary:Alphabetical order. DAVilla 08:41, 3 August 2007 (UTC)

Announcement of policy discussion re: interwiki links and the interwiki bot (Meta)

Please see m:Interwiki.py/Wiktionary_functionality_discussion for more information. The dicussion page on Meta is intended to provide a space for folks from every wiktionary project to discuss and decide upon the behavior and features of the interwiki links bot (RobotGMwikt); until now, there has been no such mechanism. Besides your participation, we also need folks who are able to help translate the dicussion on an ongoing basis as appropriate. Thanks. ArielGlenn 22:53, 29 July 2007 (UTC)

Copying Tea Room discussions to entry talk pages

Is there a {{tearoom}} equivalent of {{rfvpassed}} and {{rfvfailed}}? –If not, why not? (I believe that there ought to be one.) † Raifʻhār Doremítzwr 04:32, 30 July 2007 (UTC)

A Tea Room discussion is supposed to be about a fine point of a particular cleanup task. There shouldn't be any "TeaRoomFailed" equivalent; such issues are moved to RFV if needed. Tea room conversations, where applicable can be copied to talk pages with a single line ": ''From [[Wiktionary:Tea Room]]:'' " given as a section introduction. I don't know that any sysop is actively archiving these (according to WT:DW,) at this time. Despite interruptions, I've been making moderate progress on automating RFD and RFV archiving. If I get that working, RFDO and TR will be natural extensions. --Connel MacKenzie 20:12, 31 July 2007 (UTC)

Treatment of katharevousa

The way that Katharevousa entries should be treated is being discussed at Wiktionary talk:About Greek#Katharevousa - if anyone has experience of the treament of a recent purist form in another language it would be good to hear. —Saltmarsh 05:53, 30 July 2007 (UTC)

Creation of User and Talk pages

Whose responsibilty is it to create the user page and talk page for a new user? Should anyone except the user create his/her own user page? In what circumctances should we create such pages for anon IP addresses? SemperBlotto 10:40, 30 July 2007 (UTC)

Is this a response to my recent welcoming of a number of new named and anonymous users and my subsequent creation of user page redirects to their respective talk pages? –Is that not OK to do? † Raifʻhār Doremítzwr 10:53, 30 July 2007 (UTC)
Yes. My own view is that a user's User page is his/her/its own responsibility and should never be edited by anyone else. A welcome on the talk page is useful. I don't think that anon User pages should ever be created, and a welcome only given if repeated good edits have been made and we want the user to stay and create an account. IP addresses are frequently shared and old messages can be confusing to new users. SemperBlotto 10:58, 30 July 2007 (UTC)
I agree with SemperBlotto; the only occasions when you should be editing another users' userpage or user subpages are:
  • by explicit invitation
  • vandalism clean-up (often it is polite to leave a notice on the user's talk page announcing you have done this)
  • removal of copyvios, personal attacks or other explicitly prohibited material (it is usually most appropriate to ask the user to do this themselves, and for someone else to do it only if they refuse).
  • to correct your own error (e.g. you mean to edit their talk page, but actually edited their userpage by mistake)
I don't think that a user page should redirect to the user talk page unless the user themselves wants it to.
User talk pages, and the talk pages of user subpages on the other hand are open to anyone to create and edit. Thryduulf 11:55, 30 July 2007 (UTC)
OK, noted. FWIW, I remember reading somewhere “if you don’t want a user page, consider making a redirect to your talk page” (or something similar), but I guess that isn’t an invitation for other users to do so on their behalves… † Raifʻhār Doremítzwr 13:07, 31 July 2007 (UTC)

Temporary bot operation

As explained and discussed in Wiktionary talk:About Japanese#Template:japdef, Japanese entries' maintenance about template usage will be started soon. It will be performed with replace.py via my newly created bot accout User:TohruBot. The template to be replaced is, as far as I know, completely peculiar to Japanese, and so the change has no implication for other languages. I will have my bot run without requesting the bot flag as the number of the entries to be modified is only about 1,200. --Tohru 15:01, 30 July 2007 (UTC)

  1. Symbol support vote.svg Support Connel MacKenzie 20:32, 30 July 2007 (UTC) :-) that is, as stated: without the bot flag is very good for a small (1200) task like this. --Connel MacKenzie 20:33, 30 July 2007 (UTC)
Done. Thank you for your cooperation [19] :). --Tohru 08:15, 3 August 2007 (UTC)

Place names, once and for all.

Enough messing around, let's come up with a policy, here and now, regarding inclusion of place names. It is pretty clear that some place names should be included, at least names of planets in our solar system, oceans, continents and countries, perhaps states and big or famous cities. I believe we should have other geographic features (mountains and rivers, gulfs, bays, seas, and certain man-made landmarks) and perhaps other political features (names of counties or moderately sized cities, commonly used city names). Does anyone have ideas for a simple formula for inclusion/exclusion of such features? Mine have failed miserably to this point. bd2412 T 15:23, 30 July 2007 (UTC)

Does the place actually exist, and can we find a reference in literature that uses it (rather than mentioning it). What could be simpler? SemperBlotto 15:26, 30 July 2007 (UTC)
Even a minor street name? A small town of, say, 300 people? A block-square public park? We have to have limits. I fear, however, that imposition of reasonable limits will require a Wikipedia-type notability evaluation. bd2412 T 15:40, 30 July 2007 (UTC)
I'm O.K. with a Wikipedia-type notability evaluation. The key difference is that Wikipedia is interested in the notability of a topic, while we're interested in the notability of a term. There's probably a high correlation between the two, but "1600 Pennsylvania Avenue" and "10 Downing Street" are probably-non-notable names for definitely-notable places, and Utopia is a probably-notable name for a probably non-notable place. (We should probably find a different word besides "notable", though. How about "nomable"?) —RuakhTALK 16:19, 30 July 2007 (UTC)
It is easy to see '1600 Pennsylvania Avenue' being used figuratively. Are you suggesting the name's notability be inversely proportional to the referent's notability? I am not sure about that. --Connel MacKenzie 21:07, 30 July 2007 (UTC)
Well, it's hard to think of a good example of a non-notable name; that's what makes it non-notable. It's possible that 1600 Pennsylvania Avenue does actually warrant an entry, I don't know. I'm certainly not suggesting that a name's notability is inversely proportional to the referent's notability (and, BTW, I'm definitely not suggesting that it be anything — I think notability is something that exists in the world outside Wiktionary, and we can try to gauge it, but not to create it); when I said there's a high correlation, I meant there was a high positive correlation. —RuakhTALK 21:27, 30 July 2007 (UTC)

I would suggest including the names of all of

  • Planets in our solar system
  • Countries (present and past)
  • top-level subdivisions of countries (ditto)
  • lower-level subdivisions of countries that are are regularly nationally or internationally used as a primary location reference, that are not a sum-of-parts of a higher area, or just the sum of more two or more areas (ditto)
    e.g. in the UK this would include counties (e.g. Somerset, Greater Manchester), official regions (e.g. Midlands, East Midlands, Scottish Highlands, but not South West England, North Wales, or Yorkshire and Humberside. Tyne and Wear would be allowed as the parts it is a sum of are not areas but rivers)
  • Real settlements, of any size, provided that
    • there are at least three independent uses (not mentions) spanning at least a year
    • these uses are not in the context of works about an area smaller than a top-level country subdivision
    • the uses are in works that are not tourist brochures/travel guides (or similar), gazetteers (or other lists of places), maps, or any work authored or published by a public authority that operates in an area less than a top-level country subdivision of that country (i.e. a work by a third-level authority in the USA about a place in the USA is excluded, but a work by them about a place in France is not).
    • they are, or were in the past, separate settlements and not just an area of a larger one.
  • Real rivers, seas, oceans, lakes and (ranges of) hills/mountains (subject to the same criteria as real settlements)
  • Other real geographical areas that are not a sum of a location and generic description (ditto)
    "Somerset Levels", "Welsh Marches", "Niger Delta" and "Russian Steppes" are sum of parts, "Lake District", "American Midwest", "Top End" and "Costa Del Sol" are not.
  • Generic names for places (with the usual three citations, etc)
    e.g. "Anytown, USA"
  • Street names with derived/generic/attributive uses (ditto citations)
    e.g. "Broadway", "High Street"
  • City parks, etc provided that either the name is not a sum of parts including the city/area it is in, or it is used generically/attributively/derivationally (same citation requirements as real places)
    New York's Central Park" and London's "Hyde Park" and "Green Park" probably qualify; "Wimbledon Common" and "Highgate Hill" don't.
  • Fictional places/geographical areas/etc. that have at least three independent uses spanning at least a year, all of which are outside the context of both the fictional work they are from and fictional places in general.

Where settlements/other geographical features that don't meet these criteria have attributive, generic or derived uses (that satisfy the existing CFI) then the origin should be explained in the etymology or definition with a link to Wikipedia. (For example, "Cheddar cheese: from w:Cheddar (a village in Somerset, England where the cheese was first made) + cheese")

Where a real place that merits inclusion here shares a name with a fictional place that doesn't (or vice versa), then see also-style link to Wikipedia should be used. Thryduulf 17:50, 30 July 2007 (UTC)

The above looks good to me. Would your prohibition on citations from "works about an area smaller than a top-level country subdivision" bar local newspapers? bd2412 T 18:22, 30 July 2007 (UTC)
hmm, I honestly hadn't thought about local newspapers. I can only speak from a UK perspective, as that is the only country's newspapers I'm familiar with, but papers like The Scotsman (Scotland) and The Western Mail (Wales) qualify as good sources as they cover the top-level subdivisions of the UK. Large regional papers like Wester Daily Press (mainly Somerset, Wiltshire, Gloucestershire, Bath and Bristol) although not covering the whole of England I think probably ought to qualify as their catchment area is not hugely smaller than Wales and there isn't a single England-only newspaper. London area papers (e.g. The London Paper are similar (I think) to the English regional papers. Smaller papers, probably oughtn't be considered broad enough in scope, as their focus is still to a large extent local fête. My impression (and this could be wrong) is that US newspapers that are named for large cities (e.g. New York Times and Chicago Tribune is that they are more national in scope than similarly titled papers in the UK, and as such should be good sources). I'm not 100% on all this though. My reasoning is that the name of every little hamlet will be in a local paper and we probably don't want to include all of them. Thryduulf 20:48, 30 July 2007 (UTC)
Of all the things you've mentioned on this topic, newspapers seems the most relevant. To clarify; non-local references in local newspapers. If a local newspaper in Vancouver talks about a place in Florida without explicitly saying it is in Florida, USA, we should have a mechanism for counting that citation. (e.g. Key West, Pensacola, Miami, Daytona Beach, Tampa or Orlando.) But do you know of a search mechanism we can use that will limit it to durably-archived news outlets (while ignoring the prevalent Internet-news sites, most of which do not meet our "durably archived" criteria?) --Connel MacKenzie 20:57, 30 July 2007 (UTC)
yes, non-local references from local newspapers probably should count. Gooogle News archives [20] is the only online search that I know of, but I don't recall seeing anything definitive on Wiktionary about whether we consider it durably archived or not (I'll start a new section below). The Internet Archive might have some as well. BBC news articles can stick around for a long time (the oldest I've found is [21] from 22 October 1997) which ones stay and which go seems random. Thryduulf 21:37, 30 July 2007 (UTC)
Then go read WT:CFI. "Durably archived" is a primary aspect of it. Online-only references are always invalid (and never last even for three years, anyhow.) --Connel MacKenzie 21:42, 30 July 2007 (UTC)

Celestial objects

  • If we'd like to stop messing around, then please let's stop commingling different types. They each have different issues. I hereby propose a vote for all named celestial objects. (Arguably, the simplest on that list!) I propose the following additional paragraph to WT:CFI#Names of actual people, places and things:

Celestial objects can be included if named by recognized international astronomy association. While Red Rectangle merits an entry, HD 44179 should remain listed only on Wikipedia. The sun, the moon, the original nine planets and the three dwarf planets as well as their 169 moons all merit entries; the billions of other cataloged solar system objects do not. Named stars, star systems or galaxies such as Polaris, Alpha Centauri or The Milky Way can be included, but catalog entries such as HD 188753 should not be.

No decision (nor mention) would be added regarding other planetary systems at this time. To my knowledge, this addition would accurately reflect current practice, clarify to newcomers our position, and provide a starting point for other "name" discussions and votes. --Connel MacKenzie 21:39, 30 July 2007 (UTC)

I like this, but I'm not certain whether objects like Halley's comet would be includable or not? Thryduulf 21:55, 30 July 2007 (UTC)
I presume that "celestial objects" is not intended to apply to, for example, individual named craters on the moon (or even on other planets, or moons of other planets)? I remember when the Mars Rover started finding interesting rocks on Mars, the astronomers at NASA started naming them "Scooby Doo" and the like. Otherwise, support! bd2412 T 23:11, 30 July 2007 (UTC)
Field-specific jargon (astronomy, in this case) code names are note covered. We don't, for example, include 'internal' project names of major software releases. I also don't think those items (rocks and craters) could reasonably be considered "celestial objects" in and of themselves; they don't have independent orbits, etc. I do think we should have Haley's Comet but I'm not certain I wish to broach the issue, within this proposal. --Connel MacKenzie 23:26, 30 July 2007 (UTC)
Fair enough - although there are certain features (i.e. the Sea of Tranquility) that we may have to deal with eventually. Celestial geology is a topic for another day. Perhaps we should tackle the more contentious issue of cities first, as I think a solution to that conundrum will likely solve states and countries as well. bd2412 T 02:12, 31 July 2007 (UTC)
I was thinking just the opposite; given the rubric that "hard cases make bad law", why not start with the easiest categories where there is the broadest consensus, and work our way incrementally to the harder ones? It might be slow (but it can hardly be any slower than it's been so far ;-)). ArielGlenn 03:24, 31 July 2007 (UTC)
That's certainly what I'm aiming for here, by starting with the only non-contentious one in the lot.
Procedural question: our WT:VOTE mechanism is getting convoluted. Is a conversation supposed to quiescent here for a week, or simply discussed for a week before the preliminary vote is created? (Assuming that apparent general consensus has been reached in both cases, before any vote is started, of course.) --Connel MacKenzie 04:28, 31 July 2007 (UTC)

Well now I'm thinking we can do a sort of instant runoff election for a menu of possible rules.

There are a number of distinct areas for which we need, perhaps, distinct rules:

  1. Celestial bodies (for which Connel has made an excellent proposal).
  2. Continents and oceans (Earthly, of course).
  3. Countries, current and historic (and perhaps aspirational, e.g. Kurdistan and Palestine).
  4. Top level subdivisions of countries (states in the U.S. and Mexico, provinces in Canada, counties in the U.K., departments in France, cantons in Switzerland, Oblasts in Russia, prefectures in Japan, etc. We may want a rule to avoid having to include whatever administrative divisions tiny places like Tonga and Monaco are divided into, if any - those will be more like neighborhoods).
  5. Lower level subdivisions, like U.S. counties. Some U.S. counties should probably be included - Brooklyn, the Bronx, Orange County (I know of at least two, one being the home of Disney World in Florida, the other being the subject of a television show in California), Cook County in Illinois.
  6. Megacities - New York, Tokyo, Berlin, Beijing, Chicago, Hong Kong, Houston, Moscow, Sydney, Rio de Janeiro, etc.
  7. National capital cities which are not Megacities - Canberra, Brasília, Rabat, Bern.
  8. State/regional capitals.
  9. Cities that are neither megacities nor capitals, but are famous for other reasons - Venice, Nice, Dresden, Gettysburg, Casablanca.
  10. Cities that are "large" but not "mega".
  11. Cities or counties with commonly used names (Springfield, Jackson, Jefferson).
  12. Cities that are not particularly "large".
  13. Man-made landmarks.
  14. Major geological features - mountains, seas, plateaus, deserts, jungles, peninsulae.
  15. Rivers and lakes.
  16. Smaller but historically important geographic features?

Anything else? bd2412 T 05:53, 31 July 2007 (UTC)

At the risk of making a long catalogue even longer, street names (Broadway), parks (Central Park), town squares and the like (Times Square... well not exactly, but you get the idea). Or do these come under man-made landmarks? ArielGlenn 07:15, 31 July 2007 (UTC)
While that outline is a good plan for separate votes, the discussion (directions) of each are enormously different. If they are kept separate, they have chances of passing; commingled, the noise-to-signal ration will be overwhelming. --Connel MacKenzie 07:15, 31 July 2007 (UTC)
Right, what I am proposing is an individual discussion/development of criteria/vote for each. When all is said and done, we'll probably need a separate CFI sub-page for place names. bd2412 T 07:17, 31 July 2007 (UTC)
I'm sure we can address that if it becomes an issue. So, do you like the wording above for the first of those votes? I envision these staggered by a week (or maybe just a day or two each) so the majority of concerns can be raised in the first day of each vote without directly affecting other votes (so much.) --Connel MacKenzie 19:53, 31 July 2007 (UTC)
I tweaked it a bit (in my lawyerly way) but it's good:
Names of celestial objects shall be deemed to meet the CFI where such names have been assigned by recognized international astronomy association, and do not constitute a mere catalogue designation. While Red Rectangle merits an entry, HD 44179 should remain listed only on Wikipedia. The sun, the moon, the original nine planets and the three dwarf planets as well as their 169 moons all merit entries; the billions of other catalogued solar system objects do not. Named stars, star systems or galaxies such as Polaris, Alpha Centauri or The Milky Way can be included, but catalogue entries such as HD 188753 should not be. This criteria does not apply to names of comets, asteroids, or similar bodies, which shall be considered individually.
I like the idea of firing up a proposal for a new category of place names every 2-3 days. Shall we declare a moratorium on RfDs of place names (except for obvious junk like "the corner behind Joe's house") pending the completion of this process? Cheers! bd2412 T 00:56, 1 August 2007 (UTC)
I support the above wording proposal, the proposal for frequent similar proposals, and the proposal to forestall deletion proposals in the interim. Rod (A. Smith) 02:44, 1 August 2007 (UTC)
I don't see how a moratorium even could be stuck to. People will still nominate what they nominate. If a deletion discussion helps iron out the policy discussion, so much the better. If the policy changes with regard to a previously deleted entry, one of us will restore it. It doesn't seem like that big of an issue, and this is a wiki. --Connel MacKenzie 06:34, 1 August 2007 (UTC)
Just hoping to avoid things like the RfD of France. bd2412 T 14:39, 1 August 2007 (UTC)
I don't like the comets and asteroids sentence. If that is going to be dealt with in the future, then deal with it in the future. It should at least imply that the same general principles apply. First sentence still has my typo/rewrite error..."assigned by recognized" vs. "assigned by a recognized"... Also, the part of that sentence you did rewrite is now self-referential and needlessly wordy. Saying "can be included" is much clearer and more accurate. --Connel MacKenzie 06:34, 1 August 2007 (UTC)
I would probably simplify it to say "any celestial object that has a name, rather than a catalogue number" SemperBlotto 07:35, 1 August 2007 (UTC)
I like the simplified version, but I'd add the first example sentence from the long version. Regarding the moratorium, I think this a good idea. If someone nominates an entry that would be allowed by one of the proposals under discussion (e.g. Brisbane) , we just put the nomination on hold (linking to the relevant discussion) until the discussion is concluded, whereupon we can resume the debate about the nominated entry. If someone nominates and entry that would not be allowed by any proposal (e.g. the corner behind Joe's house) then the nomination can procede as normal. Thryduulf 08:44, 1 August 2007 (UTC)
O.K., but we can still have long-winded and ultimately pointless arguments about whether a given entry is covered by any of the proposals under discussion, right? ;-) —RuakhTALK 13:45, 1 August 2007 (UTC)
If we can produce a novella about whether "usuress" is a misspelling of "usurers", then I have no doubt that not only is this possible but near as dammit a certainty ;) Thryduulf 20:13, 1 August 2007 (UTC)

So do we need further discussion, or is someone going to start a vote on Connel's celestial objects proposal? bd2412 T 02:25, 3 August 2007 (UTC)

I have some issue with the inconsistent italicisation. Furthermore, what is meant by “original nine planets” — Pluto is not a planet, though was officially considered one until recently; nonetheless, that doesn’t mean that it was “originally” a planet. This really needs to be rewritten, or, preferably, removed — the wording can state “…the eight planets and three (four?) dwarf planets…”. Otherwise, it seems like a pretty good start. † Raifʻhār Doremítzwr 02:37, 3 August 2007 (UTC)
See Wiktionary:Tea room/2006#Pluto, planet, dwarf planet, w:Pluto etc. The wording "original nine planets" is consistent. The new classification, while newsworthy, does not change Pluto's historic status as "the ninth planet," particularly in literature. The wording "...the eight planets..." is incorrect; the wording "...nine..." inherently includes Pluto, as very specifically intended. The word "Pluto" didn't cease to exist by an arbitrary reclassification. Possibly, it shouldn't be listed on Wiktionary as a "dwarf planet" at all, until it has citations spanning over a year of use. But when that happens, the older definition as one of the original nine planets should then be tagged as "dated." --Connel MacKenzie 20:14, 15 August 2007 (UTC)
Yes, it's difficult to believe that Doremítzwr or anyone else suddenly fails to understand the meaning of “original nine planets” just because of some new technicality. Rod (A. Smith) 20:36, 15 August 2007 (UTC)
Perhaps "Objects in our solar system officially classified by the International Astronomers Union (or whatever their name is) as planets or dwarf planets" would be better. This also has the benefit of allowing our CFI to remain the same if more of either are discovered. Thryduulf 10:02, 3 August 2007 (UTC)
Ok, so how about: The name of a celestial object shall be deemed to meet the CFI if it has been officially classified by a recognized international astronomy association as a planet, dwarf planet, star, or as a single body including multiple planets or stars, and has been assigned a name by such an association, rather than only a catalogue number. Savvy? bd2412 T 15:34, 3 August 2007 (UTC)
So the Moon would be excluded? the Galilean moons of Jupiter (Io, Ganymede, Europa, Callisto)? The Perseids? The Crab Nebula? Halley's Comet? Oort cloud? Kuiper Belt? None of these fit into the language you've proposed, and all but the Crab Nebula are located within our solar system. --EncycloPetey 18:50, 3 August 2007 (UTC)
  • Oops - I did mean to include moons! bd2412 T 02:44, 4 August 2007 (UTC)
What about: The name of a celestial object shall be deemed to meet the CFI if it has been assigned a name, rather than just a catlogue number, by a recognized international astronomy association, and has been officially classified by such an organisation as one of the following:
  • a planet or dwarf planet
  • the moon of such an object
  • a comet, asteroid or similar object
  • a star
  • a grouping of one or all of the above (e.g a nebula)
Thryduulf 22:01, 3 August 2007 (UTC)
Aren't some limitations/reservations about asteroids and comets needed here as mentioned earlier? I've read that there are about 13,000 asteroids and 2,900 comets with assigned names, though the actual number of comets' names are somewhat less than that of comets themselves due to the considerable duplication. (It seems that a discoverer of an asteroid gets the naming rights, and that a comet is named after the discoverer automatically. In some cases, an observation spacecraft such as SOHO may find hundreds of comets, so...) Or should we wait to get worried about it until someone who tries to complete them determinedly shows up? --Tohru 02:11, 4 August 2007 (UTC)
Very few comets or asteroids would really merit a Dictionary entry. According to Wikipedia, Ceres in "now classified as a dwarf planet". There are, however, asteroid groups such as "the Apollos, Amors, and the Atens" which may be worthy of inclusion. As for comets, besides Halley's Comet and Hale-Bopp (which prompted the suicide of the Heaven's Gate cult) I can think of none that merit an entry. bd2412 T 02:50, 4 August 2007 (UTC)

Taking into account the numerous comments above, I've consolidated the wording I think meets everyone's concerns:

Names of celestial objects can be included if they have been assigned by recognized international astronomy association, and do not constitute a mere catalog designation. While Red Rectangle merits an entry, HD 44179 should remain listed only on Wikipedia. The sun, the moon, the original nine planets (including Pluto) and the three dwarf planets as well as their 169 moons all merit entries; the billions of other cataloged solar system objects do not. Named stars, star systems or galaxies such as Polaris, Alpha Centauri or The Milky Way can be included, but catalog entries such as HD 188753 should not be. Less distinct celestial objects such as Haley's Comet or Comet Hale-Bopp may require regular citations of use.

I believe I've gotten the italicization correct (or at least, consistent.) The nine planets has been qualified to convey more precisely what is meant. Some verbiage has been simplified. The "less distinct" objects have been excluded from this rule, while still being permitted with normal verification. the Red Rectangle example implies that Crab Nebula is also permitted without verification. I think that's everything. I'd like to start this vote today, as 'premature,' with voting on the policy to start one week from today. I intend to use the following preamble:

This vote is for the addition of the paragraph into Wiktionary:Criteria for inclusion#Names, after the section "What Wiktionary is not with respect to names" with the heading "Celestial objects", with the expectation that the section will be moved to a sub-page when additional Names rules are added.

Are there any new objections, before I begin this, the only non-controversial sub-category of all the proper noun criteria! this vote? --Connel MacKenzie 20:14, 15 August 2007 (UTC)

Aside from the general objection that I don't think it's worthwhile for CFI to contain a laundry list of each type of proper noun that is or is not acceptable, my objections are as follows (in no particular order): (1) "Halley's Comet" should be spelled correctly; (2) I don't understand why "Red Rectangle" automatically warrants inclusion while "Halley's Comet" may require regular citations of use — it seems to me that the latter name is much more worthy of inclusion than the former; (3) It seems a bit w:WP:BEANS-ish for the CFI to contain redlinks that aren't intended to become bluelinks someday (though as they already do contain one — the cat's pajamas — I guess the cat might be out of the bag on this one); (4) "The original nine planets" is an odd phrasing, as there were originally five planets (Mercury, Venus, Mars, Jupiter, and Saturn) — the sense is fairly clear, but I'm sure we can find a better way to put it; (5) "The original nine planets (including Pluto) and the three dwarf planets" is quite odd, as it makes a point of listing Pluto twice; (6) The paragraph blurs the line between names and referents, saying at some points that certain names should or should not be included and at other points that certain objects should or should not be (a distinction that's not a big deal when we're talking about English, but could be a bigger problem when we're worrying about languages wherein astronomical names are not assigned by any international body); (7) "Can be included" is very weak — either it "should be included", or it shouldn't be mentioned; (8) I don't know what "less distinct celestial object" means, and the two examples aren't enough to clarify it for me; (9) The paragraph does not clarify how to decide whether a term like North Star, Morning Star, Comet Halley, Seven Sisters, or the like warrants inclusion, and seems to imply that scientific use (as opposed to, say, literary use) should be a key factor in this decision; (10) The paragraph does not mention constellations, which, while not celestial objects per se, probably fall into the same category and therefore should be mentioned in the same place. —RuakhTALK 21:23, 15 August 2007 (UTC)
As above, taking into account Ruakh's corrections: #1) OK, done, #2) Comets may be addressed in the future; this vote is for named objects, #3) non-existent page, #4) OK, done, #5) OK, done, #6) OK, done, #7) OK, done, #8) That's the result of the conversation above, #9) Yup - those aren't automatic from this vote, #10) OK, done.

Names of celestial objects should be included if they have been assigned by recognized international astronomy association and do not constitute a mere catalog designation. While Red Rectangle merits an entry, HD 44179 should remain listed only on Wikipedia. The sun, the moon, the original nine planets (including Pluto) and the the two other dwarf planets as well as their 169 moons all merit entries; the billions of other cataloged solar system objects do not. Named stars, star systems, constellations or galaxies such as Polaris, Alpha Centauri or The Milky Way should be included, but catalog entries such as HD 188753 should not be. Names of less distinct celestial objects such as Halley's Comet or Comet Hale-Bopp may require regular citations of use.

--Connel MacKenzie 00:09, 16 August 2007 (UTC)

I disagree. There are a handful of celestial objects for which the Messier catalog designation is important and commonly used enough that an entry is important. Specifically, M13, M31, and M33. The reason M13 is used is that it's easier than saying "the Great Globular Cluster in Hercules". The designations M33 and M31 turn up with regularity in astonomic literature because they are the two nearest spiral galaxies (both visible from Earth), and the designations are used rather than their names because their names are the same as the constellation in which they are located. It is therefore time-saving to use the designation rather than the name. These are the only three catalog values I would strongly argue for, though I imagine other users might argue for full inclusion of the Messier Catalog, just as we include the E-numbers for food additives. I would also argue that Halley's Comet is not a "less distinct" object; that phrasing makes no sense to me. --EncycloPetey 00:48, 16 August 2007 (UTC)
For now, on this topic, I give up. The argument that this is the application of encyclopedic notability is still too convincing. I don't see a way this can be acceptably addressed. --Connel MacKenzie 16:55, 25 August 2007 (UTC)

New propoal for Criteria to Include a Celestial Object as a Place

M31,... may be of great encyclopedic significance. Maybe one day, our descendants will visit there. Right now though, if it exists solely in astronomical literature, why include it? I'm gonna try to find some international astronomical conventions that are easy to stick with.
Now, we haven't communicated with anyone outside of our solar system, but there is a convention on which asterisms are official constellations. We navigate by these, name meteor showers for which of these are directed from, can locate our planets and satellites using them, and they basically comprise all stars significant to us, so they should be included. There are objects out there that aren't stars -- like black holes but all places we've been to so far orbit a star (the Sun), so let's exclude non-stars as places. Let's not include specific stars or globular clusters in this naming convention because they are objects within each of the constellations and the only star system we've contacted is our own.
We're special so let's include places that pertain to us. We are in the Milky Way Galaxy, so we should include that place. We are in the Solar System, so we should include that place. Why waste time enumerating the moons and planets in the Solar System in the pramble? It still means the Sun, all the planets and dwarf planets in our solar system, and all their moons. I'd like to include the Asteroid Belt as one place, because there are some large nearby rocks there we might mine and mathematically it appears right between Mars and Jupiter. The reason for including all our planets and dwarf planets is that each one orbits the Sun and weighs enough to gravitationally force itself to be spherical -- a high threshhold that excludes the other known Trans-Neptunian Objects and objects in the Asteroid Belt. New dwarf planets or moons may yet be found, so the number may change again anyway. Other notable things may be discovered in the Solar System like the Van Allen Radiation Belt, Earth's Lagrangian Points, the International Space Station, Halley's Comet, the Hale-Bop Comet, weather satellites, the Kuiper Belt Objects, and the Oort Cloud, but with such weak gravitation it's hard to imagine us being able to walk around them or for robots to treat them like land. So let's exclude these things -- not land, not a place. Gas planets and the Sun aren't landable but they are useful for referencing moons that may be. I don't know what it takes for a rock going around a planet to be called one those 136 moons, but many are landable and some may have life so even if they are only referenced in astronomical literature, they merit inclusion.
Please note that different cultures use different asterisms in their belief systems and I don't want to pose one belief system over another but we need a reliable system of recording where something appears to be from our solar system in order to create coordinates realistic 3-dimensional map of our universe. No matter what latitude, longitude, time of year, or time of day we find ourselves in, We can still observe the stars (even in daylight with the right tools) & the official mapping of the constellations will give us sufficient coordinates. Please, remember that visually close stars in a constellation may still be quite far apart, because of how their distances from us vary.
I pose we vote for including as places, the official constellations, our galaxy, our star system, & certain objects within it. Those objects are the planets, the dwarf planets, the moons of each, the star, and the Asteroid Belt, which would have been one of the planets if it had coalesced and stayed together. I'll try to make up a list of these objects. I hope I gave enough reason to exclude the other celestial bodies as not yet worthy of being places in and of themselves but merely objects within an actual place. The criteria I chose may not be palatable to every user but they will at least form a definitive list, so hopefully a vote can continue. The other objects might later be listed as localities within these places under broader criteria. Speaking of clear criteria, megacities don't have any, but in the spirit of Mega-, how about cities that had at least 1 Megaperson (1 million people) living within their offical metropolitan limits on 2005/JAN/01 00:00(UTC). Thecurran 07:49, 26 August 2007 (UTC)

A-Cai's proposal on place names that appear in a classic work of prose or poetry

How about saying something like:
  • a place name meets CFI criteria if it is real or fictional and no matter how big or small, provided that it appears in a classic work of prose or poetry.
Of course, you then have to define "classic." In English, Chaucer, Shakespeare etc. I have just recently started translating Romance of the Three Kingdoms for Wikisource, which is one of the four classic novels of China. It contains a huge number of obscure people and place names, and each of these should be documented in Wiktionary so that a person reading the original can fully understand the source text. Check out s:Romance of the Three Kingdoms/Chapter 1 to see how Wikisource and Wiktionary can fit together in this regard. The first two paragraphs alone have generated several dozen new entries for Wiktionary! -- A-cai 11:03, 1 August 2007 (UTC)
Undoubtedly however there are instances even in classic works of prose and poetry where an insignificant or fictional place is mentioned one time, without import or even without any means to say where exactly this place is. I would support inclusion of real places based on their mention as an actually identifiable place in such a work, e.g. Venice as mentioned in The Merchant of Venice. bd2412 T 20:36, 1 August 2007 (UTC)
True, but I still think we should include it, because it may not be obvious at first whether it's a significant place. Also, the location might be unknown to the first Wiktionary contributor, but maybe someone else would find out more information after a little research. For example, in Romance:
I was about to argue for keeping an entry for 五原山 (Wuyuan Mountain) when I realized that there was another way to interpret the text (Brewitt-Taylor avoids the issue by generically rendering 五原 as a place "away from the capital"). I saw a reference in Records of Three Kingdoms, which talks about a Wuyuan (五原) county. After some checking, I came across this paper about earthquakes in Inner Mongolia. According to the document (page 96), Bayan Nur (where Wuyuan county is located) has a long history of earthquakes, the first one occurring in 183 (seventh lunar month, sixth year of the Guanghe era). In Romance, it says it happened in the first year of the Guanghe era, which is 178, but it is rather obvious that it is the same event. In fact, at least one of the characters in the novel came from Wuyuan County (ex. Lü Bu). So now my translation changes to:
My point is that we now go from one mistranslated place name, which shows up only one time, to a place of significance in studying the geologic history of Inner Mongolia. Wiktionary has to remain flexible enough for people like me to work through these issues, and not just discount things out of hand because we don't know their significance at first. I mentioned classic prose and poetry in particular, because there are a lot of people who do research on these things that would be interested in seemingly minute details. And yes, I will delete 五原山 and replace it with 五原, now that I have figured this out :) -- A-cai 22:59, 1 August 2007 (UTC)

Google News Archive

I've seen odd mentions of this here and there, but can't find anything definitive about whether [url=http://news.google.com/archivesearch Google News Archive search] is considered "durably archived" for our purposes or not? Reading the About section, their aim would seem to be compatible with durably archiving the news stories along the lines of their groups and books searches. For me the answer is "yes", but what is the community's opinion? —This unsigned comment was added by Thryduulf (talkcontribs).

These have not been considered "durably archived" automatically in the past. The Google search feature does not actively retain these indefinitely (that I know of.) They don't say they do, either, on http://news.google.com/archivesearch/about.html. Since they don't own the content, they can't make any guarantee about its availability. For example, they can't even guarantee the same results a week from now. Contrast this with the groups.google.com argument, where the archives seem to only exist via Google now. (To my mind, the groups.google.com argument was flawed to begin with, as those aren't editorially reviewed, nor spell-checked, but most importantly, no guarantee can be made that Google won't be bought tomorrow and cease providing that service for free.) Their news collection service seems to be quite different: they collect news items from many sources for a limited time period only, and there is no inherent guarantee that those referents are durably archived, nor ever printed on paper at all.
If a news.google search result points to a newspaper that you know is a print edition, it is reasonable to assume that libraries have copies of the newspaper widely available. But if you don't know the newspaper, no assumption can be made that it isn't an internet-only publication. (FWIW, even a NYT citation could be from an "on-line only" edition, therefore not durably archived. The "TimesSelect" stuff seems to be for internet-only items?)
I also have a problem with pinpointing explicit relationships to individual search services. (I know I personally rely too heavily on b.g.c.; perhaps I should change my "helper sidebar links" to point to the yahoo book-search thing instead of b.g.c., despite its smaller coverage.)
I don't see how we can come out with any sort of blanket approval of results from news.g.c. Overall, it sounds like a bad idea, to me. --Connel MacKenzie 00:49, 31 July 2007 (UTC)
I believe TimesSelect is actually primarily for items that they syndicate to other newspapers; certainly it does include some syndicate items, and equally certainly not all online-only items are TimesSelect, though I'm not sure of the exact criteria. At any rate, it's usually possible to tell whether a given article appeared in the print edition. The bigger problem with using nytimes.com content is that even if a version of the article appeared in the print edition, the online content often gets fixed post-printing. (Factual errors are generally noted in corrections, but factual updates in breaking stories aren't, and at any rate, we're equally affected by the kind of textual update that they wouldn't bother to issue a correction for.) —RuakhTALK 01:19, 31 July 2007 (UTC)
Make a change to Wiktionary:List of searchable archives#News agents then? Honestly I haven't found these to be very useful anyway. DAVilla 08:35, 3 August 2007 (UTC)
They are worlds better than Usenet; they at least have had some editorial review, while covering a much broader range of new terms than b.g.c. does. Additionally, they are particularly helpful when the signal-to-noise ratio is very low on b.g.c.
Whoa! Your list of search-able archives is very interesting. But I think listing cnn.com and theonion.com are obvious mistakes, right? How are either of those durably archived? Youtube should have a more explicit prohibition...we don't need any video media, which can't meet commons' copyright restrictions. AOL and Yahoo should be listed as search portals, not content archivers themselves, right? And it still doesn't address the problem of listing which pointed-to sources we can (or can't) consider to be "durably archived." ("Durably archived," as I understand it, means that either the National Library of Congress has a paper copy, or the online source came from a site that has some very explicit corporate policy/contractual requirements to maintain the archives online, publicly, indefinitely.) The main problem with blogs and websites, is that they disappear after a year (or less.) --Connel MacKenzie 19:11, 15 August 2007 (UTC)
Don't bring copyright into it. We wouldn't be posting videos here, just quoting them, which is fair use. The problem with YouTube et al. is that it's not durably archived. But the Library of Congress doesn't just have paper. There's a collection of 2.7 million sound recordings, for instance. Wikipedia doesn't say how many films.
I would agree with eliminating AOL, Yahoo, and CNN, which as I noted above haven't been very useful anyway. The Onion does have a print version, although it's unclear how well the two correspond. DAVilla 19:26, 15 August 2007 (UTC)

alphabetical listing

I'm absolutely sure this would have been raised before and discussed ad infinitum, but I don't know where to find it. I Don't think there's a "Frequently Requested Features" page...

I would like to know if it is possible to implement a Alphabetical listing feature in Wiktionary? Since, unlike Wikipedia (et al.), there can be an "order" to pages here, why not place the letters of the alphabet along the top of the main page for quick searching, or something like that. E.g. Every letter more you type in the drop-down list shortens to show only those words that are left that have a page here. This could even be extended to have the "next" and "previous" pages automatically appear along the bottom/top of every single page.

Cheers, Witty lama 06:31, 31 July 2007 (UTC)

Our Index: pages (e.g. Index:English,) are broken down by language. You can also use Special:Allpages or Special:Prefixindex. I tried using {{rank}} for the top 1,000 English words, but that met a mixed reception (some elated, some cool, some militant objections.) For any number of reasons, I haven't returned my attention to that project in a long time. --Connel MacKenzie 07:13, 31 July 2007 (UTC)


Could someone please point me at the vote that justified this? --Connel MacKenzie 21:24, 31 July 2007 (UTC)

It seems Ptcamn would be the one to ask. bd2412 T 21:47, 31 July 2007 (UTC)
For what it's worth, I agree with the edit. There are many, many natural languages for which we have entries but for which no ISO code exists. I could find half a dozen such Australian Aboriginal languages without breaking a sweat, in part because I've been the one going through and cleaning up the Language categories and setting up ISO templates. --EncycloPetey 04:44, 1 August 2007 (UTC)
Then you should be removing those entries. ~7,000 languages inadequately covered is more than enough to keep track of. --Connel MacKenzie 06:23, 1 August 2007 (UTC)
For artificial languages and languages specifically disallowed by CFI, I have been. But I will not delete entries simply because an indiginous or extinct language was not given an ISO code. There is even one WP in a language which has no ISO code; one had to be invented internally within the Wikimedia projects to support it. The ISO code assigning process is heavily biased towards extant languages. As our mandate is "all words in all languages", we should not be showing that kind of bias against indiginous and extinct languages. --EncycloPetey 16:57, 1 August 2007 (UTC)
The point is that we can't possibly assume to be authoritative on the topic; it is a case where we must delegate such decisions to SIL/ISO-639. To claim we are "the" language authority is absurd and can only lower the general public opinion of the Wiktionary project. --Connel MacKenzie 19:13, 1 August 2007 (UTC)
Authoritative on what topic, exactly? What languages exist? What languages have been spoken been spoken by some population of humans somewhere? If you don't think some language exists, I'm sure we could find a reliable source to cite. I don't know what you think having an ISO 639 code proves, but whatever it is I'm sure it can also be demonstrated in other ways. --Ptcamn 07:30, 2 August 2007 (UTC)
But your unilateral change does not reflect this Wiktionary community's consensus. If you wish to make your point, do so. Don't go around vandalizing policy pages, instead of discussing it. --Connel MacKenzie 07:48, 2 August 2007 (UTC)
I assumed that "If the language lacks an ISO 639 language code, it is almost surely not acceptable." was written under the mistaken impression that ISO 639 covers almost all languages that "are (or were) used for everyday communication by some identifiable, natural population of humans". Since there are plenty of languages that fulfill that but do not have an ISO 639 code, the change seemed appropriate. I can understand using ISO 639 as a guide to what languages are/were spoken and should be included, but I very much doubt the community consensus would be that even though a language is real and natural, it should not be included just because it doesn't have an ISO 639 code. --Ptcamn 08:44, 2 August 2007 (UTC)
Looking at Wiktionary talk:Criteria for inclusion#ISO code criterion is seems both you and I were partly incorrect. Yes, natural languages were briefly considered earlier; no they should not be excluded out-of-hand. I still do not think the wording change you made to CFI was appropriate; the wording you have now encourages such dialects as language names (sometimes, not even romanized coherently.) Simply waiting a year, then making a change directly opposite to the discussion, seems quite provocative. The change should be rolled back/undone. --Connel MacKenzie 18:24, 15 August 2007 (UTC)

The wording originally appeared as part of this edit. There is no associated discussion on the CFI talk page. It looks more like the editor's personal opinion to me than a statement of fact. It certainly predtaes the community's realization of the trmendous shortcomings of the SIL/ISO listings for our purposes (to whit: the frequent absence of codes for extinct languages, and the inclusion of codes for things that are not languages but "meta-languages"). --EncycloPetey 09:06, 2 August 2007 (UTC)

It would be very good here to understand what 639 is, and is not. It is intended to encode all languages, living and dead, natural and artificial. It isn't a list of "good" languages or any such thing. I've worked with ISO WG's, so let me explain a bit.

The original version (639-1) was understood to be very limited. It provided two letter codes for enough languages to be enormously useful for general applications (say localizing versions of computer software, or identifying the language of documents in international trade). Trying to encode 7000 or more languages at the very start was neither possible nor required.

The dash-2 version added a number of three letter codes, including group codes. There isn't any confusion about whether art is a language or not, it is a group code for artificial languages not otherwise coded.

The dash-3 version added a great number of additional 3 letter codes. Let me explain how this works: an ISO WG typically does not directly look for or accept applications for codes from various different people or groups (although it certainly may!). What they do is designate some particular organization as the "normal" channel for contributions. In the case of 639, this is SIL. SIL regularly requests additions of codes to 639. SIL is a Christian organization, interested in publishing religious literature in many languages. Note that they do a very good job of keeping the religion from biasing the language work. (I'll bet you didn't even know that!) This does have one strong effect though: SIL itself generally only initiates coding for living languages; their Ethnologue only covers them. They are, however, happy to channel coding requests for dead languages to the ISO WG.

We have used the existing set of 639 codes (minus group codes and the 639-2 compatibility codes) as a defining set for CFI; we should continue to do this (and that text should be put back). However, it should just be our baseline; we exclude some coded artificial languages, and permit un-coded languages, assigning them codes based on the group. For example fiu-vro is Template:fiu-vro. These will at some point be coded by ISO; we would be a valuable contribution to that process. The other reason we want to use coded languages whenever possible is that they are defined by the standard; otherwise we could get into a lot of debate about what the definition of some language is.

In the fairly near future, there will be 639-4 and 639-5, defining 4 and 5-letter codes for many thousands of additional languages; the work on this is going on now. We can, as I noted, be a valuable contribution to that process. Robert Ullmann 13:51, 3 August 2007 (UTC)

I would agree that an explicit mention of ISO codes is a good thing here, but the wording previously used was inappropriate, as evidenced by Connels' interpretation of it to mean "no ISO, no entries". The exclusion of constructed languages is already treated within that section of CFI, and underwent recent revision by community vote, so that portion need not be covered here (except perhaps a see also in the text). We would also need more explicit language regarding how do expect to deal with languages for which there is evidence, but for which there is no ISO code. The text should note both the fiu-vro example as well as the fact that many extinct languages do not have an ISO code yet. --EncycloPetey 18:40, 3 August 2007 (UTC)
A couple things to say here, just to clarify. First, each currently-released part of ISO-639 has had a separate registration authority. Infoterm was RA for part 1 and the US Library of Congress for part 2. Part 4 doesn't appear to be designed with defining any new codes in mind and part 5 is primarily for language families.
However, a part 6 is also currently in the works with the goal of creating 4-letter codes for "comprehensive" coverage of languages and all variations. This includes both language families and dialectical variations. However, there is no attempt made to distinguish between them. This neatly sidesteps political issues with calling one thing a language and another a dialect, but will create the problem of forcing every group using it which needs that distinction to deal with it themselves. This is something we'll have to deal with at some point and I have no doubt that it will complicate this entire process. As such, we must seek to define our own criteria for this, though ISO codes may be a good base for now.
As it stands, I agree that ISO-639 is limited in the languages it can represent. I think it's essential for our stated purpose to allow further languages. It may be that a case-by-case decision should be made. Certainly we should have information on dialectal variation, thus the matter is simply what to call a language and what a dialect. (As noted above, constructed languages seem to have their own criteria.) As to the language of the CFI, I would suggest a note to the effect of "An ISO-639 code is one indicator of an acceptable language, but other languages may be acceptable as well. Extinct languages as well as smaller languages may be accepted based on evidence of their use at some period." Perhaps it would be worthwhile to discuss theoretical constructs, such as Proto-Germanic, etc. —Leftmostcat 07:14, 4 August 2007 (UTC)


I need to know how the modern english word "lord" comes out from the old english word "hlaford",i.e. which are the phonological processes that brought to this word in modern english.Thanks —This comment was unsigned.

Move to WT:ID. --Connel MacKenzie 13:42, 2 June 2007 (UTC)
Better, move to the Tea Room. --EncycloPetey 17:40, 2 June 2007 (UTC)
The hl is reduced to l as in most Old English nouns > Modern English nouns, the f (which is pronounced as v) is lost altogether, again this is common (compare over being reduced to o'er). See laird, the Scots version of lord which is still pronounced with two vowels. 17:29, 18 July 2007 (UTC).

Layout of example sentences

There is no real consensus about this, and I have collided with several persons undoing my changes etc. So I’d like to have this settled and put into WT:ELE.

The point is: how to format example sentences? I know we actually want to have quotes for each and every single word sense, but that is a long way to go, and in the meantime, it is very useful, especially for non-native English speakers like me, who mostly contribute translations of English words in their own language.

The format for quotations is rather fixed, as discussed in WT:" (although not all the templates on the Templates subpage do not follow it!), and I propose to handle example sentences similarly, just leaving out the date line:

'''word''' (inflection)

# first def
#* ''example sentence with '''word''' in bold''
# second def
#* ''another sentence with the '''word'''''

I.e.: having them start by a bullet, with the sentence in italics and the head word in bold. Of course this has to be adapted for other scripts etc., but I’ll leave that to those who are knowledgeable in that.

It would look like this:


word (inflection)

  1. first def
    • example sentence with word in bold
  2. second def
    • another sentence with the word

If enough people react, I’ll start a vote about this. H. (talk) 15:33, 11 June 2007 (UTC)

I agree that example sentences are an important element of an entry for dictionary users, and I make a special effort to add them. In fact, it seems to me that Wiktionary is particularly able to distinguish itself from traditional dictionaries in this respect, since we have the room to include examples for every definition. As for format, here's my view:
Example sentences should:
  • be grammatically complete sentences, beginning with a capital letter and ending with a period, question mark, or exclamation point.
  • be placed immediately after the applicable numbered definition, and before any quotations associated with that specific definition.
  • be italicized, with the defined term boldfaced.
  • be as brief as possible while still clarifying the sense of the term. (In rare cases, examples consisting of two brief sentences may work best.)
  • be indented using the ":" command placed at the start of the line.
  • not be bulleted, since a single example per definition is ordinarily all that is required, and in the rare cases where multiple examples are provided for a single definition they are separated by line breaks and are readily recognizable as multiple examples.
-- WikiPedant 15:56, 11 June 2007 (UTC)

So far, all my entries have been as stated by WikiPedant. I would prefer to keep it that way. But if the general consensus is to put bullet points, then bullet points it shall be.Algrif 16:05, 11 June 2007 (UTC)

I agree with WikiPedant and Algrif. In particular, I like that we make artificial example sentences clearly distinct from quotations by italicizing the former and bulleting the latter. (I guess this isn't strictly necessary — you can always see whether an author and date are mentioned — but I still like it.) —RuakhTALK 16:10, 11 June 2007 (UTC)

Yes, it's a good point that bulleting quotations and not bulleting example sentences contributes to keeping them very distinct visually. -- WikiPedant 16:20, 11 June 2007 (UTC)

I agree with Ruakh, WikiPedant, and Algrif. A suggestion: place example sentences before quotations, so that the example doesn't mix with the quoted text:

thiotimoline (uncountable)

  1. a chemical substance
    A crate of thiotimoline arrived today.
    • 1948, I. Asimov, The Endochronic Properties of Resublimated Thiotimoline, p42
      Observation of the sample of thiotimoline was etc etc

just to make up an example. Cynewulf 16:42, 11 June 2007 (UTC)

Indeed. I have just expanded my list above to include this point re placement. -- WikiPedant 17:27, 11 June 2007 (UTC)
I am very happy to see this discussion. I will abstain from the bullet and italics debate and abide by the outcome. However, I would like to propose that the examples be twice indented ( as #:: or #:* ) to set them off more clearly from the definitions and to horizontally align them closer to the actual quotation of any quotations that may be present. I would also like to make it very clear that example sentences are not to be linked (wikified) under any circumstances.
Finally I would like to address WikiPedant's list of format guidelines for examples with an alternate list of goals:
  1. To place the term in a context in which it is likely to appear, addressing level of formality, dialect, etc.
  2. To provide notable collocations, particularly those that are not idiomatic. The hope is that they may register with the user. (An idiomatic term is better exemplified on the respective page.)
  3. To select scenarios in which the meaning of the example itself is clear.
  4. To illustrate the meaning of the term to the extent that a definition is obtuse.
  5. To exemplify varying grammatical frames that are well understood, especially those that may not be obvious, for instance relying on collocation with a preposition.
Although the list may seem technical and specific, we tend to do these naturally without really analyzing it. But it might help to have the purposes spelled out, for instance to support my claim that the point of brevity can sometimes be counter-productive. DAVilla 19:01, 11 June 2007 (UTC)
Could you please explain what you mean by "grammatical frames" in #5, a little more clearly for me? I'm think I agree, but I'd like to be certain. --Connel MacKenzie 01:15, 15 June 2007 (UTC)
I apologize to making you wait in a response. On consideration by your request, I have put together this itself-referential example. 19:45, 28 June 2007 (UTC)
I also believe that we ought to use ‘:’ and not ‘*’ to indent the example sentences — generally, I agree with the points suggested by WikiPedant et al, although I do not think that complete sentences should be required. — Beobach972 20:10, 11 June 2007 (UTC)
I think this is becoming quite a productive discussion. Concerning DAVilla's and Beobach972's points:
DAVilla's points about (a) links (internal and external) being disallowed in examples and (b) outlining the purposes which good examples should fulfill strike me as terrific. It is very helpful to get this sort of thing down in writing. As for double indenting examples, I see pluses and minuses. On the one hand, it probably will be an aesthetic improvement if the starts of examples and quotations are horizontally aligned. On the other hand, every time you push text to the right, you increase the likelihood of needing to wrap a line and increase page depth and you also create a screen which may not look so good for users who still have low-res monitors (although the latter is a problem which will diminish with time).
Concerning Beobach972's view that complete sentences should not be mandatory, I'm not so sure. I think complete, grammatically sound sentences should be the norm and should, at the minimum, be strongly encouraged. In a world where the sentence seems to be at increased risk of becoming an endangered species, a dictionary should be one of the bastions of proper English usage. -- WikiPedant 20:39, 11 June 2007 (UTC)
Foreign language entries (at least ja and ko ones) use "*" instead of ":" for each example to distinguish between (a) the example ("*" followed by the non-italicized example), (b) the transliteration of the example ("*:" followed by italicized transliteration), and (c) the English translation of the example ("*:" followed by standard translation, sometimes followed by a parenthesized and quoted literal translation). If we care to make English examples and foreign language examples consistent, we should consider the formatting challenge of examples with transliterations and translations. Note that multiple examples are often required to show the proper usage of the term, especially to hightlight syntax or grammar patterns. Rod (A. Smith) 22:35, 11 June 2007 (UTC)
WikiPedant: complete sentences are generally alright, it's just that fragments can work just as well in many cases... giving ‘a fishing line’ for one of the senses of line, for example, works well — making it a complete sentence, ‘That is a fishing line’, doesn't increase the illustrative quality of the sentence. — Beobach972 00:46, 12 June 2007 (UTC)
Rod: we could make the style consistent (although what we'd do to reconcile languages that used ':' instead of '*', if we encountered any, I don't know :-p ... ), but it does not necessarily have to be. We can decide the entry layout for English words, and individual languages are free to create Wiktionary:About Xyzese pages to outline how the format for them differs from WT:ELE (I know that you know this, Rod, but I clarify it in case others do not). — Beobach972 00:46, 12 June 2007 (UTC)

The only comment I'd add is that example sentences are most useful when we haven't got a good citation for a particular sense. When we do have a good citation, we don't really need a sentence invented by us as well. That's because on a fundamental level a good dictionary works from citations to definitions and not the other way round. Although thinking about it, if we do get this new Citations namespace then maybe it'll be best to have all citations there and leave an entry with only example sentences. That would probably make it less cluttered as well. Widsith 08:15, 12 June 2007 (UTC)

Algrif comment removed. Misunderstood what was meant above. Algrif 12:17, 12 June 2007 (UTC)

I very much like this discussion. I use single indentation for example sentences, bullets for quotations. For foreign entries, example sentences are usually indented once, translations of those examples double indented. I feel quite strongly that all example sentences should be complete sentences, never sentence fragments. I think DAVilla's list of "goals" for an example sentence are fantastic. Like Widsith, I thought that quotations could be used to replace any example sentence indiscriminately; perhaps (with DAVilla's list) we should review that choice. I think there is something to be said for having both. Also, with DAVilla's list, we can have objective discussions about particular replacements. --Connel MacKenzie 21:08, 12 June 2007 (UTC)
Widsith: I don't think we'll ever remove all quotations from the entries — as you say, a dictionary is supposed to define words based on usage, so having examples of real usage is the best policy. Additionally, it's nice for entries to show the oldest print uses of the word (or, for no-longer-used words, the most recent), famous uses of it, etc. — Beobach972 03:32, 13 June 2007 (UTC)
Yep, that sounds good. My main worry was just that when you invent sentences, there is always a temptation to make something up to fit your definition, rather than word the definition to fit the citations. But there are always going to be cases where you need to construct your own examples, even the sOED resorts to it sometimes. Widsith 08:33, 13 June 2007 (UTC)


Since we largely seem to agree on this, I do not think it is necessary to start a vote on this. I propose that I add the following part to WT:ELE, between the headers ‘Definitions’ and ‘Abbreviations’ (or should it come after the latter?):


Example sentences

Generally, every definition should be accompanied by a quotation illustrating the definition. If no quotation can be found, it is strongly encouraged to create an example sentence. Example sentences should:

  • be grammatically complete sentences, beginning with a capital letter and ending with a period, question mark, or exclamation point.
  • be placed immediately after the applicable numbered definition, and before any quotations associated with that specific definition.
  • be italicized, with the defined term boldfaced.
  • be as brief as possible while still clarifying the sense of the term. (In rare cases, examples consisting of two brief sentences may work best.)
  • be indented using the "#:" command placed at the start of the line.
  • for languages in non-Latin scripts, a transcription is to be given in the line below, with the same indentation.
  • for languages other than English, a translation is to be given in the line below (i.e. below the sentence or below the transcription), with an additional level of indentation: "#::".

The goal of the example sentences is the following, which is to be kept in mind when making one up:

  1. To place the term in a context in which it is likely to appear, addressing level of formality, dialect, etc.
  2. To provide notable collocations, particularly those that are not idiomatic.
  3. To select scenarios in which the meaning of the example itself is clear.
  4. To illustrate the meaning of the term to the extent that a definition is obtuse.
  5. To exemplify varying grammatical frames that are well understood, especially those that may not be obvious, for instance relying on collocation with a preposition.


I’ll edit the example lay-out accordingly. (Note that I made the indent be #:, since it is not to break up the numbering, of course.) I also added a few lines about languages other than English.

I’dd also add the following sentence to the header ‘Quotations’: ‘Quotations are prefered over example sentences. However, nothing stops you from providing both. In some cases, it might be reasonable to remove the example sentence, if the quotation exemplifies the same use. Quotation are generally put under the definition which they illustrate. If there is both an example sentence and a quotation, the quotation follows the example sentence.'

Any objections/changes? Suggestions for better wordings are wholeheartedly welcomed, I still feel constrained when having to write in English, not finding the words with the connotation I want… H. (talk) 16:41, 25 June 2007 (UTC)

Looks good, although it is a bit wordy. That might be helped by a couple of clean examples, perhaps one without a direct citation following, and one example with a citation following in order to make it clear that the examples and citations can work together. --EncycloPetey 16:45, 25 June 2007 (UTC)
I'd put a vote up anyway -- people are a bit tetchy about changing ELE, though w:WP:SNOW may apply. Also, I'd like to suggest putting transliteration at the same added indentation as translation:

その (romaji sono)

  1. that
    sono hito wa monkii desu.
    That person is a monkey.
... but I guess it's fine either way. Looks good! Cynewulf 19:28, 25 June 2007 (UTC)
It would also be nice to italicize transliterations consistently:
  1. that
    sono hito wa monkii desu.
    That person is a monkey.
Rod (A. Smith) 20:41, 25 June 2007 (UTC)
Er -- right. Oops. Cynewulf 20:43, 25 June 2007 (UTC)
  • If it's such a shoo-in, why bother skipping the vote? Just to give naysayers ammunition? It is rare to have a completely non-controversial vote - frankly I think I'd enjoy that change from tradition. :-)   --Connel MacKenzie 17:42, 26 June 2007 (UTC)

Ok, I’ll make it a vote then. but I think a little more discussion is needed first, since…

I do not approve of the boldening of the word in the transcription/translation. That is not done with citations either. I am unsure about the extra indenting for transliteration. For many scripts, it is probably nicer if they are outlined (e.g. Cyrillic, where the correspondence is almost 1/1). So the example would look like this:

その (romaji sono)

  1. that
    sono hito wa monkii desu.
    That person is a monkey.

OTOH, in ideogrammatic scripts, it might be useful, since bold doesn’t really work there… I’ll leave that to the specialists. What do they say?

If it’s too wordy, please go ahead and rewrite. As I said, I am not a native speaker and often have to use longer sentences to express what I mean (like now).

Ah, the following point is missing:

  • not be wikified

H. (talk) 13:42, 27 June 2007 (UTC)

I think it's very important to bold words in a transliteration/transcription and in a translation, so that it's clear what matches what; if an example sentence or quotation for a word meaning "home" actually uses a definite locative form meaning "in the home", then all of "in the home" should be in bold for clarity's sake. —RuakhTALK 16:25, 27 June 2007 (UTC)
So how would this format adapt to support two example sentences with translation if there was no transliteration required? Specifically, how would we format:
  1. To feel, to make feel.
    Tengo frío. - I am cold.
    Eso nos tiene lastimos. - That makes us sad.
--EncycloPetey 20:23, 7 July 2007 (UTC)
I've been separating loose translations from literal ones for such examples, like this:
  1. To feel, to make feel.
    Tengo frío.
    I am cold. (literally, "I feel cold.")
    Eso nos tiene lastimos.
    That makes us sad. (literally, "That makes us feel sad.")
Rod (A. Smith) 20:58, 7 July 2007 (UTC)
Then for example sentences in English, we use italics, but for examples and translations in all other languages, we don't? Also, for languages that do not need to be transliterated, this adds lots of unnecessary space between successive definition lines. For tener, we currently have only 7 Spanish definitions (there are potentially more than 30, judging by the RAE).
Following above guidelines Compressed, with italics
  1. (transitive) To have, possess an object.
    Ella tiene seis hermanos.
    She has six brothers.
    Tengo una pluma.
    I have a pen.
  2. (transitive) To possess a condition or quality.
    Usted tiene suerte.
    You are lucky. (literally: You have luck.)
    ¡Ten cuidado!
    Be careful! (literally: Have care!)
    ¿Quién tiene razón?
    Who is right?
  3. (transitive) To hold, grasp.
    Ten esto.
    Hold this.
  4. (transitive) To contain.
    Este tarro tiene las cenizas.
    This jar contains the ashes.
  5. (intransitive) To feel (internally).
    Él tiene mucho cariño para ella.
    He feels much admiration for her.
    (with calor, frío): Tengo frío.
    I feel cold.
    (with hambre, sed): Tenemos hambre.
    We are hungry.
  6. (transitive) To make to feel.
    Eso nos tiene lastimos.
    That makes us sad.
  7. (transitive) To be of a measure or age.
    Tiene tres metros de ancho.
    It is three metres wide.
    Tengo veinte años.
    I am twenty years (old).
  1. (transitive) To have, possess an object.
    Ella tiene seis hermanos. - She has six brothers.
    Tengo una pluma. - I have a pen.
  2. (transitive) To possess a condition or quality.
    Usted tiene suerte. - You are lucky. (literally: You have luck.)
    ¡Ten cuidado! - Be careful! (literally: Have care!)
    ¿Quién tiene razón? - Who is right?
  3. (transitive) To hold, grasp.
    Ten esto. - Hold this.
  4. (transitive) To contain.
    Este tarro tiene las cenizas. - This jar contains the ashes.
  5. (transitive) To feel (internally).
    Él tiene mucho cariño para ella. - He feels much admiration for her.
    (with calor, frío): Tengo frío. - I feel cold.
    (with hambre, sed): Tenemos hambre. - We are hungry.
  6. (transitive) To make to feel.
    Eso nos tiene lastimos. - That makes us sad.
  7. (transitive) To be of a measure or age.
    Tiene tres metros de ancho. - It is three metres wide.
    Tengo veinte años. - I am twenty years (old).
As a real example, I'd like to hear what people think about the readability, screen display, and visual scanning of both formats. I prefer the one of the right because of screen size limitations on some monitors, but that might just be me. --EncycloPetey 21:36, 7 July 2007 (UTC)
The version on the right looks much better, at least for short sentences (though personally I'd prefer em dashes to hyphens). —RuakhTALK 23:40, 7 July 2007 (UTC)
Agreed. The compressed version on the right is more readable, more clear, and more aesthetically pleasing.Algrif 12:06, 8 July 2007 (UTC)
I like the one on the right as well, but am not clear how it would apply to languages requiring romanization. Would we put the real text on a separate line, and the romanization and translation together; or the reverse? -- Visviva 13:28, 8 July 2007 (UTC)

Note that the left side of the above table does not follow my proposed guidelines, in that the example sentences are not italicised. I copied the table over below to make the comparison reasonable. H. (talk) 15:52, 19 July 2007 (UTC)

Following above guidelines Compressed


  1. A tactic, strategy or piece of legislation.
    He took drastic measures to halt inflation.
  2. An indicator; Something used to assess some property.
    The average price of basic household goods is a measure for inflation.
    Honesty is the true measure of a man.


  1. A tactic, strategy or piece of legislation.
    He took drastic measures to halt inflation.
  2. An indicator; Something used to assess some property.
    The average price of basic household goods is a measure for inflation.
    Honesty is the true measure of a man.
Latin-based language
  1. (transitive) To have, possess an object.
    Ella tiene seis hermanos.
    She has six brothers.
    Tengo una pluma.
    I have a pen.
  2. (transitive) To possess a condition or quality.
    Usted tiene suerte.
    You are lucky. (literally: You have luck.)
    ¡Ten cuidado!
    Be careful! (literally: Have care!)
    ¿Quién tiene razón?
    Who is right?
Latin-based language
  1. (transitive) To have, possess an object.
    Ella tiene seis hermanos.She has six brothers.
    Tengo una pluma. — I have a pen.
  2. (transitive) To possess a condition or quality.
    Usted tiene suerte. — You are lucky. (literally: You have luck.)
    ¡Ten cuidado! — Be careful! (literally: Have care!)
    ¿Quién tiene razón? — Who is right?
Non-latin-based language
  1. that
    sono hito wa monkii desu.
    That person is a monkey.
Non-latin-based language
  1. that
    その人はモンキーです。— sono hito wa monkii desu. — That person is a monkey.

Extended for languages that need romanisations. I do prefer the left option here. One simple reason is that I am not looking forward in replacing an endless amount of hyphens by em-dashes. Also note that all of the example sentences in the Spanish example are extremely short. This will not always be the case. And last but not least, for languages which need transcription the compact format does not look good at all. H. (talk) 15:52, 19 July 2007 (UTC)


I just started a vote on this topic: Wiktionary:Votes/2007-07/Layout of example sentences. H. (talk) 15:57, 19 July 2007 (UTC)

Addition of headers to ELE

I'd like to consolidate the discussion that has taken place sporadically in various places about the Part of Speech headers, specifically the lack of several headers. Presently, our POS headers are very Euro-centric; they cover the European languages fairly well, but other languages (Asian ones in particular) are not as well covered. Japanese, for example, makes use of circumfixes, so I propose adding this header to WT:ELE/POS. (Note: this debate is separate from any discussion of whether the phenomenon exists in English. It unequivocally exists in other languages.) It would be possible to make Wiktionary:About Xyzese pages for each language, and detail the use of the header in each, however, it is used in enough languages (Japanese, Guaraní, Hebrew, Dutch — note that in this case we aren't even adequately covering the European languages — Indonesian, etc) that I feel its addition to WT:ELE/POS is a simpler and more appropriate action. Additionally, I am under the impression that languages like Malay, Indonesian, etc could use either a new header or two or three, or clarification for certain troublesome words. Stephen and others can hopefully explain that, since I don't remember what headers were suggested. At any rate, I assume such additions require votes, so I'd like this discussion to precede a vote on the addition of each header (a separate vote for each, to be safe?). — Beobach972 14:22, 21 June 2007 (UTC)

Very interesting. "Headings for non-English sections" would have what requirement, then? Use for 10 or more languages? Should each of these be format-enhanced with templates (confer: {{acronym}},) so that the English-language near-equivalent can be presented? For example, I'd like to see "Noun" or "Noun-like" displayed for the foreign-language (FL) heading "Nominal". Likewise, I'd like to see "Conjugation" displayed for "Mutations". Or "Prefix and suffix" in place of "circumfix". With these in templates (restricted to a finite list) and some judicious CSS, I'd think both factions could be accommodated. The templates might make it easier for off-line XML parsing as well (or at least more obvious that an approved exception-case has been encountered.) Is this idea worth experimenting/toying with? --Connel MacKenzie 17:45, 21 June 2007 (UTC)
How should circumfixes work, ideally? I mean, I assume users aren't going to type "a- -ing" into the search box; is the idea that a- and -ing would both link to a- -ing? What happens if a prefix or suffix's only use is in circumfixes? Would its definition lines just say "Used in the circumfix [[___]]"? (By the way, I'm not sure it's so unambiguous that Hebrew has circumfixes. I do think we should include the putative Hebrew circumfixes, for the same reason that we should include the putative English ones: they clearly exist, whatever they are, and a lack of consensus that they're circumfixes isn't the same as a consensus that they're not. If we left things out because there's no consensus about what they are, then we wouldn't have any of the English possessive pronouns/possessive adjectives/possessive determiners/genitive pronouns.) —RuakhTALK 17:50, 21 June 2007 (UTC)
While I do think that "English circumfixes" is obviously incorrect, the topic that has been raised in the past is about splitting them correctly into prefix and suffix combinations. There is a need to use accepted terminology describing these things. They can be described without using FL terminology, and our readers are much better served when we stick to English descriptions. The topic at hand is making exemptions for certain FLs that use specific terminology, where in that limited context (entry sections for that language) such terminology is not only acceptable, but expected. For English entries, "Pronoun" and "Adjective" are valid headings; the examples you listed are not. That doesn't imply that we don't describe those functions, but it does imply that we limit the headings we use to group them. --Connel MacKenzie 19:50, 21 June 2007 (UTC)
I am 100% O.K. with a header like "Prefix and suffix" or "Prefix-suffix combination" in describing any language's putative circumfixes. I think it's generally a bad idea to use different POS headers in reference to the same POS in different languages, because it makes languages seem exotic, complicated, and different when those aren't necessarily the case. —RuakhTALK 00:00, 22 June 2007 (UTC)
Regarding the broader question of what new headers to add to support other languages: we're already using a "Root" POS header and a "Forms" subheader for Hebrew words (at least conceptually — so far I think there are less than a dozen such). A "Pattern" POS header might be nice as well, but the patterns might be better described in an appendix, and it hasn't been discussed yet at Wiktionary:About Hebrew. (Incidentally, Arabic has root+pattern→form inflectional and derivational paradigms as well, but apparently the Arabic-speaking editors here have chosen not to go that route. I don't know Arabic at all, so can't judge why.) A "Participle" POS header would be useful for languages where participles are declined like adjectives and/or nouns — they're still verb forms, but it makes little sense to include participles' declensions in the conjugation tables at the entries for lemmata. (It might also be useful for English, but that's less necessary.) A "Circumposition" header would be good as a theoretical matter; there are certain languages that are claimed to have circumpositions, and while none of these claims has universal acceptance, there's no better place to discuss them than at the entries for these putative cirumpositions. A "Copula" header would make sense for languages where not all copulas are verbs. A "Stative verb" header might make sense for languages like Japanese, though since right now we have decent Japanese coverage without using that header, I guess it must not be totally necessary. (I don't actually speak Japanese, so can't tell whether we're misleading calling them "adjectives", or correctly-​but-​with-​disregard-​for-​common-​pedagogical-​distinctions calling them simply "verbs", or if we've adopted our own term for them.) A "Grammatical word" or "Non-lexical item" header would be good as a catch-all for weird grammatical words that don't easily fit into any category, such as French voici or Hebrew Template:wlink (yesh) (which dictionaries have somehow decided are adverbs, presumably through some sort of lottery process, or perhaps consultation with a Magic 8-Ball). —RuakhTALK 17:50, 21 June 2007 (UTC)
Well, circumfixes occur in Indonesian (see w:Indonesian language#Grammar), but the Wikipedia article mostly calls them confixes. So, there's another language that will need the header. I can't imagine the best way to enter them, though. I imagine that a-ing, a-...-ing, a- -ing, and more could work and would be possibilities for entering as a search. According to the Wikipedia article on w:Circumfix, there is at least one example in Japanese and one in Guaraní.
As to the broader issue, Pattern is not a part of speech or a function, so I don't think that works well as a header. I agree that Participle is a very useful header for Latin, Ancient Greek, (and for Russian as Stephen has said elsewhere). Couldn't Stative verb be handled the same way we handle transitive, intransitive, and reflexive verbs? That is, use ===Verb=== together with (stative) at the head of the definition line? French voici is an odd construction, but it's still best described as an adverb (or determiner), like Hungarian az (which also appears without a verb, as in "Az taska" That is a bag). We need not proliferate headers for that word. Any new header ought to have a sufficient number of uses required before being seriously considered. A single oddball word or two doesn't make a full discussion worth the effort. Participle is a real possibility that occurs in multiple languages for multiple words. Circumfix still seems tenuous, and could be covered under Affix is there aren't enough cases found to make the effort worthwhile. --EncycloPetey 19:17, 21 June 2007 (UTC)
I agree that lookup of "circumfixes" is thorny. For languages that have them, an appendix: entry would seem the most logical way to group them. That way, peculiarities of naming conventions can be unified is a fairly consistent manner. --Connel MacKenzie 19:54, 21 June 2007 (UTC)
Re: "Pattern is not a part of speech or a function": I'm not sure I follow. Patterns serve many of the same roles in Semitic languages (and probably other Afro-Asiatic languages) that affixes do in English. (This is not to suggest that Semitic languages don't have affixes, just that they also have patterns.) The only reason I can think of to exclude patterns is that it's hard to figure out what article titles to use. (For those not familiar with Semitic languages: words are typically formed from "roots", which are a series of consonants that pertain to the meaning, and "patterns", which are a series of vowels, possibly plus a prefix and/or suffix, that pertain primarily to the grammar. For example, the root L-M-D pertains to learning or studying; use it with the pattern -o-é-et, and you get lomédet "am/is/are learning (fem. sing.)", but use it with the pattern -i-é-, and you get liméd "[he] taught".)   Re: using the "Verb" header with a "stative" tag: Yeah, that sounds reasonable.   Re: "Any new header ought to have a sufficient number of uses required before being seriously considered.": As long as there's a catch-all header, that's fine. I don't think it's necessary to give (say) circumfixes a "circumfix" header, but I do think it's necessary to include circumfixes and give them a header of some sort. —RuakhTALK 00:00, 22 June 2007 (UTC)
Ah, but the question is, if you use ‘-o-é-et’ with the root *BDG (which I've made up for the occasion), which pertains to talking and conversations, do you get bodéget, ‘am/is/are talking (FEM SING)’? Id est, (I'm assuming) the pattern has the meaning ‘am/is/are doing (FEM SING)’ — does it? — Beobach972 04:45, 22 June 2007 (UTC)
It depends; not every root+pattern combination necessarily produces a word, like how fingerism and typistry aren't words in English. Generally a given pattern will produce many different words; for example, every single verb lemma in the language is produced using a root and one of only seven verb patterns. (The noun patterns are much more hit-and-miss; some are extremely common, such as the participle patterns, and others are too rare to warrant entries. If we do decide to include patterns, we'll have to figure out some sort of criteria for including them or not, with productiveness probably being the main factor.) To go with your example — "am/is/are talking (fem. sing.)" would ordinarily use the root D-B-R and the pattern m'-a-é-et, producing m'dabéret, but "am/is/are conversing (fem. sing.)" or "am/is/are saying (fem. sing.)" would each use the pattern -o-é-et (the former with the root S-KH-KH, producing sokhákhat due to vowel mutations, and the latter with the root '-M-R, producing oméret). —RuakhTALK 06:57, 22 June 2007 (UTC)
Alright, that's what I was asking : the patterns have (or, some of them have) both a finite number of meanings/results and a significant number of applications/uses (ie, they're definable). — Beobach972 03:12, 23 June 2007 (UTC)
Japanese circumfixes? The one that w:Circumfix mentions -- 読みなる o-yomi ni naru -- is the only thing even close to a circumfix that I'm aware of in Japanese. I don't even think it is a circumfix: the になる ni naru part is two separate words, and the irregular form ご覧になる goran ni naru (instead of *お見になる)) uses it; similarly the お o- prefix is used on all sorts of words to make them "prettier". Is it so hard to describe this sort of thing as prefix + stem + suffix, or in this case as prefix + verb-form + particle + other verb? In any event, please don't take the existence of this pattern as evidence in support of needing a Circumfix header here. Cynewulf 20:02, 21 June 2007 (UTC)
Alright, it seems that the Wikipedia article needs to be corrected. At any rate, the list of headers under discussion so far (to have them all in one place) is :
  1. Root (already in use, according to my recollection and some of the comments above)
    Forms (or Form?)
  2. Participle (I'm surprised we don't use this one already)
  3. Circumfix (and/or Confix?)
  4. Copula (again, I'm surprised)
  5. Verbal noun (?)
  6. Particle
  7. Determiner
  8. Predicative
  9. Impersonal verb
— Beobach972 04:45, 22 June 2007 (UTC)
Actually, we are already using Participle; it just hasn't been officially accepted yet, as is the case for most POS headers we don't use in English. I do hope that one of the outcomes of this discussion will be to come up with a list of headers to be voted in officially. (see WT:POS). I've never heard of Copula as a part of speech, but it may only exist in languages I know little or nothing about (there are many of those!) Ideally, each language eventually will have its own complete official list of aceptible POS headers on a page like Wiktionary:About Greek or Wiktionary:About Japanese. --EncycloPetey 05:01, 22 June 2007 (UTC)
As for how circumfixes would work, I think a person would search for whatever word they wanted (eg xbdgy) and find the circumfix in the etymology (x- + bdg + -x), as Raifʻhār Doremítzwr has suggested. — Beobach972 04:45, 22 June 2007 (UTC)
"Forms" actually wouldn't be a POS header; each "Root" section has a "Forms" subsection for listing the various lemmata using that root. (It's essentially the "Root" equivalent of a "Derived terms" section, but with a few peculiarities. It's essentially a bridge between the old dictionary-listing style where words appeared in alphabetical order by root, and the modern dictionary-listing style where words appear in alphabetical order by their final root+pattern form.) —RuakhTALK 06:57, 22 June 2007 (UTC)
We may also need a 'Verbal noun' header. See [this very recent discussion]. However, we may not need to add it to ELE/POS, if it is only used in two languages; we'll have to determine how many languages use it. — Beobach972 22:44, 23 June 2007 (UTC)
The idea behind WT:POS was to list all the headers, though not necessarily arrange them by language. If the Verbal noun is a standard feature describing many words in Arabic and Russian, then it ought to merit inclusion. Those are two rather widely-spoken well-known world languages. --EncycloPetey 23:41, 23 June 2007 (UTC)
So, we all seem to agree on Root and Participle, both already in use. Do we agree on Verbal noun? — Beobach972 19:44, 25 June 2007 (UTC)
Are there any other headers we are currently using (in more than one language) that are unofficial? — Beobach972 01:32, 26 June 2007 (UTC)
Most headers listed at WT:POS have not been investigated to see how many languages use them. So, that list would make a good starting point for checking. --EncycloPetey 01:37, 26 June 2007 (UTC)
Great idea! I see Particle is used in several, is it official yet? I can't really tell. — Beobach972 01:46, 26 June 2007 (UTC)
Nothing described on the page is official. Only the standard English headers are truly official (the separate lists at the top of the page). Everything else is in limbo (or would be if the pope hadn't recently abolished limbo). --EncycloPetey 01:50, 26 June 2007 (UTC)
Have we reached a conclusion on Pattern? It seems useful to me, but reservations were expressed, above. — Beobach972 19:44, 25 June 2007 (UTC)
In addition, we have to reach a decision on Circumfix (and/or Confix) and Copula. Circumfix seems necessary; what is the status of Copula? — Beobach972 19:44, 25 June 2007 (UTC)
Please take a look at Wiktionary:Votes/pl-2007-06/Addition of headers to ELE. Are there any imporvements you deem necessary? Is the voting procedure acceptable? — Beobach972 20:15, 25 June 2007 (UTC)
I'm not sure that we're ready to vote on "Pattern" yet; at this point it hasn't even been brought up at Wiktionary talk:About Hebrew (Hebrew being the only language currently using "Root", and hence the only language that would conceivably use "Pattern"). I think it might be best to wait until such time as a "Pattern" header is discussed at Wiktionary talk:About Hebrew and incorporated into Wiktionary:About Hebrew — which means that it can be voted on when we vote on Wiktionary:About Hebrew. —RuakhTALK 20:58, 25 June 2007 (UTC)
Ah. Should Root be postponed to WT:AH, too, or voted on here? My guess is that all of the Semitic languages could use it, but if Hebrew is the only one using it (and there are no objections to that use), then I suppose there is no need to make it official yet. — Beobach972 01:32, 26 June 2007 (UTC)
I think it's fine to vote on "Root" now — the Hebrew speakers at Wiktionary talk:About Hebrew all seem to support it, and the non-Hebrew-speakers who have expressed opposition don't seem to be opposing it on language-specific grounds, so can make their case just as well here. Currently Hebrew is the only one using it (well, and Aramaic, but so far our Aramaic coverage is really just an extension of our coverage of Ancient Hebrew), but only because Arabic isn't using it and Amharic doesn't seem to have gotten very organized yet. (I don't think we have coverage of any other Semitic languages.) —RuakhTALK 04:36, 26 June 2007 (UTC)

I for one would love to have Substantive (or perhaps Substantive adjective). I think as it stands in Latin, we're using the "Adjective" header and (substantive) before a definition, but it would be nice to be able to present inflection tables without feeling like they're being put on non-lemma pages. Medellia 07:00, 26 June 2007 (UTC)

I considered that one for a long time, but eventually decided it wasn't useful. In most cases, the inflection is the same, so having the header would mean a duplication of the inflection tables in a second section. As you pointed out to me, it's only in specialized cases where the substantive sense is limited to a single gender that there is an issue. So, on the one hand there is some use for it. On the other hand, having it opens the floodgates for every substantive sense to be treated this way. As an alternative, souldn't we use something like (substantive, m) as a header? The inflection would still appear in the adjective declension table, and the user would be alerted to the fact that the sense applies only to a single gender. --EncycloPetey 18:33, 26 June 2007 (UTC)

I didn't phrase my response correctly then. Substantive meanings worth nothing (i.e. those that differ greatly from the meanings of the adjectives whence they come) usually pertain to one gender only but sometimes to two. I understand and agree with your reasoning, however. So long as substantives aren't being placed under the "noun" header (which I'm nearly certain I've seen... AG, perhaps?), I can deal with it. Medellia 15:47, 27 June 2007 (UTC)

I note that that still uses L3 headings of Demonstrative adjective, Demonstrative pronoun, and Relative pronoun, which looks good to me. Is there some reason for limiting the number or specificity of headers? I think changing the three headers in that to Adjective and Pronoun would be a net loss. —Stephen 18:33, 26 June 2007 (UTC)
That's why the header Determiner was proposed some time ago. Determiners cover some of the functions traditionally split in grammars between demonstrative adjective and demonstrative pronoun. If we use Determiner and Pronoun for that, we don't lose any of the information. I think a starter list was assembled in Category:English determiners. It's a closed set, so it's easy to manage. I think Determiner should be added to the list of ones to be voted on, as there has already been discussion before on the subject. --EncycloPetey 18:39, 26 June 2007 (UTC)
However, I’m not really concerned with the headers used by English. Some languages need other headers. Russian, for example, needs Participle (Russian participles, which include present active, present passive, past active, past passive, present adverbial, and past adverbial, are not like English participles and words like adjective or verb don’t fit); Predicative, which is a part of speech that shares some traits of adjectives together with verbs, but which are neither adjective nor verb; Impersonal verb is a small but important class of words that are not verbs, but technically are impersonal verbs (i.e., they can be called impersonal verbs, but they are not considered verbs); and there are several others besides these for Russian. It would actually be helpful to use more specific headers in Russian, much the same way that they are used in that, because some participles (for example) are very unlike other participles, both grammatically and semantically: adverbial participles have almost nothing in common with active and passive participles.
Arabic distinguishes many more parts of speech than English does, but they may be combined to a large extent and listed under familiar Engish terms. Arabic does have some parts of speech are don’t combine well with what English grammar would expect, and special headers would be helpful. For instance, Arabic is said not to have an infinitive, but an infinitive is a kind of verbal noun ("to work" is semantically very close to "working")...each Arabic verb has, in addition to the different finite forms and participles, a Verbal noun, which some grammarians call an infinitive. There are numerous sorts of nouns that may be made from a verb, but there is only one Verbal noun. While the various nouns that are made from verbs should all have their own separate pages, the Verbal noun belongs on the verb page (it needs its own page as well, but it is also an integral part of the verb and is listed with the verb). Arabic has an important class of words called elatives that should have a header. There are a number of others.
If there is some reason that using the correct header in a foreign language creates some sort of problem with bots or statistics, maybe we could just have a miscellaneous header like ===Other===, and then I could put the correct term after it in bold like this: ===Other=== br/ font size=3 Adverbial participle /font size=3 br/ хохоча br/ # definite. This way, whoever is interested in the statistics will only see Other, and the bots will only have to be aware of Other, but words will still have apparent headers that correctly identify them but don’t show up in places where people don’t like them. —Stephen 18:33, 26 June 2007 (UTC)
Another one that occurs to me is ===Syllable===, which is used in Korean and other South- and Southeast-Asian languages, equivalent to letter in the Roman and Cyrillic alphabets. The header ===Symbol=== should suffice for punctuation. —Stephen 17:42, 28 June 2007 (UTC)
Another one I find is ===Classifier===, as in Template:THchar, a part of speech found in Thai that is missing in English. To create plurals and when counting something in Thai, you have to select a classifier from a long list of classifiers, depending on the noun in question. For paper, the classifier is Template:THchar; Template:THchar; for piles of paper, it’s Template:THchar; for pieces of food, it’s Template:THchar; Template:THchar is for ears of corn; Template:THchar is for bundles of wood. There are many classifiers and they have to agree with the noun in the same way that pronouns agree with nouns in English. —Stephen 22:23, 30 June 2007 (UTC)
If that's the same as the way the Japanese do it, then it could be included as a subcategory of ===Determiner===. Articles, Demonstratives, and Number/Numerals are all subcategories of Determiner. Of course, that doesn't mean we shouldn't consider having Classifier as a header in its own right. After all, we have Article. --EncycloPetey 22:50, 30 June 2007 (UTC)
Minor clarification: Modern Korean is written with an alphabet, so in Korean, 음절 (eumjeol, “syllable”) is a collection of two or more 자모 (jamo, “letters”). Rod (A. Smith) 18:42, 28 June 2007 (UTC)
Yes, I read Korean. While it is a true alphabet, the smallest movable unit of any word is a syllable, not a letter. Individual Korean letters cannot be cut, copied or pasted; you have to work by syllables. —Stephen 19:26, 28 June 2007 (UTC)
Maybe your operating system doesn't allow you to cut, copy, or paste individual jamo, but that seems irrelevant. Individual jamo within Korean words are often replaced with others: 돕다 (dopda) -> 도와 (dowa). Rod (A. Smith) 20:01, 28 June 2007 (UTC)
I don’t understand what you’re saying. Do you mean that you can copy the "d" of the syllable 돕 (leaving nothing but the "_op") and then paste it under the "o" in 도? I can type any Korean I like, but I cannot copy and cut or paste individual letters from or into a syllable. Once I type a syllable, it cannot be deconstructed changed. —Stephen 23:03, 28 June 2007 (UTC)
And more to the point, even if a dedicated Korean system allows you to do it, most Americans, Britons and Aussies will not be able to, and if they want to find out the pronunciation of 등대지기님, they will have to copy a syllable at a time: , , , , . If you only have the individual jamo , people cannot cut that out of a syllable in order to search for it. —Stephen 23:10, 28 June 2007 (UTC)
Copy-paste mechanics for looking up pronunciation seem irrelevant. I am just correcting the common misconception that a Korean syllable is the closest equivalent to a Roman or Cyrillic letter. Despite the fact that each Korean syllable has a Unicode entry, the Korean letter is the jamo. That is, "Korean jamo" is to "Korean syllable" as "English letter" is to "English syllable". Rod (A. Smith) 23:56, 28 June 2007 (UTC)
The copy-paste mechanics are relevant because that was what I was talking about. I was discussing the header ===Syllable=== which is used in many Korean entries, which is not a lexical item but is only for pronunciation. Just as anybody can look up the individual letters in тягач to see what script they are in and approximately how they are pronounced, anyone can look up the individual syllables in 등대지기님 (but cannot look up the individual jamo). This is why the header ===Syllable=== is present in many Korean entries, and ===Syllable=== will also be required for pronunciation purposes in the scripts of most other South Asian and Southeast Asian languages, including Thai, Tamil, Khmer, Lao, Burmese, Sinhala, and so on. Unlike Korean, Thai letters can be copied and pasted individually, but the Thai script works in a very complex and unusual way, and you have to address the pronunciation at the syllable level. —Stephen 00:31, 29 June 2007 (UTC)
Aha. I didn't realize you were talking about users looking up pronunciation of Korean terms we don't include. I wondered why we include ==Korean== ===Syllable=== entries. You suggest that a user who doesn't know how to pronounce Korean letters might actually search for a Korean term here, find no results, and then cut and paste each individual character of that term into a search box in order to find pronunciation hints. Interesting. Rod (A. Smith) 00:45, 29 June 2007 (UTC)
We're straying a little from the topic here, but -- the only legitimate use for Korean:Syllable entries I know of is to collect hanja readings. The problems with using them more widely are that a) Modern hangul isn't nearly as phonetic as one would like, so pronunciation info is likely to be at least as misleading as not, and b) most such isolated syllables are not used in any lexical or attributive way, and thus seem to violate WT:CFI. (one solution might be to redirect such syllables to line items in an appendix, where romanization and codepoint data could be given, but that would require getting over our collective redirectophobia somewhat.) -- Visviva 04:04, 29 June 2007 (UTC)
On the topic of Asian languages: is Hanzi official? Would you like it to be? It's only used in the Sinitic language(s), but considering how many entries we have in those language(s), it's very widely used, and it is used under more than one L2 header. — Beobach972 19:39, 28 June 2007 (UTC)

This is perhaps more of a question than an addition... I've seen the headings "Prepositional article" and "Contraction" used for modern Romance language words such as au and del which are a contraction of a preposition and a form of the definite article. Is there a proper header to use for these terms or ought one be added to the list? Medellia 15:47, 27 June 2007 (UTC)

Contraction is already a standard header, though strangely not for English words like can't. I would expect that Determiner could also be applied to words with au and del. --EncycloPetey 19:07, 27 June 2007 (UTC)
Yes, words such as au, al and del are usually referred to as contractions. —Stephen 17:42, 28 June 2007 (UTC)
  • Trying to wade through this very lengthy section, I'm left wondering if our previous system of making exceptions on a language-by-language basis is really in need of repair. Each additional heading should just be covered under other language "about" pages. Allowing all alternate headings for all non-English languages is just begging for errors. --Connel MacKenzie 09:00, 4 July 2007 (UTC)
I don’t agree with that. I’ve seen a lot of alternate headings and so far I haven’t noticed any errors among them (except for ordinary typos such as typing "Verb" when it should say "Noun"). OTOH, when somebody goes back to reduce and "Anglicize" them, it produces a lot of errors and results in loss of important information. —Stephen 18:14, 4 July 2007 (UTC)

Addition of headers to ELE - more


Another important one for Russian is "Verb prefix". Russian and other Slavic languages have a fixed number of special verb prefixes, and these are very different from regular prefixes as we have in English. Russian has precisely 23 of them and they are equivalent to the so-called separable verb prefixes of German (ein-stellen, dar-stellen, auf-stellen, etc.), except the Russian verb prefixes are not separable. The Russian verb prefix is at the very heart of the language. There is nothing in English that compares with them. Besides the special "Verb prefixes", Russian also has regular prefixes for nouns, adjectives and adverbs just as we have in English. Also, Russian has a form called "Predicate adjective". Predicate adjectives look like adverbs, are indeclinable and do not vary for gender. Predicate adjectives can never be used attributively and their adverb-like appearance is the cause for a lot of the grammatical mistakes Russians tend to make when they try to speak English (they may say, for example, that something should be done as usually...it should be usual, but the Russian calls for обычно, a predicate adjective which looks just like an adverb). If we can’t label обычно as a predicate adjective, then it would be better to label it ===Other=== and then Predicate adjective. Otherwise, it will have to be left as an adverb and people will have to try to piece it together for themselves. —Stephen 19:35, 4 July 2007 (UTC)
The Dutch, Germans and Hungarians treat their "verb prefix" as a Particle in the grammars I've looked at. Thus, there isn't a need for an additional header as the specifics can be handled in examples and Usage notes. Yes? The problem with calling these things "prefixes" is that they're separable; sometimes they are attached to the front of a verb form, but other times they are placed after the verb as a separate unit.
As for the Adverb/Predicate adjective, the Slovene grammars I have (three) consistently just refer to this as an Adverb, since it is indeclinable like an adverb and has the ending of an adverb. They simply reagrd a word in that position as a verb modifier rather than a noun or adjective. In English, we would say "I speak Slovene" and call Slovene a noun; the Slovenes would say "Govorim slovensko" (I speak slovene-ly) and call it an adverb. Why not simply follow their lead and call it an Adverb? See how I set up the page for slovensko for an example of how I chose to deal with this difficulty. --EncycloPetey 20:01, 4 July 2007 (UTC)
The Russian verb prefix is equivalent to German’s separable verb prefixes, but not identical. Russian verb prefixes are not separable, and they impart not only the adverbial senses that the German prefixes give, but also define aspect and tense for Russian verbs. Иду means "I’m going", and войду means "I will enter" (i.e., not only the sense of going into, but also the future tense, and also the perfective aspect). Russian has words that are called particles, but verb prefixes are not particles. It can’t be done well if mixed in with common prefixes for substantives along with usage notes. The verb prefixes need are called verb prefixes and need to be labeled verb prefixes. All other prefixes are just prefixes.
Some grammars that are written by someone who is weak in either English or Russian (or just weak in grammar) will very likely call predicate adjectives adverbs, which is exactly why so many Russians make this particular error when they try to speak English. Predicate adjectives are not adverbs and are not translated by adverbs. Это интересно, while it looks on the face of it like "This is interestingly", is not "interestingly" at all...Это интересно means this is interesting (a predicate adjective, not an adverb).
As for "Govorim slovensko", that IS an adverb. In Russian, that’s "говорю по-словенски", and словенски is the adverb of словенский. —Stephen 20:36, 4 July 2007 (UTC)
OK, so these are genuinely different situations. My concern then shifts to ask (1) Can "verb prefix" be handled by the current Prefix with a usage note specifying they're attached to verbs and modify the meaning? (2) Is this usage akin to the meaning-altering prefixes found in Indonesian (if you know)?
My concern with Predicate adjective is that it could spread into English, where we do have some exapmples, but for which we would probably rather not call them a different part of speech. However, I can't see a good way to do it offhand basides the new header. Can they be called Adjectives and given a "context" template (say {{ru-predicate}}; the name is immaterial) that would mark them, provide a link to an explanatory Appendix and category them separately? --EncycloPetey 20:58, 4 July 2007 (UTC)
I don’t see why these 23 entries should be complicated by mixing with ordinary substantive prefixes, but if they are, I would not like to see any substantive prefixes added...but it is unavoidable. So if everything went under Prefix, a constant effort would be required to keep the verb prefixes clearly separate. Right now, all 23 Russian verb prefixes have entries (e.g., при-, под-, на-, по-, в-, вы-, с-, у-, из-). If substantive prefixes are added directly, it will mean a lot of work trying to separate water from oil, and even more work keeping them separate in the future.
Russian prefixes give lots of meaning. Besides tense and aspect, some common examples that demonstrate meaning are идти (to go), прийти (to come), уйти (to leave), найти (to come upon, to find), подойти (to approach), сойти (to go down from), выйти (to go out, to exit), войти (to go in, to enter), изойти (to go from), дойти (to go until, to reach), зайти (to drop by). (Actually, entire books are devoted to this subject, especially with the verbs of motion such as go, ride, fly, etc.)
Indonesian prefixes are different. Indonesian verbal prefixes can be divided into verb-forming (stative, dynamic, accidental dynamic, transitive), noun-forming (deverbal, etc.), and inflectional (agent-orientation, object-orientation, imperative, etc.). So Russian verb prefixes are much easier for an American to understand, and Indonesian prefixes are relatively alien.
Predicate adjectives can’t be called adjectives without a lot of explanation on every page, because they do not look like adjectives, do no work like adjectives, and have restricted but very common usage. If we don’t call them predicate adjectives, I think it’s better to just ignore that part of speech and mark them only as adverbs. If someone is interested enough or confused enough, they will find explanations somewhere.
These are only a few of the headings that don’t match English. There are numerous other ones. As I said before, if there is some reason why we should not label things correctly, then we should use a dummy header like ===Other=== and put the real label after it without any = signs. That would take care of all of the problem headers. —Stephen 22:49, 4 July 2007 (UTC)
  • Thank you for reiterating my point so clearly. Hiding such linguistic features in a heading leaves an English reader with absolutely no understanding of what you just described. While I consider it to be very bad to mislead our readers with ambiguous headings, it is worse when particular nuances (like you mentioned) are absent. Unifying headings (as I often do) to the most reasonable heading is just that - regrouping terms so bots and English readers alike, can make sense of the entries that are mislabeled. For example, the ===Prefix=== heading sounds (from what you say here) like the appropriate heading. Additional information can be given on the inflection line, on the definition line, or in a generic shared ===Usage notes=== section. But using ambiguous, misleading headings is Not Good. For the second example you gave, yes, ===Adverb=== is a much more useful grouping; clearly more explanation is needed either on the inflection line or in the definition line, or in a ===Usage notes=== section. Very few people here understand Russian - how on Earth are regular English readers (such as myself) supposed to even know there is a finer distinction? Creating a catch-all "===Other===" heading would only worsen the problem.
  • Again, giving blanket exemptions of heading for all languages is my larger gripe. Each language that has a certain feature can and should have its own "Wiktionary:About..." page. This is particularly important for the German/Russian distinction you hinted at above!
  • --Connel MacKenzie 20:07, 4 July 2007 (UTC)
Also, Arabic can have terms that appear to be individual words, but which actually are phrases. This is due to the fact that words that have only a single consonant (for example the conjunctions Template:ARchar and, Template:ARchar then, and the prepositions Template:ARchar for, Template:ARchar in} cannot stand alone the way one-letter English words can. In Arabic, they must always be prefixed to the following word (usually a noun, pronoun, or verb). If you prefix Template:ARchar to Template:ARchar, you get Template:ARchar ("for God", "to God"). The proper term for Template:ARchar is prepositional phrase, but this would not work for Template:ARchar (by God). I can find labels to describe these things, of course (such as "Prepositional phrase"), but Autoformat does not like any choice I make. One thing is certain: they cannot be called simply "Preposition" or "Conjunction" or "Noun". Most of the millions of possibilities do not even need an entry here, but there are some common, set terms such as Template:ARchar, Template:ARchar and Template:ARchar that merit inclusion. Template:ARchar in particular causes confusion for students of the language. —Stephen 20:12, 4 July 2007 (UTC)
But what good is it, to just confuse bots and English readers with nonstandard headings, instead of putting it under ===Phrase=== then explaining it on the inflection line? --Connel MacKenzie 20:20, 4 July 2007 (UTC)
That’s what I said at the start. This whole problem seems to be about the bots. We are dumbing the articles down until they are worthless just to make it easy to write bots that make sure they are appropriately dumbed down. That’s why I proposed that I could label all of these thing ===Other===, which the bots could be instructed to avoid (I assume), and then the correct label could be added beneath the dummy header in a large bold type, but without any === that would confuse the bots. Or possibly a code could be added at the start of these headers that would tell the bot to ignore it. If we are going to sacrifice scholarship for bot-ease, we might as well just call every entry ===Word===, ===Symbol===, or ===Phrase===. Making it easy for bots is making it hard to have good articles. —Stephen 20:49, 4 July 2007 (UTC)
Okay, so instruct the bots to ignore POS in languages until the respective layout page (Wiktionary:About language) is mature enough to allow them. Right now English may be the only one for which that's true, or close to being true. DAVilla 21:34, 4 July 2007 (UTC)
Isn't a prepositional phrase functionally an adverb? DAVilla 21:34, 4 July 2007 (UTC)
In some cases, yes. In some cases it's a functional adjective (in English too). Cpmpare:
*We walked in the road. (Prepositional phrase functioning as adverb: where we walked)
*We took the left fork in the road. (Prepositional phrase functioning as adjective: which fork)
But in inflected languages it might not behave like either. This is one reason the CGEL treats phrases differently from the parts of speech. --EncycloPetey 21:42, 4 July 2007 (UTC)
Obviously, I should not have conflated the bot activities; my hope was that you would finally see the perversity of what you are doing. The is the English Wiktionary; its purpose is to provide definitions of all words in all languages to English readers. Yes, we are trying to fit definitions into a mold that an English reader can comprehend. There is absolutely no way you can convince me that Joe-average-user can comprehend what you are trying to convey when you enter "substantive prefix" or "predicate adjective" as a heading. BUT THAT is truly conflating the issue. Giving carte blanche to random headings means that they will never be explained anywhere, on a "Wiktionary:About ..." page. An "Other" heading is just a more severe level of the same shortcut-for-copying-from-other-source's sake. (Note: I am not suggesting copyvio, nor any suspicion of copyvio, but I am suggesting that the assumption that our formats should be bent to other resources' formats, is fundamentally flawed!) I am talking about "explaining things comprehensibly"; you are in turn calling that "dumbing down." But why? It is harder to explain these things clearly. On the other hand, it is "dumb" to simply copy other formats that don't fit here, because others use those conventions. --Connel MacKenzie 02:05, 5 July 2007 (UTC)
Um... I assume you are responding to Stephen above, yes? --EncycloPetey 02:14, 5 July 2007 (UTC)
This discussion seems to be producing a number of headers that are widely used or are widely agreed upon, and an equal number of debated Russian-language headers. Would it be better, perhaps, to put the Russian headers into Wiktionary:About Russian and an associated vote, as we postponed Pattern to WT:AH, and only vote on the other headers now? We ought to vote on, eg, Particle and Root, which seem uncontentious, but we could postpone Verb prefix et al. Alternatively, since this is a 'line-item' / 'itemised' vote, we could just vote on them all... — Beobach972 03:27, 5 July 2007 (UTC)
To clarify, I think it's good that this is producing discussion on the Russian headers. — Beobach972 03:46, 5 July 2007 (UTC)
The discussion of Russian headings is a frumious bandersnatch. It is, for the most part, irrelevant to the fact that none of the proposed headings should be given carte blanche for every language except English. Each language (or language group) that has specific concerns needs a "Wiktionary:About..." page. Bypassing the mechanism will only come back to bite you. --Connel MacKenzie 06:32, 5 July 2007 (UTC)
Well, to some degree, yes. However, it is value to raise the issue generally in case the same headers actually will apply across a range of languages. In that case, it is good to discuss and standardize them across languages, if possible, as the discussion for Participle did. --EncycloPetey 07:46, 5 July 2007 (UTC)
That is a different thing - coordinating the "About" pages amongst themselves is a good thing, but that clearly is not the intent here; that one example is only a fortunate by-product of the discussion. But the discussion, as a whole, is seeking a flawed outcome. --Connel MacKenzie 16:49, 5 July 2007 (UTC)
I've removed the Russian headers from the vote, Connel MacKenzie has a very good point that those perhaps should (and easily can) be handled with an About Russian page vote (and other About language votes). Based on the response, I've tried to re-focus the vote to deal only with those headers that are or could be used by many languages. — Beobach972 00:47, 11 July 2007 (UTC)
I do not agree with Connel. I agree that this Wiktionary is meant for English speaking users. However, you seem to forget that there are also people who speak English and that know what ‘Verb prefix’ means, and in what sense it differs from a normal prefix (to name just one example, you can substitute any of the above suggested POSs). Isn’t it most likely that someone who looks for при- (or при-) is someone who is at least interested in the Russian language, probably even learning it? Then the chance is big he will have heard about the difficulties with Russian verbs of movement etc.
Maybe we have to get rid of the taboo to wikify POS-headers and just accept ===Predicative adjective===, provided the appropriate page explains what that is, hopefully having a link to the Wikipedia article explaining more in-depth what it is. There are some border cases that might be handled better like {{transitive}}, though. H. (talk) 15:23, 31 July 2007 (UTC)
Yes, I think Connel is assuming that most people who look up, for example, Russian words are not studying the language and are not interested in grammar or linguistics. My sense is that any such users won’t care at all about parts of speech or what headers say and are only interested in general meaning and perhaps general pronunciation...but that more serious users are those who are studying the language, and the terms that I have used are the ones used when studying that language. I think most languages that are not closely related to English require headers that don’t apply in English. Hungarian has its coverbs, Ojibwe has prenouns, and so on. I think anyone who doesn’t know what a coverb is is not studying Hungarian and won’t care about part of speech it or how headers read. But I know that if I, as a student of the language, ever picked up a book about a language such as Russian that did not call things what they are, but instead referred to Russian gerunds as nouns, I would throw it in the trash. Complaints about using the correct terms such as adverbial participle have only come from people who do not know anything about the language and who aren’t interested in that sort of thing. I don’t think we’ll ever hear such complaints from anyone who is actually interested in the language or in linguistics in general. —Stephen 12:14, 7 August 2007 (UTC)

Addition of headers to ELE - more (2)

I see that the process of reducing non-English parts of speech to nonsense is continuing apace, as this shows. Hungarian ki- is not an adverb. Russian adverbial participles are not adverbs. After all of the discussion above, none of the suggestions by linguists who know what these terms are have made any difference, and editors who do not know them are replacing them with incorrect English. I still think that if we are going to toss out the correct designations to make it easier to write bots, then we should at least replace them with something neutral and meaningless such as Other or Word or Phoneme, rather than changing them to something that is incorrect. —Stephen 16:07, 21 August 2007 (UTC)

Can we (as you say) distinguish between a 'bot tagging a header that is not known/documented (within the wikt!) (E.g. Coverb) and a human munging it to something incorrect? Please?
Certainly. A bot will mark coverb as an unconventional header, then an editor can change coverb to something neutral such as phoneme, or to something wrong such as adverb, or the editor could choose to ignore it until the matter is resolved, as I trust it eventually will be. It shouldn’t be too difficult at a later time to go back and reconstruct the correct terminology for all headers ===Phoneme===, but headers incorrectly changed to ===Adverb=== will be much harder to catch and repair. —Stephen 16:45, 21 August 2007 (UTC)
I'm all for adding more POS headers whenever needed, as long as they are documented somewhere and are part of a finite list of such headers, not an un-bounded set. If a needed header is not recognized, it should be added to WT:POS in the "Other headers in use" section (which as it says, doesn't require a vote or whatever), and added to User:AutoFormat/Headers so it isn't flagged. (That is just telling AF not to worry about it; but also listing it as something to be sorted later.) Other bots reading the database need to have a bounded set to deal with, but additions should not be difficult. Robert Ullmann 16:20, 21 August 2007 (UTC)
Stephen: in re-reading the above, I realize that you are getting frustrated with AF when there is no reason: go to User:AutoFormat/Headers, edit, add a new section for Russian POS if you like, and add them in. AF is supposed to be catching and flagging errors (which includes highlighting things that need some attention), not making your life difficult! The header list isn't wired into the code. (it is protected sysop/sysop, but that isn't an issue here) Robert Ullmann 16:28, 21 August 2007 (UTC)
No, I’m not frustrated with AF, but with the mangled "corrections" done as a result of AF flags by editors who do not know the language and who do not understand the term that they are changing. I will try adding some of the headers to User:AutoFormat/Headers...maybe this will finally solve the problem. —Stephen 16:45, 21 August 2007 (UTC)
Certainly AF has nothing to do with my cleanup lists generated completely independent of AF. Yes, for a very long time I have been working against Stephen's efforts to de-standardize things. Yes, my cleanup lists have been around much longer than AF. Random non-word classifications have absolutely no reason to be mentioned here on en.wikt. I have tried in every way I know of to solicit help with normalizing efforts. Of course, I do not know Hungarian, so when a nonsense heading is used, I, indeed, make my best guess at what might have been intended. Insisting on more and more completely nonstandard headings serves no purpose whatsoever. All it does is create useless random characters that mean nothing to anyone but the person that first typed them in. If instead, the approach is taken that entries should match existing formats and conventions, the "problems" wouldn't crop up. If people knowledgeable about Hungarian helped with entries that have obviously incorrect headings like "Coverb" the "problem" wouldn't crop up. If stubborn insistence (on breaking formatting wherever possible) wasn't encouraged, we wouldn't have copycat vandalism like the "Circumfix" nonsense above. Remember? The start of this whole thread?
Think, just for a moment, about what you are doing, when you use a ridiculous heading. #1) English readers (anyone with less than PhD level linguistic skills in a particular foreign language) cannot comprehend what is intended for an entry. #2) No derivative use is possible for the entry (it can't be classified as, well, anything.) #3) It confounds cleanup efforts by distracting from what should be addressed - i.e. newcomers initial edits that got lost in the shuffle. #4) Generates ill-will all around (such as your comments above, directed at me.)
If anything, we need a small fraction of the headings that AF currently allows for. One by one, we need to eliminate the need for the "questionable" headings by matching our formatting conventions, one entry at a time. Yes, during that phase, the applicable equivalents for something like "Coverb" can be used, and supplemented with usage notes as needed. But leave something ridiculous like "Coverb" as a "part of speech"? Please.
--Connel MacKenzie 21:46, 21 August 2007 (UTC)
"Stephen's efforts to de-standardize things"? I don’t make any effort to destandardize anything. I’m all for standardization as long as it is done in a manner that favors and preserves accuracy and correct information, and does not introduce errors or cause the loss of important data. As for people knowledgeable in a language doing the clean-up, that is my point. There are many, many Russian pages flagged for such clean-up and, although I know the language and know what terms are used by anyone studying the language (not PhD’s, but anyone), I am helpless to do the clean-up because I can’t find adequate "standard" solutions. The clean-up can only be done by someone who does not know the language or understand the terms, and who therefore can turn the current correct terminology into nonsense without a second thought. Our standardization system cannot swallow the proper terms used in the study of a language, yet has no problem at all with replace them with words that mean something completely different.
The problem as I see it is not that terms such as Predicate adjective or Coverb or Adverbial participle are used, but that a system is in place that does not allow the correct terminology, yet happily accepts complete errors. Russian teachers may choose among three terms to use for the adverbial participle, namely adverbial participle, gerund, or deyeprichastiye. Deyeprichastiye is the Russian word for it, and gerund causes confusion because it is completely unlike a gerund in the English sense. Adverbial participle sounds a little more technical than gerund, but it is more accurate and not ambiguous; and unlike deyeprichastiye, adverbial participle is actually English. I experimented here a few years ago using the word gerund for the Russian form, but it resulted in a lot of confusion. The problem is that the current standardization insists on using a word that does not describe an advebial participle, such as noun, adverb, or verb. I think standardization is fine, but it should be subservient to accuracy and correctness, rather than the other way around. As things are, standardization is everything and accuracy counts for nothing.
Stephen, you know that isn't true. Any time I make the slightest misstep you jump all over me; meanwhile we have, what, about 10,200 or so, entries with completely unrecognizable headings? Yes, out of politeness to you, I often stop what I'm doing to discuss it. Then find later that the problem has still grown. --Connel MacKenzie 02:55, 23 August 2007 (UTC)
Since I am strong in linguistics but weak in computer programming and macros, I could only guess that there was some monstrous problem posed by using correct terms such as Adverbial participle...now, it appears that the only thing needed to make Adverbial participle a "standard" term is entering it just once on a filter page that bots look at to see what is standard. So if it is so easy to have the correct term, then why must we accept having adverbial participle changed to an English POS that does not fit?
This has nothing at all to do with programming. At this point, I have only the most general notion of what Robert is doing with AF. When you update his lists, it doesn't affect me, nor anything I clear out. It is about the standards the this Wiktionary community has agreed on - that's why you had this same issue with others. --Connel MacKenzie 02:55, 23 August 2007 (UTC)
If you study a language such as Russian, Hungarian, or Arabic, even at the high-school level, these are the terms that you will use. It has nothing to do with PhD’s. People who don’t know what an adverbial participle is are not studying Russian, and these people probably do not care what it means and would not care if you called it an infinitive. They will only be interested in the general meaning, and possible the pronunciation. But for those who are studying the language, they know and expect the correct terms. So the correct terminology doesn’t hurt anybody, and it helps those who are really interested.
You assert there is no English equivalent POS? What heading is the translated English word under? Is it so hard to enter something as a ===Noun=== with a "{{adverbial participle}}" qualifier at the start of the definition or after the inflection? Is it formulaic? Are all Russian 'adverbial participles' always translated to English as noun+verb+adverb or some similar pattern? What is wrong with using a heading that English readers might understand? --Connel MacKenzie 02:55, 23 August 2007 (UTC)

when you use a ridiculous heading. #1) English readers (anyone with less than PhD level linguistic skills in a particular foreign language) cannot comprehend what is intended for an entry. #2) No derivative use is possible for the entry (it can't be classified as, well, anything.) #3) It confounds cleanup efforts by distracting from what should be addressed - i.e. newcomers initial edits that got lost in the shuffle. #4) Generates ill-will all around (such as your comments above, directed at me.)

As for "ridiculous headings": (1) most people studying the language are not PhD’s, but they can certainly comprehend the right terminology used in that language; (2) I have no idea what you mean about derivative use or classifying as, well, anything; (3) I have no idea what you mean about confounding cleanup efforts by distracting, etc. If we can use the correct terminology, I can do the cleanup all by myself. And (4), my comments were NOT directed at you, they were directed at the problem. How could I possibly discuss this problem any more gingerly?
re: #4) Well, how can I enlist your help and more gingerly? I'd like to work on a solution here, not (re: #3) argue till my fingers fall off.
re: #2) That stuff is why we are here, isn't it? Provide FREE content under the GFDL license, right? Things like www.ninjawords.com can only have nightmares when encountering en.wikt's Russian entries.
--Connel MacKenzie 02:55, 23 August 2007 (UTC)
And when I studied languages that were closely related to English, such as German and Spanish, these problems did not appear. It’s only when dealing with more exotic languages, languages not so closely related, that you encounter some terms that are not so usual for English. Every Arabic student is familiar with elatives of adjectives and with the jussive and energetic moods of the verb. If you changed elatives to relatives, you would make it sound more familiar to those of us who don’t care anyway, but it would be utter nonsense to anyone who is actually interested in Arabic. —Stephen 05:07, 22 August 2007 (UTC)
We don't generally enter "moods" here anyway, for English entries. But again, qualifiers after the POS heading's inflection would be a much better way to be both accurate for such students, yet comprehensible and familiar to the rest of us non-students. --Connel MacKenzie 02:55, 23 August 2007 (UTC)
I guess the question is, do we want ki-#Hungarian to have any use at all, or not? If we want it to be useful, then obviously it needs to have something resembling linguistic accuracy so that people who know a bit of Hungarian can make use of it. If we just want it to be a feather in our cap — "see, we even have an entry in this crazy language!", then of course it's more important to be consistent and use terms that feather-loving people can look at and feel like they understand. —RuakhTALK 22:05, 21 August 2007 (UTC)
I agree; entries with even the slightest typo in any section heading should be deleted. --Connel MacKenzie 02:55, 23 August 2007 (UTC)
Look: you intentionally edited an entry to replace accurate information with useless, inaccurate information. If a new editor did something like this, I'd leave him a warning on his talk-page — unless you beat me to the punch by simply blocking him. I'm sorry if my previous comment was an "unrealistic snipe", as you call it, but you can't just vandalize an entry, make it less useful, and then get angry when other editors get annoyed. —RuakhTALK 03:52, 23 August 2007 (UTC)
I replaced useless accurate information in a useless entry, to one that was inaccurate, yet with the sudden ability to be described as some kind entry with a definition for a valid word. I most certainly can get upset, when it is a rehashing (with a bit of a harsher flavor now,) of a very old topic. --Connel MacKenzie 04:37, 23 August 2007 (UTC)
(after edit conflict) Actually, I think that what we need to do in cases like this is use the correct English terminology used when talking about the foreign language. For example, when discussing the German language in English we discuss "nouns" and "verbs" not "Substantive" "Zeitwörter". For aspects of the language, or words, that do not exist in English we should use the same practice - i.e. when discussing the Latin words we talk about "declension" not "declinatio". If someone doesn't know what "verb" or "declension" mean then they can look the words up in a dictionary. This means that if "coverb" is the correct English term to describe this aspect of the Hungarian language (and the Wiktionary entry and reference Wikipedia entry strongly suggest to me it is) then we should use it, if you don't think it is the correct term (or indeed believe it is not even an English word) then request verification. If you, or any other editor or reader does not know the word "coverb" you and they can look it up in the dictionary - I have just done it and it was very easy - and I've even gone to the trouble of providing you with a link so you don't have to go to the huge bother of typing it into the search box).
I agree that we need a finite set of headers, and, if I read his arguments correctly, so does he. Neither of us wants to allow "nonstandard" or "random character" headings. What we want is to use correct headers - coverbs, from what I read, are not adverbs, just as in English pronouns are not nouns. Hungarian is not English, Chinese is not Russian, English is not Latin - why should we impose the structure and terminology on another language if it doesn't fit? If our editors and readers are accidentally educated in this process then good, is it not the goal of the Wikimedia Foundation to develop educational content? Thryduulf 22:33, 21 August 2007 (UTC)
There is no point in using an ever-growing set of headings to describe words that we already have appropriate headings for. If you wish to note that something is a {{coverb}} in Hungarian, why not do so on the "inflection line" or on a definition line? If the translated word behaves as two or more English parts of speech, it is only effective to list those parts of speech, with a "coverb" indication following on the "inflection line." --Connel MacKenzie 02:55, 23 August 2007 (UTC)
Well put. And to be even more helpful, we can linkify such headers, as we already do with "acronym" and "initialism". —RuakhTALK 23:10, 21 August 2007 (UTC)


http://slashdot.org/article.pl?sid=07/06/22/010255&from=rss --Connel MacKenzie 15:43, 22 June 2007 (UTC)

Here, here! --Williamsayers79 16:14, 24 June 2007 (UTC)


During a working session last night I made some changes to the topical hierarchy. EncycloPetey disagreed with some of what I did (see discussion on my talk page). There is one particular change that I made that he reverted and that I feel strongly should be discussed here. I find it much too simplistic to simply put Government as a sub-category of Politics. Therefore I had added Category:Organizations and Category:Society as parent categories.

Status for the sister projects is that Commons has the lone Category:Politics as parent and Wikipedia has "Category:Social institutions", "Category:Politics" and "Category:Determinants of health" as parent categories.

Can we reach a consensus here on what is a reasoned choice of parent categories for Category:Government? __meco 11:23, 27 June 2007 (UTC)

There was also no Category:Organizations until you created it. It is an empty category, listed under Category:Society. So, listing Category:Government under both would be redundant, at least. A government is not an "organization" in the usual sense, nor is it a "society" (businesses, corporations, charities, etc.). A google search for the exact phrase "government organization" returns 1.2 million hits, with 475 on Wikipedia alone. If a government is a form of organization, then "government organization" would be redundant. Rather, there are organizations within a government that are termed "government organizations".
Part of the reason I asked you to stop and discuss what you were doing is that you were creating lots of empty categories for the sole purpose of cross-linking to wikipedia and commons, even thought he categories were empty and duplicated topical areas already in existence. Please do not radically alter the entire category structure for English categories. We have a system laid out that is parallel across all languages that we include here. Changing the English category srtucture means that all the others (Chinese, Spanish, French, Russian, Arabic, etc.) will not match. Also, some of your hierarchy decisions were odd. Technology is not a subcategory of Human. Our categories were set up similarly to the way in which library categorization is done. A category should only be nested within another if it is truly a subcategory. So Entertainment > Games > Board games > Chess, which makes logical sense. Your arrangement of Apes > Humans > Technology is nonsensical. And we have no need for a Category:Apes if we have a Category:Primates. There simply will never be a large enough collection of terms to populate it and make it worthwhile to have for a dictionary. This is a direct result of the differences between what Wiktionary and Wikipedia are trying to do. On Wikipedia, the category Apes would include all the various articles they haves which pertain to apes, inlcuding articles on each species, genus, and family of apes as well as articles about fictional apes like King Kong, famous apes like Koko, and so forth. Here on Wiktionary, the category will contain only words pertaining to apes. In most cases, this will result in a list of scientific names and general group names. The scientific names are not considered to be English, so they would reside in a separate category. Thus, only the common names of species and groups would exist in the category, and there just aren't very many ape species in existense, so there won't be many words to include.
Please do not try to force our category structure to match those of other projects. This is a dictionary site; Wikipedia is an encyclopedia; Commons is an image file repository. Each site is using its own category structure to categorize the kinds of information it provides. Because the goals of the other projects are different, their category structures will differ from ours. Wikipedia will never have a category for Category:Latin adverbs, but we do because we need one. Wiktionary will never have a category for w:Category:Computer olympiads because it would serve no purpose. Likewise, there is no need for Category:Academia and Category:Academic disciplines, as you created. Please, do not impose a category structure here simply because another project is using that structure. That's not useful or desirable because it imposes foreign structure into a system for which that structure was not designed. That's not to say that our category structure is ideal, but changes should be made in order to better serve the needs we have for our own category structure, not simply for the purpose of artificially mimicking someone else's structure.
In this particular case, I would put Category:Government under Political science, as shown below. On the left is our current structure, on the right is my proposal:
current structure proposed revision
Category:Social sciences

Category:State 2

1 Category:Anthropology is currently listed under Category:Sciences. I would move it here, and provide links between "Sciences" and "Social sciences" at the top of each category page.
2 "state" can mean the government or a nation or a geographic region. Thus, Category:State duplicates Category:Government on the one hand, and is ambiguous with Category:Countries on the other.
--EncycloPetey 19:06, 27 June 2007 (UTC)
As I attempted to garner your acknowledgment of in the previously referenced discussion thread on my talk page, I have observed in myself, as well as in others, that attempts to initiate specific or general discussions relating to the structure of a topical hierarchy (which here at Wiktionary is merely an added feature, whereas it's the backbone of other projects such as Wikipedia and to a lesser extent Commons) seem to be met with a marked inert sluggishness and innate hesitancy or reluctance. Often these attempts simply run out in the sand for some reason. I attempted a hypothesis as to why this may be the case on my talk page. Getting into this discussion here now gets doubly difficult for me as it appears that my attempt vis-à-vis you failed. At least you didn't acknowledge this perspective.
I don't want this to come across as censure, I just want to present my suggestion that when it comes to the categorizing and re-categorizing of topical categories, the most viable and sustainable route to chose is to give individual contributors a great amount of slack, and discuss each would-be problem on a case by case basis. I believe that simultaneously grasping all the issues pertinent to reaching sound conclusions in these matters is a task that demands an extraordinary availability of mental energy. I will attempt to illustrate what is involved by paraphrasing Robert S. Corrington in his psychobiography of Wilhelm Reich (I am quoting from a talk page entry at Talk:Wilhelm Reich at Wikipedia):
He had an almost unparallelled ability to synthesize knowledge from vastly diverging fields "simultaneously maintaining several seemingly incompatible conceptual horizons in one expanding categorial and phenomenological space, while also making continual reconstruction and reconfigurations that correspond to an expanding phenomenal data field."[1]. Corrington asserts that while Freud at best could work out one or two categorial horizons simultaneously, "Reich [...] could hold a number of horizons in his mind while reshaping each one under the creative pressure of the others, [...]producing a rich skein from the game strategies of (1) transformed psychoanalysis, (2) cultural anthropology, (3) economics, (4) bioenergetics, (5) psychopathology, (6) sociology, and (7) ethics."
I believe that the task at hand for us is somewhat analogous to the phenomenology which Corrington describes, and that the broad discussion you wish to take place may be unfeasible in the manner you are inviting. I am far from certain that I am right in this hypothesis though. __meco 09:46, 28 June 2007 (UTC)
I'm really not able to decipher this conversation at all. Just thought I'd mention... :-) -- Visviva 04:37, 29 June 2007 (UTC)
Much appreciated input. __meco 09:26, 29 June 2007 (UTC)
There are two things I don't understand about your proposal: firstly, what definition of "social science" is being used (since I would define the term in a way that included economics and political science), and secondly, why your proposed scheme (like the existing one) seems to have two different kinds of categories, interspersed seemingly at random: categories like Category:Political science, which contains political science terms (as in, terms that are themselves part of political science, terms that are used by political scientists), and categories like Category:Government, which contains terms about government (as in, terms that are used by people talking about government). For consistent naming, we'd have either something like Category:Political science and Category:The Study of Government (both named after the studies), or something like Category:Government and Politics and Category:Government (both named after the things studied). —RuakhTALK 07:06, 29 June 2007 (UTC)
That said, insofar as I do understand these schemes, your proposal seems much, much better than the status quo. —RuakhTALK 07:09, 29 June 2007 (UTC)
With regard to the names of the categories, they come from a bit of a tradeoff. First, there are existing categories to be dealt with. Changing category names means editing all the links from everywhere they occur, which enormously applifies the work involved. Second, our {{context}} template likes category names to match the context phrase that is displayed at the head of a definition line. So, "political science" and "government" work well with this template in a way that "The Study of Government" and "Government and Politics" don't. Third, I've tried to consider what short (ideally one-word) category names would be most readily recognizeable for what they're likely to contain. The current category names seem to do that rather well, so I don't see a pressing need to fix it (If it ain't broke...). Part of the result is a mix of names for fields of study and names of things studied. But this is a much wider problem than just for the social sciences, and it applies both on Wikipedia and here (I spent several months helping to sort stub articles on Wikipedia, so I've seen the category problems they have to deal with!)
Now, why then aren't Economics and Political science under Social Sciences? That comes from looking at the subcategories. Would someone looking for Category:Law (which is under Political science) think to look in Category:Social sciences? Possibly, but probably not. Would someone looking for Category:Money think to look in Category:Social sciences? Not likely. The result is that I've moved Economics and Political science up one level. One of the tenets of good categorization is that people get lost if the hierarchy has too many nested levels. A second corollary is that a category shouldn't be "hidden" in an odd place. All the category names above it should lead obviously to the subcategories to ease frustration for those looking for information. The result is a bit of a compromise. --EncycloPetey 07:27, 29 June 2007 (UTC)
That all makes sense, and thanks for explaining. :-)   I still think it's confusing to have Category:Social sciences not contain Category:Economics and Category:Political science, though. Is it possible to have it contain them redundantly (i.e., so they're in both Category:Social sciences and Category:Society)? Or, better yet, can we get rid of Category:Social sciences and put Category:Anthropology, Category:Psychology, and Category:Sociology just directly in Category:Society? —RuakhTALK 15:55, 29 June 2007 (UTC)
Redundant categorization is possible; I just prefered to avoid suggesting that at the outset. I also would not want to see the Category:Social sciences collapsed into Category:Society for three reasons. First, it would lead to an overload of Category:Society. Maybe not immediately, but not long in the future; The category Society has the potential to contain numerous subcategories pertaining to many aspcts of society. Second, doing so would eliminate a potentially useful category for those terms whose use is too broad to apply to just one discipline in the social sciences. Third, it would prevent us from separating those more general categories and terms (that most users will be looking to find) from those categories that will be populated by highly technical jargon used by specialists. I agree that its not an ideal scheme, but no practical scheme ever is. It all goes back to tradeoffs.
In the case of Category:Economics, the apparent oddity of placement comes from the choice of category name. I think that Category:Economy would make a good general name for what is included. Were we to use such a name, then it would be clear why the category is under Society rather than Social sciences and likely no one would balk. However, as I noted above, our context template prefers a category name that describes the discipline rather than the subject so that the discipline will appear at the head of the definition line. So, in light of that, Category:Economics is the better name for our purposes. The result is that it looks like that category should be a subcategory of Category:Social sciences. However, that is just where the name Economics would make more sense to appear; its content works better under Category:Society. It's another tradeoff. --EncycloPetey 19:27, 29 June 2007 (UTC)
There's nothing wrong with being redundant! Cyclic categories are a little weird, but even they are possible, and anyways the proposal is not cyclic (since Economics would not contain Social Sciences or Society; that would just be wrong). Economics is a part of society. Economics is a social science. There's nothing strange about that! DAVilla 17:08, 29 June 2007 (UTC)
I agree with DAVilla; my (unpopular) two cents is that Wikipedia's avoidance of "circular" categories is inappropriate and pointless. There is nothing inherently wrong with redundant categorization. There is, however, the benefit of being able to find terms one is looking for, without delving through numerous subcategories. --Connel MacKenzie 17:34, 30 June 2007 (UTC)
To be clear, I'm not in support of circular categories because I feel that the software that provides category listing could some day be much stronger, with the ability to add or exclude terms in subcategories to various depths, and that kind of capability will require a tree structure. However, this is entirely off-topic as the point is that this type of redundant categorization would not even be cyclic. DAVilla 11:35, 1 July 2007 (UTC)

It sounds as though objections have been answered, so I'll proceed with restructuring Category:Society (a fitting way to celebrate American Independence Day). This leaves open the possibility of redundantly listing categories afterwards, such as duplicate listings under Social sciences as Ruakh suggested. --EncycloPetey 00:48, 5 July 2007 (UTC)

Original Research

Just wondering why there's no policy I can find here regarding original research. I know that the most likely response is probably that no one's gotten around to writing the policy for it. I tend to work a lot on Simple English Wiktionary (where I'm an admin and we can use all the help we can get), and we have one user there that has novel uses/meanings for what a conjunction and a preposition are. Since we tend to follow the lead of y'all over here on many of these things, I wish there were a policy page here that clearly stated that Wiktionary doesn't accept original research any more than Wikipedia does. At least then I'd have something to point to besides Wikipedia policy. I couldn't even find anything relevant on Meta. --Cromwellt|Talk|Contribs 04:12, 29 June 2007 (UTC)

Sadly, many of our standard practices are known only from bits of scattered discussion over the past three years. But yes, in principle we follow the same practice regarding original research as Wikipedia. There just doesn't seem to be a policy page for it yet. --EncycloPetey 04:18, 29 June 2007 (UTC)
I know I've seen respected users here saying that Wiktionary doesn't have NOR for good reasons ... and really the whole RfV process would violate the Wikipedia version. I agree that it is needed for certain areas (etymology, grammatical classifications, etc.), but the exact scope of its application will need some thought. -- Visviva 04:35, 29 June 2007 (UTC)
It's not that NOR is incorrect here, it just isn't all that applicable. Avoiding copyvios such as the potential to reproduce various word lists in entirety is much more pressing. We've mentioned the original research proviso with regard to parts of speech, pronunciations, transliterations etc. but depending on which authority you cite there are linguists who favor almost any approach. It really all comes down to just making a decision in the community that will undoubtedly reflect the general practices on the outside, aside from multilingual issues perhaps. For POS that means no circumfixes at least in English; for the IPA of English we're using /r/ which seems to be under periodically unceasing debate, and for transliterations we tend to pick one dominant one and definitely don't invent anything ourselves. RFV is barely original research by the way. It's much more closely related to the references on Wikipedia that help verify notability. There's a lot of work that goes into finding citations, but there isn't a lot of creativity in the verbatim result. DAVilla 05:33, 29 June 2007 (UTC)
Which is probably another reason no one has tackled writing it yet.  ;) --EncycloPetey 04:43, 29 June 2007 (UTC)
I think part of the problem is that our structure requires us to make decisions; we have to choose a single part of speech to use to represent a word (or a given use of a word), where a Wikipedia-esque dictionary might devote an entire section to the debate over what part of speech the word is. (For English, this is a problem with many grammatical words — is my a pronoun, an adjective, a determiner, or a determinative? — and all participles — is seen a (non-finite) verb, a (verbal) adjective, or just a participle? — and many prepositional phrases — is on time both an adjective and an adverb, or just an idiom, or just a prepositional phrase?; for other languages, it can be a problem for many more words.) To some extent this is more of a restriction on NPOV (we have to choose one primary viewpoint, and relegate other viewpoints to usage notes or appendices or the trash bin), but it's also a restriction on NOR, in that we have to decide for ourselves what seems most logical. (We could adopt NOR and decide to take the most widely held viewpoint, but then we have to decide whether that means the most widely held viewpoint among linguists, or the most widely held viewpoint among lexicographers, or what the balance is.) —RuakhTALK 05:08, 29 June 2007 (UTC)
The reason no one has succeeded at adapting NOR here, is that the primary goal of this dictionary is to provide definitions. To use fair-use (quoting a single sentence to show how a word is used) leaves a hole for the definition itself; we can't copy definitions from anything except Webster's 1913 and the Century Dictionary. Most often, doing so leaves unusable definitions. So the primary function we provide, actually is that of a secondary source (O.R. as per 'pedia.) For everything else, we do try to provide references from other secondary sources. So, putting it into a policy is tricky, at best. Status quo, has been to honor w:WP:NOR for everything except the definitions themselves (as they are "verified" by the quotations showing their use.) --Connel MacKenzie 07:28, 29 June 2007 (UTC)

That special milestone

At our current rate of growth, by my calculations, we may reach half-a-million (500,000) entries sometime within the next six weeks. Besides posting a little blurb at Meta, does anyone have suggestions as to how to mark this important milestone? --EncycloPetey 02:25, 1 July 2007 (UTC)

Collective punishment? Mass deletions?
Oh wait, did you mean celebrate? ;-) DAVilla 23:26, 3 July 2007 (UTC)
Hahaha, mass deletions... we're reaching half a million?! What?!?! Noo, we gotta delete some stuff so we can whittle that back down!!
*sigh*... Thank you, you've made me laugh on an otherwise depressing day. — Beobach972 03:11, 4 July 2007 (UTC)
Hmmm. I suppose I could do a practice deletion run now that we are close to 450,000.  :-)   Alas, most of the stuff I've deleted recently doesn't count towards that statistic, anyhow. --Connel MacKenzie 05:42, 6 July 2007 (UTC)
We could unblock & re-sysop Wonderfool, I suppose, so we could keep the number of entries 'below 1/2 mil.  :-)   --Connel MacKenzie 21:17, 15 August 2007 (UTC)
But really - it would be nice to get rid of Wiktionary:Requested articles:English before that milestone. There are less than 2,000 words to go, and many of them are relatively simple definitions. If we all did a few each day (and I cut down on Italian verbs) we could see the end of it.SemperBlotto 21:30, 15 August 2007 (UTC)

  1. ^ Corrington, Robert S., Wilhelm Reich: Psychoanalyst and Radical Naturalist, Farrar, Straus and Giroux, NY, 2003, p. 98
Last modified on 14 April 2014, at 03:07