Last modified on 29 January 2014, at 22:46

Wiktionary:Beer parlour/2008/December

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives +/-
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014

We're almost past fr!

As of when I'm writing this, the English Wiktionary is 300 words away from the size of the French one. Soon we'll be in the #1 spot! -Oreo Priest talk 17:46, 20 August 2008 (UTC)

But they'll be back from vacation soon. DCDuring TALK 18:25, 20 August 2008 (UTC)
We've been slowly gaining ground on them for a while now; we just happen to have made a sudden gain as a result of high activity here and reduced activity there. There were ahead of us once before, and we then surpassed them for a long time. "This has all happened before, and it will all happen again." --EncycloPetey 18:53, 20 August 2008 (UTC)

Update: we now trail by 72 entries. --EncycloPetey 23:03, 20 August 2008 (UTC)

Neck and neck now!. Congrats. Everyone. Conrad.Irwin 23:46, 20 August 2008 (UTC)
...and now we're nearly 500 ahead of them. --EncycloPetey 18:01, 21 August 2008 (UTC)
But acording to the homepage of Wiktionary.org as of the present moment, they should have gained the upper hand with 3000 words ahead. When exactly did the French Wiktionary surpass anew the English one? (I presume between 21. August and 28 Aug, right?) Bogorm 07:48, 28 August 2008 (UTC)
It's out of date, see Special:Statistics and. The number we compare against is "pages qui sont probablement de véritables articles" and "pages that are probably legitimate content pages.". It's not really important at all, but some competitiveness is inherent in human nature, and it's better to get rid of it on an "external" foe than to bicker among ourselves. :) Conrad.Irwin 08:06, 28 August 2008 (UTC)

Now the French have taken the lead again by about 350 1000. --EncycloPetey 15:35, 31 August 2008 (UTC)

Well, but now I see on wiktionary.org the French Wiktionary behind the English. And you thought that they would take up expanding after their holidays are over. Your prognosis has obviously not been corroborated by these developments... What is the reason? Bogorm 07:06, 6 September 2008 (UTC)
On wiktionary.org, they're definitely ahead now. Teh Rote 23:30, 18 September 2008 (UTC)
Yes! We're ahead again! Teh Rote 14:56, 6 October 2008 (UTC)
 ??? www.wiktionary.org shows 919 000 for English W. and 922 000 for French, how did you conclude, that they had been surpassed? On what evidence? Bogorm 16:07, 6 October 2008 (UTC)
Recent changes count. The one on Wiktionary.org isn't updated too often. Teh Rote 14:27, 14 October 2008 (UTC)

Very interesting, I'm sure, but how goes the race between Lilliputian and Blefuscuese? - Pingku 15:20, 13 December 2008 (UTC)


Past participle variants

Since there is a great divergence between entries of past participle variants here in Wiktionary, I ask: for languages where a past participle can be inflected, how should we handle them? Here are some examples:

  • Asturian escribir: escribíu m, escribida f, escribío n, escribíos m pl, n pl, escribíes f pl
  • Catalan escriure: escrit m, escrita f, escrits m pl, escritas f pl
  • French écrire: écrit m, écrite f, écrits m pl, écrites f pl
  • Italian scrivere: scritto m, scritta f, scritti m pl, scritte f pl
  • Portuguese escrever: escrito m, escrita f, escritos m pl, escritas f pl
  • Spanish escribir: escrito m, escrita f, escritos m pl, escritas f pl

If these words exist and are recognized as verb forms (i.e., not as adjectival derivations or other interpretations), I propose that they should be labeled correctly, linked to the lemma form of the verb and included in conjugation tables. Daniel. 05:31, 11 November 2008 (UTC)

Some of how we handle these may have to vary by language. I am not sure what you are asking about: Format of a complete entry, just the inflection line, how such participles appear within the lemma page, how to set up conjugation tables for the lemma, or something else. Even in the examples you've provided, you haven't presented the whole picture: The Spanish past participle escrito is not used in that form in all regions that speak Spanish; Argentina and Uruguay prefer the spelling escripto instead. It also doesn't address the issue of what interlinking (if any) ought to exist between participles of different tenses. Nor have you expressed an opinion about what part of speech you consider these words for the specific languages listed.
In Latin, participles function as a separate part of speech, with characteristics of both adjective and verb. There is also no "past participle" in Latin. There is a "perfect passive participle", so called because Latin has both active and passive senses for most verbs and because Latin has more than one "past" tense. An example of how these are being handled in Latin exists at the entry for amatus. Some of this page's content and formatting will not apply in other languages, but some of it may be helpful.
However, as I said at the outset, I'm not sure we can decide what to do for all languages, or even if a uniform setup is possible for just the Romance languages listed above. We don't even have that kind of consistency with certain groups of words in English. --EncycloPetey 03:54, 12 November 2008 (UTC)
Some of these examples were took directly from Wiktionary itself, both here at the English version and from other languages; these were commonly referred as "Adjectives" (derived from participles) or "Verbs" (conjugated verb forms) without a clear distinction; most without even appearing at conjugation tables. They caught my attention. To be honest, I don't have sufficient knowledge to decide how should we handle a word in Asturian or Italian, but they seem to have similar rules of conjugation - that is, the part of speech of all participles should be "verb form" and they should link to the infinitive form of the verb; that's why I am asking for other opinions. One language I know with details is Portuguese, that I speak fluently, and the answer to my own question is: Yes! In Portuguese language, there are such variants. Here is an example: A garota foi embelezada por mim. (The girl was embellished by me.) This is clearly an use of the female singular past participle of verb embelezar, that cannot be substituted by an adjective: A garota foi "bonita" por mim. (The girl was "beautiful" by me.) makes no sense. Daniel. 16:49, 23 November 2008 (UTC)
I fully agree with you. And it's not specific to Romance languages, it's exactly the same in English (compare He prefers sugared coffee and the coffee he had sugared). Lmaltier 20:23, 29 December 2008 (UTC)

Category:Lithuanian male given names and co.

These given name categories must all be renamed. Names are not male or female, they are masculine or feminine. --EncycloPetey 18:53, 21 November 2008 (UTC)

Yikes, between Category:Female given names by language and Category:Male given names by language there's ~100 subcategories written that way. Interestingly, b.g.c. has similar usage statistics between the two forms (when searching w/o "given" as well) while the web clearly prefers the "male/female" usage (~5:1). IANAG (I am not a grammarian) but could common usage be employing "male/female" attributively here? --Bequw¢τ 23:33, 23 November 2008 (UTC)
No, it's not used attributively. If the anmes themselves actually were male and female, they'd be getting together to produce baby names. --EncycloPetey 19:55, 25 November 2008 (UTC)
This seems to me like a perfectly ordinary use of male#Noun as a noun adjunct: "male given name" = "given name of a male". I can see where "masculine" might be somewhat better, but then again given the various additional meanings of masculine/feminine it could be misleading. I don't see a problem with Makaokalani's system. -- Visviva 13:56, 26 November 2008 (UTC)
You'd have to check all the 8000+ given name entries and move hundreds of names into the new categories.
"Masculine/Feminine given names" are correct for languages with exactly two genders, m and f. For all other languages, it would be "Given names borne by persons of the male/female sex". English names cannot have a grammatical gender. And what about "Cecil is such a feminine name for a man"? "Male/Female given names" sounds like a reasonable compromise, short and easy to understand.
The old names of given name categories (fr:Male given names) sounded fine to me, but Robert Ullmann insisted they must be changed into POS ( French male given names). There is a rule about it, so I created nearly 200 new categories and now I'm busy cleaning up the old categories and adding templates to hundreds of names. Very boring and quite unproductive work. I refuse to start all over because of a male/masculine controversy. The only thing that really matters is the content of the entries. --Makaokalani 14:07, 24 November 2008 (UTC)
Male/female is not a "compromise"; it is incorrect. Modern English usage places emphasis on the biological traits for "male/female". The terms "masculine/feminine" are applied to gender roles. Yes, it would mean changing all the entries, but they're already been changed before, haven't they? And they're supposed to be done through templates, which makes the task much easier. It is also incorrect to say the English names "cannot have a grammatical gender". Modern English does have vestiges of gender as it existed in Old English, and the separate third-person pronouns for he/she and him/her are plain testament to that fact. These words evoke a connotation of gender, and most English given names do exactly the same.
The fact that you decided to proceed with a major change to category structire on the basis of a private conversation, without seeking input from the community is your responsibility. If you really believed that "the only thing that really matters is the content of the entries", then you wouldn't have started changing the category names in the first place, would you? --EncycloPetey 19:53, 25 November 2008 (UTC)
"Private conversation?" I discussed this in Wiktionary:Requests for deletion/Others#Category:Armenian names and Wiktionary:Beer parlour#New subcategories for English given names. I take responsibility for the change from topic to POS categories, but not for using the male/female words just like they had been used before. Nobody had ever mentioned that it could be a problem. All the entries haven't been changed yet, and several hundred names don't have templates in the new system either, in categories "--- diminutives of male/female given names", "--- male/female given name parts". If you promise to change them all, that's fine with me.
What about having a vote, if you feel so strongly about this?--Makaokalani 13:34, 26 November 2008 (UTC)
I would be willing to make those changes, although I can't promise to do them quickly. I am currently editing (by hand) the inflection line of more than 3000 Spanish verb entries, in part because I'm the one who proposed a unified template for Spanish verbs, and because the variation in the verb inflection patterns and in the entries themsleves necessitate that each one be seen by a person. Sorry that I mised that October discussion. I took a wiki-break during October and did not edit so much that month. It seems I missed that BP discussion, or I'd have commented. --EncycloPetey 03:03, 28 November 2008 (UTC)
Makaokalani is entirely correct, the names are names for males and females. This makes sense for all languages. "Masculine" and "feminine" only make sense for languages where nouns (and proper nouns) have grammatical gender; for other languages "masculine" and "feminine" are utter nonsense. "Robert" is not a "masculine" name in English, it is a male name. Full stop. "Male" and "female" we can and should use for all languages, just as it is. Robert Ullmann 15:18, 1 December 2008 (UTC)
That's not true. "Masculine" and "feminine" don't pertain only to grammatical gender, but also to social gender; a transman (a man who's biologically female) will go by a male/masculine name such as "Robert". But the issue is complicated; fairly few societies have really accepted transgender people, and names are generally assigned long before a transgender person is capable of articulating such … if a transman's birth-name was "Jessica", was that his name because he's biologically female and it's a female name, or was that his name because it's a feminine name and his family assumed he would be a woman? If some people continue to call him "Jessica" after he's transitioned, is that because it's a female name and he's female, or because it's a feminine name and those people are failing to recognize him as a man? Neither male/female nor masculine/feminine seems totally accurate — from a purely descriptivist standpoint, it seems that such names are both male/female and masculine/feminine — and I don't see how you or EP can see this in such black and white. (And that's even ignoring intersex people, genderqueer people, and random people with arguably gender-inappropriate names.) —RuakhTALK 15:33, 1 December 2008 (UTC)

Lithuanian letter

I know this is a bit off-topic, but could someone knowledgeable about it check the Lithuanian word (semens - seed) which I added in Appendix:Proto-Slavic_*sěmę - in my Etymological dictionary it was written with ě, but when I looked up the letters in Lithuanian in the Wikipedia article it turned out that this kind of e is not there, so I wrote simply e. Unfortunately, I understand no whit in this language. Which letter is the correct one? Bogorm 15:48, 1 December 2008 (UTC)

It's probably sė́menys (flaxseed, linseed). You've mixed up acute tone on <ė> for a caron <ˇ> ^_^. LKZ is excellent source for finding out obscure Lith. words, as well as for accentuation paradigm. --Ivan Štambuk 16:34, 3 December 2008 (UTC)
There are also variant forms sė́menės, sė́mens..your source probably referred to the second one. --Ivan Štambuk 16:40, 3 December 2008 (UTC)

Category:English cryptograms: ABBCBB

This is a request for comments on this idea. This can be useful to people trying to solve cryptograms. (I suspect the last space in the category's name should be removed, but that's neither here nor there.)—msh210 20:52, 24 November 2008 (UTC)

I don't see that much point, but there's no reason why not. Conrad.Irwin 21:39, 24 November 2008 (UTC)
Given we have alphagrams, I can't see any reason to object to this. Thryduulf 22:14, 24 November 2008 (UTC)
Yes, there is a reason why not: for this to be useful to someone solving cyptogram puzzles, pretty much all words have to be categorized this way. You want this on every word? Is simple (trivial) to generate a cryptogram index from a word list. Adding the categories here doesn't add any information. (see for example [1] for more information) (and, yes, I don't think alphagrams are useful either, but to the extent that they are, they are useful for individual words; simple substitution cipher pattern lists are only useful with a complete index, and we don't want or need that here.) Robert Ullmann 10:36, 25 November 2008 (UTC)
Agree. These don't need to use the category system or be linked from entries. On the other hand, if someone wants to create and maintain pages in Appendix-space that list these (a la our Rhymes pages), I don't think anyone would have a problem with that. -- Visviva 16:35, 25 November 2008 (UTC)
Appendicize, per above. Appendix:English words with ABBCBB structure. bd2412 T 05:46, 28 November 2008 (UTC)
Yes, my intention was that this category be added to every English entry, or, at least, every English entry with repeated letters (not necessarily consecutively, so tot would be so categorized). Since people are objecting, and no one is enthusiastic about it but me, I'll delete the prototype.—msh210 18:07, 3 December 2008 (UTC)


Redirects for combining forms from other languages?

We seem to have pretty much settled the question of short combining forms such as m'y, s'y, m'en, and m'a in the RfD discussions of those terms. I think we can also agree that the arguments in favor of keeping those do not apply to longer combining forms that are not, for example, likely to be mistaken for similarly spelled English words. Examples brought up in the discussion include m'était and m'arriveront. While the argument for having entries such as these would be pretty weak, I see no reason why we ought not redirect the commonly occurring combining forms to their uncombined verb, i.e., redirect j'était, m'était, t'était, s'était, l'était, d'était, c'était, n'était to était. Is there any particular reason why we ought not do so? bd2412 T 12:50, 29 November 2008 (UTC)

In any case, there is a need to be very careful: all forms you mention exist, except j'était (the right form is j'étais) and d'était (this form simply does not exist). :Possible issues about such redirects are:
  • there would imply the creation of a very large number of redirects: for many (or most) French verbs beginning with a vowel, there would be a need for m than 100 or 200 redirects (for each verb); for most French nouns and adjectives beginning with a vowel, there would be a need for 2 redirects; for many English nouns, there would be a need for 1 redirect (e.g. mother's), etc. Is this a real issue? I don't know.
  • the same form might exist in different languages,
  • the same form might imply different redirects, even for a given language (I have no example, but this might happen).
This is an interesting question, because words including a ' are very numerous in some languages (e.g. Breton). Even in French, there are a few examples which are proper words , such as presqu'île (a combining form, but an actual word), périph' (abbreviated word) or chem'not (a regional word).

Therefore, there is a need to define a policy for helping readers when they find such a word. A proposal might be to add a rule in some 'Help' page:

For searching something such as m'y, proceed in several steps:
  • if you feel this might be a word, first try to search m'y; if you don't find it, try to search m (or m'?) and y.
  • if you feel this might be a combination a several words, first try to search m (or m'?) and y; if you don't find them, try to search m'y.
Lmaltier 13:56, 29 November 2008 (UTC)
Oops, j'étais) and d'était came up as false positives when I was googling for combined forms. In any event, it may be 100 or 200 redirects counted by a particular lemma, but it would be no more than a dozen for any individual article (since each conjugation gets its own entry here). The system can easily handle that. I'm not advocating redirecting 's forms for English words. Presumably, people who use an English dictionary will know what the 's signifies (if they don't know enough English to know that, they might be better served with a dictionary in another language). But an English speaker might not know that "n'était" is a combined form, as English doesn't have forms that combine from the front (unless you want to count 'tis type words, for which we have entries). bd2412 T 15:21, 29 November 2008 (UTC)
But this affects many other languages that do something similar, but worse. In Galician, as in Spanish, some pronouns may be suffixed to the verb, but (unlike Spanish) certain articles appearing after infinitive verb forms with an enclitic pronoun undergo a spelling change and are attached to the previous verb, which also undegsoes a spelling change. I have no idea how to add such combined forms as entries. There is no apostrophe or other mark indicating it is a contraction. There is a severe spelling change, so that one cannot look up the components unless you already know it's a combined form (and this can be very hard to recognize. Worse, the resulting combination after contraction is not a word. It's a scribal form representing what happens in speech when a noun's article comes immediately after a verb. It just happens that in Galician, the pronunciation change is reflected in writing. It would be like "eatwethe" or "walktheythe" if English did such things. --EncycloPetey 16:28, 29 November 2008 (UTC)
Indeed. And Spanish, though not as bad, does add accent marks in many such cases (dándole, dármelo, etc.), and contracts -os + nos into -onos. And a different kind of problem: in Hebrew, all one-letter words are written solid with whatever word comes after them, which can produce a long chain of many one-letter words followed by a single two-letter word — how to handle something like וְשֶׁכְּשֶׁמֵּהַפֶּה (v'-she-k'she-mei-ha-pe, and that when from the mouth)? We might not find one solution that works for all languages; for French (where apostrophes always mark the boundaries) and Hebrew (where there are infinitely many possibilities, bounded only by the extremes of syntax, and an infinite subset of them are attested), I advocate simply not having such entries, and perhaps adding fancy JavaScript to redlink pages to help users find what they're looking for. We're a dictionary, and while we've stretched the bounds of that in some respects, there are limits. For Spanish and Italian, maybe we want to develop an "only in"-type non-entry that has a link to each component word, plus a link to an appendix? (Ideally, this non-entry would be able to co-occur with a real entry for an actual word that happens to be identical.) And for Inuktitut, I advocate abandoning all hope. :-P   —RuakhTALK 17:42, 29 November 2008 (UTC)
For the contracted forms that I am immediately concerned about, this is not a problem. bd2412 T 17:55, 29 November 2008 (UTC)
For the “contracted forms” that you are immediately concerned about, there is no problem; we don't need entries for them. They're all sum-of-parts, with parts that are instantly discernible to anyone with any knowledge of French. —RuakhTALK 06:00, 30 November 2008 (UTC)
The problem is that there are plenty of people out there who speak English (and would use an English dictionary) but have no knowledge of French. They may still come across a French phrase in a book or magazine and turn to us in bewilderment. If they are unaware of the significance of an m' or j', they might not know to strip these when looking up the verb, and may be unable to find our entry on the word. bd2412 T 06:27, 30 November 2008 (UTC)
Yes, but they might also have no knowledge of French syntax, and thus how to decipher subject and object. Do you also advocate including entries for every single French clause (I suppose doing so could boost our entry count considerably :P)? Simply put, we cannot have entries for everything someone might want to look up. There comes a point where someone who wants to understand a language must learn some things about that language. We are a dictionary, not a translator engine. -Atelaes λάλει ἐμοί 07:11, 30 November 2008 (UTC)
I am proposing redirects (not entries) for words formed by the contraction of two terms with an apostrophe (i.e. no spaces). The problem is that someone unfamiliar with French may see "m'appelle", look it up, and find nothing here even though we have separate entries on m' and appelle. Clauses are different because the reader can at least find the individual words in the clause. bd2412 T 07:31, 30 November 2008 (UTC)
I think the best solution is to include common compounds of this sort in the respective entries, so that appeler or whatever has an example that includes "m'appelle." This would make the relevant entry show up in a search. -- Visviva 02:37, 3 December 2008 (UTC)
That would have us coming up with 5-6 example sentences for each French verb that starts with a vowel. Redirects would just be easier on the brain. Maybe we could have the combined terms on the page under ====Derived terms====. bd2412 T 06:24, 3 December 2008 (UTC)
Generalizing from this language-specific instance I really like Lmaltier's idea about a Help: page that lists the bounds of our dictionary (because we do/will have bounds). It could be divided by language (or with subpages) and would include information like the outcome of this debate and the vote to disallow English form with 's. Maybe including it at Help:Searching (which I just realized existed)? We could like to it from the page users get when their searches fail. --Bequw¢τ 07:53, 30 November 2008 (UTC)
Not sure what level to indent this on. As I've mentioned before w.r.t. the Hebrew forms Ruakh mentions above, I think these deserve entries, or at least redirects, so that people can find them who don't know where to split the word(s), as bd2412 argues above.—msh210 19:01, 1 December 2008 (UTC)
Wouldn't that lead to an infinite number of such entries? I'm not a technical guy, but that seems problematic. Last I checked, our application for an infinite amount of server space was still "under consideration". ;-) -- Visviva 02:37, 3 December 2008 (UTC)
I agree both with Msh210 and Visviva: I think these entries should be limited to attested forms (with quotes) and created only by contributors feeling these entries are useful (not by bots). I would propose the same rule for numbers and all such infinite lists. But this is a different discussion. Lmaltier 17:09, 3 December 2008 (UTC)
Would it help if I narrow my proposal to words spelled in the Roman alphabet? An English speaker confronted with a completely different alphabet is not likely to confuse those words for a comparable English word, or believe that the combined form is a single word of the type commonly found in a dictionary. bd2412 T 06:27, 3 December 2008 (UTC)
  • In Sanskrit there exists mandatory and highly-elaborate sandhi at word boundaries which joins words in chunks such as "mahāsmṛtidharastattvaścatuḥsmṛtisamādhirāṭ", and which can often be very difficult to decode esp. by beginners who are too lazy to memorize sandhi grid completely, and if you don't know many of the words in the text you're studying (i.e. you have to guess where one word ends and the other one begins to look it up in the dictionary). Obviously, in cases such as this, and in the cases of polysynthetic/highly-agglutinative languages (someone mentioned Inuktitut), it would be pointless to resolve this mechanically by redirects or entries whose number would exceed that of the corresponding lemmas by several orders of magnitude. The best way to solve this IMHO would be to generate all (relevant/common, to the extent that the generation of this sort is feasible) possible outputs of this sort by means of well-defined templates and include them in the corresponding entries. Take a look at the possessive declension of Hungarian nouns at entries such as szerv. Technically "my organ" in whatever language would be SoP not meriting a namespace entry, but it nevertheless shows up in search results and it's "there" for interested users to look it up. --Ivan Štambuk 06:08, 3 December 2008 (UTC)

KangXi Zidian is gone

The Kangxi Zidian that used to be at www.kangxizidian.com and is currently linked to from 21,000 pages as a source seems to have disappeared. Is there an alternative avaliable, or do we need to remove every link? -- Prince Kassad 19:29, 29 November 2008 (UTC)

Hopefully, the links were made using a template so that we can fix all the links by simply updating the template. --EncycloPetey 19:38, 29 November 2008 (UTC)
That would be {{Han ref}}. Nadando 19:52, 29 November 2008 (UTC)
Of course, modifying only the template (and not the individual entries) would leave each entry with some junk syntax. Now I understand where junk DNA comes from. bd2412 T 21:33, 29 November 2008 (UTC)
No, won't leave any junk in the entries. Just a matter of unlinking the broken site. The template parameter is the dictionary key used in the Han Unification process, and the template breaks it down to page/character/present or not. It will still do that, just not link the image. I'll fix it. It would be very good if soemone could figure out another source (or whether the images are still available from that site, but moved) Robert Ullmann 14:38, 6 December 2008 (UTC)
The images are still there : look here. I think the naming conventions are the same, maybe just a temporary shutdown. Koxinga 19:36, 13 December 2008 (UTC)
They used to be gone for two months at least. Now that they're back, the link can probably be re-inserted. -- Prince Kassad 20:07, 13 December 2008 (UTC)

Prominent interwiki links.

français is in the French Wiktionary

A while back, Conrad.Irwin created a preference (see Wiktionary:Preferences) to "[t]rial the javascript prominent interwiki links." What it does is, when you visit an entry for a foreign word that has an interwiki link to its language's Wiktionary (for example, when you visit français, since fr:français exists), it adds a little link under the L2 header. I've been using it for a while now, and I've neither experienced nor heard mention of any problems with it. I think it's a great feature, and I'd like to make it standard. (Technical details: I'd copy User:Conrad.Irwin/iwiki.js to MediaWiki:prominent interwikis.js, modify MediaWiki:Common.js to import it, and remove it from the preferences list at User:Connel MacKenzie/custom.js.) Does anyone object to my doing so? (I'll wait a few days to give people a chance to try it out a bit and make sure no one has any concerns or objections.) Thanks! —RuakhTALK 18:26, 1 December 2008 (UTC)

I just turned it on and refreshed a few times, but I don't see the link (in Safari/Mac).
I'm not convinced that a link to the entry in the word's native language deserves extra prominence at all, and I don't think it should be added to the body of the English-language entry. It is not guaranteed to have a better-quality entry than the English or any other Wiktionary, and, a priori, the majority of such links will be useless to the majority of readers. Michael Z. 2008-12-01 18:51 z
Re: not seeing the link: That's odd. Do other preferences work for you?
Re: lack of guarantees: Well, there's also no guarantee that a given Wikipedia entry will be terribly helpful, yet we include links to those — indeed, links that IMHO are significantly more prominent than the proposed FL-wikt links. But, a few points:
  • While it's not guaranteed to have a better-quality entry than ours, there are things that we exclude as a matter of policy, such as translations to other foreign languages, which can only be found at the native-language entry.
  • While it's not guaranteed to have a better-quality entry than that of some random Wiktionary, it is almost-guaranteed to be more useful to the typical reader. If I'm looking up a French word on the English Wiktionary, chances are much higher that I know some French than that I know some Spanish, and chances are that I know much more French than Spanish.
Can you clarify your statement that "a priori, the majority of such links will be useless to the majority of readers"? Are you saying that most readers looking up a French word don't know any French?
RuakhTALK 19:31, 1 December 2008 (UTC)
Now WT:PREFS seems broken completely. I checked a couple of items items and hit "Save settings”, but they didn't show up in other pages. So I went back and reloaded the prefs page, and now they don't show up: “Category: Wiktionary pages with shortcuts” appears directly below “Save settings to refresh view.” (This happened once before, and I couldn't fix it, but yesterday I found that it had reset itself sometime.) I have at least 12 cookies from wiktionary.org, and don't know which ones to delete to reset the prefs.
I'm having the same problem with Chrome on Windows Vista (Firefox 3.0.4 works fine though). --Bequw¢τ 10:53, 2 December 2008 (UTC)
It's the WiktPrefs cookie, the one whose value is a long hyphenated string of ones and zeros. —RuakhTALK 15:21, 2 December 2008 (UTC)
Thanks. Refreshed, but it still won't work in my Safari. Michael Z. 2008-12-02 17:25 z
Regarding usefulness: there are 172 Wiktionaries. I can maybe read about 5 of those, and perhaps make use of information on the page in a dozen more. All of this information is already linked from the sidebar.
Now you are proposing duplicating one of these standard links in the body of the English-language page—this will give me ready access to about 150 web sites which I cannot use and don't intend to try. I can't even distinguish characters in Chinese, Devanagari, or Arabic, so I don't need the English-language interface being cluttered with tens of thousands of redundant links to these and many other sites.
It's great that some editors like this feature, and that you can add it to the interface. But the the majority of reader's can't benefit from 172 dictionaries, so there's no point in watering down an already very dense and complex default page design with this. Michael Z. 2008-12-01 21:18 z
Well, how often do you look up words in those scripts? You won't come across these links unless you're looking up these words. And remember that a large part of our target audience got here from Google, doesn't understand the WMF interface, and won't notice the interwiki links no matter how useful they'd find them. (And "tens of thousands"? It's true that there are many affected pages, but any given page will have only a small number of extra links, if any at all. Why are you opposed to this, but not to the "upload file" link that's on every single page of the entire site?) —RuakhTALK 13:10, 2 December 2008 (UTC)
If you're saying that I, or some average reader, rarely looks up non-Roman words, then please show some statistics to support this. I think it's a mistake to make assumptions about how people use Wiktionary.
If you're saying that the interwiki links are a bad interface, then let's get rid of them or improve them, rather than adding more interfaces to the page.
The upload file link is not hanging awkwardly from a top-level heading in the page body. Michael Z. 2008-12-02 17:25 z
Wait, so you can make the assumption that most people looking up a Foo-language word would be unable to derive any use from the Foo Wiktionary, but when I express doubts, you ask me for statistics, and tell me it's a mistake for me to make assumptions?
I'm not sure if the interwiki links are a bad interface; having them is certainly better than not having them. If you have any suggestions for how to improve them, I'm all ears. If your suggestions are such an improvement that these "prominent interwiki links" aren't useful any more, even better.
If you dislike the appearance, we can change it. What do you propose? IIRC, my original suggestion — the one that prompted Conrad to create this preference — was that we link to fr.wikt the same way we link to fr.wiki, using {{projectlink|wikt|lang=fr}} or the like. Would you like that better?
RuakhTALK 18:36, 2 December 2008 (UTC)
Different assumptions. You've implied that any reader looking up, e.g., a French word can make use of a French dictionary—I am skeptical, and I wouldn't build any interfaces based on this assumption unless I had some evidence supporting it. Or I might infer from your comment that English-language readers only look up English words, which I also doubt. On the other hand, I will categorically state that a very large majority of our readers will not want to look at many of the 170-odd other Wiktionaries at all.
Now that I can see it, I don't think it's terrible-looking, although I rather dislike the indentation, which zigs the left margin of the page right at the mainest (not comparable) heading. Also, the vertical space “above the fold” is very precious to us, and I object to wasting even a little of it on foreign-language content.
We do have the sidebar links on the left, but they have the problem of being disconnected from the content, and in this case we want to relate the two. Wikipedia puts a lot of context-relevant content in an implied right column: infoboxes and a majority of images. Since our content is less block-oriented than Wikipedia articles, I think we could make even better use of this, perhaps even with a full-height third column. Right now we only sporadically hand the somewhat awkward Wikipedia and other project link boxes on the right. I think that “see also” links and the TOC belong there too. Michael Z. 2008-12-04 01:46 z
Re: "You've implied that any reader looking up, e.g., a French word can make use of a French dictionary": What gave you that impression? I'm saying that a reader looking up a French word is likely to benefit from its Wiktionnaire entry, not certain to be able to use it. Our entries are filled with links that not everyone will want to click on, but as long as a reader is likely to want to click on a given link, it's worth having.
Re: "I might infer from your comment that English-language readers only look up English words": That's pushing it a bit, but I do think that someone with absolutely no knowledge of language X is much less likely to look up words in it then someone who speaks the language or is studying it. (And if that's not the case — if people who know something prefer not to use us — then that means we're a novelty, not a dictionary, and it doesn't matter what sort of fun and quirky links we have to amuse our readers.)
RuakhTALK 19:43, 4 December 2008 (UTC)
Support I've had it on for awhile now, and I think it a very useful feature. While Mzajac is right that the entry is not guaranteed to be of higher quality, I think that in many cases it is nonetheless (and generally has the advantage of being reviewed by native speakers). The link is subtle, and I wonder if it might be helpful to a number of readers (although certainly not all). -Atelaes λάλει ἐμοί 19:40, 1 December 2008 (UTC)
The link is already in the sidebar. Why not emphasize it with a different list bullet, or with bold link text. From an interface point of view, this would be better than adding a redundant link with different link text, while showing its status in the context of the full list of foreign-language entries. Its function would be analogous to en.Wikipedia's interface for foreign-language featured articles, which many readers may already be familiar with.
In short, add this functionality to the familiar existing interface, instead of adding a second, heterogenous interface for the identical function. Michael Z. 2008-12-01 21:26 z
Someone has written code for the method you propose, Mzajac: w:Template:Link FA, used with w:MediaWiki:Common.js to highlight (for other purposes) specific entries in the language-interwiki list. Maybe we should use similar?—msh210 22:00, 1 December 2008 (UTC)
I prefer Mzajac's suggestion as well. --EncycloPetey 02:37, 2 December 2008 (UTC)
The problem with just highlighting sidebar links is that they aren't often located near the FL entries on the page. Go to the#Swedish and you won't see the sidebar link several screens up. An addition advantage is that Conrad's solution uses the English names (though the sidebar langs can be translated also) which is good on the English Wiktionary. --Bequw¢τ 10:53, 2 December 2008 (UTC)
Yeah, but this feature is meant for readers of foreign languages, no?—it should be in French, etc. If I could only read English, then I wouldn't be interested in it. If I can make any use at all of the linked page, then I should at least be able to work out “français is in the French Wiktionary” in French. Michael Z. 2008-12-02 17:25 z
That's not a bad idea. Currently we only store "fr"->"French" etc., but there's no reason we couldn't store "fr"->"%s est au Wiktionnaire français" or the like. —RuakhTALK 18:36, 2 December 2008 (UTC)

Update: I've now added a sample link to the top of this section, so y'all can see what it looks like. (That's what I see under the ==French== header at français.) As you can see, "prominent" might be too strong a word: it's not exactly a 72pt blinking marquee in bright magenta. —RuakhTALK 15:13, 2 December 2008 (UTC)

Okay, I finally see what it looks like—also found that it works in Firefox, but after refreshing the cookie it still doesn't show up in my Safari/Mac. Do we know if this is broken in Safari and some other browsers, or is it just something wrong on my machine?
Why is the text indented away from the left margin, which every other element on the page aligns with? This makes the language heading look awkward. Michael Z. 2008-12-02 17:25 z
Re: Safari: I don't know, I'll try to figure that out. (But I don't have Safari on this machine, so I can't do that right now.)
Re: alignment: If people prefer a different display, that's easy to change. For that matter, we can also add a class="" that would let users customize its appearance or remove it entirely.
RuakhTALK 18:36, 2 December 2008 (UTC)
Safari has just come out in a new version 3.2.1. When I updated my version a few minutes ago, some of the previous problems were corrected, including the WT:PREFS issue. --EncycloPetey 19:26, 4 December 2008 (UTC)
Well, I'd like to think my edit to MediaWiki:Common.js had something to do with it, but even if not, I'm glad the prefs are working for you now. :-)   —RuakhTALK 19:43, 4 December 2008 (UTC)
That's entirely possible. I did not have the opportunity to try the page between the time you made your edit and the point when I updated my software. --EncycloPetey 19:47, 4 December 2008 (UTC)

Counting the uncountable

I've recently found myself adding countable senses to our entries for leukemia and nitrogen, among others, and I'm wondering if this is the right way to handle these cases. There are an awful lot of these... as someone noted recently, most "uncountable" nouns can be counted in certain contexts. However, these contexts often differ greatly from one case to another. For example, "milks" can refer to either types or containers of milk... "nitrogens", on the other hand, can refer to either atoms or isotopes of nitrogen. ("Nitrogens" may also be used to refer to containers of nitrogen, but this doesn't seem to have much currency in print.)

I wasn't able to get much help from my usual references; the OED, WordNet, Macquarie and Webster's-1913 all ignore countability entirely. The MW3 asserts (tacitly, by providing a plural form) that both "leukemia" and "nitrogen" are countable... the Longman Dictionary of Contemporary English labels both words uncountable. I wasn't able to find any dictionary that tackles the issue head-on.

There seem to be three main approaches we can take:

1. Ignore the distinction entirely, and keep all such words as "uncountable" with an uncountable definition. This is the status quo for most entries.
2. Label the inflection line "countable and uncountable", with a single multi-part definition.
3. Label the inflection line "countable and uncountable", with separate definitions for the countable and uncountable uses.
4. Ignore the distinction entirely, and change all such entries to use the default, countable inflection line

I prefer #3, since it is the most transparent for the user. On the other hand, it is also more work for editors, and may make our entries more complex (and harder to maintain) than necessary. What do y'all think? -- Visviva 02:28, 2 December 2008 (UTC)

I prefer #3 as well, although it is less than ideal for some situations. Many plants have this problem (e.g. corn/corns, wheat/wheats), where the plural applies only when more than one variety is meant. For multiple individuals of a single strain or variety, the noun is uncountable. Relfecting this in an additional definition is awkward. Perhaps for these, we could include the information as a standrad template used in a Usage notes section. --EncycloPetey 02:36, 2 December 2008 (UTC)
A fifth option is to exclude countability from the inflection line and mark only uncountable senses. I would rather that the "-" option was considered as "plural form to be added" instead of treated as uncountable. Option 3 is satisfactory in its completeness, but not in its esthetics, IMHO. If the uncountable tag provided a blue link to a useful article in an Appendix or at WP, that would be wonderful. Any large classes of nouns that had similar characteristics in this regard, were conveniently identifiable, and had mostly regular contributors might merit templates.—This unsigned comment was added by DCDuring (talkcontribs).
Remember that entries will continue to be refined. If we go with no. 3, then will 90% or 99% of all “uncountable” entries end up marked “countable and uncountable?”—if so, then the distinction will lose its meaning.
Aesthetically, I like entries which combine the senses, and say something like sausage (1): “a type of food, or a length of it, or an example of one.” The countability is self-evident from the elements of the definition. But again, I suppose this will eventually get refined and subdivided into two or three senses.
I remain a proponent of the term mass noun, which to me means “not usually counted (but, like every other English noun, sometimes counted).” Michael Z. 2008-12-02 05:31 z
I go with option 5 as proposed by DCDuring. In many cases this would allow us to give the normal definition as uncountable and to describe what might be counted with a countable definition line. Conrad.Irwin 09:26, 2 December 2008 (UTC)
Option 3 does that as well. The problem with option five is that it leaves the plural in the inflection line without noting there (for normally uncountable nouns) that the plural form listed is unusual. --EncycloPetey 10:32, 2 December 2008 (UTC)
Option 5 leaves it the same as normal, we have no way of indicating which definitions are more common. (Not a counter-argument, just an observation) Conrad.Irwin 19:27, 2 December 2008 (UTC)
I'm not a huge fan of "countable and uncountable", because to me it suggests that the word is both at once, when in fact it's one at a time. I'm also not a huge fan of stretching to include rare countable senses like "a specific variety of milk", because you can do that with any uncountable noun, and it seems nonce-y to me. (Note: that may not be true with this specific example; it may well be that milks (varieties of milk) is a common term in some field. Hopefully my point stands whether or not it applies to this case.) How about:
6. Have the inflection line say “milk (usually uncountable; plural milks)”, and don't bother defining the term countably unless there are particularly common or non-obvious countable uses. (In the case of milk, I think "an order of milk" is probably warranted, since IME at McDonald's you ask for "a milk" rather than "milk" even though the latter would work fine.)
 ? —RuakhTALK 13:03, 2 December 2008 (UTC)
This looks like an improvement to me, although a comma will suffice instead of the semicolon. It still has to be relatively simple to enter, and I think we still have to account for the (rare?) cases which are only uncountable. Michael Z. 2008-12-02 17:30 z
I'd be down with a comma. And yes, we need to account for only-uncountable cases, -slash- cases where the plural isn't well-enough attested to merit an entry. —RuakhTALK 19:32, 2 December 2008 (UTC)
6 sounds good. Of course, we can't automatically change {{en-noun|s|-}} to read that, as it currently gives no preference to the uncountable, merely saying the noun appears as both types, so we'd need to go through them by hand.—msh210 18:42, 2 December 2008 (UTC)
Oh, darn, that's a good point. :-/   —RuakhTALK 19:27, 2 December 2008 (UTC)
We could, however, change {{en-noun}} to give this output for {{en-noun|-|s}}, which currently just gives "uncountable". -- Visviva 01:54, 3 December 2008 (UTC)
Yes, but first would need to check how many entries have {{en-noun|-|s}}, placed there by editors who meant {{en-noun|s|-}}, and who therefore didn't mean to give preference to the uncountable. I know I've done that (though I think I've fixed each time I've done so).—msh210 18:02, 3 December 2008 (UTC)
Done, I think. My scan of the latest daily dump turned up 13 entries using {{en-noun|-|s}}, and I have fixed those which seemed to need fixing. See list at User:Visviva/uncountable-countable. If anyone wants to double-check the scan, that would be wonderful; my ignorance of XML is exceeded only by my ignorance of Python. -- Visviva 08:48, 13 December 2008 (UTC)
Option 5 would be my personal preference (because long inflection lines tend to distract too much attention from the definitions below them, an important detail neglected in the general trend to put as much information into inflection lines as could possibly fit in there) but #6 looks like a fair compromise. -- Gauss 20:01, 2 December 2008 (UTC)

Previous and Next - Template?

Is there a template and/or guidelines for "previous" and "next" words in a list? Entries for 1, a, and b, place the "previous" and next" under See also. Entries for 2 and 3 place the "previous" and next" right under the Symbol header. --AZard 20:49, 2 December 2008 (UTC)

I'd prefer it under "See also", unless there is good reason not to (e.g., it is attached to an illustration, or would just look weird at the bottom of the entry). -- Visviva 12:20, 3 December 2008 (UTC)
Agreed. I think placing them on the inflection line is a bad idea (aesthetically and logically). --Bequw¢τ 20:34, 3 December 2008 (UTC)
There are several other instances of this kind of infobox - see helium and terzo for a couple of examples. SemperBlotto 17:09, 3 December 2008 (UTC)
There are infoboxes in use on the signs of the Zodiac (see Virgo, e.g.) and there are infobox templates available for cardinal and ordinal numbers (using {{cardinalbox}} and {{ordinalbox}}). These numerical boxes are already in wide use for Hungarian, Italian, and Latin entries. A similar template could be created for alphabetical symbols, provided that the infobox is used within a language section, and not in a Translingual section. Alphabetical order, even for Roman-letter European languages, is not the same between languages. Polish dictionaries, for example, alphabetize with the sequence "a ą b c ć ...". Although there are no words that I know of begining with a-ogonek (ą), there are Polish words that begin with "ba-" and "bą-", and these are traditionally alphabetized separately in dictionaries with all the "ba-" words coming first. The Serbian alphabet begins with "a b c č ć ..." when written in Roman letters, but begins with "а б (=b) в (=v) г (=g) ..." when written in Cyrliic script. So even the sequence of the first few letters isn't consistent within Europe. --EncycloPetey 20:44, 3 December 2008 (UTC)
i'm starting to appreciate these templates. do you use a tool or manually code? can anyone create a template? --AZard 04:06, 4 December 2008 (UTC)
I code manually. Anyone can create templates, but where a new kind of template is created, it usually goes through some kind of testing and discussion to see whether it works, and whether the community is inclined to use it. Some people create new templates that make other people's lives easier, others create templates that permit desired standardization, but some people create unnecessary or improper templates that are deleted. If you're offering to try your hand at creating an alphabetic navigation box template, then I'd say to go for it. But, I'd also caution you not to paste it into too many articles before getting feedback. When I set up major templates like that, I find that other people's feedback helps to streamline the coding and improve the visual look, as well as weed out coding errors and other problems. The result may mean changes to the template and the way parameters are done, so applying it too widely right away may mean that I have to go back and re-edit every place I'd tested the template. --EncycloPetey 19:02, 4 December 2008 (UTC)

New surname categories

The surname categories are to be changed into parts of speech, like the given names. For most languages it is quite simple: "Category:fr:Surnames" becomes "Category:French surnames". English surnames are a problem. Until now ,"Irish/English/Scottish surnames" have meant "English surnames used in, or typical of, Ireland/England/Scotland". That's the usage in many surname dictionaries, too. But in the Wiktionary, "Category:Irish surnames" must now mean surnames in the Irish (Gaelic) language, such as Ó Murchadha. "Category:English surnames" must mean any surname in the English language, including Murphy, McDonald, Wong, Patel.

I propose to create subcategories for English surnames by language of origin. "English surnames from Irish" will include most surnames typical of Ireland. Many surnames of England may be found "English surnames from Middle English", etc. This is a dictionary, not an atlas. Category:"Scottish surnames", "American surnames", "Jewish surnames", "Islamic surnames" must be abolished because these are not languages. Jewish surnames are a distinctive group, not defined by language of origin, so "English surnames of Jews/of Jewish usage" might be created. There is nothing to stop anyone from making additional categories, e.g. "English surnames of US( =typical of US and rarely found anywhere else)". But they shouldn't replace the language subcategories, and if you only add one surname, it's pointless to make a new category. Usage can be explained in the entry. There is a new Template:surname.

If you have objections, or new ideas, please tell them now. I mean to start creating the categories next week.--Makaokalani 14:36, 3 December 2008 (UTC)

Objection to the "of Jews/of Jewish usage", as it gets into w:who is a Jew and, even discounting that, is pretty much useless, as many, many names have been used by Jews. (My own is an example: I'm a Jew by anyone's standard, I think, but have a surname, Hamm, shared, among Jews, by only my immediate family as far as I'm aware; most people with my surname are of German extraction or African-American.) In other words, I disagree with what you wrote, that "Jewish surnames are a distinctive group". Perhaps you're thinking of "English names from Yiddish"? Yiddish is a language, of course.—msh210 17:59, 3 December 2008 (UTC)
I think we could allow a category for "Jewish" surnames, but not for American, Scottish, etc. This is the one exception I can think of that might actually be useful to our users, since it is not going to be mistaken for a language. There is the question of what to do with "last" names that aren't technically surnames, but I don't think that problem is significant enough to worry about. We can call them surnames, and the few people to whom the difference matters will understand why we've lumped them together. The biggest question in my mind is how to handle surnames used in India, when names may be written in either a native or Roman script. Yes, Patel could be classified as an "English" surname, but that's a bit misleading. I think the proposed solution of regional origin subcategories for English surnames would be a good solution for this. So, "English surnames from Hindi", "English surnames from Tamil", "English surnames from Punjabi", etc. For India, it might be worth also having a region overcategory for "English surnames from languages of India". (with that phrasing to avoid the ambiguity of Indian) --EncycloPetey 18:06, 3 December 2008 (UTC)
It's true that not all Jews have Jewish surnames, and also (incidentally) that not all people with Jewish surnames are Jews, but that doesn't mean there's no such thing as Jewish surnames, just that they form a fuzzy set. Based on my own experience, some probable members include Green, Gold, Cohen, Katz, Levi, Goldwasser, Finkelstein, and Gur. Indeed, to some extent it's possible to distinguish Ashkenaz surnames from Sefardi and Mizrakhi ones. Is it worthwhile? Maybe. There are definitely novels where the reader is supposed to infer a character's Jewishness from such things as his/her surname. I'm not sure to what extent we can help with that, and to what extent we can only hinder. (BTW, if you care, you're not completely alone: google:site:il Hamm does pull up some Jews.) —RuakhTALK 19:05, 3 December 2008 (UTC)
This is a fascinating idea, but I think there will be a lot of incorrect classifications. My own surname (Million) is a fine example, as even I am not completely sure of its origin. Family tradition held that we were French, but extensive searching through genealogical sources has revealed that my family probably came from Ireland in the 12th century with a surname something like O'Mallion. Unfortunately, when you get that far back in the records, you tend to find surnames that mean very little - "son of", "from", etc. People with the same surname may not be related in any way other than having come from the same village, while two close relatives may have unrelated surnames simply because one was born in a different place. But I'm for trying this anyway. It sounds difficult but worthwhile. -- Pinkfud 19:27, 3 December 2008 (UTC)
A category for "English surnames of uncertain origin" (or something similar) is possible. --EncycloPetey 20:23, 3 December 2008 (UTC)
Would this idea also be applied to other peoples with a significant diaspora, or recognizable names? Mennonites in Canada and elsewhere, for example, are associated with a set of surnames of mostly Germanic but also Slavic origin. I'm sure there must be other examples. Michael Z. 2008-12-04 01:15 z
There will be mistakes and inaccuracies. Some surnames have five or six different origins. Many surname entries are stubs without an etymology. If I don't know the origin, I'll put them in "English surnames" and hope that someone wiser will check them. Most surnames of England were formed in the late Middle Ages so "from Middle English" is convenient for anything English sounding, including patronymics: Johnson, Jackson, Jenkins. Only when the given name and surname are identical: Abraham, Alexander, I'll tag them "from Hebrew", "from Ancient Greek".
Of course many surnames are shared by Jews and non-Jews. What's the harm in an additional category "English surnames of Jews" , or "English Jewish surnames", whatever the best wording? "Jewish surnames" is too confusing. Someone will create "Category:fr:Jewish surnames" and we're back where we began. The present system is a mixture of topic and POS categories and a headache to anyone. There are about 1,100 surnames in the categories, but obviously also many surname definitions without a category. Notice that place names will stay in topic categories. London is not defined as "a place name", but as "a city".--Makaokalani 13:30, 5 December 2008 (UTC)

Senses vs. translations

I think we have a fundamental problem with senses vs. translations. I mean that the translations appear in a separate table after all of the individual senses, and if a sense gets added or removed, the translations table remains a separate entity, so it can get badly out of date. (See e.g. abate.) The "obvious" solution would be to attach translations to each individual sense (as we do with citations/quotations); has this been discussed before? Is there any good reason for the separate table of translations? Equinox 23:15, 3 December 2008 (UTC)

Offhand, I can think of two stumbling blocks.
  1. A graceful interface for the reader. Perhaps each of quotations and translations needs a tabbed reveal interface. I'd like to see something more elegant than the current collapsible translation bar.
  2. A usable interface in wikitext, which still outputs reasonably structured HTML. For example, dog has 10 senses, and the first has 180 translations. How does an editor deal with this in the edit box?
These can be overcome, but I don't think we have a solution at hand. Michael Z. 2008-12-04 01:26 z
I don't think this is tenable under our current layout, just because of the sheer volume of text involved. We don't want hundreds of lines between senses, unless we have some way to edit senses individually -- which MediaWiki doesn't and probably never will provide. Also, I think it should be possible for a user to rearrange senses without sorting out all of the translations; otherwise the barriers to constructive change become too high. What we need is a way to tell when the translations (and other sense-dependent sections like Synonyms) have gotten out of sync with the definitions, so that the entry can be tagged for cleanup. As a side note, I would note that {{jump}} would potentially provide a way of doing this, although like most of my creations it needs some serious overhauling before it will ever be ready for production. -- Visviva 03:04, 4 December 2008 (UTC)
This same probelm affects synonyms, antonyms, other -onyms, etc. Yes, this issue has been raised many times, but no satisfactory alternative has even been proposed that woould (1) work with our software limitations, (2) be at all feasible for a newcomer to learn how to edit, (3) not be prone to breaking with every page edit, and (4) be possible to implement without altering every page by hand to the new system. It is a nice idea, but there must be a practical suggestion for actually doing it in order to have anything happen. --EncycloPetey 18:50, 4 December 2008 (UTC)

Summarize Beer parlour discussions in subject pages?

Does it make sense to summarize significant Beer parlour discussions in the associated subject pages? For example, in Nov 2008, there was a discussion on alphagrams. So then, the Anagram page would include a short summary of that discussion. It can go under Wiktionary:Anagrams or Wiktionary talk:Anagrams. This would help newbies like me see what discussions happened in the past. Including a link would be useful, but it seems BP discussions are archived, which might break the link. Summarizing relevant votes might be good too.

Take a look at Wiktionary talk:Anagrams --AZard 17:00, 4 December 2008 (UTC)

Visible differentiation of sister project links

(moved from Wiktionary:Requests_for_deletion#Tennessee_Valley_Authority)

Having given this some more thought, there is a point I'd like to make. One of the marvelous things about hyperlinks is that they're not inherently constrained to any given site. That's why we can surf the world at the click of a mouse. Unfortunately, that same feature is a double-edged sword. There are certain sites where that behavior can be unexpected - and this is one of them! Personally, even though I'm a contributor both here and at en:wp, I find it highly annoying to click a link here and suddenly find myself there. I call that "stealth linking", and to me it's impolite at best. (Keep in mind that the same trick is how some "drive-by" malware gets installed. The viewer clicks a link that actually leads to an installer script).
As a Geologist, I'm often at a certain USGS website. There's a link there that leads to a map repository at another agency. The USGS webmaster seems to agree with me, because that link explicitly warns the viewer that they will be leaving the current site. For sites that claim any form of "authority", that's good practice. Therefore, I would prefer these matters to be settled with a link that leads to a short description and the Wikipedia template, or else by wording like "See (XYZ) at Wikipedia". That way, the viewer knows he is leaving Wiktionary, and understands why. Just my two cents... -- Pinkfud 12:55, 4 December 2008 (UTC)
Well, neither we nor Wikipedia should be claiming any form of authority, certainly not the kind of authority asserted by US government websites. Perhaps a site-wide disclaimer is in order? Wikipedia is a sister project; it's not like we're shunting people off to Urban Dictionary (perish the thought). -- Visviva 13:24, 4 December 2008 (UTC)
Like it or not, the Federal agency that writes my paycheck allows Wikipedia to be used as a source for internal documents - though not (yet) for papers to be published. But what I meant by "authority" was not the official sense, but the idea that a site purports to give factual and (hopefully) accurate information, as opposed to a site whose focus is commercial, personal ramblings, or whatever. I can even think of a couple of "Blogs" that have authoritative status within their circle of readers. In that sense, any site that achieves such recognition is an "authority". -- Pinkfud 13:46, 4 December 2008 (UTC)
Perhaps a solution to the issue Pinkfud's raising is to use links of this style instead of this in definition lines. I'm not certain, though, that the issue needs solution.—msh210 17:02, 4 December 2008 (UTC)
FWIW, our default skin does format links to Wikipedia differently from internal links, in that the former are a specific blue-gray color. However, our latter have at least five different colors (blue, purple, red, mute red, black), so it's not necessarily obvious that this sixth color denotes a quasi-external link. —RuakhTALK 18:38, 4 December 2008 (UTC)
Not to mention "missing-plural green". I don't think that the sister projects deserve to be treated the same way as external links. For non-contributing or infrequent users we have no choice but to accept the existing standard for indicating semi-out-of-site links or to select the one of the most popular standards that seems to best fit the needs of protecting the Wikimedia brand by not disappointing or surprising users. Only if there is no suitable model with broad use should we invent our own. If we are considering inventing our own system, it might pay to consult with WMF to at least not be needlessly inconsistent with any emerging efforts elsewhere in Wikiworld. It might not be a bad idea to color-differentiate links that shift out of the language of the originating page or of the users preferred set of languages and even differentially color links that go back to a users initial session-specific or preferred language. DCDuring TALK 20:08, 4 December 2008 (UTC)
Excellent thoughts, all. I'm glad you can at least see the issue. Perhaps a simple template, called {{out}} for example, and made to accept the "pipe" could be used. Usage would then be {{out|title|wikipedia}} which would result in an automatic link plus "at Wikipedia" (or other sister projects), and would also auto-add such links to a cat or similar page that keeps track of such usages. (Just a thought, I don't know if it could even be done that way). -- Pinkfud 21:29, 4 December 2008 (UTC)
We add a pipe to hide the nature of the link, and then look for new technologies to develop to expose it again. Why not just omit steps two and three, as in w:lipstick on a pig? Everybody knows that w stands for WikipediaMichael Z. 2008-12-05 00:19 z
It is a great weakness of our approach when we assume that admins and regular contributors are the targets for our entries. I would be surprised if most users who have made fewer than ten visits would know that "w:" meant that the following link was to Wikipedia. Yes, they would learn after it happened to them (once, twice, thrice, more ?). Providing new experimenting users with little unexpected surprises is a good way to reduce the likelihood of their becoming regular users, thereby keeping demand on the WMF servers low, I suppose. DCDuring TALK 00:54, 5 December 2008 (UTC)
Yes, DCDuring has another very good point. Also, when you look at the sheer traffic volume generated by Wikipedia as opposed to Wiktionary, it's obvious that far more people know about and use the former. It follows, then, that any user who intentionally came here is looking for a simple definition - "what does this word mean?" - and is not interested in a long-winded encyclopedic treatise on the subject. Transparently sending them to exactly what they were trying to avoid is a disservice, to say the least. Finally, let me point out that many here love to expound upon "what Wiktionary is not". I submit that Wiktionary is also not an index to the other projects! Transparent redirects create exactly that effect - we become simply a way to find articles, a Google of Wiki project content if you will. -- Pinkfud 01:21, 5 December 2008 (UTC)
Sister project links in boxes or under "See also" are important for our users, in part because they liberate us from wasting time duplicating content readily available from those projects. "In-line" links are the issue. They are temptingly useful for proper nouns and for SoP multiword technical terms that cannot be described briefly and intelligibly in a dictionary with a broad audience. DCDuring TALK 01:37, 5 December 2008 (UTC)
Well, I'm not advocating adding more links to Wikipedia.
I'm just suggesting that we mark them by the simplest possible method: making the shortcut w: link transparent to the reader. This is unambiguous, less cryptic for the reader than the “external link” icon (I really did mean that the “w” icon means Wikipedia to a lot of the general public) and also text-based and less disruptive than the icon in both visual browsers and screen readers, easy to enter, and consistent with other Wikimedia projects. It easier to interpret than many abbreviations used in paper dictionaries, and its nature can be tested by the reader immediately, instead of having to leaf through to find the legend at the back (or is that the front?). This is suitable for readers.
The only problem is that the tooltip title text parrots the link text w:lipstick on a pig instead of actually doing something useful, by showing “lipstick on a pig” in Wikipedia (anyone if we can change that?). Michael Z. 2008-12-05 01:48 z
I'd be down with doing it experimentally on several (a score?) high-traffic proper noun entries (possible subjects: porn, christmas, new year, great depression, electoral college, inauguration, TARP). If it doesn't blow up, we could have a vote to standardise on it. DCDuring TALK 02:14, 5 December 2008 (UTC)
That sounds like the thing to do. Willing to help in any way I can! -- Pinkfud 08:24, 5 December 2008 (UTC)
I agree that a practical experiment might be good at this point. I've no strong opinion, and see things I like and dislike about each approach. Let's try out the alternatives and see what happens. --EncycloPetey 09:20, 5 December 2008 (UTC)
I do not think we need to be any more explicit that the different colour the links already have. Adding the prefix visibly, or an icon, just increases the visual clutter of which we have too much. Though I would agree that a more useful tooltip is desirable, I think the effort needed to do that would outweigh the slight benefit. If you want to know where a link goes, hover the mouse on it and look in your status bar where you will (in most environments) see to where you are headed. Both Wikipedia and Wiktionary are owned by the same company, and written by a similar set of people, there is no need to warn people that "Wiktionary is not responsible for the contents of external links" because Wiktionary isn't really even responsible for its own content... If we were to have to do something, I would suggest using a mini-globe in place of the [2] blue icon for external links. But then, for the other sister projects, there is no such iconic logo. (Though all Wiktionaries are going to get the tiles soon, for those who haven't been following the request) Conrad.Irwin 09:58, 5 December 2008 (UTC)
The lag I often experience in seeing hovertext would makes it less than effective for inexperienced or impatient users. The tiles might work for this purpose and would be a worthwhile object of experimentation. DCDuring TALK 12:24, 5 December 2008 (UTC)
I think you're completely missing the point here. Firstly, it has nothing to do with "disclaimers", it has to do with the negation of a user's choice as to what content he wanted to see. I know Wikipedia exists. If I want a full article, that's where I go. But if I only want a simple DICDEF, that's what I expect to get here, and I don't want to get tricked into going to the article I didn't want. Secondly, your comment about the so-called different blue is lost on me. I happen to have some blue-blindness. I can see blue, but unless the 2 shades are side-by-side, I can't tell them apart. Even when they are side-by-side, I only see the external version as slightly grayer than the other. Worse, I see blues at the edge of my visual field as darker than those near the center. So that's all pretty useless to me, and my condition isn't all that uncommon. -- Pinkfud 10:27, 5 December 2008 (UTC)
I hadn't realised that the blues were different in about a year of heavy use here and have never been diagnosed or noticed a color-vision problem. But, I'm just impatient and insensitive or unaware. The idea of using the same hue for all of WMF is a good one, but I doubt that we can rely on relatively subtle differences in color alone. I wonder how w:Americans with Disabilities Act of 1990 applies to an entity like WMF. BTW, most of us on these pages routinely use pipeless links to sister projects, not entirely out of laziness, but to give our fellows a hint at what lies beneath. I appeal to those with more Web experience and discernment than I to recollect any robust-seeming solutions or standards they may have observed that bear on this problem. DCDuring TALK 12:15, 5 December 2008 (UTC)
Not to get off-track, but did you realize that the vast majority of people with minor color vision deficiencies go through life without ever knowing it? We see the world as we always have seen it, so there's no reference point for comparison. Until wifey brings home two color samples from the paint store and asks which shade you like better. Then it's "Uhm, there's a difference?" As for the tool-tips: Yeah, I never wait for it to pop up. And the status bar? That's usable. But again, I'm old and wear trifocals. I seldom bother trying to read that tiny text down there - unless I'm suspicious of the link for some other reason. -- Pinkfud 19:39, 5 December 2008 (UTC)
The status bar is not a universal interface, so it is not a valid excuse to avoid clarifying the nature of links. As far as I know it is not part of the HTML recommendation. It is hidden by default in Safari, probably in a number of mobile browsers, and in the Lynx text-only browser.
The title attribute (usually providing tool-tips) is part of the HTML recommendation, and is also referred to in accessibility guidelines.[3]
I have good colour vision. I may have noticed some time in the last four years that external links have a different colour, but promptly forgot completely, even though I stare at them every day. Michael Z. 2008-12-05 20:06 z

Amending ELE example entry

I have started a vote to amend the page layout example in the ELE, specifically the Pronuncation section in the example, to:

===Pronunciation===
* Phonetic transcriptions
* Audio files in any relevant dialects
* Rhymes
* Homophones
* Hyphenation

Reason for the vote: There is confusion generated by the ELE example as it currently reads, since the Pronunciation section order in the example is taken as a policy recommendation by some editors. True, there is prefatory text about how variations are possible, but this is often overlooked and the example is taken as prescriptive.

Effects of the proposal, if approved: The proposed change will bring the example into line with current practice, and will promote a more logical sequence by:

  1. adding "phonetic transcriptions", which is currently missing from the example.
  2. grouping transcriptions and related audio.
  3. placing items pertaining to other words (rhymes, homophones) after the items pertaining to the current entry.
  4. placing non-pronunciation items included in the section (hyphenation) last in the sequence.
  5. inserting a space after the asterisk bullet (*) per AF preferred format.

Yes, some entries require more complex formatting, and there are some issues of item indentation and grouping that have yet to be settled, but for simple cases the above sequence should suffice. The proposal, if approved, would not change any explicit policies, but would make the ELE example resemble more what is implied in other places. --EncycloPetey 23:01, 4 December 2008 (UTC)

Standard way to include technical character code and Unicode data

I've noticed at least three formats for including technical data on specific characters, letters, or symbols, such as but not limited to Unicode data.

  • CJKV Characters, of which there are thousands:
  • Korean syllables, of which plenty are being added now:
  • Other characters, punctuation, accent marks, letters, symbols, etc. Probably the few such articles which include technical info all take independent approaches:
    • ҄ (kamora)

Let's see if we can pick one format for all such "symbol" or "character" entries. — hippietrail 00:08, 5 December 2008 (UTC)

Keyboarding and encoding information are definitely technical info, removed from what the character itself is. But I'm not familiar with the information about the Han character: is it about classification of the Chinese character itself, or is it information about its representation in particular computer systems? Maybe these two things should be treated differently. Michael Z. 2008-12-05 00:28 z
The radical and stroke count relate to the character as such; cangjie and four-corner are keyboard input systems. -- Visviva 00:50, 5 December 2008 (UTC)
I know radical+stroke is usually language-independant but there are characters with a different number of strokes at least between Chinese and Japanese, even some which share a Unicode codepoint!
Cangjie and four-corner I assume are Chinese specific. I know even Mandarin and Cantonese users have different input methods. Japanese and Korean input is different again. — hippietrail 01:00, 5 December 2008 (UTC)
Than CJKV character entries have a bunch of different info. There's a translingual section which can include an etymology of the writing of the character and meanings the character has independent of which language it appears in. The translingual section also has a technical section which includes info on where to find the character in several character dictionaries as well as its Unicode codepoint.
So yes the technical mumbo jumbo should clearly be separate from the lexical stuff which is our main focus. That said, all characters and symbols have some associated technical mumbo jumbo. Since people are already including this data but seem to be aware of the other groups of users who are including similar data but in other formats, it seems that now is the time to look at them all and come up with one way to do it.
I feel strongly that most of this stuff has nothing to do with usage and that we should not turn "Usage notes" into a grab-bag section. We should have some kind of new "Technical data" section. Name and layout of said section to be discussed here. — hippietrail 01:00, 5 December 2008 (UTC)
I agree that "Usage notes" isn't ideal (although I think it is accurate; this information does pertain to a kind of usage, just not the kind that we usually think of.) I would suggest two options: 1. Some kind of infobox (perhaps combined with a navbox); a simple two-column layout seems ideal for this kind of information. 2. a standard "Technical notes" section exclusively for Symbol-type entries; would fit nicely with the existing "Usage notes" and "Dictionary notes". Leaning towards #1 atm; I've always liked infoboxes, though we don't have much use for them normally. -- Visviva 12:57, 5 December 2008 (UTC)
Would the radical & stroke info belong with the etymology? Does it deal with the origin or composition of the character, or is it more like its orthography and expression?
The technical info is analogous to pronunciation—how the “term” is represented or expressed. Perhaps Hyphenation could be combined with it. Michael Z. 2008-12-05 17:43 z
I don't think it would fit well in the etymology; it is basically graphological. If this stuff is analogous to pronunciation, I suppose its placement should be analogous also (i.e. before the POS section if any). In this case I like "Technical data" better than "Technical notes". Propose the following, as a starting point only:
1. Technical information related to character input, composition and encoding to be in a "Technical data" section
2. "Technical data" section to be placed in the same location as Pronunciation
3. In general, no entry should have both a Pronunciation and a Technical data section.
4. In entries with a Technical data section, the POS section can be omitted if the character does not have a specific associated meaning (this would apply to Hangul syllabic blocks and many alpha/numeric characters).
5. For characters that are not language-specific, the Technical data should be in the Translingual section only.
Thoughts? This would certainly require a VOTE to be accepted as standard, but perhaps a few demonstration entries could be prepared (I'm looking at you, ). -- Visviva 16:06, 6 December 2008 (UTC)
Regardless of any conclusions reached here, I don't see any particular need to disrupt the existing Han character entries, which are just fine the way they are. The Han characters are unique in the world, and it does no harm if our treatment of them is somewhat unique as well. -- Visviva 16:06, 6 December 2008 (UTC)
I agree with what Visviva says Usage notes is definitely more suited to actual usage of the word, as is done at 熱い along with many, many other entries I'm sure. 50 Xylophone Players talk 20:19, 6 December 2008 (UTC)

Why make “Pronunciation” and “Technical data” sections not both appear? A character may certainly have a spoken name as well as a Unicode code point. I'd rather see a clear definition of what belongs in the two sections than a complex formula for avoiding having both.

I also would rather see this follow the POS, but I don't know if I have a firm reason except for avoiding pushing it still further down the page. Keep in mind that a particular symbol may have more than one technical expression, each with its own code point (e.g., an apostrophe can be a typewriter apostrophe ' , a typographer's usual right single quotation mark ’ , or a modifier letter apostrophe ʼ ; a diacritic letter can be encoded with a combining diacritic, or as a single combining-form character). It also may include information about historical or language variants, such as the Serbian and Macedonian cursive glyphs for the Cyrillic letters б, д, т, п, and stylistic variants like ligatures, and italic and bold fonts. And each may have code points in different encoding schemes, but I'd rather stick with Unicode where possible.

Can we lump this under a more descriptive heading than “Technical data?” As I see it, this is all about written representation, including computer input method and encoding, but also may include information about handwriting, brush strokes, cuneiform stylus technique, typewriting, and metal typesetting (e.g. hyphenation). Michael Z. 2008-12-07 22:50 z

Hmm... "Graphology"? "Representation"? "Form"? "Written form"? (could be superordinate to "Alternative forms") "Composition"? "Production notes"?
A lot of the things you're mentioning seem more like "Usage notes" material to me (e.g. stylistic variation), though I can see how it could go either way.
As for "Pronunciation" and "[Technical data]", there are likely to be some cases that call for both, e.g. characters in a language-specific alphabet. In most cases it's the name that has a pronunciation, not the character itself, but let's bracket that issue for another time. But Translingual sections, where the technical data would usually belong, shouldn't have Pronunciation sections at all, ever. And even some language-specific characters (e.g. at least 96.4% of Hangul "syllables") don't have distinct Pronunciations. There shouldn't be an absolute prohibition, but I would think entries with both sections would be the exception.
For me, the most important thing is that this section, whatever we call it and wherever it goes, be able to replace the POS for entries with no semantic content. That would make entries like , which will never contain anything but technical data, significantly less absurd. -- Visviva 02:17, 8 December 2008 (UTC)
Maybe we just need some standard subheadings under “Usage notes”: Hyphenation, Input method, Encoding, RadicalMichael Z. 2008-12-08 04:18 z
Well, this whole cycle of discussion is mostly due to the objections of many users to using "Usage notes" for these purposes (see e.g. Hippietrail's initial posting above). I'd prefer to find a solution that everyone finds acceptable. -- Visviva 16:11, 8 December 2008 (UTC)
Something like this would be my ideal entry for a wholly asemantic Unicode entity. (Somehow I don't see this meeting with much approval, but I thought I'd run it up the flagpole). -- Visviva 16:11, 8 December 2008 (UTC)
I like the simplicity, but it may puzzle readers because it omits the standard elements. Every other entry has a main heading which classifies the term (POS or “Symbol”), over the term itself. This looks like the main entry is missing, and only “Technical data” about it got left behind. Michael Z. 2008-12-08 18:15 z

What do dictionaries call this kind of information? What kind of specialty dictionaries cover these topics?

Are we venturing into encyclopedia territory by discussing not just words (including the names of letters and symbols), but how they are written and encoded? Michael Z. 2008-12-09 06:39 z

Amending ELE to Homophones Template

I propose amending the relevant homophones examples in the ELE to the homophones template. If you want to see the proposed edits in context, they have already been made in WT:PRON#Homophones on 11/24/08. The homophones template has been around since Feb 2008, so I hope these changes will be straightforward. If there are no objections, I guess a formal vote will follow. These are the four proposed edits:

 * {{temp|homophones|rite|wright|write}}

* {{homophones|rite|wright|write}}

* {{homophones|beta}} {{i|in [[non-rhotic]] accents}}
* {{homophones|rite|wright|write|ride}} {{i|in accents with [[flapping]]}}
Please see Wiktionary:Votes/2008-01/Homophones section, which could be a complicating factor. No consensus was achieved in that vote, but the homophones template did not exist at that time. --EncycloPetey 19:36, 5 December 2008 (UTC)
Level 4 header for homophones is a separate issue. I believe you can bring up a re-vote at any time. For now, the homophones template should be the standard for the current formatting. --AZard 20:35, 5 December 2008 (UTC)
As I said, the template did not exist at the time of the vote. The discussion on the vote includes many additional comments beyond the L4 header issue, and these comments may (or may not) be considered useful in terms of the current proposal. --EncycloPetey 20:39, 5 December 2008 (UTC)
Since there has been no objections to the homophones template, I have started a vote. The vote will end concurrently with EncycloPetey's proposal vote. I have to say that it was quite difficult to follow the instructions on initiating a vote. I can see why nobody wants to initiate votes on minor changes. --AZard 00:18, 19 December 2008 (UTC)

Username restrictions

I think we should remove the section in Wiktionary:Usernames and user pages which reads,

"(Usernames) should be fairly easy for the typical English-speaker to recognize, remember, and type. This generally means being fairly short; using the Latin alphabet (though some digits, spaces, and/or punctuation may be included as well); and avoiding long unpronounceable sequences of characters. (This rule is somewhat flexible, and compromises may be made in borderline cases.)".

With the advent of unified login, users must pick a username for all projects. I would be a bit bummed out if the Greek Wiktionary told me that I could not use my unified login username for their project, because it was not composed of Greek characters (they don't actually have such a policy, but would be every bit as justified in having such a one as we are in demanding Latin characters). It may be reasonable to replace it with a more moderate restriction, perhaps against having usernames that don't link for everyone (such as the creator of Yabim, whose userpage I can link to with my Windows partition, but not with Ubuntu). Thoughts? -Atelaes λάλει ἐμοί 06:39, 7 December 2008 (UTC)

Agree with this assessment by Atelaes (talkcontribs), most certainly makes sense. Cirt (talk) 07:07, 7 December 2008 (UTC)
Me too.—msh210 18:49, 8 December 2008 (UTC)
Rightly or wrongly, ASCII is the default character set of Web culture. It's great that MediaWiki allows usernames to contain non-ASCII characters, and even to consist solely of non-ASCII characters, but I don't think we need to allow that if we don't want to. For some relevant statistics, I looked at the usernames of administrators on various projects in non-Latin-script–using languages. Here are the results:
project total Latin native link
Arabic Wiktionary 6 6 (100%) 0 (0%) ar:Special:ListUsers?group=sysop
Arabic Wikipedia 22 20 (91%) 2 (9%) w:ar:Special:ListUsers?group=sysop
Greek Wiktionary 6 6 (100%) 0 (0%) el:Special:ListUsers?group=sysop
Greek Wikipedia 16 15 (94%) 1 (6%) w:el:Special:ListUsers?group=sysop
Hebrew Wiktionary 4 3 (75%) 1 (25%) he:Special:ListUsers?group=sysop
Hebrew Wikipedia 50 27 (54%) 23 (46%) w:he:Special:ListUsers?group=sysop
Mandarin Wiktionary 6 5 (83%) 1 (17%) zh:Special:ListUsers?group=sysop
Mandarin Wikipedia 90 77 (86%) 12 (13%) w:zh:Special:ListUsers?group=sysop
Russian Wiktionary 4 4 (100%) 0 (0%) ru:Special:ListUsers?group=sysop
Russian Wikipedia 70 63 (90%) 7 (10%) w:ru:Special:ListUsers?group=sysop
Which is not in itself an argument for forbidding these foreign alphabets, but at least suggests that the situation is not symmetric between our forbidding foreign alphabets and other-language projects forbidding ASCII letters. Even the Hebrew Wikipedia, which has by far the greatest proportion of native-script usernames (marked in pink), has a majority of Latin-script usernames. (Though if we ignore bots, I think it's 24 Latin to 23 Hebrew, so basically tied.)
(BTW, the Mandarin Wikipedia doesn't add up because it had one numbers-only username. I could have counted that as Latin, because Mandarin has its own system for writing numbers, but I decided it made more sense not to count it for either one.)
RuakhTALK 14:48, 7 December 2008 (UTC)
I agree that, with unified login, this is obsolete and should be removed. I have seen a user or two with a Hangul username, and I don't recall anybody making a fuss about it. That said, for reasons of editors being able to recognize and communicate with one another, I would strongly favor an ASCII-only username policy if it could feasibly be implemented -- but, with unified login, it really can't. It's a good thing the Great High Gods of Wikimedia know what's best for us. </snark> (How many scarce WMF resources were wasted implementing this completely unnecessary "feature"? I know it had the devs tied in knots for months...) -- Visviva 15:34, 7 December 2008 (UTC)
By the way, regarding %D9%90 (talkcontribs), who created Yabim: that username consists of a single Arabic diacritic, the kasra, which makes a short /i/ sound. Even if we allow Arabic-script usernames, we don't have to allow usernames consisting solely of diacritics, without any regular characters for them to attach to. —RuakhTALK 16:19, 7 December 2008 (UTC)
I agree with that about diacritic-only usernames.—msh210 18:49, 8 December 2008 (UTC)
I think keeping the current policy for en:wikt-created accounts and accepting most (wih the aforementioned caveats) accounts that might come from other projects is fine. If worst come to, users will probably request a latin transliteration from the user they can use when referring to them. Circeus 15:52, 9 December 2008 (UTC)

Move to Wiktionary:Preferences Gadgets tab in Special:Preferences ?

Hello. I have started a new thread and proposal at Wiktionary talk:Preferences, see here: Wiktionary_talk:Preferences#Move_to_Gadgets_tab_in_Special:Preferences_.3F. Would most appreciate your input/thoughts. Thank you, Cirt (talk) 07:14, 7 December 2008 (UTC)

We need to specify en:… categorisation, rather than just assume it

I recently corrected the erroneous categorisation of aceto and aceti in the English plurals and en:Latin derviations categories, moving to their proper Italian æquivalents. This kind of thing is a pretty persistent and widespread problem, as evidenced by Robert Ullmann’s recently-created list of English missing forms. It arises from the labour-saving fact that multilingually-applicable templates, such as {{etyl}} and {{plural of}}, automatically categorise the entries in which they feature as English ones, unless the lang= parameter is otherwise defined. This is fine for English entries, which, especially if created by inexperienced editors, saves time both in their creation and in cleaning them up. However, in the case of foreign-language entries, neglecting to properly define the lang= parameter for every template used therein incorrectly categorises that entry in an English-language category, as exemplified above.

At a time when the majority of entries are for words in English (as is the case now, AFAIK), this makes sense. However, eventually, the number of entries for foreign-language terms will, præsumably, outnumber the number of entries for English terms (for the simply fact that the sum of words in every other recorded language far outweighs the number of words in the English language). For this reason, I believe that we ought to explicitly specify lang=en in templates whereby it is præsently assumed. AutoFormat should be able to do the bulk of the work, because, AFAIK, the majority of instances are unambiguous. Once A.F. has done what it can and the majority of templates have their lang= parameters specified, those autocategorised as English can be redefined into a clean-up category for human editors to categorise correctly. This would solve our problem of entries’ incorrect categorisation.

Any thoughts?  (u):Raifʻhār (t):Doremítzwr﴿ 18:38, 7 December 2008 (UTC)

I agree with Doremítzwr on this one. Yes, there are times to give English an exception from normal policies because we're the English Wiktionary. However, in this case, I think it would be better to use the "en" prefix. I have been using en as the second parameter for etyl for some time now, and I know I'm not the only one. I think AF could certainly help out, as it has the capacity to know what L2 it's working with (the only necessary piece of information to do this job). Additionally, as we are really the flagship of the Wiktionaries, this change will make it easier for non-English Wiktionaries to import our content. -Atelaes λάλει ἐμοί 20:49, 7 December 2008 (UTC)
I want to be sure I undersatnd, because I interpret the title of this section differently from the thrust of the text. The title seems to imply renaming categories, but the text seems to be about using "lang=en" explicitly rather than assuming it. I agree with the use of explicit "lang=en" or "en" where the template calls for an ISO code. --EncycloPetey 22:10, 7 December 2008 (UTC)
Both would be good for consistency, but my proposal was concerned with making lang=en assumptions explicit, rather than changing category titles.  (u):Raifʻhār (t):Doremítzwr﴿ 23:16, 7 December 2008 (UTC)
Oh, I misunderstood. In any case, I support both. -Atelaes λάλει ἐμοί 15:18, 8 December 2008 (UTC)
The use of "lang=en" or "lang=English" (or en as 2nd par in {etly} is superfluous noise, and should be removed whereever seen. The solution Doremítzwr offers for AF to implement is backwards: it should be removing lang=en on sight, and adding lang=xx whenever missing (it does this for {IPA} for example; easy to generalize). All that is needed is a list of templates we want to consistently provide lang= for when not in the English section. (Importing things into other wikts is a strawman, they require plenty of translation anyway.) Robert Ullmann 16:52, 8 December 2008 (UTC)
The cat structure is not broken, and therefore doesn't need to be fixed. We can work on fixing things that do. Robert Ullmann 16:52, 8 December 2008 (UTC)
The noise argument is, IMO, a very weak one; even if we continue to assume English as the default catting language, what harm does its explicit expression cause? Treating English consistently with other languages would be useful if only because seeing the lang=en and {{etyl|…|en}} parameters explicitly stated in English entries would probably prævent people from neglecting their explicit statement in foreign-language entries.
Our cat. structure works in theory, but in practice many pages are miscategorised, the overwhelming majority of which being foreign-language entries miscategorised as English ones. This is only realistically explicable by the human error of neglecting to state explicitly the lang=… and 2nd {{etyl}} parameters. If we required these parameters to be explicitly stated in English entries, rather than assuming that unstated parameters intend English categorisation, all these cases of human error would show up in a clean-up category; at præsent, our system hides erroneously-categorised foreign-language entries amidst genuine English entries in English-language categories. This would virtually never happen if lang=en &c. had to be explicitly stated (because no one would accidentally type lang=en or en in the templates of a foreign-language entry, AFAIK).  (u):Raifʻhār (t):Doremítzwr﴿ 02:03, 9 December 2008 (UTC)
While the task of cleanup with our system is somewhat more complicated than it would be under your proposal, it is still feasible. The most obvious thing is to go through the entries in a given English category and see which ones lack an ==English== header (although this would skip those multilingual entries where a non-English section is applying an English category). This is the sort of list-chomping task which even the Wiktionary-unfriendly AutoWikiBrowser can handle. However, I honestly haven't been aware of this as a major problem... As "current shortcomings of Wiktionary" go, I wouldn't expect it to rank in the top hundred. -- Visviva 16:21, 9 December 2008 (UTC)
Oh, good idea. I've taught AF to be very suspicious of entries that do not contain "==English==" and also do not contain "lang=" anywhere (;-) Robert Ullmann 16:59, 9 December 2008 (UTC)
AF will catch these. As Visviva says, this isn't a problem to cause some major re-doing of things. Is not broken, just needs some(thing) paying attention. The specific templates do need to be identified. (And note to D: "noise" is a serious problem in what is essentially a UI. It is a mistake to ignore it.) Robert Ullmann 16:32, 9 December 2008 (UTC)
Can AF add the lang= parameter to context tag templates and header templates in FL entries since they also categorize? --Bequw¢τ 19:20, 16 December 2008 (UTC)

Subdivide Category:Requests for photographs?

American Sign Language entries are beginning to dominate Category:Requests for photographs, and will probably overwhelm the category eventually. Does it seem reasonable to create a subcategory like [Category:Requests for photographs (American Sign Language)]? If so, how should entries be added to such a subcategory? We could modify {{rfphoto}} to accept an ISO language code, or would it be better just to create a new template (e.g. {{rfphoto-ase}})? Rod (A. Smith) 23:36, 7 December 2008 (UTC)

A separsate category for signlanguage requests sounds good to me. I'd use a new template. RJFJR 16:59, 8 December 2008 (UTC)
If there is an ISO code for ASL, then why not just adapt {{rfp}}? --EncycloPetey 20:16, 8 December 2008 (UTC)

Dates etc

I've been thinking about how to indicate when a given word entered the language. Sometimes an etymology section will say something like "first attested circa whenever", but this is not very satisfactory because different senses of the word will have come into use at different times. So anyway, have a look at bead. Here I've tried to add a little extra bit on every def line to indicate the century in which that sense was frst used. I've just HTML-coded it, but if we decide this is desirable it could obviously be templated. So what do people think? Unobtrusive enough? Too garish? Ƿidsiþ 09:35, 9 December 2008 (UTC)

It's admittedly not obtrusive, but it is very confusing. If I had not seen this topic, I would have no idea what is going on. IMHO, this information is clearly for the etymology section (possibly a template could be made derived from the {{timeline}} template?). Circeus 15:48, 9 December 2008 (UTC)
I'm with Circeus on this one; it's a nice idea but is just too opaque. Also in many cases more explication is needed or desirable (who coined it, in what context, with what spelling, etc.). I would prefer to have bullet points under Etymology, marked with {{sense}}. I also like the idea of something like {{timeline}}. On the other hand, if bloated Etymologies become too much of a drag on layout, it might be worth considering interlinear etymologies.
There is also the small matter of verifiability for these; it's difficult to prove a negative, so is a given date a) the first attestation we can find, or b) the date given in authoritative works such as the OED? In either case, we should note the source. Further on this, IMO we should always include the first known use in the interlinear citations, even if it is otherwise not an ideal example. -- Visviva 16:12, 9 December 2008 (UTC)

Well, adding this kind of info in the Etymology section means it would need as many lines as the POS has, as well as a lot of duplication of information, which seems crazy. Interlinear etymology is something very different, I do not call this proposal "etymology" -- each sense has the same etymology as the last, the word is just developing over time. As for verifiability, the reason I kept it to within a century is that in many cases the first attestation is given slightly differently in different sources, and this allows us to give a ballpark figure without infringing copyright of any one source. Ƿidsiþ 16:23, 9 December 2008 (UTC)

Well, this would fall under the general rubric of a word's historical development, which AFAIK is what etymology is about. (And for many words, different senses have actually been coined independently, though they otherwise share the same derivation.) But you're right that putting it all under Etymology can get messy, as well as causing an undesirable amount of visual noise.
I can understand that there may be reasons for providing only the century in some cases, but I think any approach we take needs to allow for more information to be added, if someone is willing and able to add it.
What I meant by "interlinear etymology" was something like this:
  1. (farbology) To farb
    He farbed the gozzles and placed them in the snarkalot.
    Coined in this sense by Dr. Foodlesnaffer in a 1936 paper in the Journal of Irreproducible Results.[1]
    • <quote if available>
It's not kosher under current ELE, and there may be good reasons to avoid it, but it seemed like it might be worth considering. -- Visviva 17:40, 9 December 2008 (UTC)

I don't think repeating a date from a particular source can be considered copyright infringement. Of course, it's good form to cite the reference in such a case.

Regarding the format, I find the yellow highlights are way too much: they are more prominent than bold text. They need formatting to make them distinctive from their context while reading the definition line, not to attract attention from across the page. Conversely, it's possible that for certain kinds of colour blindness they will lose all distinctiveness whatsoever. And why C9, which I've never seen before, instead of the transparent 9th c.?

Instead of the deprecated font element, they could be marked up as <abbr title="attested in the ninth century">9th c.</abbr>, whose meaning is clear.

I don't mind attaching sense-specific information to the definition lines, but this is really contrary to our current practices. Have a look at trooper for another approach. Michael Z. 2008-12-09 18:35 z

Yes, I agree 9th c. is better, and the yellow was only for the sake of argument really. The problem I have with entries like trooper is that you end up reeling off all the definitions again in the Etymology section, whch seems a pointless waste. Why bother with a POS section at all if all the definitions are listed elsewhere? Ƿidsiþ 18:49, 9 December 2008 (UTC)
The two serve different functions, and each serves its own function better. If you removed either one, or if you combined the two, the entry would be significantly worse.
Specifically, in trooper, the etymology does not repeat the definitions, it only refers to them, and if it did it would be far less were incorporated into the definition lines, then it would really suck for someone who just wants the definition.
But putting the etymology after the definitions might be an improvement. The definitions are meant to be comprehensive, but as concise as possible. An etymology may grow to a very large size, with lots of detail, continuing to refer to the senses (the article core). Michael Z. 2008-12-09 19:14 z
I think the etymology section is the place for such things. That'd give us space to specify that the date mentioned is the earliest attested use (rather than the earliest use, or, well, anything else ("9th c." is highly ambiguous)). Yes, it'd make the etymology section a little wordy, but I think that's the best of all the options anyway. At least, I think so right now.—msh210 19:13, 9 December 2008 (UTC)
An etymology can get very wordy (e.g. горілка, ковбаса), and I think this is a reason to keep this information in one place, rather than taking a small selected part of it and moving it elsewhere in the entry. Michael Z. 2008-12-09 19:23 z
Ultimately, an approach along the lines of what Widsith proposes will, ultimately, be necessary. It's all fine and good to say that we should simply put it in the etymology section, but eventually, when we get up to the standards of the OED (something I feel confident will happen, if not necessarily all that soon), this will be completely impractical. The etymology sections will become ridiculously long, and very difficult to read. However, what Widsith proposes is not enough. We also want to be able to include the earliest quote, if nothing else. However, putting such things, with our current format, is completely impractical. It would break up the senses too much. So, here's what we need. We need to have a box to put things in for each sense, which isn't seen unless called. Conrad.Irwins's paper view was so close to perfection. Each sense needs to have a bunch of buttons next to them which allow the user to see individual information, such as a sense etymology. Usage notes falls under the same category as etymology (it currently works under the current format, but will not stay that way forever). This would also be the perfect place to put translations and synonyms, things which are currently somewhat detached from the senses they serve. Would anyone with some JS skills be willing to take a crack at this? It's something which will need to happen eventually. However, for the time-being, I'm not too worried, as we are still working at getting etymologies for basic words. We're certainly not the OED yet. -Atelaes λάλει ἐμοί 07:10, 10 December 2008 (UTC)
You just prompted me to try out the paper and toggle views. Wow. Michael Z. 2008-12-10 20:06 z
I'm glad that you liked it - there are several issues with it at the moment, but most of them can be greatly improved with little effort. One of the points of building it was to show that most of the information we have on wiktionary could be reformatted into a new layout reasonably easily. It has been the case for as long as I've been here (a mere year or two) that people have been grumbling over the entry format - and while I am certain we will never find something that makes everyone happy, I do think we could do a lot better than we do right now. Our current policy of having senses nested under an Etymology heading is very naive - although some senses have common ancestors it seems to me highly unlikely that there is no information we can give about individual senses and splitting each sense into it's own etymology is stoopid. By putting the Etymology under the sense we can thus add information clearly to the relevant place, by using some kind of subsensing we can share the common parts of etymology in a supersense [I've not looked closely at this idea, it just came to me while writing this]. The problem is that whenever a change in layout is proposed comes the realisation that we don't have the manpower to convert a million entries into a new format. As the parser demonstrates, well over 99% of this can be done automatically - or at least semi-automatically, but that still leaves thousands of entries that will need to be slowly manually taken care of. While I would love to invest some time into this, it's not really sensible until we have a proposal that everyone agrees to for a new layout - otherwise that's a weeks effort gone down the pan. It is interesting to note that in the 2005 discussion, all the proposals were divided by type of data, and each type of data section split individually into subsections - what I would like to see is totally the opposite. [Well, I think split by language and then PoS is reasonably sensible, though there are cases where that doesn't work either - particularly for verbed nouns]. I started a page at Wiktionary:Layout woes a while back - it would be useful if people can keep adding to it. Conrad.Irwin 01:50, 14 December 2008 (UTC)
Someone at some point mentioned an Etymology namespace.—msh210 22:48, 10 December 2008 (UTC)
I think we already have enough namespaces. If we create an Etymology namespace, it'll start a trend which will result in users having to look thirty different places to find info on one word. No good. -Atelaes λάλει ἐμοί 23:25, 10 December 2008 (UTC)
I'd agree with Atelaes on this one. Conrad.Irwin 01:50, 14 December 2008 (UTC)
I agree as well. I don't see any reason we couldn't coordinate this with the Citations namespace, which already handles citations for a word. Perhaps including the earliest quote (when available) or a date and note (when it isn't). Perhaps even using some template or visual indication that it is the earliest known usage in the language. However, since we treat Old English as a separate language, this won't always be complete information. --EncycloPetey 20:53, 14 December 2008 (UTC)
Sounds good.—msh210 19:12, 15 December 2008 (UTC)

Lexicon of Linguistics

This might interest those who write articles related to linguistics: I quote from an e-mail I got from the contact person of the Lexicon of Linguistics:

Hi, Feel free to quote. If the quote is extensive we'd appreciate a mention of the source.

Met vriendelijke groet, Johan Kerstens

Hendrik Maryns schreef:

Hi,
I was wondering what the copyright and license conditions of the Lexicon
of Linguistics are. The reason I ask is that I want to write an article
about ‘unergative’ for Wiktionary, and want to take over the
description of the lexicon. But I have to make sure I am not violating
copyright here.

—This unsigned comment was added by Hamaryns (talkcontribs).

Sounds like CC-BY, which is good. All content taken from there should be tagged with a consistent template, both to provide the desired attribution and in case there turn out to be problems in the future. -- Visviva 16:01, 9 December 2008 (UTC)

false friends

Should false friends be explicitly mentioned and discussed in foreing-language entries (e.g. défiler/defile, demand/demander)? Which section, if any, is most appropriate for this, if it is to be included? Circeus 15:56, 9 December 2008 (UTC)

It would fit best under Usage notes, I think. Definitely good information for us to include. -- Visviva 16:02, 9 December 2008 (UTC)
That's tough. They can be useful, but there are so many potential false friends when one considers that we cover every word in every language. How do we determine which of the words in the world are going to be confusing? If one Romance language has a false friend with an English word, it's a decent chance that most Romance languages will have a similar word. --EncycloPetey 01:28, 15 December 2008 (UTC)
I was thinking only of false friends between foreign languages and English: although we cover all languages, we do it primarily for English speakers. Circeus 02:56, 17 December 2008 (UTC)

do we want those links?

Do we want those links.png

Mutante 11:17, 14 December 2008 (UTC)

You mean, do we want translations in form-of entries? I should hope not. -- Visviva 11:24, 14 December 2008 (UTC)
Agreed. And especially not translations in foreign-language form-of entries. —RuakhTALK 13:16, 14 December 2008 (UTC)
I distinctly remember people advocating for them (was a while ago last time I saw the topic brought up, can't remember who or where), though I myself very disapprove. Circeus 19:33, 14 December 2008 (UTC)
Maybe when our bots and servers have nothing better to do. DCDuring TALK 19:46, 14 December 2008 (UTC)
Please no (except for English genitives, those should always list corresponding foreign language genitives). -Atelaes λάλει ἐμοί 20:07, 14 December 2008 (UTC)
No, we list Translations only for English entries. The other Wiktionaries handle translations for their languages. Should plural English eels list translations? Probably not, since many languages have more than one plural form, and those plural-form pages point to the main entry form anyway. As DCDuring notes, there are better things for us to be doing right now, anyway. --EncycloPetey 20:49, 14 December 2008 (UTC)
Probably not, for the reasons above. Looking at the diagram, I can't help observing that even the links from the left-hand column to the right-hand column (not vice versa) don't make a lot of sense in most cases: The entries in the right column typically say nothing more than that they are the plural of the corresponding word in the left-hand column, so a user who clicks on the link in the article of the singular form learns nothing new. (Yes, those links are nevertheless justified because pronunciation information for the plural is in the article for the plural. It's just that such pronunciation sections are quite rare.) -- Gauss 00:11, 16 December 2008 (UTC)
No, they are way too burdensome, and mostly redundant, as per above. Possible exception would be if a grammatical plural differs in meaning from the semantic plural (of an English word). Most of the time, the semantic and grammatic plurals agree, for which case there is no need for direct translations of plurals. (The translations are already available indirectly, by jumping to the English singular, translating, then the foreign plural is in the inflection line.) —AugPi 00:58, 19 December 2008 (UTC)

Straw poll: non-English definitions

Is it ever acceptable to define a non-English entry using a non-English word? User:Ruakh thinks so[4] [5] [6], but I disagree. On the English Wiktionary, definitions should be given in English. --EncycloPetey 01:26, 15 December 2008 (UTC)

Personally I'm less troubled by the non-English definition than by the idea that we are relying solely on another dictionary -- one dictionary -- for our definition of a word in a major world language. I mean, if it's simply an alternative form, that should be fairly easy to substantiate independently, and if it has some other meaning, that shouldn't be too hard to substantiate either. If -- just for the sake of argument -- if we really don't have any editors who know Spanish well enough to judge one way or the other, then we should reconsider whether it's a good idea to have the entry at all. Personally, I would be inclined to replace the problematic definition with {{alternative form of}} + {{rfv}}, and to delete it out of hand if no verification is forthcoming. I do agree with the point that definitions here on EN should always be at least partially in English. -- Visviva 01:49, 15 December 2008 (UTC)
That makes sense. —RuakhTALK 02:12, 15 December 2008 (UTC)
I don't think you're being completely honest here: for example, I've more than once seen you define a foreign-language form-of in terms of its lemma (and I wholeheartedly support you in doing so). Judging from the DRAE, quejar is some sort of variant of aquejar; so if I may venture a guess, I'd hazard that your objection is really to the equals sign, rather than to the link? If so, I'm not opposed to having some sort of English text instead of the equals sign; the question is, what to say? "Variant of _____"? "Alternative form of _____"? "See _____"? I'd feel a lot more comfortable if we made that decision based on facts, rather than based on an urgent and dire need for some English text, any English text. (Hmm, even without the facts, "see _____" might be preferable to "=_____", because the latter could imply that the two are exactly equivalent, whereas for all I know they might differ in some way that the DRAE isn't deigning to mention. How would you feel about "See ____"?) —RuakhTALK 02:12, 15 December 2008 (UTC)
Now you question my honesty. When the entry is an inflected form or alternative form of the same word, then I have no problem with referring the user to the lemma / main spelling of that word. However, I personally detest referring the reader to another word for a definition (in any language; except as a translation), and strongly oppose referring the reader to another non-English entry. Neither "=" nor "see" helps the reader in this case, because we don't yet know the meaning. Telling someone to "see X" provides no information about how the two entries are related. Using "=" expresses synonyms, for which we have a separate section. If quejar deserves to have an entry, then it deserves a definition, not a cop-out. --EncycloPetey 02:54, 15 December 2008 (UTC)
Re: questioning your honesty: I'm sorry, I didn't mean it like that. I just meant that what you were saying didn't seem to match your actual beliefs, not because you were lying about your beliefs, but because you seemed to be (internally) misidentifying what bothered you. Perhaps I was wrong. I agree that quejar deserves some sort of additional information, but since we don't have that information at the moment, I think =[[aquejar]] or ''See '''[[aquejar]]'''.'' is very helpful, and that {{rfdef}} is overkill (since that template implies that we have no definition, rather than a sub-ideal one.)
BTW, I do think that =[[foo]] is completely fine in some cases. For example, in context קֶשֶׁת (késhet, bow) can be used with an implied בענן (b'anán, in the clouds), and I'd have been totally fine with defining it as =[[קשת בענן]], because that's really what it is. (As it happens, we do in fact give its English translation at both entries, and I'm down with that, but I don't think it's necessary at all.)
RuakhTALK 03:34, 15 December 2008 (UTC)

The =X form isn't a definition at all. It's a cross reference, equivalent to a form of link, except that it doesn't add the appropriate category. Do our guidelines mandate converting manually-constructed entries to templates? If we really want to use this form, then the form of templates should have a parameter to display in an abbreviated version, but I don't see the advantage. Michael Z. 2008-12-15 16:02 z

If there is a known appropriate "form of" description, then I for one have no particular desire to use (or display) the =X form. IMHO the question is how to handle cases where we're not sure of the appropriate description (is it an "alternative form"?), or where there doesn't seem to be one. —RuakhTALK 16:27, 15 December 2008 (UTC)
Yeah, I say alternative form of. DAVilla 06:37, 18 December 2008 (UTC)
In this specific case, we also have a "fake lemma" form (quejar for quejarses), a case that must be dealt with, and it would be a good idea to do so now. For French entries, I've settled on pure lemma only, with optional redirect from the reflexive (if only because that is the normal treatment in French dictionaries), but this doesn't work with spanish. Maybe {{form of|Non-reflexive form}} could do the trick? In any case, both lines are almost certainly situation to use {{form of}} or one of its derivatives. Providing a full definition would be pointless.
On the broader question, I believe that yes, there are situation were a foreign word must be sued to defined another. This is mostly for cases where two foreign institutions have to be related in their definitions (usually, the words in question are actually alspo considered proper in English too, though). Circeus 00:52, 19 December 2008 (UTC)

Stuff that we are missing

I've just refactored w:Aught ought naught nought (in preparation for renaming it). We are missing several senses and several articles. I encourage everyone to fill in the gaps (the Wikipedia article cites sources, which may be of help) so that all of the interwiki links from Wikipedia to here, throughout the article body, work. Uncle G 12:18, 15 December 2008 (UTC)

i am not old enough 2 drink beer

so am i still allowed 2 visit this "beer parlor"?

Yup! It's alcohol free - sadly. Conrad.Irwin 22:32, 15 December 2008 (UTC)
But you are allowed to bring a bottle (nobody would know). SemperBlotto 22:39, 15 December 2008 (UTC)
http://www.unc.edu/depts/jomc/academics/dri/idog.htmlRuakhTALK 23:55, 15 December 2008 (UTC)

where to put abbreviated forms

Currently some entries have abbreviated forms listed as synonyms (e.g., inferior vena cava), some have them as alternative forms (e.g., immunoglobulin), some have them on the inflection line (with perhaps the use of {{abbreviated}}; e.g., gross national product). Is there a policy I'm unaware of that states where to put them? If not, I'd be curious where most are (any way to be able to tell?) and where people think they should be.—msh210 23:36, 15 December 2008 (UTC)

I've not seen a policy. I'd prefer to see abbreviations listed as Alternative forms. Synonyms should be reserved for other words, not used for forms of the same word. However, if there is more than one definition, and if the abbreviation pertains to only one or a few definitions, then I can see an alternative might be necessary (as for page, where p. does not apply to all definitions). --EncycloPetey 23:40, 15 December 2008 (UTC)
Well, can't alternative forms use {{sense}}?—msh210 23:44, 15 December 2008 (UTC)
Possibly, but that is normally used after the definitions. Placing that template in an Alternative forms section at the outset of the entry would not carry the full contextual information. It might make sense if the Alternative forms section in such a situation is nested after the POS (which is allowed). --EncycloPetey 23:49, 15 December 2008 (UTC)
I used {{sense}} on invariant section - but then it's use in the etymology section on that page is just plain wrong according to our current layout policy. Conrad.Irwin 23:51, 15 December 2008 (UTC)

Is WikiSource a durable archive?

If a publisher publishes books:

  1. without copyright, with the permission of the authors,
  2. does not succeed in getting them into libraries or leaving a visible trace at Google Books or Amazon, but
  3. posts them at WikiSource, then
    Is it not durably archived for CFI purposes? DCDuring TALK 16:22, 20 December 2008 (UTC)
As we accept cites from wikipedia, wikisource should be treated as durably archived. Conrad.Irwin 16:42, 20 December 2008 (UTC)
I didn't think we did accept WP cites as valid for attestation. RU's were, I thought, just leads for possible new entries, for which purposes they are useful, and as usage examples, for which they are stopgaps. DCDuring TALK 17:12, 20 December 2008 (UTC)
They (Category:New words from Wikipedia) can be considered valid citations as illustrations of use, but do not meet the strict "3 independent cites in durable media" CFI rule. But in this case it isn't the durability that is in question, it is the independence; people have created 'pedia articles to try to attest wikt entries and vice versa. For Wikisource, I think the issue is similar: it is durably archived, but material that may be essentially spammed into the wikis can't be considered to be an independent use.
We don't accept cites from Wikipedia. —RuakhTALK 19:45, 20 December 2008 (UTC)
That's a good question. Has it come up? Presumably if the book was published in the U.S., it would be in the Library of Congress, right? —RuakhTALK 19:45, 20 December 2008 (UTC)
Sadly, no. It was the early 70s. It was california. It was copyright-free. It must not have seemed necessary. Nor is it listed on Worldcat. DCDuring TALK 21:46, 20 December 2008 (UTC)
IOW, it seemed like a good idea at the time to dispense with such formalities. DCDuring TALK 00:15, 21 December 2008 (UTC)
Are they listed in Books in Print for the relevant years? -- Visviva 06:10, 21 December 2008 (UTC)
I don't know that we require presence in Worldcat, Amazon etc. as a criterion for "durably archived." Perhaps we should ... but under our current definition, I would think that if these works' publication in print can be verified by some acceptable means, that would be good enough. On the other hand, being on WS doesn't matter one way or the other, except that it makes them much more convenient to quote.
This is an area we should put some thought into, as many of the same concerns apply to unpublished resources in archives (letters, manuscripts), a potentially rich source of raw material which, to my knowledge, we have not yet considered tapping. These types of resources are disallowed on Wikipedia, but our situation is somewhat different. -- Visviva 06:10, 21 December 2008 (UTC)
I view this as a matter of establishing reasonable rules that allow us to avoid being manipulated, not that the case I have in mind seems highly likely. In the case of Usenet, our rules can be gamed by someone with patience. In the case of a printed work that is electronically available and has no publicly accessible copy to allow it to be inspected, we could be deceived. Worldcat, Amazon, and Google Books provide evidence that the title exists, as would a record in Books in Print. If the electronic copy is not provided by an established institutional process (Gutenberg, Google, Amazon, others?), it seems that it might still be subject to question as to authenticity, but the existence of a copy in a library at least means the potential for confirming the correspondence of the electronic copy (at WikiSource) would exist. (BTW, I assume WikiSource has an adequate means of covering itself with respect to copyright issues.) DCDuring TALK 08:36, 21 December 2008 (UTC)

contraction, prevocalic, clitic, elision, apocopic

Looking at the definitions of c', d', l', m', j', 't, and s' I see that they are variously described in the various languages of these entries as a "contraction of", "prevocalic form of", a "clitic form of", an "elision of", a "proclitic form of", and an "apocopic form of" the respective words they contract. Shouldn't all of these be the same? If so, which one is the most appropriate? And if not, what is the distinguishing factor? bd2412 T 03:24, 21 December 2008 (UTC)

Re: "Shouldn't all of these be the same?": I'm not sure. Different languages have different pedagogical traditions, and it might make sense to respect those, even when there's no relevant difference in the languages themselves. (For example, I think it would be silly to describe the French passé simple as a "preterite", or the Spanish pretérito as a "past historic", even if abstractly it would make sense to use the same term for both.) And some of the terms only apply to some of them; for example, 't is not prevocalic, apocopic, or proclitic.
More specifically, my thoughts on each:
  • "Contraction" seems good; I don't actually know whether it's accurate, but I think anyone coming across it will know exactly what we mean. (It's vague, in that it doesn't explain what motivates the contraction, but a usage note is probably better for that, anyway.)
  • "Prevocalic form" seems decent for the French forms, at least; it's not perfectly accurate, since the rules are more complex than just "use it before a vowel" (« le onze avril », « dis-le au professeur », etc.; and stuff like « je m' fuis » is common in netspeak, song lyrics, and such), but I think it's a good enough first approximation for a definition.
  • "Clitic form" is completely wrong. It's true that these forms are clitics (except maybe 't, I'm not sure), but the corresponding full forms are also clitics; we might as well define cars as "Noun form of car" and thee as "Archaic form of thou."
  • "Elision" is a standard term for French, though in my experience it's usually said of the phenomenon rather than of the resulting forms. It's vague, in that there are many kinds of elision, but someone who knows something about French will know that "elision" is standardly applied to this phenomenon, just as "liaison" is applied to its converse. (On the other hand, someone who knows something about French probably won't need to look up these forms anyway.)
  • "Proclitic form" has the same problems as "clitic form", only worse. (A proclitic is a clitic that comes before whatever it attaches to, as opposed to an enclitic, which comes after.)
  • I'm not sure if "apocopic form" is technically accurate or not (except for 't, where it clearly isn't), but that's not how I'm used to "apocope" being used. Also, like the other options, it seems vague, but perhaps unavoidably so. Still, I think I've mostly seen that from Italian-speaking editors, so maybe it's standardly applied to Italian?
RuakhTALK 07:38, 23 December 2008 (UTC)
Sorry, I meant t', not 't. How about a combination of the above, e.g. Proclitic contraction of... ? bd2412 T 08:44, 23 December 2008 (UTC)
That's a good idea, though that specific combination would have the same problem as just "proclitic form" (we'd never define cars as "Plural noun of car" or thee as "Archaic pronoun of thou"). —RuakhTALK 22:17, 23 December 2008 (UTC)

Position of the search box

Hoi, the search box is the most relevant thing in the side bar. It is now possible to have it at the very top. I think it would be a good idea to consider moving the search box to the top on Wiktionary as well.. Thanks, GerardM 13:31, 21 December 2008 (UTC)

Above the navigation box, or above the logo? Can you link to an example wiki? Michael Z. 2008-12-21 16:43 z
See, for example, dewiki (no link because linking cross-project and language gives an invalid title error) -- Prince Kassad 17:26, 21 December 2008 (UTC)
That looks like a good idea to me. Michael Z. 2008-12-21 20:25 z
Linking to w:de:Main Page works OK (project first, then language)... I like this idea. --- Visviva 03:46, 22 December 2008 (UTC)
Excellent idea. -Atelaes λάλει ἐμοί 04:40, 22 December 2008 (UTC)
I like it too. Has anyone tried enwikt on a mobile device? Is moving the search box helpful in this case? --Bequw¢τ 08:12, 23 December 2008 (UTC)
Now done, can easily be reverted by undoing my edit to MediaWiki:Sidebar. It's momentarily disorientating, but so are all interface changes. Conrad.Irwin 10:05, 23 December 2008 (UTC)
I liked it where it was. Can we have a copy at the bottom as well? SemperBlotto 12:31, 23 December 2008 (UTC)
Thanks, GerardM 15:06, 27 December 2008 (UTC)

Logs

We need to make the log page have more search options. For example here, we need to be able to search for users that don't have any talkpage comments, users that never had any contributions, and we want to search for blocks with expiry times of less that 3 days, or more than 1 week but less than half a year, etc., and we want to search for, or exclude (like the boolean operator "NOT", certain "block summaries" (such as "Spamming promotional material", "Vandalism", "Intimidating behaviour/harassment", "Inserting false information", etc. etc.).96.53.149.117 00:51, 22 December 2008 (UTC)

Why would you want to search for such things? If you are researching you can download wiktionary and search manually. If you want functionality added to the software you can try filing a bug at https://bugzilla.wikimedia.org/ but I suspect they'll get put at a very low priority. Conrad.Irwin 13:10, 22 December 2008 (UTC)
Who is "we"?
By the way, it seems pointless to search for users who never had any contributions, since the vast majority of those will be auto-blocks of IP addresses creating accounts. (It's true that we have a policy of blocking accounts with unacceptable usernames, but until an account makes a contribution, most of us will never notice it. It's not like we have nothing better to do than patrol the user creation log for offensive usernames.)
RuakhTALK 07:45, 23 December 2008 (UTC)

English plural gerunds

As in comings, goings, etc. These are coming up with some regularity in my tracking lists, so I'd like to have a consistent, templated approach. Questions:

  1. Is an entry like launchings OK, or should different terms be used? (in particular, should we use "gerund" here in addition to/instead of "present participle"?)
  2. Should there be a reciprocal link from the present participle page, and if so, what form should it take? I don't suppose we'd want to use {{en-noun}} under a ===Verb=== heading...

-- Visviva 05:03, 23 December 2008 (UTC)

This seems to me to be another regualar form of English verbs (in that you can pluralise any -ing), so it would make sense to link them straight back to the lemma somehow. Launchings fits our current style for doing that and while part of me feels it would be nicer to try and "define" it, I can't think of a sensible wording for such a definition. Conrad.Irwin 09:48, 23 December 2008 (UTC)
I like the direct link to the lemma. But as it stands the template does not permit a bracket within so that the entries would be excluded from the entry count. DCDuring TALK 12:05, 23 December 2008 (UTC)
Honestly, I've never understood why we want these to count; they're not real entries by any stretch. But yes, I'll take a stab at fixing that (or maybe the underlying {{en-term}}). -- Visviva 12:48, 23 December 2008 (UTC)
Numbers motivate some. OTOH, RU has proposed a new kludge to manage the WM counter, which may obviate any need for such tinkering. DCDuring TALK 13:00, 23 December 2008 (UTC)
It is no longer necessary to play link tricks with these templates, I've created a different kludge. See below. Robert Ullmann 06:21, 24 December 2008 (UTC)
Yay, thanks. FTR, I have eliminated {{en-term}} from the template for now, so it can take wikilinked args if necessary. -- Visviva 09:23, 24 December 2008 (UTC)
Hmm. I think they should each be treated on their own merit. As an example swimming seems to be a gerund to me ("swimming is good for you") and I can't think of a use for "swimmings"; but launching seems to be a noun ("NASA made several launchings last year". So I would keep "swimming" as a gerund (or present participle for the grammar police) but make "launching" into a (verbal) noun. SemperBlotto 12:13, 23 December 2008 (UTC)
Indeed, "swimming" doesn't seem to have a plural in its usual sense (although "swimmings" does occur in several obscure contexts). This is particularly interesting since I can easily imagine contexts in which it would make perfect sense to speak of "swimmings", but it seems that there are always other words handy to take up the slack (swim, stroke, lap, etc.). -- Visviva 12:48, 23 December 2008 (UTC)
I'm tempted to suggest that this is just a difference between a countable gerund and an uncountable gerund, but I suspect that would cause grammarians to pursue me with pitchforks and torches. Still, I have a hard time accepting that "launching" can be a noun in its own right when it simply means "the action of the verb 'to launch'"; that's exactly what I would expect the gerund of "launch" to mean. Will scrounge around in my grammar books for a plausible solution. -- Visviva 06:31, 24 December 2008 (UTC)
Is that a U.K.-ism? "Launchings" sounds really weird to me; I'd say "launches". (Compare google:"the NASA launches", 41 distinct hits, to google:"the NASA launchings", 1.) —RuakhTALK 09:11, 24 December 2008 (UTC)
I don't think so; most of the hits on Google News are from US sources. It's more of an international awkward-ism. -- Visviva 09:23, 24 December 2008 (UTC)

I think you need to be careful to distinguish between present participles and "gerundial nouns". Compare: opening the shop vs. the opening of the shop. The first is clearly a verb as it has an object. The second is clearly a noun. It is only this second type that can have a plural.--Brett 13:26, 8 January 2009 (UTC)

This is, indeed, the key distinction. (I would never have questioned the distinction itself, but posted originally because I was unsure of how we wanted to approach it, since our existing present participle entries haven't generally addressed the issue.) As it happens, my copy of the Cambridge Grammar of the English Language arrived today (my Christmas present to myself ;-), and I read with interest their discussion of gerunds on pages 1220ff, particularly this bit:

At the level of words, what is important is to distinguish gerund-participle forms of verbs from nouns (the reading of the poem) and adjectives (a very inviting prospect).

If we follow this approach -- which makes a lot of sense IMO -- then many, many of our existing present participle entries will need new ===Adjective=== and ===Noun=== sections. Obviously we want to be somewhat conservative with these, but I'm inclined to think that any -ing form with an attested plural should in fact be treated as a (gerundial) noun. In particular, adding a "Noun" heading will save us from the paradox of having plurals of words we otherwise treat only as verbs. Of course, as SemperBlotto says above, these will need to be dealt with on a case-by-case basis... but there certainly are a lot of them. -- Visviva 02:00, 9 January 2009 (UTC)
That's been my feeling as well. The only practical test I would add is that, if the potential noun is actually used as a noun, then it can take adjectives and determiners. Consider "the first opening of the shop," where the and first describe the opening. This won't work with a construct like "We plan on opening the shop tomorrow," where it's a verb with object. --EncycloPetey 02:08, 9 January 2009 (UTC)
An verb "-ing" form is always a verb. If it also behaves like an adjective (comparative, gradable, predicate use), then it's and adjective; if it behaves like a noun (forms plural, modifiable per EP), then it's a noun. So there could be three PoSs. Does that capture most of it? That doesn't seem too different from what we do now when we are paying attention. We just need to pay more attention. DCDuring Holiday Greetings! 02:20, 9 January 2009 (UTC)
I'd agree, but right now we have a policy of always listing adjectives and nouns before verbs. Personally, I'd rather have no adjective or noun section, except in cases where a given POS is particularly common and/or has its own senses (as with reading and inviting), than relegate the verb section to the bottom of the page. —RuakhTALK 05:19, 9 January 2009 (UTC)
Is that actually policy? I'm not seeing it in WT:ELE. My impression of current practice is that alphabetic order is followed unless one part of speech is much more salient than the others, in which case that POS section goes first. So, in a case like launching we would have #Verb (present participle of) followed by #Noun ([gerund] of). Even if it is counter to policy, that seems like the only sane way to go. -- Visviva 06:19, 9 January 2009 (UTC)
I guess I don't know if it's policy. It's in italics at WT:POS. WT:POS bills itself as a think-tank rather than a policy, but I've seen some very rude language directed at those whose edits don't conform to it. ;-)   Past relevant BP discussions include this one from July '06, where Connel asserts, "The alphabetic order of POS headings was one of the oldest conventions that actually gained consensus here on en.wikt:.", and this one from July '07, which failed to reach a resolution between those who thought that alphabetical order was generally a good idea (i.e. that exceptions should be allowed) and those who thought it was always a good idea (i.e. that exceptions should not be allowed). (N.B. WT:POS makes exception for headers like "Character" and "Syllable", saying they should come first, but that wrinkle never seems to come up in the discussion.) —RuakhTALK 23:08, 9 January 2009 (UTC)
I agree with both Ruakh and Visviva. I use whatever order makes the most lexical sense, and default to alphabetical when there isn't a compelling reason for some other sequence. There are individuals who hold to strict alphabetical sequence, but that's never been written into any policy I've seen, and the idea has gone in and out of fashion as far as I can tell. I've no idea how the community stands currently on the issue, since it's seldom discussed by more than a few people. If I had to guess, I suspect most community members are ambivalent about the issue. --EncycloPetey 00:18, 10 January 2009 (UTC)
In the narrow case of gerunds (and past participles ?), we wouldn't seem to be opening ourselves up to chaos and conflict if we deprecated any sequence of PoSs other than Verb, Adjective, Noun, and then any other (Interjection, perhaps?). All PoSs (except Verb?) needing to be attestable of course. Or could we just have a few "model" entries (protected?, in appendix?)? DCDuring Holiday Greetings! 00:31, 10 January 2009 (UTC)

Over at Simple.wikt, the general practice is to list the adjective if it is gradable, can be modified by very or can complement seem or become. Generally, we don't list "gerundial nouns" unless they're listed in standard dictionaries: words like building. The others could be added, but we don't have a lot of manpower and they are very low priority.--Brett 18:01, 9 January 2009 (UTC)

Espaeranto words to show the exact meaning of the words across the languagaes

Dr. Probal Dasgupta, the president of the Universal Esperanto-Association, in the 5th Asian Congress of Esperanto in Bangalore, Feb. 2008, (with 160 participants from 29 countries) presented a brilliant idea: the same way we use IPA to show the exact pronunciation of the words, we can use Esperanto equivalents to show the exact meaning of the words across different languages. I believe Wictionary - and actually, the whole world of Wikipedia and its sister projects - are the best places to start this. It will be a good way to standardize the meaning(s) of the words across languages, and Wiki-world will be the first place to make this revolution to happen in the world of the words (dictionaries, encyclopedias, etc).

I really appreciate your comments on the idea.

Ahmad Mamdoohi

23 Dec. 2008

There is an Esperanto Wiktionary, which would be the place to carry out such a project. Of course, Esperanto encounters the same difficulties as all other languages in representing universal translations, which is that a great many words have subtle differences in use across languages, so one-to-one translation is impossible, and translation of idioms is particularly difficult. bd2412 T 08:51, 23 December 2008 (UTC)
This is a nice idea, and it works to a certain extent for very simple words - like alphabet <-> alfabeto where it is actually the same word just spelt out in the style of the language. For anything more complicated, (rat or mouse) <-> sorcio, (thank you, cheers) <~> (dankje, bedankt) I point you to BD2412's response. Conrad.Irwin 09:58, 23 December 2008 (UTC)
This concept is called a pivot language, and the idea of using Esperanto is hardly new. It is useful, and machine translation systems often use it in part. It doesn't approach solving the general problem; as noted, one needs pivot identifiers (whether words or just database IDs) for each specific sense of each word. (e.g. run needs dozens) The Omega wiktionary project attempts this, calling the IDs "defined meanings". The specific problem with Esperanto is that it is too small; a pivot language with reasonable coverage (all of the senses of all of the words in every-day dictionaries in the top (say) 50 world languages would require on the order of 2-5 million words or identifiers. All that said, using Esperanto as a pivot language is interesting. One might use the Esperanto word plus a sense identifier as the identifiers, thus making the process of machine translation more understandable than pure numerical IDs. You will still find the size of the Esperanto vocabulary limiting. Robert Ullmann 03:50, 26 December 2008 (UTC)
Such a project has been carried oout in the past, using Latin as the pivot language. I have a 17th century Latin dictionary which defines each word in Latin, then translates each Latin word into English, French, Spanish, Italian, German, Greek, and Hebrew. I have a facsimile of a 16th century trilingual dictionary that does this from Latin into Polish and Lithuanian. The idea isn't new, but it does have problems as others have noted above. --EncycloPetey 20:06, 29 December 2008 (UTC)

Links in form-of templates etc

We've been adding explicit links to make pages "count"; this was or is reasonable in some cases, but has caused some problems, and will cause more. We just spent time fixing {{inflection of}} (for example). I've set up a better kludge. See Wiktionary:Page count and WT:GP#Template:count page: Building a Better Kludge.

All you need to know is that you no longer have to bother with linking or not linking for page count. (There may be other reasons to use an explicit link.) Is automatic. If you are creating or modding a template, by all means allow explicit links if reasonable, but you needn't break other function or do convoluted things to do so. Robert Ullmann 06:27, 24 December 2008 (UTC)

Textarea font change

The font specification for the edit field was recently changed in a software update. This only affects some browsers. Discussion is at WT:GP#Textarea font changeMichael Z. 2008-12-24 17:14 z

Japanese expert help requested for two new categories

I've just created two new categories for phenomena in the Japanese language: Category:Japanese words with multiple readings and Category:Japanese words with nonphonetic spellings.

I know only a little about Japanese so I'd like to bring these categories to the attention of contributors who know more to work on the category description pages. In particular I don't know if they occur only in a few special common particles and historic pronunciations of a small set of words, or if there are many and various reasons for their existence. — hippietrail 01:29, 25 December 2008 (UTC)

I've found there are four words now in Category:Japanese words with nonphonetic spellings: かはづ, , , . The first one is the traditional spelling of かわず, not nonphonetic. The last one is always read /o/, and it is thus phonetic. Since I really don't think of any word other than and , the category needs to be deleted. Or, perhaps you can list words with おお for /oː/ such as 大きい (おおきい, not おうきい), (こおり, not こうり), and 蟋蟀 (こおろぎ not こうろぎ), words with ええ for /eː/ such as お姉さん (おねえさん, not おねいさん), and words with を for /o/, such as てにをは (not てにおわ). They are all phonetic in the sense that you can pronounce them correctly from their spellings, but you cannot write them properly if you don't know them. — TAKASUGI Shinji 07:04, 25 December 2008 (UTC)
Well it is often tricky coming up with a category name which reflects the intent of the category while at the same time being short and not sounding awkward. Every source I've ever seen lists は, へ, を only as ha, he, wo and that is how I type them in my Japanese computer and our articles here on Wiktionary and on Wikipedia also follow this lead, to the overwhelming majority of non Japanese this is a surprise and they would think of it as nonphonetic. Perhaps you could start by improving the Wiktionary and Wikipedia articles which don't fully cover these characters and sounds. And obviously please also suggest a bettery category name. — 165.228.191.249 07:55, 25 December 2008 (UTC)
I'd forgotten to list the interjections like こんにちは, こんばんは, では, and それでは, which end with the topic marker は /wa/. - TAKASUGI Shinji 11:30, 25 December 2008 (UTC)

blue blank citation tab?

I've found an oddity: At unnerstand the citation tab is showing as blue but when I click it I get a page not found message. Is there a reason it's blue? RJFJR 01:26, 26 December 2008 (UTC)

can't duplicate using Firefox 3, Windows XP. DCDuring Holiday Greetings! 02:05, 26 December 2008 (UTC)
Looks red for me in FF 3.0.5 in Ubuntu 8.10. -Atelaes λάλει ἐμοί 02:42, 26 December 2008 (UTC)
It's still blue in IE6 for me, though it is red in firefox. RJFJR 03:00, 26 December 2008 (UTC)
It first creates the tab, then uses an ajax call to check for the page, and make the tab red. Ajax on IE6 has several different failure cases, including at least one that just causes frequent "random" errors. (I.e. not really random, something presumably deterministic is happening, but appears that way.) I don't know of any fix other than "Use Firefox" (;-) (IE7/8 do not exhibit these problems.) Robert Ullmann 03:25, 26 December 2008 (UTC)
Wiktionary is now giving me "Error, unterminated comment" (line 761, char 21) every page load except the initial page and hard-refreshes. This may be something to do with the IEs4Linux set up as it sounds as though the cache is broken somehow. When I get this error message, I get no Citations tab at all - without the error message I get a red citations tab at unnerstand. Do you have any WT:PREFS enabled, as it's possible that one of them causes a Javascript error which prevents the rest of the Javascript being run? Conrad.Irwin 12:44, 26 December 2008 (UTC)
I have acceleration turned on in prefs (that's all in prefs). I've been getting a message that there is an eror on the page. I've just checked the details: line=278, char=11, Error=Object doesn't support this property of method. RJFJR 13:14, 26 December 2008 (UTC)
Thank you, the problem should be now fixed if you clear your cache (ctrl+shift+F5). For the techies: if you override Array.prototype.indexOf in IE6, it will cause indexOf on strings to fail with "Object doesn't support this property of method"... Conrad.Irwin 13:59, 26 December 2008 (UTC)

Voting and Consensus

I'm looking at recent votes and other votes throughout the timeline, and it occurs to me that we don't have any clear cut definition of what the votes for passing a vote for someone to be an administrator should be. I don't want the process that we have to become any more complicated, don't get me wrong.

But it looks right now like we don't even notice the abstaining votes. I understand that it isn't truly casting a vote, but maybe we should be taking into account the fact that editors, often established editors here, do not feel comfortable casting a support or oppose vote. These shouldn't be used to make a decision where there is an overwhelming volley of support votes, but maybe when there are less support votes, or just over half of the total, including support, oppose, and abstain, the abstaining votes should be factored in. After all, people can always be renominated later.

On the other hand, I might be totally out of line in suggesting that we are doing something wrong. Have at. --Neskaya kanetsv 21:56, 26 December 2008 (UTC)

You're not out of line at all; it's a perfectly reasonable suggestion. But personally, I think an abstention should count as an abstention, not as a weak vote in opposition, because experience shows that editors often do wish to abstain completely. If you want to weakly oppose, you could do something like this:
  1. Symbol oppose vote.svg Weakly oppose (please count this as just 1/2 a vote). I'm not O.K. with some of the comments he's made toward certain other editors (here and here, for example), but since those editors are voting in support, I can't bring myself to cast a normal "oppose" vote. —RuakhTALK 23:24, 26 December 2008 (UTC)
(We don't explicitly provide for that, but I'm guessing no one would object, at least on procedural grounds.)
RuakhTALK 23:24, 26 December 2008 (UTC)
A reasonable thought, but I tend to agree with Ruakh. There should be an option for a complete and genuine abstention. -Atelaes λάλει ἐμοί 23:51, 26 December 2008 (UTC)
That does make more sense. I'm simply looking at the overall records of adminship votes. One thing that a overwhelming volley of abstains would say to me is that an editor may be too new. I don't think that they should count as votes other than abstain, but a great deal of abstains carry the comment of too new, or not enough interaction, and maybe that says something about the editor. I'm not quite sure how to phrase what I'm trying to say, it sounds good out loud but the words don't precisely look right.
Also, perhaps we should more solidly decide on a line of what constitutes consensus on one of these votes? I do like the weak oppose option. I don't often feel that an oppose is warranted but I don't always want to fully abstain or support, either. It seems also like we don't have enough options. --Neskaya kanetsv 03:11, 27 December 2008 (UTC) edited again. --Neskaya kanetsv 03:13, 27 December 2008 (UTC)
A more robust voting system might be worthwhile, but I'm at a loss as to what form it would take. I'm open to suggestions. -Atelaes λάλει ἐμοί 03:30, 27 December 2008 (UTC)
Could we require a supermajority of all true votes (support/oppose) and a simple majority of all participants (support+oppose+abstain)? Abstention frequently indicates vague misgivings/uncertainty; if half or more of the community has either vague or strong misgivings about a vote, it should probably not pass. -- Visviva 04:32, 27 December 2008 (UTC)
Hrm. I think a two thirds majority would be a good thing as far as support/oppose votes go. It would make the decision of the vote a little more time-consuming, but it would also eliminate any real questioning of whether someone became an admin where there wasn't enough consensus (no offense to the person whose vote brought this to my attention, honestly). I also think that the simple majority is a good thing. Out of X total votes cast, at least half would need to be in support, in that case. Your wording, Visviva, was good, sounded like what I'd been trying to word earlier. Even abstains with comments that are vague misgivings, such as "too new" are misgivings.
Please though do note that I originally intended this to only apply to votes as to whom we make an administrator. Anyway, I shall let other people continue with discussion. --Neskaya kanetsv 06:34, 27 December 2008 (UTC)
(after edit conflict) Re: abstention indicating uncertainty or vague misgivings: Yeah, I think that's true, at least for administrator votes. Maybe we should try to encourage editors to be bolder in voting "oppose" if they have misgivings? I'm worried that making abstentions more meaningful might just discourage editors from even abstaining. But on the other hand, the abstaining is kind of useless (and I say this even though I think I do it more frequently than most), so maybe it's O.K. if we lose some of it.
Another approach might be to ignore "abstain" votes as we do currently, but to require the "support" votes to exceed both a certain percentage (say, 70–80% of true votes) and a certain number (say, 10–12 established users, for some to-be-determined value of "established"). If twelve established users vote "support" and no one votes "oppose", then I don't care if fifty people abstain.
RuakhTALK 06:46, 27 December 2008 (UTC)
Heh, edit conflicting. Have to say that is probably one of the first times that's happened from an edit I'm making.
However. Abstains mean something, at some point, because a true abstain is where you see the vote, and then don't even bother clicking on one of the sections. I'm sure we have editors who do that, however it seems that we also have administrators who see the vote and for one reason or another abstain. We have non-administrators who abstain and make comments indicating the reason for the abstaining, and those reasons are usually a vague misgiving or uncertainty as Visviva said. They're not properly abstaining in that they have no opinion whatsoever.
You make another set of valid points, really. I think I might add that several abstains from established editors also means something. What that something is it seems we have yet to decide. --Neskaya kanetsv 07:51, 27 December 2008 (UTC)
As a point of reference, this is what is said at Wikipedia, "After seven days, a bureaucrat will determine if there is consensus to give you admin status. This is sometimes difficult to ascertain, and is not a numerical measurement, but as a general descriptive rule of thumb most of those above ~80% approval pass, most of those below ~70% fail, and the area between is gray." http://en.wikipedia.org/wiki/Wikipedia:ADMIN#Becoming_an_administrator
Neutral or abstain votes are not part of the count. "failed means the candidate received fewer support than oppose votes. consensus not reached means the candidate received at least as many support as oppose votes, but support was deemed by bureaucrats to be insufficient..." http://en.wikipedia.org/wiki/User:NoSeptember/List_of_failed_RfAs_(Chronological) --AZard 14:18, 27 December 2008 (UTC)
As another quick note, we aren't Wikipedia here. If you look through some of the adminship votes, it is not always the bureaucrat who puts the initial decision, although they always have the final say. While it may not be important for abstaining or neutral votes to be taken into account at Wikipedia, I believe that it is far more important for the even vague misgivings to be noted here, because we have such a smaller body of voting editors. Therefore, I do not think that we need to follow in Wikipedia's footsteps for their policy on this, but rather discuss and work out our own. I don't have a terribly great amount of time to work on this right now, and obviously AZard brings up a good point that even a simple majority and a supermajority brought up and put in, there needs to be some flexibility. --Neskaya kanetsv 20:16, 27 December 2008 (UTC)
Abstain means abstain, if the people thought "too new" they would oppose. I think a simple majority suffices, but would be happy for a discretionary vote or two in either direction on the part of the vote-closer (who may or may not be a bureaucrat). Conrad.Irwin 21:06, 27 December 2008 (UTC)
Given all that an administrator can do (a black-hat administrator could probably manage, with some skill and effort, to steal some passwords using JS and XSS), and given that it's supposed to be a one-way trip (it can't be reversed without intervention by a steward), I think "simple majority" is way too low a bar. —RuakhTALK 02:12, 29 December 2008 (UTC)
Agreed. I'd always assumed a supermajority was required. Of course, there have been very few cases where it would have made a difference -- most adminship votes are unanimous or nearly so -- but if 45% of voting editors actively oppose a particular nomination, I don't think it should pass. IMO we should definitely require something more than a simple majority -- whether it be a supermajority, simple majority including abstentions, or minimum number of support votes (or, as I would prefer, all three). -- Visviva 04:44, 29 December 2008 (UTC)

collective noun - template, appendix and category

I've been workng recently with Appendix:Collective nouns, and I want to add the verified ones into Category:English collective nouns, but I'm considering a template (Template:collective noun is the obvious choice) to add before the definition, so {{collective noun}} would give something like (collective noun), adding into Category:English collective nouns. Is this a worthwhile idea, or is there a simpler way? --Jackofclubs 12:25, 28 December 2008 (UTC)

Sounds good, but it might be good to have {{collective noun}} simply stand in for the definition, similar to {{surname}}, so that it would generate something like "[[collective noun|Collective noun]] for a group of [[{{{1}}}|{{{2|{{{1}}}}}}]]." (plus the category) -- Visviva 04:51, 29 December 2008 (UTC)

EncycloPetey gone AWOL: WOTD cycled back

I've just noticed that we've been since the 26th running the 2007 words, and EncycloPetey seems to be AWOL ATM. The word for the 29 was supposed to be luxate according to his planning page, can an admin switch it in? I'm willing to fill in for Dec. 29-31 and will come up with a January list if he hasn't resurfaced by Thursday morning, but should the words for Dec. 26-28 (bicameral, ribbit and graminivorous) be retroactively switched in? Circeus 04:14, 29 December 2008 (UTC)

I've edited the appropriate (I think) page. Yes, we should change the words from 26-28 so there is a record. Nadando 04:41, 29 December 2008 (UTC)
Since those words haven't actually had their 24 hours of fame, maybe they should be added to the January list? -- Visviva 04:53, 29 December 2008 (UTC)
That's what I intended to do if that was people's preference. I'm neutral to either, but I'd rather a decision be taken as I use the WOTD for a completely unrelated feature elsewhere and this is stopping me dead in updating it. Yeah, I know, I'm selfish XD Circeus 04:58, 29 December 2008 (UTC)

Sorry for the absence. I tried to take care of the rest of the month on the 24th, but was beset with repeated power outages in my area. The next few days I was all but bed-ridden with flu, and had not the clarity of thought or presence of mind to attend to WOTD. The redundancy in the WOTD recycled templates was built for just such an eventuality as we just had, or as Connel once put it: "in case EncycloPetey gets hit by a bus." Thanks to Nadando for editing the rest of the month.

As for the words that didn't actually feature: We normally leave in the words that did feature, rather than post-modifying them. This has happened a couple of times before, but just for just one isolated day, rather than three. --EncycloPetey 19:58, 29 December 2008 (UTC)

Category:Catachreses

I have created this lexicographic hierarchy. I suspect that the term isn't known to a lot of people, so I'm advertising it so that people can catch on and populate with appropriate terms and phrases. __meco 10:59, 29 December 2008 (UTC)

I think I found one- scapegoat. Nadando 19:25, 29 December 2008 (UTC)
That category’s definition of catachresis is somewhat different from those offered by Dictionary.com. I wouldn’t call scapegoat catachrestic according to Dictionary.com’s definitions (nor that of the COED [11th Ed.]). Meco, do you care to provide a reference for your definition? (BTW, kudos for creating this category — it is sorely needed.)  (u):Raifʻhār (t):Doremítzwr﴿ 19:36, 29 December 2008 (UTC)
People may also wish to look at the description and examples in the WP article on Catachresis, which has a very different explanation. --EncycloPetey 19:52, 29 December 2008 (UTC)

Catachresis is usually the result of context within a sentence, and not inherent within a word, so there may not be many words to put into this category. The only examples I can think of to place in the category are words like face (of a clock), hand (of a clock), leg (of a table), and similar words. I don't think that scapegoat qualifies as one under any definition I've heard before. --EncycloPetey 19:50, 29 December 2008 (UTC)

We (I at least, anyone else who cares to own up is heartily welcomed) have a problem. It appears that the sense which my blurb is based upon is the primary sense in Norwegian (katakrese), but that the primary (at least) definition in English has a somewhat different focus. This warrants an investigation into whether there exists in English the same definition as in Norwegian (which we would then be only too happy to include as an added sense to the word) or whether there exists another linguistic term, heretofore unknown, which describes this etymological phenomenon in English. __meco 13:01, 30 December 2008 (UTC)
The OED gives only one definition (in brief, "improper use of a word"). I think our definition could use some rounding out, but there is no apparent etymological dimension to catachresis in English. Etymologically, I would find it difficult to distinguish a term that arose through catachresis (which I guess is the Norwegian meaning?) from one that arose through the normal historical shifts and transfers of meaning ... It seems like this would have to be a judgment call, since most uses of most words would have been considered an abusio if you go back far enough. -- Visviva 13:45, 30 December 2008 (UTC)
For a better context in this discussion, let me rephrase here the definition that appears in Category:Catachreses:
  • D1. A term whose etymology reveals that its present use is based on a misunderstanding, often resulting from a confusion of two words similar in appearance, but different in meaning.
The part of the definition starting with "often" seems to be optional per the use of "often". However, what remains after cutting off the part is insufficient to apply the definition to a particular case, to me anyway:
  • D2. A term whose etymology reveals that its present use is based on a misunderstanding.
Contrasting to that is the current WT's definition:
  • D3. The misuse of words; applying a term to something which it should not properly describe.
Google books search suggests that the English "catachresis" is a figure of speech, one which is contrasted to metaphor. Also W:Catachresis classifies catachresis as a figure of speech. Thus, "catachresis" can be mostly decided about a word in a given sentence, not about a word alone.
Sourcing the definition of the Norwegian concept of katakrese would help. I would like to see a source that makes it clear the Norwegian "katakrese" is indeed semantically different from English "catachresis".
--Dan Polansky 21:35, 30 December 2008 (UTC)

regional dialects' categories' names

We have:

I think we should standardize these all to one style. Thoughts? (Has this been discussed before?)—msh210 20:00, 30 December 2008 (UTC)

I personally like the style Category:Austrian German, but am undecided whether exceptions should be made for things like Singlish. Also, this would make things very awkward when the place name and language name coincide (in part or in whole), such as for Northern English English (now at Category:Northern English dialect), English English (now sort of at Category:England and Wales), and French French (now sort of at Category:European French), so perhaps it's not such a good idea.—msh210 20:00, 30 December 2008 (UTC)
I don't believe that uniformity is as desirable as one might ideally expect, as msh210 has pointed out. However, I do agree that we should not use ISO language prefixes for these categories, just as we do not for the master category of each language. The ISO codes are only useful or meaningful for topical categories that can exist in many different languages. For specific dialects, the words exist in only one language, so the ISO code prefix would be redundant with part of the category name. --EncycloPetey 08:47, 31 December 2008 (UTC)

December

濠洲, images of words, copyright implications

Yesterday while browsing through used bookshops I found an old Japanese atlas which contained a map of Australia using the old ateji spelling 濠洲. Thinking fair use I took a photo. Finding the image upload process here trick I went to the IRC channel #wikimedia-commons to ask help and provide feedback. I don't know much Japanese so couldn't find publisher or copyright info on the atlas. It was only a few pages thick and quite old. My best guess is that it was intended for students and dates from between about WWII and the 1960s.

Now the Commons guys think if I include the whole photo it's probably a derivative work and I need to establish that it's not copyright. My photo is just the portion of the map immediately surrounding the recognizable shape of Australia for context.

I've included the full map and two cropped versions provided progressively less context but with progressively fewer potential copyright problems.

Does anybody have any thoughts. Photographic citations of words in context will be a topic for us sooner or later. — hippietrail 03:39, 17 December 2008 (UTC)

In principle, I don't think there's any difference between a photographic citation and a textual citation; both should be fine for us to use. However, if there's enough context to be meaningful, I expect that has to be considered fair use, which means it can't be hosted on Commons, which means we would have to enable image uploading here. I gotta say, I'm not in a great rush to open those floodgates. -- Visviva 04:20, 17 December 2008 (UTC)
Not sure why we need the photo as a citation. Someone who knows Japanese can identify the year of publication, title, publisher, editor, whatever, and create a regular citation for the term. That might not be feasible in this particular case (try getting someone who knows Japanese to go to that bookstore with you!), but I don't think photo citations are something we'll need much. (In fact, having the photo without the bibliographic info doesn't help much as a cite anyway.)—msh210 17:21, 17 December 2008 (UTC)
Indeed, it would be better to photograph the details of the work (publisher, title, authors/editors, publication year, etc) and have it translated by someone that speaks Japanese. I don't think Fair Use covers situations like this. EVula // talk // 16:49, 18 December 2008 (UTC)
Fair use is not in fact sufficient reason to use something in a MediaWiki project. Photographing the publishing details would be as much of a copyright infringement as any other page. Identifying where the publishing details are would itself require someone that reads Japanese, unless you can tell me where this info usually is in old Japanese atlases (-: — hippietrail 03:52, 19 December 2008 (UTC)
Re: "Photographing the publishing details would be as much of a copyright infringement as any other page.": Is that true? I thought copyright was about the expression of ideas; the presentation of factual information in a standard format cannot be copyrighted. (At least, that's how it is in the U.S. We used to have a "sweat of the brow" doctrine, but there was some sort of case involving a phone book where the Supreme Court ruled that there was nothing creative in the phone book's format, and the information in it was not copyrightable. I suppose other countries might have different rules.) Or are you saying that in old Japanese atlases, the publishing details may have been presented in a creative, copyrightable format? —RuakhTALK 19:39, 20 December 2008 (UTC)
To be clear, photographing a copyrighted work is never copyright infringement (despite what some institutions will have you think). But publishing or distributing that photograph may be.
However, temporarily putting a photo on some website so that a Japanese reader can determine whether it is copyrighted sounds like scholarship to me. In my inexpert opinion, this use would be protected by fair dealing or fair use (depending on the laws of your country). Michael Z. 2009-01-08 02:48 z
I support fully the uploading and would like also to remind you that the Japanese copyright is not 70 years, but 50 years after the publishing. Therefore all stuff from WW2 and issued before 1959 is copyright free. Bogorm 12:12, 26 December 2008 (UTC)
I returned to the shop armed with notes on the characters for Japanese era names so I could work out dates. All the dates inside the covers and on the front were between 1946 and 1950. So this atlas appears to now be in the public domain. — hippietrail 01:54, 8 January 2009 (UTC)

hundred and thousand

(Moved from Talk:hundred#classification)

I know it is a tradition to classify hundred as a cardinal number and dozen as a noun, but on what ground is it justified? If you examine them grammatically, you'll find they are alike, while twenty through ninety are true numerals.

ten men / *a ten men / ?tens of men / *a few ten men
twenty men / *a twenty men / *twenties of men / *a few twenty men
*dozen men / a dozen men / dozens of men / a few dozen men
*score men / a score men / scores of men / a few score men
*hundred men / a hundred men / hundreds of men / a few hundred men
*million men / a million men / millions of men / a few million men

What do you think? - TAKASUGI Shinji 14:55, 21 December 2008 (UTC)

Many cardinals also behave as nouns, forming plurals, being the object of prepositions, etc.. That is why we usually show them as cardinals and nouns. However, I see no reason to remove "hundred"'s classification as a cardinal. As for "dozen", any discussion belongs on its talk page or at WT:TR. Discussing general questions about the entries for cardinals would belong at WT:BP. DCDuring TALK 17:23, 21 December 2008 (UTC)

(End of move)

From the comparison above, I'd like to classify hundred and thousand as nouns just like dozen, not as cardinals like ten, which can be indefinite determiners. There must be published linguistic analyses. Do you have any ideas? - TAKASUGI Shinji 00:16, 22 December 2008 (UTC)
There is a problem in your table. The phrase "a score men" is not grammatical; it should be "a score of men". This is one difference which distinguishes a numeral from a collective noun like score.
A numeral (specifically a cardinal numeral) expresses a count and may function as either a noun or adjective in providing the count: "There were ten men." / "Ten were there.'" You cannot do this with dozen: "There were dozen men." / "Dozen were there." Neither of these sentences is grammatical. So, I would tend to agree that neither hundred, nor million, nor thousand is a numeral grammatically, since none of these words functions in such constructions: "There were hundred men." / "Hundred were there". These words are nouns only. However, when used as "a hundred" or "one hundred", etc. these words become part of a compound numeral word. --EncycloPetey 01:07, 22 December 2008 (UTC)
I've done some additional thinking and poking into grammars. It is possible to use hundred, thousand, etc. as numerals in a limited way. Specifically, they seem to work as numerals so long as they are preceded by a determiner, such as an article (definite or indefinite), a numeral, a demonstrative, or an indefinite.
(with the indefinite article) There were a hundred people present. / A hundred were present.
(with the definite article) There were the hundred people present that we had expected. / The hundred were present.
(with a numeral) There were one hundred people present. / One hundred were present.
(with a demonstrative) There were these hundred people present. / These hundred were present.
(with an indefinte) There were some hundred people present. / Some hundred were present.
Based on this, we could call these words both numerals and nouns, with Usage notes included to explain their limited functioning as numerals. --EncycloPetey 06:47, 22 December 2008 (UTC)
Thank you for your reply. I have edited the two articles. Please check hundred#Usage notes and thousand#Usage notes. - TAKASUGI Shinji 03:25, 24 December 2008 (UTC)
dozen cannot take the -th derivational suffix unlike the others. you might also want to make a note about googol being even more nouny than dozen. Ishwar 14:56, 21 January 2009 (UTC)

Page titles for phrases needing referents

I've come across a problem with translations of certain words into Irish. There are certain phrases in Irish which include a referent in the middle of the phrase. Examples are that, where the phrase is an ... sin, and thirteenth, where the phrase is triú ... déag. For the first, removing the referent changes the meaning; for the second, I've simply never seen it happen. Is there a convention currently in place for dealing with this in translations and page titles or can one be created? Many dictionaries use ~ to stand in for the headword in their entries. I wonder if we could use it to stand in for a generic referent. Using generic language-appropriate words, of course, changes the meaning. an ceann sin is that one, instead of that .... Suggestions very much appreciated. —Leftmostcat 21:57, 30 December 2008 (UTC)

I like the idea of using the '~' as it is a character that has fewer other uses than most, both in normal text and on computers. Perhaps you could create an entry at triú ~ déag and see how it goes? Conrad.Irwin 14:46, 1 January 2009 (UTC)
We have thus far kept the mainspace fairly clear of entries containing placeholder symbols (as opposed to placeholder words) like "~" and "..." and "X". IMO this is good because it keeps us from fragmenting content in opaque ways. I think something like "triú ~ déag" would be acceptable if there are no other options, but I would really prefer if we could find another way...
Where "foo bar" is different in meaning from "foo ~ bar", as I guess is the case for an ~ sin, could we have two separate inflection lines within one POS section? We use this approach for some other, very different, situations. Then one inflection line could present the "foo bar" form, and one the "foo ~ bar" form.
Where only the "foo ~ bar" form exists, I don't see why we can't just put the entry at [[foo bar]], and explain the usage through examples and usage notes. -- Visviva 15:04, 1 January 2009 (UTC)
Something about this solution seems off to me, though I'm having difficulty putting it into words. For one thing, this doesn't seem to me to be any less opaque—simply less spread out. It also seems to lump together content which doesn't follow our convention for lumping together. This seems like kind of a weak argument against, but it just seems like a different but no more acceptable solution to the problem. The point of posting was to maybe come to an agreement on a standardized way of separating this content correctly so that the opacity becomes less of a problem. —Leftmostcat 06:59, 2 January 2009 (UTC)
Question: Someone who knows Gaelic, knows the word triú ... déag, and wants to look it up in enwikt will look where? The answer to this question is not necessarily where we should have the entry, but is likely.—msh210 20:52, 1 January 2009 (UTC)
This is an interesting question, and one I don't have a ready answer for. Neither of the solid online dictionaries for Irish right now are completist in the way that Wiktionary is and neither have entries for either "thirteenth" or "that" directly. One does have an entry for "that one", as an ceann sin. I don't think that I'd search for that, personally. That said, I don't think I'd search for either tríú déag or tríú ... déag either. If anything, I would search for tríú and déag separately. This brings up a possible answer, though: it's possible that these phrases can somehow be considered SOP. tríú is third and déag is similar to -teen. an ... sin seems a bit less clear-cut. an here is the, sin something like that. Still, I suppose this can be adequately explained in the entry for sin. That seems to clear up the problem for those two phrases and a number of related phrases. I'm not sure, however, if that means the problem goes away. It still seems feasible that the question could come up again. I just don't know if that means it's worth continuing this discussion.
To me, this leaves another sort of problem. In translation lines, this can be "t-template ... t-template" but this reduces our ability to do any sort of automatic processing on the result. Most automated systems would probably see this as "tríú" and "déag" being provided as translations for "thirteenth". I wonder if we could standardize or even template this sort of situation so that automatic processing is eased. —Leftmostcat 06:59, 2 January 2009 (UTC)

New context tag and category for copulas

At WT:TR#awful two have thought it useful to have a category for verbs that functioned as copulas for at least one of their senses. I thought it might be useful if there were a context tag that applied to the senses that the verb had when functioning as a copula. At least in UK grammar schools, "linking verb" seems to be the common terminology. Should be use "copulative" or "linking verb" as a tag. Either would seem to require a link to Appendix:Glossary or to the entry.

w:Copula (linguistics) and w:List of English copulae are useful background. Does anyone have further thoughts on the subject? DCDuring Holiday Greetings! 19:19, 31 December 2008 (UTC)

Personally I prefer copula / copulae. The term "linking verb" is used in US education, but primarily in the lower grades and without much explanation about what "linking" means. Given the decline in grammar education, either "linking verb" or "copula" would be equally opaque to most Americans. The term copula is more likely to have cognates in other languages, and therefore more useful for our non-English users. The term copula is also more generally useful, since the description "linking verb" refers to a sentence position such verbs do not always take, even in English. For example, the sentence "I want to know what the problem is." places a copula at the end of the sentence. --EncycloPetey 22:55, 31 December 2008 (UTC)

For some words in the WP list (eg, "He acted happy".) there seems to be a distinct meaning for a sense that is "copulative". For others it seems to be an optional part of a few of the meanings (eg, "He arrived famished in New York at 10pm."). DCDuring Holiday Greetings! 23:26, 31 December 2008 (UTC)

It seems to me that the latter situation is limited to adverbial use. So in the sentence "He acted happy," the adjective happy is used adverbially to describe how he acted. The same is true of the other example you've given. So, I'm not sure there is a distinct sense in each case, but there does appear to be a grammatical context difference. --EncycloPetey 11:24, 1 January 2009 (UTC)
I am not sure about many of the 37 verbs that appear on the WP list of English copulae. However, "appear" is one that:
  1. seems to take a wide range of subjective complements and
  2. appear on many lists of copulae.
If one can deem any subjective complement to be an adverb, then there is no point to the exercise. How would one determine when happy#Adjecive is being used as happy#Adverb?

I note that there are many lists of copulae that claim to be "fairly complete", but don't have the same members.

I also am curious about those verbs that have objective complements ("You make me happy to be alive.") and reflexive complements ("He drank himself sober"; "They laughed themselves silly.") Our entries sometimes don't even have senses that I can effortfully construe with an adjective though I know the construction exists. Is there a label for such verb constructions and the verbs that participate in them. At least one grammarian included a few verbs that take objective complements in his list of copulae. DCDuring Holiday Greetings! 12:15, 1 January 2009 (UTC)

Each of these will need to be considered on the merits, which means we will need to give some thought to what sort of tests there can be for copularity. For "appear", I think the distinction is fairly clear -- "he appeared happy" does not mean "he appeared in a happy way" but rather "he appeared (to be) happy". For "arrive", I'm inclined to agree with EP; the verb is not really attributing anything of the subject... "act" is a little trickier, as I would normally say that "he acted happy" means "he acted (in such a way as to seem to be) happy", not "he acted in a happy fashion/while happy". Perhaps that is still copular -- it certainly doesn't seem adverbial -- but it is a little different from words like "seem" and "become".
On a somewhat unrelated note, I am concerned that copula and friends are being made to carry a lot of content that should properly be placed in a grammatical appendix; entries should reflect only how a word is used, regardless of whether that usage is correct or not. -- Visviva 15:45, 1 January 2009 (UTC)
The substantive value of this is to amend or add to our definitions and usage examples to reflect instances of copulative-type use where this is missing or unclear, which is surprisingly often. I thought that the basic test is whether the verb in the usage under consideration can take an adjective as subjective complement. (Nouns seem easier to confuse.) We might want to exclude those where the usage is at present limited to only a small number of adjectival possibilities. But this would possibly make Wiktionary less useful as an aid to reading works written in earlier Modern English. I don't see any particular "bright line" to distinguish copulative verbs from intransitive-verbs-that-take-subjective-complements-that-look-exactly-like-adjectives-but-must-be-adverbs-because-the-verb-isn't-copulative.
I think that in "John arrived hungry" "hungry" is clearly attributed to John and not his arriving. The question might be whether "arrived" plays an essential role in the attribution. This is perhaps the weakest of the purported copulae. It just seems quite arbitrary to deem all the adjectives that can follow "arrive" to be adverbs. Further, consider: "Come hungry." "The workers fell idle at the start, sat idle, remained idle, grew increasingly concerned, and left worried that the company would go broke." "The ingot glowed orange." "Loyal IBMers bleed blue." "Tom tested positive." "He pleaded guilty." "The carrots run small this time of year." DCDuring Holiday Greetings! 16:37, 1 January 2009 (UTC)
As usual, I'm coming to the discussion late, but... The CGEL calls this type of complement a "predicative complement". The distinctive properties of PCs are discussed on pp. 253 to 257. It calls verbs that take these simply "verbs taking predicate complements". These are further subdivided into: complex intransitives with depictive PCs (e.g she felt lonely/an outsider in her own house), complex intransitives with resultative PCs (e.g., He became aware/president), complex transitives with depictive PCs (e.g., I thought it important/a bother), complex transitives with obligatory resultative PCs (e.g., you drive me nuts), and complex transitives with optional resultatives (e.g., bust it open). This is discussed on pp. 263 to 266. The designations copula/copular are limited to certain uses of be. The he arrived happy example would belong to the first group, recognizing that the PC would be optional.--Brett 18:13, 18 February 2009 (UTC)
That helps. They don't even consider become and seem as in any way copulas, eh? I wonder what Quirk says. DCDuring TALK 20:03, 18 February 2009 (UTC)
I don't have access to Quirk, but Greenbaum calls many verbs copular, as does the Longman Grammar of Spoken and Written English, which is based on the analysis of Quirk & Greenbaum.--Brett 13:39, 19 February 2009 (UTC)
Thanks again. "Copulative" applied to either the intransitives with all PCs or just the depictives would add a bit of information without requiring putting too much strain on the mythical normal user. A few (~1-5%?) might have heard of copulas and remember something about them. The GEL analysis is finer, but requires more explanation and some not-so-intuitive vocabulary. I wouldn't mind having some hidden-by-default contexts and categories for finer grammatical analysis. DCDuring TALK 15:27, 19 February 2009 (UTC)

That sounds fine to me, but how far do we take this valency thing. All the way, I say, but then you knew I would. We already list transitive and intransitive, (and ditransitive?) now we're going to mark verbs as copulative. Algrif, and perhaps others, already mark some verbs as "catenative". This is a subset of verbs taking non-finite clausal complements, which in turn is a subset of all verbs taking clausal complements. There are, of course verbs that license locative complements (e.g., put it this side of the desk) and those taking preposition complements. And this is just English. I have no idea what other types of complements are allowed in other languages.--Brett 20:19, 19 February 2009 (UTC)