Wiktionary:Beer parlour/2008/November

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


Replace all etymon templates with proto and etyl

I was wondering what people's thoughts would be about replacing all etymon templates (e.g. {{L.}}, {{AGr.}}) with {{etyl}}. The advantage would be that we would have a consistent, fairly intuitive format for etymology templates (just one set of codes to memorize). Additionally, this would also allow us to make widespread changes in format, or allow users to customize their experience (perhaps we could allow users to link to SIL's site, instead of the 'pedia, or no links at all). Finally, and most importantly in my opinion, this will allow us to sync up our allowed L2 headers with etymon languages. The disadvantage would that we'd be putting a lot of eggs in a single basket. {{etyl}} is under sysop-only protection, which is about as safe as we can make it, but if I turn out to be another Wonderfool, a couple of edits to the template after such a switch is made could severely backlog the server. A couple of caveats: First, {{etyl}} is not yet capable of handling dialects, although this is something I would like to change in the near future (see User_talk:Robert_Ullmann#Standardizing_dialects for a glimpse of my lack of progress on the issue, any bright ideas would be most welcome). Because of this, we would want to keep dialect-specific templates, such as {{LL.}}, {{VL.}}, etc. Also, {{etyl}} can only handle languages with ISO codes, which discludes reconstructed languages, such as Proto-Indo-European, as well as macro-languages, such as Germanic. Proto-langs are no problem, as {{proto}} should be used for them anyway. Btw, I've gone through a bunch of the old-fashioned proto-lang templates, such as {{PIE.}}, switched them to {{proto}}, and nominated them for deletion here, if anyone would like to comment. As for macro-languages, I figure we can leave those for the time-being, at least until a solution is decided. So, if anyone agrees with this, but has some issues with the current implementation of {{etyl}}, they should be noted now. Also, if any of our bot owners would be willing to take up this task, should the community response be positive, that would be appreciated. -Atelaes λάλει ἐμοί 02:13, 11 August 2008 (UTC)[reply]

That's what I've been trying to do with the etymologies I've added or edited, at least if I knew enough to look for the code. (Who knew that Anglo-Norman had a code?) I steer clear of all reconstructed languages. Do all of the Germanic languages used in Webster 1913 etymologies have ISO 639 codes or are they like the various Latins? DCDuring TALK 02:24, 11 August 2008 (UTC)[reply]
I have not come upon any Germanic languages which do not have ISO codes. Could you give an example of one you could not find? -Atelaes λάλει ἐμοί 02:30, 11 August 2008 (UTC)[reply]
It would be Middle and Old (Dutch, Friesian, Saxon) that I'd be concerned with because Webster 1913 uses them. I guess the real scope of my concern is with Websters 1913, so that we don't lose that information needlessly and so that we don't waste time searching for codes that don't exist. DCDuring TALK 03:09, 11 August 2008 (UTC)[reply]
{{dum}}, {{odt}}, {{ofs}}, {{gml}} (Middle Low German is another name for Middle Saxon) {{osx}}. Couldn't find Middle Frisian. And, if we do this, step 1 would certainly be, switch those templates which currently have an ISO counterpart, and worry about the rest later. -Atelaes λάλει ἐμοί 18:21, 11 August 2008 (UTC)[reply]
There isn't a code for Middle Frisian because almost nothing was ever written down in Middle Frisian. Old Frisian continues into the 16th century, then modern Frisian languages appear in the nineteenth. There is almost nothing available for the intervening period. --EncycloPetey 18:57, 13 August 2008 (UTC)[reply]
Sounds good—I use etyl whenever possible. The only problem I remember off-hand is Afrikaans, which I couldn't locate a code for use in laager. For Canadian French, I simply created {{fr-ca}} and used {{etyl|fr-ca}}, in CanadienMichael Z. 2008-08-11 02:36 z
Afrikaans code = afr. Canadian French = fre. "afr" has worked for me. But there are many codes that don't work with etyl, possibly because they are language families (e.g. today pes (Persian) didn't work with etyl). With Afrikaans and South African English you can also find African languages that don't yet work. DCDuring TALK 03:09, 11 August 2008 (UTC)[reply]
I can't find a reference that supports Canadian French = fre. ISO 639-2 has French = fre/fra,[1] and ISO 639-3 doesn't have fre at all.[2]. It looks like the region code fr-ca is necessary. Michael Z. 2008-08-11 06:56 z
Shouldn't that be fr-CA (uppercase country code)? —RuakhTALK 13:29, 11 August 2008 (UTC)[reply]
Yup, fr-CA would be the recommended style and that's how I should have created it. (But, technically, it ought to be case insensitive.) Anyway, as Robert reminds us, it's probably not a good idea to multiply the number of dialects with codes by the number of possible regions. Michael Z. 2008-08-12 15:51 z
We have language code templates for the exact set of languages we use as L2 headers. Creating templates like this for an entirely open ended set of regional dialect (e.g. hundreds of thousands in potential) would be an absolute disaster area. Canadian French is a context label {{Canada|lang=fr}} on definitions. In etymologies, we are always going to need descriptive qualifiers, it is not possible to just add code after code. etyl should be used if and only if the language is accepted as an L2 header and thus is coded. If you simply must use etyl for everything, forcing them into it, then etyl needs a qualifier parameter ({etyl|fr|Canadian}} or some such syntax). (The point made above about sync'ing L2 languages with Ety languages is valid, as long as one keeps in mind that one is syncing L2 languages (with code templates) with a small subset of the huge number of dialects and variants in Etys. Robert Ullmann 14:04, 12 August 2008 (UTC)[reply]
Is there a master list of L2 headers? Can I assume that we can use any valid ISO code, but not introduce regional variants without discussion?
(cutting into the middle of this comment) see WT:CFI, where the list of languages allowed uses ISO 639-(1,3) as the presumptive starting point, with some disallowed (Klingon), and other allowed with some discussion. Robert Ullmann 18:11, 17 August 2008 (UTC)[reply]
There ought to be some mechanism to handle this, for example to preserve the etymological information when my dictionary reference says that the source of a word is “Canadian French.” I will quote you at template talk:etyl#Regional language tags. Thanks. Michael Z. 2008-08-12 15:44 z
You can say "From Canadian French language", putting only the "French" in with a template, but noting Canadian as text. --EncycloPetey 18:59, 13 August 2008 (UTC)[reply]
I tried that first: From Canadian {{etyl|fr}}, in the entry Canadien. It works, but it is lacking in a couple of ways.
  1. The link text is awkward, particularly in this entry: “From Canadian French Canadien (“Canadian”)” Splitting up the single noun “Canadian French” into linked and unlinked text makes it a bit confusing.
  2. The raw text “Canadian” doesn't add any semantic information into the database that is Wiktionary. There is no category to consistently find French Canadian derivations.
 Michael Z. 2008-08-13 22:55 z

I agree with this, I've been using {tempy|etyl}} exclusively where possible in all the ety sections I add or edit. As far as I know, all the language codes that work with {{etyl}} are listed at Wiktionary:Etymology/Templates (WT:ETY/TEMP) and those languages, families, etc that don't are listed at Wiktionary:Languages without ISO codes. Obviously feel free to update either of these. Thryduulf 10:56, 11 August 2008 (UTC)[reply]

I support standardizing on {{etyl}}; and if there are any etymology languages that don't have codes that we don't want to create pseudocodes for, I'd advocate using a name like {{etyl-fake-lang}} rather than {{FL.}} for its etymology template. As for admin vandals — well, that hasn't stopped us from using {{context}}, {{infl}}, {{en-noun}}, {{term}}, {{a}}, and so on. —RuakhTALK 13:29, 11 August 2008 (UTC)[reply]

I do also support this idea, since I reckon all forms of standardization would help on this Wiktionary. It's already hard enough for newbies to contribute, so I will support all ideas that can easy this. Also, I reckon this is a job a bot could help out with? If needed, I may run my bot to change templates. --Eivind (t) 18:11, 11 August 2008 (UTC)[reply]

Here is a list of etyl codes, although it's not complete, it does help finding them for languages like Old Irish or Low German. Nadando 00:22, 12 August 2008 (UTC)[reply]

There are cases, however, where we cannot (currently) use the {temp|{etyl}} template, as there are a few old languages without codes, and there are words whose etymological origin is not known with enough specificity, such as words known to originate in a southern Slavic language or a Mayan language or a Tupí language, but where the specific language of origin ios not known. For these situations, we have broader etymological categories and may have to use the older style templates. --EncycloPetey 19:02, 13 August 2008 (UTC)[reply]
Agreed (see my intro for this thread). But would you support changing all old-style etymon language templates which have etyl counterparts? That way we can see what's left and what needs work. -Atelaes λάλει ἐμοί 19:36, 13 August 2008 (UTC)[reply]
Seeing what wasn't available for use with {{etyl}} so we could work out how we want to deal with them was the reason I created Wiktionary:Languages without ISO codes - see its talk page. Thryduulf 21:35, 13 August 2008 (UTC)[reply]
Right, but lots of those common language families or macrolanguages like Slavic, Germanic.. or even Indo-European (ine) do have their own code which can effectively be used in {etyl} and thus deprecate old-style templates. Wiktionary:Etymology/language templates already contains {etyl}-style alternatives for some of them. I would also support the idea of AutoFormat changing old templates to {etyl} and {proto} in etymologies.. --Ivan Štambuk 19:21, 13 August 2008 (UTC)[reply]
We do run into one problem using those templates though, and this is that it becomes harder for the bots to know when a code represents a valid language and when it doesn't. We ought to tag the macrolanguage templates in some way for the benefit of Robert's bots, or else we'll end up having AF treating "Berber" and "Slavic" as valid language headers. --EncycloPetey 02:31, 16 August 2008 (UTC)[reply]

Relevant to this discussion are three sections on WT:RFDO, WT:RFDO#Template:Haw., WT:RFDO#Template:Icel. and WT:RFDO#Manx.. These are the specific etymology templates for Hawaiian, Icelandic and Manx respectively and have all been orphaned and replaced with {{etyl}}. The Hawaiian template currently has a consensus to delete, the other two have so far not garnered any responses. Thryduulf 21:39, 13 August 2008 (UTC)[reply]

Well, I'm counting six supports (besides myself) and no opposes. Should I go ahead and put the request to AF, or should this be put to a vote? Anyone? -Atelaes λάλει ἐμοί 05:26, 21 August 2008 (UTC)[reply]

I don't see the need for a vote. I'm happy (and apparently nobody else has any massive objections) for either AF or another bot to do the work as described here and at WT:RFDO#Dotted etymology templates. Thryduulf 09:25, 29 August 2008 (UTC)[reply]
While this is an old discussion, I wanted to point peoples’ attention to Connel’s remark at: Wiktionary_talk:Etymology#New_template, namely that the old forms are useful for Webster 1913 import.
Presumably automated conversion would address this issue – any thoughts?
Nils von Barth (nbarth) (talk) 02:44, 28 September 2008 (UTC)[reply]

I'd like to know what the result of this was. The Webster-1913 style templates are retained as redirects (since there is no actual naming conflict...they all have a period in the actual template name.) --Connel MacKenzie 16:07, 13 November 2008 (UTC)[reply]

The result is that AutoFormat is working through converting all the {{F.}} style uses to {{etyl|fr}} ones. It isn't currently doing all of them, just those listed at User:AutoFormat/Ety temps. I believe the intention currently is to add all the others that translate to a single ISO code later.
As far as I am aware there has been no decision yet about what to do regarding templates that do not correspond to an iso code (e.g {{LL.}}). I can't remember what the state of play is regarding dotted templates that correspond to iso codes for things other than a single language (e.g. {{Gael.}}).
I personally don't have a problem with retaining the dotted templates for ease of inputting/importing if others see a value in this. As it's been quite a while since any of these orphaned templates were deleted, I think its fair that any more should be brought to WT:RFDO rather than be deleted under the previous consensus. I know Robert Ullmann has a report detailing the usage of these templates, but I can't find it at the moment. Thryduulf 20:35, 15 November 2008 (UTC)[reply]

Regional context

There are many Portuguese words with spelling variations between two main dialects: Brazilian Portuguese and European Portuguese (e.g., "contato"/"contacto", "registro"/"registo", "elétron"/"eléctrão" etc.). So, following the examples of "metre"/"meter", "parlour"/"parlor" etc., I have been adding the related entries and context templates. However, I didn't see up to this date, any context for conjugated or inflected forms. That is, "liters" should be obviously related to "liter", and therefore mainly used in US, right? The problem is: there are a huge number of Portuguese words with more than one inflected or conjugated variants according to each dialect.

  • An "European man" is an homem europeu; but an "European woman" can be both "mulher européia" (in Brazil) or "mulher europeia" (in Portugal).
  • "To love" means amar; but "we loved" means "nós amamos" (in Brazil) or "nós amámos" (in Portugal).

So, should I add context to these inflected or conjugated forms, as I did for amamos? And that leads me to another question: Should I add context to all forms of a word used mainly in just one dialect, as I did for golos, relating to golo? A section "Alternative spellings" at an entry without context for definitions seems incomplete, and an entry with no qualifier seems incomplete at all. Daniel. 20:24, 4 November 2008 (UTC)[reply]

Ideally, yes, the context you've added is desirable for all forms of words, including inflections. The alternative spellings and qualifiers are wanted on all such entries. --EncycloPetey 20:28, 4 November 2008 (UTC)[reply]
Ok, then I'll continue to use them. And, in this case and based on existing English entries, I suppose that other contexts related to when and where use specific words ({{archaic}}, {{colloquial}}, etc.) are also desirable, so I will use them too. Daniel. 14:40, 9 November 2008 (UTC)[reply]

Wiktionary favicon change

Currently Wiktionary has the same favicon as Wikipedia. It's not the best solution: Wiktionary needs to be recognised as a separate project and gain recognision and current situation may only strengthen false opinion that Wiktionary is a part of Wikipedia. Apart from it, it's simply unconvenient to have browser with many open tabs and have a problem to tell, which come from Wikipedia and which from Wiktionary.

We could use icon , already used in some situations to represent Wiktionary (like on German Wikipedia or on French Wiktionary). There's a general agreement about such change on Polish Wiktionary, but we don't want to do unilateral steps, as we would like to see Wiktionarys having consistent look. I have written also on German Wiktionary beer parlour and if we decide to go for it, we could write a request on Bugzilla for our three projects together. --Derbeth talk 12:18, 7 November 2008 (UTC)[reply]

This has been discussed extensively before. There is no consensus to use the tiles here, although I personally prefer it. My opinion is that Wikipedia should change theirs to the mini globe - but trying to change them would be amusing. Conrad.Irwin 23:42, 7 November 2008 (UTC)[reply]
I would personally prefer the tiles to the existing W to differentiate us from Wikipedia. Thryduulf 14:43, 8 November 2008 (UTC)[reply]
I would prefer that Wikipedia switched their favicon to a little globe (their standard icon), leaving the "W" for Wiktionaries. The tiles are icky. --EncycloPetey 19:32, 8 November 2008 (UTC)[reply]
I like the tile, but also think that WP should change to the globe... -- IrishDragon 02:31, 19 November 2008 (UTC)[reply]

See https://bugzilla.wikimedia.org/show_bug.cgi?id=16315 for the bugzilla request for that. --- Best regards, Melancholie 00:21, 21 November 2008 (UTC)[reply]

User:Conrad.Irwin/creation.js "Form of entries at the touch of a button"

Hello everyone! I had a good idea (tm). The idea behind this bit of javascript is to create "form-of" entries semi-automatically. The basic workflow is that users who have this ticked in WT:PREFS (it's not there yet, because there are some technical details to sort out; it won't be on by default - at least not for many moons) will see green links instead of red links in places where the contents of the non-extant entry can be worked out automatically. They then click on the green link, and click "Save", instead of having to type out a form-of entry. This is intended for use in situations where running a bot is not desirable, but where a little work could be saved anyway. This is ideal for templates like {{en-noun}}, as the creator of a noun page can create its plural in two clicks and no typing. It is less good for words with large inflection sets, but these are probably worth creating a bot for anyway.

Feel free to move this paragraph to WT:GP, I just wanted to group it all together. In order to make this work, I would like to add some markup to the inflection templates that will aid the javascript in working out the format of the entry. Each potentially creatable link will be wrapped in a span with three class names. (These are negotiable, just the ideas I first came up with atm). class="form-of plural-form-of lang-en" (I envisage adding a few more parameters, such as gender). The "form-of" allows a quick lookup of all potentially creatable entries, the "plural-form-of" tells us which form we are to create, and the "lang-en" (although optional for english) tells us which language we are working in. From that it then does a lookup to find an entry-creation template (example for plurals at User:Conrad.Irwin/test) and invokes mw:Extension:AutoEdit to substitute the values of the parameters. I know there are several problems with the code as it is at the moment, and most of them I have commented in there - but if you can see other issues, or a better way of doing this, please let me know. The main reason for this post is to ask permission to add this meta-data to some of our inflection templates. Would anyone mind if I merged {{en-noun/test}} back into {{en-noun}}? Conrad.Irwin 03:22, 8 November 2008 (UTC)[reply]

I'm all for changes that make editing easier and more efficient. Polyglot 11:17, 8 November 2008 (UTC)[reply]

Sounds good to me. Thryduulf 14:37, 8 November 2008 (UTC)[reply]

I have put this into WT:PREFS, simply tick "Make red-links to some form-ofs fill out entries automatically." and visit a page with {{en-noun}} and no plural form. A list of such missing plurals can be found at User:Conrad.Irwin/good plurals. I'll update that list to link to the singulars too :). I've tested this under IE6, Konqueror, Opera, Firefox3. Conrad.Irwin 19:38, 8 November 2008 (UTC)[reply]
Now works for {{en-verb}} too. Conrad.Irwin 01:54, 9 November 2008 (UTC)[reply]
Thanks for this. It appears to work in Firefox 2/Kubuntu in addition to the browsers you've tested in. Thryduulf 04:27, 9 November 2008 (UTC)[reply]
This is great, thanks! -- Visviva 06:46, 9 November 2008 (UTC)[reply]

When the target page exists is there a way this could detect whether it has the template it would add if it didn't exist? For example, (deprecated template usage) skins is an English plural, English third person singular and Dutch plural. Until I added it earlier today, the skins entry did not have a Dutch section, but unlike the diminutive (deprecated template usage) skinnetje I could not use the "accelerated" method. Thryduulf 23:55, 11 November 2008 (UTC)[reply]

That looks like a job for parsing the dump. RJFJR 00:45, 12 November 2008 (UTC)[reply]
In theory it is possible, but it would require a lot of effort, both in programming, and also when running (as it would have to load each potential "form-of" page and then parse it to determine which forms were present). Parsing the XML dump would also be quite tricky, though possibly slightly easier, and should probably be the task for a bot. (If someone else wants to implement it, I'm happy for them to integrate it into the script, but I don't have the inclination to implement it myself). Conrad.Irwin 00:56, 12 November 2008 (UTC)[reply]

Vanity pages

We have quite a few User pages that either describe the person concerned, or link to an external site (e.g. Facebook). In cases where this is the only edit by the user, is it OK to delete the user page (after an interval to allow for slow editing). SemperBlotto 08:36, 8 November 2008 (UTC)[reply]

I have no problem with deleting userpages which link to personal websites from users whose sole edit is that userpage (such as the user Gauss pointed out). However, to take some examples, I have absolutely no problem with Ruakh noting his educational background nor Robert Ullmann noting his history with RFC's. Perhaps more controversially, I don't have a problem with SemperBlotto's and Alifshinobi's links to their personal webpages, nor User:ArielGlenn/Personal, nor Connel putting his personal email on his userpage. While I adamantly support our editors' rights to anonymity, those who choose to disclose their real names and whatever else could be considered to be doing Wiktionary a service in providing relevant information. People might rightfully wonder who is writing the dictionary they're reading. The key difference between the first user and the latter group is that every editor I've noted by name is an important editor here, with a significant contribution to this project. I think that if we simply said that users without substantial contributions were prohibited from links to personal webpages, I think that our admins generally have the good sense to make judgment calls on this sort of thing. If we wanted a more concrete rule, we could perhaps set a requirement of 100 edits to the main namespace which do not get reverted. I rather doubt that folks looking to use us as a Myspace would go to the trouble of fulfilling that requirement. -Atelaes λάλει ἐμοί 08:59, 8 November 2008 (UTC)[reply]
OK - If someone knows how to generate a list of such pages (let's say zero other edits apart from User and initial User talk page, and over a month old (as a start)) then I'll see about pruning them. SemperBlotto 11:30, 8 November 2008 (UTC)[reply]
In this case I concur completely with User:Atelaes that Wiktionary should not imitate w:Myspace - for many reasons, one of which is that the latter is a regional network from Septentrional America and Wiktionary is meant to have a world-wide audience. Therefore I support the proposal about 100 edits in the main (and Citations ? Why not?) space or a bit more. Bogorm 13:35, 8 November 2008 (UTC)[reply]
I'd say about 100 constructive edits to the content (main, citations, rhymes, Appendix, Wikiksaurus, concordance*, transwiki*) or Wikitonary namespaces (or their talks) would be a useful guide. I wouldn't be completely rigid on it though, for example if a user starts with their userpage and then steadily (but not necessarily quickly) makes good edits then I would let it stand. I'm not completely certain about those namespaces I've asterisked though.
I think that all users (except obvious vandals) should be given a month's grace, and after that time I think it should be easy to classify them into three groups that we should handle differently:
  1. Good contributors - those who've reached the 100 edit mark or look set to shortly. These users should be afforded the leeway given to the established users namechecked above.
  2. None contributors - those who've clearly displayed a lack of interest in significant contribution. The proposal by SB should be applied to these users.
  3. Others - those who don't fit into either category. An individual approach is probably best with these users, maybe giving them more time to see if it becomes clearer or talk to them. Thryduulf 14:35, 8 November 2008 (UTC)[reply]
While I can follow and agree with the reasoning given, there is one point about which I'm hesitant. I've seen some situations where are editor goes a month or more between bouts of editing. In some of these cases, it is useful to attempt to contact the editor regarding some bit of information to be verified. For some of these editors, the best way to resolve the situation is to have an alternative means to contact them, which these outside links do provide. While this situation doesn't happen all that often, it has come up on at least three occasions for me, and there are others where it would have been really nice to have a user page with contact information. So, I'm not sure I would set the number of edits as high as 100 in cases where the editor has added new entries. --EncycloPetey 19:31, 8 November 2008 (UTC)[reply]
If they haven't contributed at all, they should be deleted. If they've contributed, even only a bit, let them keep some reward. Conrad.Irwin 19:57, 8 November 2008 (UTC)[reply]
I agree, except where the userpage is excessive.—msh210 08:11, 14 November 2008 (UTC)[reply]
I haven't had any particular issue noting whether or not a page is essentially SPAM or someone contributing. Seems rather obvious most of the time. If you want to count contributions, either count globally, or look at their "home" project; they may be very active elsewhere, and simply have a copy of their user page here.
Rather than deleting the page, just add the wikimagicword __NOINDEX__ to it; Google and friends will ignore it, and not count links from it. Robert Ullmann 10:49, 9 November 2008 (UTC)[reply]
I have made a start by creating {{vanitypage}} and added it to User:Magwizshiz. It really needs to add the page to some sort of category so that we can easily keep track of them. (p.s. That list of pages would be nice) SemperBlotto 13:02, 12 November 2008 (UTC)[reply]
I suggest giving users fair warning before deleting any User pages. Maybe give the warning in their discussion pages? If someone deleted my user page without warning, I would be offended. --AZard 02:50, 22 November 2008 (UTC)[reply]

Separate derived terms from derived phrases?

I'd like to propose separating derived terms (i.e. terms that vary the spelling of the original words) from derived phrases (i.e. set phrases and/or idioms that include the original word. By way of example, the current listing of derived terms in beauty is:

Everything up to beautify is a different word, a word that takes part of beauty and adds a different suffix or suffixes to it. Everything after that is a phrase, "beauty X" or "X beauty". I think these should get different treatment in definitions because the process of forming a true derivation is so different from the process of making a phrase. bd2412 T 09:46, 8 November 2008 (UTC)[reply]

I'd keep them together. Separating the lion cub from the lioness would be too cruel. And, very often, both spellings exist, with and without space. There is not so much of a difference... However, complete sentences (beauty is in the eye of the beholder, beauty is only skin deep) are a different case, and should be elsewhere, in my opinion. Lmaltier 09:53, 8 November 2008 (UTC
What is the proposal?
  1. to add a new "Derived phrases" header at the same level as "Derived terms"?
  2. to add a new "Derived phrases" subhead below "Derived terms"?
  3. to group the terms under existing headers, possibly using {{rel-top}}?
The last costs the least amount of vertical screen space and requires no Vote AFAICT. There are sometimes principles other than what you mention that afford useful bases for grouping related and derived terms, many of which could be (are now!!!) accommodated under option 3. Why isn't option 3 sufficient? DCDuring TALK 10:45, 8 November 2008 (UTC)[reply]
There was a time when I would have agreed with this proposal, but since that time I've seen too many odd problems and cases like what Lmaltier has noted. Look at the derived terms (first section) for (deprecated template usage) time, where a single term may be variously spelling with a space, without a space, or with a hyphen. This proposal would put timescale and time scale on separate lists, which makes no sense lexically. --EncycloPetey 19:26, 8 November 2008 (UTC)[reply]
A solution to the requirement of putting timescale and time scale on one list is to define the second list as those terms that contain at least one additional stem. Thus, both timescale and time scale end up in the second list, while timely and timeless in the first list. This rule still separates lion cub from lioness, though. A good heading title for the second list is unclear to me; what about "Compound terms"? (If the headword is already a compound term, the heading is inexact a bit.). --Dan Polansky 21:54, 8 November 2008 (UTC)[reply]
(after edit conflict; mostly a dup of Dan's comment) Is it possible to separate {words derived using affixes and whatnot} from {phrases and compounds that come from other words as well as this one}? I think that should give "timely", "timeliness", "untimely", "betimes", "timing", etc. pride of place, while appropriately demoting "time[ ]scale", "lunch[-]time", etc. —RuakhTALK 21:56, 8 November 2008 (UTC)[reply]
But that's only a partial solution, because it doesn't consider all the other ways in which derivations happen, such as shortening by means of abbreviations, contractions, elision, etc. Please look at the whole gamut of possibilities. --EncycloPetey 22:03, 8 November 2008 (UTC)[reply]
So a different take: group A: all derived terms; group B: group A minus group C; group C: all the terms that can be obtained by appending or prepending a word or more words to the headword, regardless whether the separation sign between the headword and the newly affixed word or words is (i) absent, (ii) hyphen, or (iii) space. I do admit that I am unaware of the varienty of spectrum of derivation possibilities, so this may possibly be a rather naive proposal. Still, AFAICS one way of derivation is assigned to the group C, and all the rest to the group B. --Dan Polansky 22:27, 8 November 2008 (UTC)[reply]
But you've only modified your proposal to accomodate the one specific problem I metioned without looking for any others. Here is another situation your proposal does not deal with: how Related terms would be affected. Consider replacement of one affix with another. On the entry for (deprecated template usage) timely, where would (deprecated template usage) timeless go related to (deprecated template usage) timeliness. And would (deprecated template usage) time be in a separate list of its own, since it would be the only entry related strictly be removal of a suffix? I used this Related terms example mostly because I can't offhand think of a Derived terms example, but I know this situation exists for Derived terms as well. And this isn't the only additional problem. Before making a sweeping proposal for formatting Derived terms, I'd want to know that the proposed solution has been thought through first, which your proposal has not. Otherwise, we end up having to make many further revisions to work already done. --EncycloPetey 00:39, 9 November 2008 (UTC)[reply]
Re: "But you've only modified your proposal to accomodate the one specific problem I metioned without looking for any others." I am afraid that is correct. I do not understand the problem with related terms that you have just mentioned, but I guess I should better stay out of the discussion at this point, as the knowledge that I have of the topic of derivation of words is too limited to allow me to show that all kinds of not-yet-mentioned problems that could possibly arise have been considered. --Dan Polansky 10:46, 9 November 2008 (UTC)[reply]
Let me add: If your objection was that this proposal artificially creates a dedicated heading for one class of derivations while leaving all the other classes of derivation unmentioned, assigning them the implicit class "miscellaneous", then my answer was off the track, and I do not have any reply to this objection. Just like some other people, my experience is that the class labeled by me as C typically gets much longer than the class B, the miscellaneous derivations, so its separation could be worth it. --Dan Polansky 22:46, 8 November 2008 (UTC)[reply]
How about creating two lists: one for the longer "sentence-like" phrases, and another for the rest. It's easy to find what we are looking for if they are sorted alphabetically. --Panda10 22:56, 8 November 2008 (UTC)[reply]
Well, certainly it is a stretch to say that beauty is in the eye of the beholder and beauty is only skin deep are "derived terms". I would agree with an additional level of subheaders below a "Derivations", with separate between (a) true derivations, (b) derived phrases including compound words which may or may not use a space, and (c) idioms which necessarily include the headword. bd2412 T 04:06, 9 November 2008 (UTC)[reply]
I think this conversation would be helped by someone experimenting slightly with some entries that have many "Derived/Related Terms". New subheadings and stuffing previously-visible terms in hidden boxes would be too much, but be bold otherwise. Tinker with table column headers or create separate (vertically stacked) {{top2}} tables for the different proposed groupings. --Bequw¢τ 10:21, 10 November 2008 (UTC)[reply]
Keep it simple! Lmaltier 21:11, 14 November 2008 (UTC)[reply]

Alphagrams

There are several words I've seen recently with alphagrams, correctly placed in the anagrams section. As there is no standard presentation, however, they are shown in several different ways.

I've just created {{alphagram}} to try and resolve this, the formatting is simple, e.g. for (deprecated template usage) word:

* {{alphagram|dorw}} 

gives

The ELE says that alphagrams should not be linked, unless it is also a word, e.g. the alphagram of (deprecated template usage) tar is "art". In this case just wikilink the parameter:

* {{alphagram|[[art]]}} 

gives

Does anyone object to using/mandating this formatting?

Also, I think that for consistency, the alphagram should always be placed at the end of the anagrams section. If the template is used, this should be a trivial task for AF. Does anyone have any comments? Thryduulf 15:30, 8 November 2008 (UTC)[reply]

I like everything, but think the alphagram should be placed first;. It will, after all, always be first alphabetically by its very nature, and in some cases will be a word itself. --EncycloPetey 19:21, 8 November 2008 (UTC)[reply]
I wondered about that, but thought that real words should be given preference - the alphagram is only a real word in a tiny minority of cases. What do others think. Thryduulf 22:44, 8 November 2008 (UTC)[reply]
If the method is place it at the end, then you have the case where it is a word, and placed first, and when it isn't, it is at the end. Should just be first. (however, the whole thing seems rather pointless to me) Robert Ullmann 10:42, 9 November 2008 (UTC)[reply]
i like the alphagram placed first since that was the format in the ELE. either way is fine. confession: i changed the alphagram example to the template format before i realized that a vote was needed to change the ELE. sorry about that. --AZard 04:25, 13 November 2008 (UTC)[reply]
what is AF? --AZard 15:18, 14 November 2008 (UTC) (signing after the fact.)[reply]
User:AutoFormat.—msh210 08:04, 14 November 2008 (UTC)[reply]
so, after a decision is made on the location of the alphagrams, an AF bot will make the changes and avoid human manual effort. am i understanding correctly? --AZard 15:18, 14 November 2008 (UTC)[reply]

When Editing Pages

When editing pages, we should be able to go the another language's analogue. As well as discussion pages and history pages. This goes for all wiki projects (wikimedia/wiki media projects).96.53.149.117 22:52, 8 November 2008 (UTC)[reply]

I don't understand what you're saying. What situation prompted this comment? --EncycloPetey 23:01, 8 November 2008 (UTC)[reply]
My guess is that you are asking for interwiki links to be displayed in the sidebar when editing a page as well as when viewing it (for example if you are editing elastic then you should see the interwikis to fr:elastic, de:elastic, pl:elastic, etc.)? If this is the case, then there is nothing we can do about this here, and you will need to make a feature request at https://bugzilla.wikimedia.org/. If you want your request to have any chance of being acted on, then I recommend that you explain clearly what it is you want and why you want it (if English isn't your first language, then I suggest asking someone to translate it for you). Even this though will be no guarantee that it will get done, as what is aced upon and what isn't is seemingly enirely down to the developers' whim. Thryduulf 04:02, 9 November 2008 (UTC)[reply]

{{nrm}} (a language template) is currently set as "Norman." However, nrm is the SIL code for Narom. Now, we have a Norman Wikipedia (with the prefix nrm, note). So, I'm assuming that the Wikimedia language council designated nrm as the code for Norman, as it did/does not have an official such code. My assumption is that we would give SIL's codes precedence over WM's internal code-set, and so {{nrm}} should be changed to Narom, and Norman be made an orphan language (i.e. not having a code). We do have {{xno}}, but that's not the same thing. Thoughts? -Atelaes λάλει ἐμοί 06:55, 9 November 2008 (UTC)[reply]

The "language committee" is screwed up. I don't know why. They set up Swiss-Deutsch as als, which is Tosk Albanian, instead of gsw. They (and in this case specifically Gerard) refused a Jerrias request, saying it didn't have any ISO code, allowed Aromaniain as roa-rup in spite of having a perfectly good ISO code rup (which they must have known, why else would they come up with "rup" as a code?!) And now create Norman as nrm, as noted an allocated code; instead of roa-nrm which would make sense. I don't follow this at all; short of assuming actual brain-damage, what can be going on?
We should use nrm=Narom, and ignore them. If they create a wikt, it would get a little more complicated, but we already compensate for a few other idiocies. (yue->zh-yue, etc) If there is a wikt, and we have Narom entries, it will take a little care. In the mean time, anyone want to ask WMF WTF? Robert Ullmann 10:35, 9 November 2008 (UTC)[reply]
I agree. The Wikipedia article for Norman language used to indicate nrm as the ISO code, until I pointed out that this was not, in fact, the case. I don't know where the idea came from but we should definitely not be perpetuating it. (I just wish SIL would actually assign Norman a code.) Ƿidsiþ 10:41, 9 November 2008 (UTC)[reply]
I'm afraid that fr.wiktionary is perpetuating nrm as the code for Norman. Something should be done. For roa-rup, I think I can answer: at the time, rup was proposed as the ISO code, but was not an official ISO code yet. This is why they chose roa-rup. Lmaltier 09:13, 10 November 2008 (UTC)[reply]
Similarly, als.wp was NEVER "Swiss German", at first it was Alsatian, and was later enlarged to all Alemannic dialects. However, the dialects in questions are currently spread across more than one code by ISO-639-3 (gsw is Swiss German, wae is Walser German and swg is Swabian, it's not clear if gsw alone can represent them as a whole). A Swabian Wikipedia requests exist, but is not on the requests page, fpor some reason. Circeus 22:34, 15 November 2008 (UTC)[reply]

Redundant links in etymology and headword

Empire State Building, New Brunswick, purple gas and many other compound terms often have their components linked in either or both the etymology (Empire State + building) and the declension line's headword (Empire State Building).

I suggest that linking only the etymology is preferred:

  1. Redundant links reduce the clarity of a web page, so it is better to choose one preferred linking site
  2. The headword in the declension line is the heart of the entry, and should draw the eye by remaining black (it's the reader's destination, not a jumping-off point)
  3. The nature of simple and compound links is clearer when they are separated by unlinked punctuation (+), which is done in the etymology
  4. Link targets are clearer when they can be linked without pipes, which is done in the etymology, e.g., lowercase “building” in the above example
  5. Affixes, etc, can be linked in the etymology, e.g., “New Brunswick + -er” (in New Brunswicker)

Linking components of the headword is a poor substitute for even the most rudimentary etymology.

Whatever we decide, it should be incorporated into WT:ELE and WT:LINKSMichael Z. 2008-11-09 18:20 z

Both. If links don't cost much in performance terms, then why not have them both? In the case of an entry like [[Empire State]] [[building|Building]], I would not think that we gain much from having a separate "etymology", but certainly wouldn't object and might add it.
  1. Redundancy. Users may have the cursor or their attention near on or the other. Users simply may develop the habit of not looking at Etymology (or alt spellings, or pronunciation, or translations, or, even, definitions) because the section doesn't meet their ne eds. I don't personally experience the lack of clarity, but would welcome evidence or authority on the question.
  2. Bold, black heart of entry; destination vs. jumping-off place. After a user has landed on an entry and confirmed that it is right place for which purpose bold must help, the entry stands on its own. I would argue that much bold is then irrelevant or distracting. But then, as the user generates further questions, it becomes a jumping-off place for further answers. Then the bluelinks or redlinks provide good information and proximity to attention or cursor helps.
  3. Clarity of links. True.
  4. Pipes bad. We aren't going to be forbidding them. They just take a few keystrokes. What harm? Etymologies have a problem too, on occasion. The ability to link to lemmas in etymologies is sometimes only achieved by having two terms, one for the headword, another for the stem. (eg record < Template:.... < cord-, stem of heart DCDuring TALK 19:39, 9 November 2008 (UTC)[reply]
  5. Affixes. I would put in an Etymology every time if the affix is at all interesting. But in cases where it is just the common senses of -s, -es, -ed, -ing, -er, we don't miss much by having them "lost" in pipes, afaict.
Both. As a general rule I agree with you that it's a bad idea to have redundant links, but that is somewhat mitigated by our page naming scheme and linking style: a link that says "book" is a link to our entry for (deprecated template usage) book, etc. (I think the big problem with redundant links on other Web sites is that it's not obvious that they're redundant.) It's also mitigated by our use of distinct colors for visited and unvisited links (which all sites should have, but many don't). I realize that "somewhat mitigated" isn't a strong argument for something, but I think it's good to be somewhat consistent. We basically always linkify components of multiword headwords, and I like that we do. —RuakhTALK 21:00, 9 November 2008 (UTC)[reply]
1. “...why not have them both?”—why not link every word? When an interface does everything, any added redundancy makes it a bit worse. “Choice is good” leads to cluttered interfaces which overwhelm and confuse the reader (e.g.)—take that too far, and you get Microsoft Word (good functionality, but how many users feel they comfortably understand its interface?). Avoiding redundancy is a good design principal in general, and also specifically in web page/software interface design.
4. I didn't necessarily mean that pipes are bad for editors, but that they make the link target less clear to the reader, especially when e.g. building and Building are absolutely different. The etymology already clarifies the roots overtly, so why not rely on those links going to exactly the same terms?
New Brunswicker derives directly from New Brunswick + -er, Empire State Building from Empire State + building: the etymological expression makes that clear, and you know exactly what the links lead to. So why would we link New, Brunswicker, and Empire in another context? Is this an alternate etymology? If the reader has developed the habit of not looking at the etymology, do we consider this an adequate substitute?
What exactly is the function of links in the headword? If we insist that it is a standard element of the entry, then we should be able to clearly define it. Michael Z. 2008-11-10 03:26 z
The function is to provide a convenient way for a user to link to components of the headword (if they exist) that is near the definitions (which I assume to be a focal point for users) whether or not there is a proper etymology header. If there is no meaningful etymology other than the components, then it enables about a three-line reduction in the vertical space taken by the etymology in the precious screen real estate above the definitions, which space may make more usable information visible on the initial screen for the entry.
The "New Brunswicker" instance is the one that I have no good way to handle with bluelinks. The link to "Brunswicker" in the inflection line seem undesirable. DCDuring TALK 03:48, 10 November 2008 (UTC)[reply]
Are you saying that the etymology section should be left out for most compound words? If so, then that should be clearly spelled out in WT:ELE, but I think a clear etymology would be preferable, is is likely to be added eventually for the sake of other details like attestation date, etc.
Anyway, this aspect of the interface needs some focus. I'd rather see links preferred in the etymology, and only present in the headword as a temporary expedient if the etymology isn't present. Michael Z. 2008-11-10 06:58 z
I agree with this. IMO headword links are at best a necessary evil; they are distracting and opaque to most users (particularly since only a small fraction of entries will have them). As I see it, headword links should be used only a) when there is no Etymology (or plausible basis for one), and perhaps also b) when it is constructive to link the constituents in a way other than is done in the etymology. -- Visviva 09:24, 10 November 2008 (UTC)[reply]
Just the headword, in general. However there may be some value in including an etymology section with entries like Empire State Building to make clear that it's {Empire State} Building and not Empire {State Building}. Ƿidsiþ 07:37, 10 November 2008 (UTC)[reply]
Both - In addition to the redundancy points already raised, I'll note that in languages with inflections, the etymology section typically links to the lemma form of the etymological origin, while the inflection line links to the specific forms used to construct the word. I suspect something similar happens in some Asian languages. Additionally, the etymology often includes additional text or other information. Where it doesn't (yet), it probably should have that information added. The inflection line is much cleaner visually, so it is easier to see and follow component links. --EncycloPetey 04:10, 12 November 2008 (UTC)[reply]
Can you link some examples? Michael Z. 2008-11-12 06:31 z
Multi-"word" entries are inority to begin with, so the only example I can find at the moment is homo nulli coloris, which doesn't have the best formatting to begin with, but which may serve as an example. --EncycloPetey 05:42, 13 November 2008 (UTC)[reply]
Yikes, this is a good example (of bad page usability). Redundant links are best avoided, because a reader can click a link, go back, then click a different link and momentarily wonder why he's not at a different place. Piped links can make this worse, because the same text linking to different pages, or different text linking to the same page, can be downright confusing. This example has it all, including links to nullus, nulli, and “nulli” > nullus.
I'd still suggest that the best way to avoid this situation is to discourage inflection-line links when there is an etymology (where one can write inflections or lemmas, and so there is less temptation to pipe links). A less desirable alternative might be to discourage pipes except where they avoid red links.
Page editing happens in a piecemeal fashion, so it is easy for situations like this to arise. We need simple guidelines to help prevent this. Michael Z. 2008-11-13 17:50 z
But de-linking the inflection line hurts all the inflected forms, which usually do not have an etymology section, and even for the singular the etymology is often do trivial that it isn't added: see number line and number lines for examples of this. I'm not sure whether simple guidelines can be drawn up that will cover more than just English. The needs of various languages are so disparate. --EncycloPetey 18:46, 13 November 2008 (UTC)[reply]
But the inflected forms have a prominent link to the lemma, which is where the actual etymology lives. Linking the inflection-line components here actually distracts the reader from this and other detailed information in the lemma entry. Instead of a cogently-written and relevant etymology, the reader is being presented with more opportunities for random dictionary surfing. (E.g. number lines has exactly one valuable link to the lemma with full definition, and potentially an etymology—clicking number or, take your pick based on an editor's whim of, line or lines here skips the useful information.)
Apart from this, I'm glad to recommend linking the inflection-line components in entries lacking an etymology. However, I think in every case an etymology which is written by an editor will be superior to an “etymology” composed of links added by default.
The homo nulli coloris example also shows how the writer of an etymology tends to explicitly name the relevant forms, and routinely link them. In this case the inflection-line links are not only completely redundant, but also less clear; they simply detract from the entry's quality. Michael Z. 2008-11-13 19:09 z
You're entitled to your opinion, but at least three of us have explicitly disagreed with you on that point. It's all very well to claim that it's simply distracting, but that's merely an unsupported opinion. I find such circumventing links positively helpful for some situations, for exactly the same resaons that you dislike them. Ultimately then, this comes down merely to preference. --EncycloPetey 19:31, 13 November 2008 (UTC)[reply]
That we disagree doesn't mean that both options are equally good, and it certainly doesn't mean that we should just keep doing it both ways in random entries, instead of agreeing on some rationale for this.
What it does mean that the form of the central element of every Wiktionary entry remains unresolved as a part of the page design. Michael Z. 2008-11-14 19:33 z
That is why I pointed out the the majority opinion disagrees with you. --EncycloPetey 19:43, 14 November 2008 (UTC)[reply]
We make decisions by consensus, no? I pointed out a number of concrete strengths of linking in the etymology. Several counter arguments seem to be along the line of “I agree that redundant links and pipes can be bad, but I like these links, and in this case they might be good.” I don't see a cogent argument in favour of this alternative, or a well-stated rationale for it, so I'd like to continue the discussion until we can agree on something. I'll try to examine the various examples systematically to help formulate a realistic picture of the pros and cons of each. Michael Z. 2008-11-14 20:49 z
Here is an example of a Latin proverb entry, where the links are positively better done in the inflection line than attempting to do so in the Etymology section: tantum religio potuit suadere malorum --EncycloPetey 02:14, 14 November 2008 (UTC)[reply]
This “etymology” is only an English translation of the Latin proverb, and it is somewhat redundant with both the translation in the definition, and the identical gloss in the quotation. A more comprehensive etymology would source and gloss each word, making the inflection-line links constitute still more redundancy in this entry. Michael Z. 2008-11-14 19:33 z
Look again, the etymology is giving the literal meaning, and because this is the English Wiktionary, that is most easily done with English. Explaining the discrete meaning and grammar of each word is not the etymology of a proverb; the etymology is the context and literary origin. Also, you did not read carefully, as the translation of the quotation is not the same as the literal translation in the etymology. A literal translation is not necessarily appropriate for translating passages of text. Please re-examine the entry. --EncycloPetey 19:42, 14 November 2008 (UTC)[reply]
But wouldn't an etymology of a foreign proverb at least have to explain the component terms? Does this apply to other phrases or expressions? Are there any guidelines or external references about how to formulate etymologies for proverbs? Not sure about proverbs, but in such a case the inflection-line links would seem to simply emphasize a sum-of-parts picture of a phrase, rather than a technical etymology. Michael Z. 2008-11-14 20:49 z
Why not show me how you would write the etymology for let the cat out of the bag? This isn't a proverb, so the proverb issue won't confound the question, and it currently has no etymology section to bias the discussion. --EncycloPetey 21:01, 14 November 2008 (UTC)[reply]
Not sure that would make sense for such a clear phrase or not, so I made an initial attempt for die Katze aus dem Sack lassen (compare). I wouldn't exactly call it elegant, but with the glosses it is significantly clearer than clicking on each inflection-line link and then trying to locate the correct sense. Probably best demonstrated with a language one can't read at all (while the German is fairly obvious to me on its own). Michael Z. 2008-11-14 21:51 z
Where would an explication of the idiom go? (Not that we consistently have such explications.) DCDuring TALK 21:59, 14 November 2008 (UTC)[reply]
Mzajac, the etymology you've set up tells the meanings of the individual words, but doesn't give any etymology for the idiom. --EncycloPetey 22:24, 14 November 2008 (UTC)[reply]
Well, then I don't know the etymology of this idiomatic phrase, and I'm not familiar with how this is done. But isn't this a better way to explain the components, and wouldn't it be a useful supplement to an overall etymology of the phrase? Michael Z. 2008-11-14 23:12 z
Now you're beginning to see: The meanings of the components words isn't actually etymologial information in many cases, but is instead supplementary information. There is no compelling reason to put such non-etymological information in the Etymology section. This is one of the unstated reasons why I don't like trying to shoehorn this information into the etymology section. Yes, the etymology section has room to explain grammar, translations, and to list both inflected and lemma forms, but this makes it visually hard to follow. Links from the inflection line directly to lemmata cut through all that. Additionally, explaining the component terms as a substitute for the actual etymology may discourage editors from adding the Etymology, since they will see that a complicated section of text already exists. --EncycloPetey 23:19, 14 November 2008 (UTC)[reply]
I am starting to better understand etymologies of longer expressions.
But the notion that some useful information should be left out of the etymology, because it will discourage editors from improving it, seems highly speculative. If an editor doesn't have a better notion of the expression's etymology, then he won't add it, regardless of what's there.
And the nature of the wiki is such that it's all going to be added sooner or later, next week, or next year, or long after we're gone. Isn't it better to work out how to add this information gracefully, rather than ignore the possibility? Michael Z. 2008-11-15 21:38 z

The citations pages are stealing all of my examples!

So, I know this conversation's been had before, but I can't for the life of me find it. If I missed the conversation where we came to a definite conclusion on this topic, please direct me to it, and I'll shut up. Otherwise, here goes: A number of editors are taking citations and dumping them all on the citations pages of entries. I am very much opposed to this, and I know that a number of others share this view. I would like to change our policy, something to the effect of:

"In addition to providing evidence of usage and existence through time, quotations also provide the ideal example sentences. In general, the ideal state is that each sense of a word be followed by a quotation which illustrates the sense in question. Additional quotations should be placed on the citations page, in order to maintain a focused entry. Note that quotations used as example sentences can be duplicated in the citations page. Pages which are very simple and would not be bogged down by additional quotes (e.g. only a single sense, few or no translations, derived terms, etc.) may have up to five citations listed in the entry itself. This is especially true of words which are rare, archaic, new, or otherwise can benefit from their existence being proven by quotations."

Additionally, I think the example given (mauve) should be made to conform to this. Of course it goes without saying that I'm not set on the details of the above proposal, but would like to see something that retains some quotes within the entries (but then, why did I say it?). Additionally, there's an interesting little convo on the talk page of WT:CITE concerning inflected forms that interested parties might care to peruse (whichever sense of the word works for you, the hypercorrect one, or the other :-)) -Atelaes λάλει ἐμοί 09:42, 10 November 2008 (UTC)[reply]

I am in general agreement with this. I think there is a useful distinction to be made between illustrative citations and probative citations. I expect just about everyone would agree that illustrative citations -- those which are chosen as representative of actual usage in context -- should be kept on the entry page, unless perhaps they are being replaced by even better citations. On the other hand, probative citations -- those needed to prove that an entry meets CFI, or perhaps to test some hypothesis about a word's history and range of use -- are a bit trickier, particularly if the cites in question are long and messy and add little new information. Personally I would still prefer to keep probative cites in the entry unless they are really obstructive (too long, too numerous), for the simple reason that our definitions of senses are always in flux, and Citations pages for non-monosemous words will inevitably get out of sync with their entries over time. -- Visviva 10:35, 10 November 2008 (UTC)[reply]
I find that adding too many quotes to the entry pages makes it harder to read, as there is that much more irrelevant text. I don't mind a quote per definition, to illustrate usage, but any more than that should be moved the the cites page where those who are interested can find it. If people aren't looking there, then maybe we should reword {{seeCites}}. Conrad.Irwin 10:55, 10 November 2008 (UTC)[reply]
We do still want to serve ordinary users, don't we? From that perspectve, it seems to me that the best probative citations are those that are also good usage examples. Unfortunately most probative citations are not especially good as usage examples. However, all but the very worst (eg, some from Usenet} are better that no usage example at all. Often, the at-least-two-lines-long nature of citations (in contrast to the one-line usage examples) pushes important content off the initial screen. Also, citations often miss specific problems that users have that can be better addressed with constructed examples. Perhaps what we need is a link format that goes directly to sense-specific citations-page sections from the sense. (I know. I know. Synchronisation issues. No technology magic for that?) DCDuring TALK 13:19, 10 November 2008 (UTC)[reply]
I would like too to emphasise the distinction between usage examples from ordinary sources in contrast to literary and thence more long citations (especially when stemming from poetic works). That is why I advocate the preservation of the concise usage examples in the main entry and the place for the citations on the appropriate page, as in mauve (although in this entry there are no ordinary non-literary usage examples). I find the structure given (mauve) exemplary and support the current Wiktionary:Citations policy. Bogorm 13:46, 10 November 2008 (UTC)[reply]
I tried to clarify what I thought of the situation at Help:Citations, Quotations, References - does that ring true to anyone else, if not could it be fixed? Conrad.Irwin 16:10, 10 November 2008 (UTC)[reply]
  1. One idea not expressed on that page is that, if an entry is short enough so that no (English) definition is pushed off the page thereby, attestation-type quotations could (should!) be on the main page, not the citation page.
  2. Another is that no citation of reasonable quality should be removed from a sense if doing so would leave the sense without any usage example. The requests for usage examples persist on talk pages, feedback, and elsewhere. DCDuring TALK 16:52, 10 November 2008 (UTC)[reply]
That page makes it sound like {citations} ∩ {quotations} = Ø, but I always took it as {citations} = {quotations} or perhaps {citations} = {quotations} ∪ {references}. —RuakhTALK 17:53, 10 November 2008 (UTC)[reply]
Yes, Atelaes, the removal of quotations to those damned citations pages has become a major peeve of mine as well, and more than once I've considered leaving the project because of it. I've added thousands of quotations and am careful to pick quotations which illuminate the meaning. It really rankles me to see good quotations exiled to those gulag subpages (which are sloppily maintained and which require redundant maintenance of the senses when there are multiple senses, and which are never going to stay in sync with their main pages). The real solution is to do it the way the online OED does it and to implement collapsible quotations boxes which work between senses on the main entry page (the code for which was developed by the ever-capable Ruakh about a year ago). -- WikiPedant 17:01, 10 November 2008 (UTC)[reply]
Hear, hear. Ƿidsiþ 05:55, 13 November 2008 (UTC)[reply]
Why not? DCDuring TALK 09:13, 13 November 2008 (UTC)[reply]
Agreed. It is inexplicable that those quotation boxes still haven’t been instituted; their use would:
  1. Remove the problem of Citations-page and Quotations-section synchronisation;
  2. Create clearly-visible grey bars between the definitions, demarcating each sense and drawing the reader’s eye to them;
  3. Cut down the amount of space taken up (or, as their detractors would say, wasted) by our citations to about one line per sense; and,
  4. Allow the simple categorisation of all pages with citations, and those without.
In light of the above, why is there so little enthusiasm for their use?  (u):Raifʻhār (t):Doremítzwr﴿ 20:46, 16 November 2008 (UTC)[reply]
In the online OED, clicking the "quotations" button at the top of the entry toggles the visibility of all quotations. IMO the ideal system for us would be similar: a JS button on every page (or at least every page with citations) that toggles the visibility of all interlinear citations on the page. The default setting could be partial visibility -- showing the quotation but not the source, so that the default user gets examples without a lot of ISBN numbers and whatnot; the user could then choose to either show all info or hide everything. I was fiddling with this a while back, but didn't get far (AFAIR, I could make it work in Firefox but nowhere else). -- Visviva 00:18, 11 November 2008 (UTC)[reply]
Right, Visviva, this is very much how I would envision it too. The current "Citations" button could be the toggle switch. -- WikiPedant 00:39, 11 November 2008 (UTC)[reply]
Further on this line, I'm delighted to report that my show/hide citations button is finally working in IE6, IE7, Opera and FF. Yay! It currently assumes that any unordered list nested in an ordered list is a citation -- I haven't been able to think of a counterexample in mainspace. Anyway, if you'd care to take it for a spin, copy the first section of User:Visviva/monobook.js to Special:Mypage/monobook.js, and put {{cites-button}} on a suitable test page. Discussion of if/how to fix this up for general use should probably go to the WT:GP; I just wanted to mention it here. -- Visviva 04:47, 12 November 2008 (UTC)[reply]
Useful, illustrative quotations should absolutely stay in the entry. The citations page is for everything else. (Note that "quotation" and "citation" are not the same thing. Anyone moving a good quotation to the citations page "because it is supposed to be there" should be immediately trouted. To the contrary, the citations page is a useful resource for quotations that might be added to the entry. (and no, we don't need more collapsible box magic; senses should have 1-2, maybe 3 useful examples and/or quotations that illustrate use. If there are more interesting things, they can go on the citations page) Robert Ullmann 17:16, 10 November 2008 (UTC)[reply]
How many per sense? What makes them useful? DCDuring TALK 17:31, 10 November 2008 (UTC)[reply]
That drives me crazy, too, as you know. Though I think 5 quotations under a sense is kind of a lot, even if there's just one sense, because then it starts to put a bit too much distance between a sense-line and the various attached onyms, translations, etc. (The OED gets away with it because it doesn't have any of that stuff, but we do, and we should be proud of it.) —RuakhTALK 17:53, 10 November 2008 (UTC)[reply]
How many per sense? Depends. For straightforward terms, 1 or 2 good quotations suffice. For slightly harder terms, I like to find one telling quotation from each of the 19th-, 20th-, and 21st-centuries. For difficult terms, I like to find one telling quotation from each century, going back as far as I can. For terms that have been around a long time, this can produce 5 or more. For ambiguous terms, I like to find enough quotations to give a clear sense of the range of usages. For idioms, lengthier lists can also be appropriate, since the defn of an idiom is greatly enriched by seeing the idiom used in context. -- WikiPedant 19:23, 10 November 2008 (UTC)[reply]
The question is how many should appear in the main entry page, as opposed to citations. Are you saying that there ought be no guidelines for this?
Idioms are actually an easy case, because the entire English section (the literal definition plus the figurative, plus the first few citations [unfortunately, the oldest] appear on the first screen. This is because there are no long etymology or pronunciation sections yet. We are not pushing more valuable content (except more recent quotes) off the first screen by inserting another quote.
Following such a practice at an entry like head or set will almost certainly drive users to seek dictionaries with more straightforward layouts for their most common dictionary needs. I already use OneLook in this way, because it provides a few definitions on the landing page as well as providing links to various dictionaries and other references, including WP and Wikt, each with its own characteristic strengths and weaknesses.
Perhaps Wikt should serve the neglected function of offering filtered usage examples. This would serve certain scholars, writers, language learners, and many others sometimes not well served by the various other tools. DCDuring TALK 19:50, 10 November 2008 (UTC)[reply]
I had always thought that around 3 was the maximum number wanted per sense of a word, but some people here seem to clamoring for a high max. Would it be fine to move to Citations: some of the 7 serial quotations under the first sense of verb#Verb? --Bequw¢τ 10:23, 29 November 2008 (UTC)[reply]
verb#Verb is a good test case. I would hope that only one or two of the citations would remain on the mainspace page. Perhaps all of the citations should be moved to citation space to allow a simple view of the attestation and usage history with just one or two left behind. DCDuring TALK 12:34, 29 November 2008 (UTC)[reply]
I would vote for keeping the 1981 (anon), 1997 (Griffiths), and 2005 (Mattison) cites. 1981 because it is the first we have, and therefore significant; the others because they are the simplest, and where possible a good citation should also be a good example, free of unnecessary complications. As an added bonus this would give us one cite per decade.
While we're pet-peeving here, I would like to point out that all of the cites for verb#Verb are borderline worthless at present, since the citer has not provided any links to the the source material (there are no URLs, DOIs, or ISBNs). For all the user can tell, these examples could all have been made up out of whole cloth.-- Visviva 15:28, 29 November 2008 (UTC)[reply]
My ears are burning. I have hardly ever inserted the links in my citations. At least any individual cite can be googled. I've only recently started using the wonderful quote templates, which make citation easier. But, you are right and I will henceforth insert the url, though the greater work may make me cite less. The citations definitely look like attestation cites rather than good usage examples. Sometimes actual usage falls short of one's aesthetic standards. Of course a show/hide for citations obviates the space problem so selectivity is a little less critical. Perhaps good-only-for-attestation citations should be commented out, so as to make our entries more uplifting. DCDuring TALK 16:30, 29 November 2008 (UTC)[reply]
I agree with DCDuring that at most two of them must remain. The more citations are moved to the Citation space, the less encumbered the main entry is. Bogorm 15:41, 29 November 2008 (UTC)[reply]

Indented subsenses?

There has, for many years, been discussion about whether to indent subsenses in long lists of definitions. (c.f. generator, death grip, ward et al). Is this something we like or hate? As far as I can see it improves the clarity of the entry, as well as the logical flow, at the possible expense of possibly breaking the {{quote-book}}-like templates that have the indentation hard-wired in.

Please, for the love of God, yes. Subsenses absolutely make the definitions more meaningful and easier to understand. This is standard practice in almost every professional dictionary I've ever seen. I think that this will require more rethinking than just quotebook, and it will certainly make for some long, tedious, and excruciating Tea room discussions, but I absolutely think it is worth it. Yay for subsenses! -Atelaes λάλει ἐμοί 00:03, 11 November 2008 (UTC)[reply]
I'm down with it. The quote templates can easily be retooled. Some code should be added to MediaWiki:Monobook.css so that the subsenses display properly; I guess the question is whether they should be numeric ("1.2") or alphabetical ("1.b"). -- Visviva 00:41, 11 November 2008 (UTC)[reply]
To be honest, my first preference would be for senses to be numbered with Arabic numerals, and senses grouped under Latin numerals (I. 2). However, I would still prefer entries with non-grouped senses to simply be Arabic numerals, as we currently have it. Can the software be made smart enough to do that? My second preference would be for alphabetical (1. b.). -Atelaes λάλει ἐμοί 05:54, 11 November 2008 (UTC)[reply]
A good direction for improving the quality of the big, complex entries. Will we need to have explicit umbrella senses for the subsenses? Not all dictionaries seem to find that necessary, eg MW3, MW Online. DCDuring TALK 01:18, 11 November 2008 (UTC)[reply]
That is an excellent question. I wonder if we might want to consider starting out with an option for either or. I realize this will introduce a certain lack of consistency, but I think it would be good to try them both out, and see which one is more practical. My intuition is that generally it will work best to simply have senses grouped together, without a "super sense", but I wonder if sometimes an explanatory note about how and why such senses are grouped might be nice.
A further question is: How many levels we should be prepared for? I think MW has 4 or 5, but no less than 3. This might affect the implementation and the level labeling. A kludge that got us two levels but couldn't go beyond that might not be desirable. Would it be easier to have labels like "1.2.4.2."? DCDuring TALK 10:20, 11 November 2008 (UTC)[reply]
I have serious reservations with indented subdefinitions. These reservations stem from the way we code and edit entries, rather than from the concept itself. If a simple, friendly, and easily maintainable structure could be devised, I might support its implementation. However, the three examples presented in support of such a structure don't seem particularly suited to such a complex structure. I see such a structure as useful only for really messy entries like head or set. A list of only three to six definitions doesn't pay off with the added complexity to format it and ensure that additional edits don't destroy the structure. --EncycloPetey 04:01, 12 November 2008 (UTC)[reply]
  • I love them. It was me that worked on ward, and I still miss my early version of "mark", before Connel nixed it. I think subsenses are very intuitive, and I think they allow for a good compromise between whether our definitions are ordered on historical principles or in terms of related definitons. This allows us to show the order in which various broad senses have emerged, while still keeping similar definitions in close promiximity. Ƿidsiþ 05:53, 13 November 2008 (UTC)[reply]

The wiki format for lists encourages entering main senses—it just seems weird to add sub-senses under an empty bullet point (octothorp point).

Changing the numbering format is easy, and you can already do it in your own user style sheet. Example from my style sheet (User:Mzajac/monobook.css), which accounts for four levels of nested lists:

/* nested ordered lists */
ol {
  margin-left: 1.6em;
  list-style-type: decimal;
  }
  ol ol {
    list-style-type: lower-alpha;
    }
    ol ol ol {
      list-style-type: lower-roman;
      }
      ol ol ol ol {
        list-style-type: decimal;
        }

I think OED uses something like A. 2. c. IV., but sometimes skips a level. Unfortunately, we can't currently set the numbers' font style or weight using CSS.

The style sheet should account for 4 or 5 labels, for those cases where someone goes overboard. I suggest we should discourage more than a level of nesting, perhaps by setting an unattractive style for all lower levels. Michael Z. 2008-11-13 18:09 z

  • The primary objection to subsense formats was that all initial examples given, violated copyright. Many more entries that use that subsense format remain suspect. The second objection (and more important, IMHO) is that derivative uses are confounded by subsense syntax. Only the most elaborate parsers have even the slightest chance of interpreting such entries. The third objection is aesthetic: splitters already have far too much latitude here - encouraging this syntax encourages more specious splitting of definitions. The resulting definition themselves are more difficult to read, so we alienate readers. Split definitions also make translation tables more numerous, less accurate and more complex. If the goal is to devise an unusable, incomprehensible system, then subsenses are attractive. But if the goal is to provide a reusable, extensible dictionary to the world, artificial subsenses add unnecessary complication. --Connel MacKenzie 14:23, 15 November 2008 (UTC)[reply]
    Can you give an example of the parsing problem? I would have thought that nested ordered lists are well structured, and relatively easy to parse (unlike the unfortunate hash the wiki parser makes of nested headings, which makes it impossible to select a section in CSS). Michael Z. 2008-11-15 21:08 z
  • In point of fact, subsenses have nothing to do with "splitting definitions". It does not change the number of definitions, only how they are organised on the page. The number of translation tables is exactly the same. It also seems bizarre to describe this as "an unusable, incomprehensible system" when all major print dictionaries do it. Ƿidsiþ 21:06, 16 November 2008 (UTC)[reply]

Standard for entries taking inflected objects

Seeking feedback to come up with a good method for entries that take inflected objects.

  • For English entries, this problem appears in the form of prepositions: think about, against, of, on, out, to, up - what is the best way to highlight all these prepositions so it draws the eye to the one the user is looking for?
  • For FL entries, the same may be a case ending: gondol -ra/-re. Where should we list the required case endings - in the beginning of the definition line? Bolded, so it's easy to see?
  • Sometimes the object can only be a person or a non-living entity, other times both. What words should indicate this: somebody, something, or an abbreviated form of these?

Your thoughts, please. --Panda10 00:12, 11 November 2008 (UTC)[reply]

An example can be seen at tartozik. The abbreviations indicate the case ending: vmvel = valamivel - with something. The labels may be generated by templates that include the entry in categories, for example Category:Hungarian words taking "valamivel". --Panda10 23:43, 11 November 2008 (UTC)[reply]

I created a page User:Panda10/Sandbox to compare options. Would you please take a look and let me know if any of them are acceptable? --Panda10 15:12, 15 November 2008 (UTC)[reply]

Replacement words in descendant languages?

How do we handle words that are replaced in a descendant language?

If a term evolves into an etymologically related term, we put it in “Descendant terms” (and it is linked back in the “Etymology” of the descendant term), but if it is replaced, there doesn’t seem a natural place to put it – it can sorta fit in “Synonyms” as the terms likely coexisted for a time, but this doesn’t capture the relationship. (The following examples are “as far as I can tell”.)

For instance, in Classical Latin, the term for “cheese” was cāseus, which was replaced in Vulgar Latin by formaticum, leading to, for instance, French fromage. Currently there’s no standard way to see terms in descendant languages, not just terms etymologically descendant from a given term.

A more familiar example is perhaps Middle English they, where Old Norse þeir displaced þæt. (Here they are cognate, which confuses explanation a bit further.)

This seems a common language process, and worthy of some systematic treatment – any suggestions?

Perhaps:

  • a note in the “Etymology” section if a term replaced an older term, and
  • a “Replacement terms” section, similar to “Descendant terms”?

Nils von Barth (nbarth) (talk) 17:23, 12 November 2008 (UTC)[reply]

A section in the Usage notes would also be a good idea. This would allow a description of when the wrod was used, and when it began to be replaced. For modern languages, this could also be expressed on the deinition line with a context label like {{archaic}} or {{dated}}. Note also that your Latin cheese example is a bit simplistic, since some Latin descendant languages retained caseus and have modern words descended from it rather than from formaticum. In other words, sometimes replacement is incomplete, or is limited geographically, such as replacement in Galician of poñer with pór, which is another incomplete and geographically-determined case of change. In this case, both forms exist in modern Galician, but each form has regions where it is the norm. --EncycloPetey 05:39, 13 November 2008 (UTC)[reply]

w:Birthday wishes in other languages is up for deletion at Wikipedia (and rightly so). However, this resource is available to us for a few days (longer, if really necessary — I can transwikify it upon request.). If there are any translations there that are missing from happy birthday, we could do worse than incorporate them (as unchecked translations if necessary, or on the talk page). I've encouraged the article's author to come here to do that sort of work. Uncle G 13:03, 13 November 2008 (UTC)[reply]

For the mean while, I've transwikied it to transwiki:Birthday wishes in other languages, so it can be deleted from WP.—msh210 04:50, 16 November 2008 (UTC)[reply]

{{suffix}}

Can I initiate some serious bikeshedding by proposing that we should edit the {{suffix}} template so that it does not include quotation marks in the name of the category. I believe this would be better because it is then possible for people to type the category name manually easily, it prevents the nested quotation marks in Entries in category “English words suffixed with “-ness””. It will be necessary to do some category renaming whatever we decide as some languages (e.g. Hungarian) use the category name without the quotes. Would people object if I got User:Conrad.Bot to move the categories that have already been created, and then update the template, then find any manual categorisations left over, and then (possibly) delete the old categories? Conrad.Irwin 01:05, 15 November 2008 (UTC)[reply]

OK with me. DCDuring TALK 01:32, 15 November 2008 (UTC)[reply]
sounds good to me. —RuakhTALK 02:40, 15 November 2008 (UTC)[reply]
Sounds very good to me. :) --Panda10 02:44, 15 November 2008 (UTC)[reply]
I've created all the respective categories without ", and Daniel. has updated the {{suffix}} to use them. I'll now slowly delete all the empty categories containing quotemarks. Conrad.Irwin 14:25, 15 November 2008 (UTC)[reply]

Fair use images on Wiktionary

I have been unable to find any Wiktionary policy statement on whether fair use images are permitted on Wiktionary. As local image uploads are restricted to sysops, and Commons does not allow fair use images, such pictures are de facto not permitted here. Should we make this explicit?

It should be noted that this Wikimedia Foundation board resolution requires all projects that allow fair use content to have an "Exemption Doctrine Policy" (EDP) (see link for definition, I don't know how to make interwiki links to the foundation wiki). (I do. ☺ Uncle G 12:03, 20 November 2008 (UTC))[reply]

There appears to be no requirement for projects not hosting fair use content to have a policy about or otherwise explicitly state this (presumably as it is the default position). Making an explicit statement (or not) about fair use images is therefore a purely local decision. Thryduulf 01:56, 16 November 2008 (UTC)[reply]

  • I have no qualms about using images under fair-use, but as we don't have many pages that are about propriety things I'm not sure we need them (and it sounds as though saying no is simpler, as we can ignore our Image: space completely). I certainly would like to keep upload sysop-restricted, which probably limits us to using commons. Conrad.Irwin 02:01, 16 November 2008 (UTC)[reply]

There has been some discussion at Wiktionary:Requests for verification#Cheerios about allowing fair use images. I personally do not think that it is worth all the hassle for the handful of entries that would benefit. IANAL, but going by what I know from Wikipedia it is worth baring in mind the following:

  • If a copyrighted image can be replaced by a free content one, then using the copyrighted image is not fair use. Even if no free image currently exists, only if one cannot exist is it fair use.
  • Every fair use image must be accompanied by a detailed fair use rationale, explaining why every use of on Wiktionary is fair use (i.e. if it is used on three pages, it must be accompanied by three fair use rationales).
  • If a fair use image is no longer used, it must be deleted. I believe that Wikipedia uses a bot to flag unused fair use images.
  • Using such images outside of the main namespace is unlikely to be fair use.
  • If we as a community choose to allow fair use, we will need to work with the foundation to create an "Exemption Doctrine Policy". AIUI this needs to be in place before we accept fair use images. Thryduulf 19:25, 16 November 2008 (UTC)[reply]
  • Unlike many conventional encyclopedias, most convential dictionaries are typically not full of pictures. -- Gauss 19:37, 16 November 2008 (UTC)[reply]
    • I have several dictionaries with pictures in them. However, I cannot recall ever seeing one (that wasn't trying to be an encyclopaedia as well) that employed a copyrighted work for which the dictionary didn't have publication rights. Translated into Wiktionary terms, this is equivalent to not having any images in Wiktionary that are not free content. Uncle G 12:03, 20 November 2008 (UTC)[reply]
    • The Wikipedia EDP at w:Wikipedia:Non-free content is long. We could copy it or shorten it. In either case, someone who has some familiarity with our likely usage pattern and the IP law field should read it. I also don't think we can do fair use casually. As Thryduulf suggests it may not be worth the effort to do so at all. DCDuring TALK 19:31, 16 November 2008 (UTC)[reply]
      • If we do decide to go down the route of fair use, the foundation site (see link above) makes it clear that they will help projects create an EDP, and presumably they will need to approve it if we go it alone. It doesn't say how to request that help, but presumably there is somewhere on Meta to contact the relevant people. Thryduulf 22:21, 16 November 2008 (UTC)[reply]
        • I see no problem with that. Even if we don't end up using any fair use imagery, there is no reason for us to foreclose ourselves from doing so if there is any potential it might be useful to illustrate a definition. bd2412 T 04:47, 17 November 2008 (UTC)[reply]
  • Per Thryduulf 19:25, 16 November 2008 (UTC) and Gauss 19:37, 16 November 2008 (UTC), I think allowing fair-use images will be more bother than it'll be worth.—msh210 06:09, 17 November 2008 (UTC)[reply]
    • Amen to that. This is trouble we don't need. Wikipedia wouldn't have fair-use either, if they/we had dealt with the issue before the project was cursed with popularity. -- Visviva 03:52, 19 November 2008 (UTC)[reply]

Should I create a vote on this issue to settle it? Thryduulf 12:19, 19 November 2008 (UTC)[reply]

  • Go for it. --EncycloPetey 07:11, 20 November 2008 (UTC)[reply]
    • Since you are interwiki linking, here's another one for you Don't vote on everything. ☺ Wait until someone can come up with a cogent reason that Wiktionary would ever need a non-free image, and only then put forward a proposal.

      Here's a thought to ponder upon: The "preamble purposes" in the fair use doctrine are "criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research". We are not Wikinews, Wikipedia, Wikibooks, or Wikiversity. Teaching, criticism, comment, and news reporting are not our domain. And you'll be hard pressed to explain how a dictionary needs to copy copyright-protected images in order to perform lexicographic scholarship or research.

      Here's another thought to ponder upon: Our quoting short excerpts of copyrighted works for the purposes of illustrating their use is something that we do under the fair use doctrine. We don't use images and media under fair use, but we do use text quite a lot. But we already have that covered in our copyright policy. Uncle G 12:03, 20 November 2008 (UTC)[reply]

  • Well, Cheerios has been mentioned as an entry that might require a fair-use image for illustration. The same reasoning would probably apply to many other terms derived from trade names. While I don't think we should have them, I think the case for fair-use images on Wiktionary is roughly as good (or as bad) as the case on Wikipedia; in both cases, the primary purpose of fair use is "teaching," i.e. illustrating the concept in question. Our need for this type of illustration is not really any less than Wikipedia's; indeed it is arguably greater since our format does not allow lengthy verbal descriptions. But again, I don't think we should have them and for my part I am happy to continue with our Vote-free, EDP-free status quo. -- Visviva 12:13, 20 November 2008 (UTC)[reply]
  • I can't think of any case where an image is really needed, but a free use image is not possible. I'd prefer to say illustrate with images at commons (easy and practical way to illustrate). Can someone give me more examples of where a copyrighted image is needed? RJFJR 14:31, 20 November 2008 (UTC)[reply]
  • Darth Vader could really use one - it would enhance the explanation of why, exactly, this name has come to stand for a person attributed with that sort of brooding malevolence. There are probably a few other examples of that type. bd2412 T 04:56, 25 November 2008 (UTC)[reply]
    Would it not be possible to use a Commons self-made image of a person in a self-made Darth Vader costume? There are some such images there currently, from conventions, but unfortunately Vader is blurry in all of them. --EncycloPetey 20:01, 25 November 2008 (UTC)[reply]
    Would anything short of an actual image of Darth Vader from the movies really capture the look? And in any event, Lucasfilm undoubtedly owns whatever copyrightable elements exist in the costume (although costumes are generally not copyrightable, masks generally are). The point is, we would be on utterly safe fair use grounds to use a 250px cropped scene from the film, and I don't think there is a substitute that would have the authenticity of the real thing. Same thing, perhaps, with Death Star, hobbit, Ringwraith, Klingon, Hogwarts. I'd also reiterate that to the extent we allow brand names of packaged goods like Rice Krispies (which we include because they may be used without context in writing), we ought to have pictures of the packaging as well. bd2412 T 06:49, 26 November 2008 (UTC)[reply]
    Since packaging changes periodically, I definitely disagree with you on that point. A bowl filled with Rice Krispies would be more effective than a picture of any package, and could be uploaded on Commons with no copyright problems. --EncycloPetey 06:23, 26 November 2008 (UTC)[reply]
    Interesting point. But in many such products there are certain elements of the packaging that are familiar across generations, such that a picture of a box of cereal from, say, the mid-80s would be instantly recognizable as that cereal. bd2412 T 06:49, 26 November 2008 (UTC)[reply]

A good news: linking to Wiktionary from Wikimedia projects using "d:" prefix works now. --Dan Polansky 08:58, 16 November 2008 (UTC)[reply]

Great. Thanks. DCDuring TALK 19:31, 16 November 2008 (UTC)[reply]

"d:"?! Why "d:"!?? Ƿidsiþ 21:00, 16 November 2008 (UTC)[reply]

"Dictionary". Seems to be the best anyone could come up with... \Mike 21:06, 16 November 2008 (UTC)[reply]
Ah, I see. So they picked the one letter from dictionary which "Wiktionary" doesn't use.... Ƿidsiþ 21:08, 16 November 2008 (UTC)[reply]
"C" isn't in "Wiktionary" either, and doens't have any relevance at all. "W" and "N" are in use for Wikipedia and Wikinews respectively. None of the others have any special meaning for "Wiktionary", and we are a dictionary. Makes sense to me. Thryduulf 22:38, 16 November 2008 (UTC)[reply]
The only other option would have been "T" (as the other projects strip the wiki prefix) but it was suggested that that could be confused with "Template:" - and is counter intuitive for beginners. meta:Talk:Interwiki map/Archives/2008-08#Wiktionary has the discussion. Conrad.Irwin 02:03, 17 November 2008 (UTC)[reply]
Thanks! Conrad.Irwin 02:03, 17 November 2008 (UTC)[reply]
Thanks, Conrad, for taking the issue of creating a single-letter prefix for Wiktionary to Meta, thus getting it done in the first place. --Dan Polansky 06:05, 17 November 2008 (UTC)[reply]
Oddly, d: links to English Wiktionary, not WIktionary in general. Contrast http://he.wikipedia.org/wiki/d:foo with http://he.wikipedia.org/wiki/wikt:foo .—msh210 07:06, 17 November 2008 (UTC)[reply]
Hmm, that seems like a bug to me... Conrad.Irwin 09:07, 17 November 2008 (UTC)[reply]
I suspect rather that it means that d:foo was added as an interwiki link like any other (like doi:foo, e.g.), and was added to link to enwikt (as any other interwiki link links to some specific URL), rather than having been added as the counterpart to w:, n:, q:, et al., which are, somehow, treated specially my the MW software or the WM configuration thereof. Not a bug, just a misunderstanding by the implementer of what was to be implemented.—msh210 17:28, 17 November 2008 (UTC)[reply]
I take that back. http://en.wiktionary.org/wiki/w:foo works but http://en.wiktionary.org/wiki/doi:foo does not, but http://en.wikipedia.org/wiki/d:foo does, so d: is sorta like the w: and n: prefixes.—msh210 06:41, 18 November 2008 (UTC)[reply]

Fine-grained categorization of nouns

I'd like to ask about the views on how to best categorize nouns, and whether at all. Recently, the following categories have been created:

It seems to me that creating further categories along these lines would lead to a creation of the following categories:

  • Czech feminine nouns
  • Czech masculine nouns
  • Czech neuter nouns
  • Czech animate nouns
  • Czech inanimate nouns
  • Czech nouns with declension pattern pán
  • Czech nouns with declension pattern hrad
  • Czech nouns with declension pattern ...

I'd think these categories are unneeded. --Dan Polansky 08:29, 17 November 2008 (UTC)[reply]

Grammatical categories are tricky, as they can't really be decided for all languages, like topical categories can (and even that's just a bit....sticky. How about dividing up Ancient Greek words between Category:Greece and Category:Ancient Greece :-)). We have a number of subcats inside Category:Ancient Greek nouns, and I'm somewhat apathetic about them. They add little....but they subtract little as well. Ultimately, this is something which I think should be decided on a case by case basis by people who are involved with the language in question. Editors from closely related languages may have useful input, as their languages probably have similar issues. Thus, I would suggest that you and Romanb have a discussion about the merits of such categories, and perhaps get other Slavic folks involved (e.g. Stephen, Ivan, etc.) if a resolution cannot be reached. Ultimately, bring the issue back here, with each side's primary arguments for larger community assessment if that all fails. Personally, I see no merits to the gender cats (although "Czech nouns with declension pattern pán" might have some merit). However, as curently presented, I think that I (and most of the other folks reading this) am utterly unqualified to judge such things, as I don't know Czech, and thus don't really understand how people might want to sort Czech nouns. I hope this does not come off as me simply brushing you off. -Atelaes λάλει ἐμοί 08:53, 17 November 2008 (UTC)[reply]
Useless categories, should all be deleted. We shouldn't follow Wikipedia's over-categorizing mentality "if it can't hurt, leave it". I cannot image in what circumstances should one be interested in inspecting ever-growing category populated with thousands of entries connection of which is based on some trivial property such as animacy and gender. Much more interesting would be closed categories, e.g. n-stems in Russians (only a handful of PIE consonant-stems nouns have been preserved in all Slavic languages), nouns meanings of which can be both animate and inanimate (and thus have dual forms in some cases), or have defective inflection, or represent an exception from normal gender assignment based on a suffix, or are in any other way "interesting". --Ivan Štambuk 13:40, 18 November 2008 (UTC)[reply]
They seem useful and interesting to me (that is, I would expect to find them of interest if I were learning Czech, which I currently am not). Certainly if a particular entry leaves the user guessing about some aspect of inflection, checking other entries with the same inflectional pattern can be helpful. Also, fine-grained categories can help with maintenance. As Wiktionary develops, there will be approximately n occasions (where n is an arbitrarily large number) when we realize that all entries for Foovian words with certain properties have inaccurate or missing information. Suppose we have entries for 10,000 Foovian nouns, among which there are 500 5th-declension Foovian nouns... to clean up a problem specific to 5th-declension nouns, it would be vastly easier to go through the category for 5th-declension nouns than to sift through the entire "Foovian nouns" category by hand. If you see what I mean... :-) -- Visviva 03:45, 19 November 2008 (UTC)[reply]
The gender-based subcategories of nouns have some merit, but as far as I'm concerned, any further subcategorization is a horde of hobgoblins waiting to eat up valuable time that should be spent on more productive pursuits. --EncycloPetey 07:08, 20 November 2008 (UTC)[reply]
Like the creator of these gender-noun categories, I think they are useful for learners of the languages, who can see the similarities between such nouns. I was considering whether or not to create a category for animate and inanimate nouns, and probably I will later. Comparing the Czech nouns categories and all the different sorts of nouns in Category:English nouns, I see no reason not to include these. However, I'm not so fussed either way. --Romanb 18:24, 24 November 2008 (UTC)[reply]

Template needed for pluralia tantum

I want to bring attention to this request: http://en.wiktionary.org/wiki/Template_talk:en-noun#Pluralia_tantum. I'd like to see this happen too. - dougher 05:53, 18 November 2008 (UTC)[reply]

I just created {{en-plural-noun}} for this, as {{en-noun}} is complicated enough already. Conrad.Irwin 09:27, 18 November 2008 (UTC)[reply]
How does this differ from {{plurale tantum}} and {{pluralonly}}? Thryduulf 11:57, 18 November 2008 (UTC)[reply]
According to {{en-noun}}, we ought to enter a singular for the word "entrails", that is the problem. Circeus 18:15, 18 November 2008 (UTC)[reply]
  1. There is a {now-obsolete} singular word [[entrail]] that appeared in Webster's 1913.
  2. There are 500+ raw b.g.c. hits for "entrail", often with "one" or "an" (indicating countability).
This is not an unusual pattern of usage (or [[abusage]]). DCDuring TALK 18:38, 18 November 2008 (UTC)[reply]
I was just citing this particular example because it was the first fairly clear-cut that came to mind, how about bagpipes, physics, acoustics, brass knuckles, Y-fronts, feces, longjohns, memorabilia, northern lights, smithereens? Clearly using the "uncountable" option of the template is inappropriate (it's explicitly intended for mass nouns). Circeus 19:02, 18 November 2008 (UTC)[reply]
My sole interests is in indicating the complexities not yet addressed. The template does not accommodate some real entries. Also it is not obvious from the label "plurale tantum" what the practical implications are for normal users or language learners. "Bagpipes" are (almost always), but "physics" (or "acoustics") is. "Memorabilia" seems to accept both. Presenting a label without presenting how to properly use the word doesn't seem adequate. DCDuring TALK 19:21, 18 November 2008 (UTC)[reply]
Personally, I prefer to use {{infl|en|noun}} when {{en-noun}} is being uncooperative. (Of course, this doesn't accommodate pluralization at all). -- Visviva 03:48, 19 November 2008 (UTC)[reply]

ISO code for Serbo-Croatian

The Wikipedia article w:Serbo-Croatian language says that the two-letter ISO 639-1 code "sh" for Serbo-Croatian is deprecated, while the three-letter ISO 639-3 code "hbs" is not.

Wiktionary currently only supports the "sh" code, should we continue to use the deprecated code or should we switch to the active code? Thryduulf 13:15, 18 November 2008 (UTC)[reply]

hbs is for what Ethnologue calls "macrolanguage" (their own neologism), not individual language code. SC is not used on Wiktionary, so {sh}/{lang:sh} and {hbs}/{lang:hbs} should not be normally transcluded anywhere unless in special circumstances. --Ivan Štambuk 13:31, 18 November 2008 (UTC)[reply]
sh should and must be available in Wiktionary for every person from the former SFRY who does not deny the existence of this language (there are such contributors) just becuase of political (separatist) convictions, unless he speaks the Slovenian language or the Bulgarian language in Macedonia. And this must apply not because I am adherent of the language, but because of the existence of the ISO code. Those who dislike it, either make no usage of it, or write sh-0, am I right? Bogorm 13:43, 18 November 2008 (UTC)[reply]
Bogorm, I though I've explained this to you while ago. There was never such thing as "Serbo-Croatian language" or "Serbo-Croatian languages", before it was invented by Communists in SFRJ. Croats and Serbs have had separate literatures for centuries (some more than the others..), and the fact that in the 19th century the same dialect (stylised Neoštokavian, but different subdialcts) was chosen for a literary language of both Serbs and Croats does not "prove" anything. Dialects of this "Serbo-Croatian" area do not form a genetic clade (there was never "Proto-Serbo-Croatian"; their last common reconstructable ancestor was Proto-West-South-Slavic). --Ivan Štambuk 13:49, 18 November 2008 (UTC)[reply]
And I too explained that there are exactly three South-Slavic languages - Slovenian, Serbo-Croatian and Bulgarian. I suggest we stop here since our argument is of no avail for people outside the Balcan peninsula to decide on the ISO matter. You also explained that there were "Czechoslovakian" also invented by Communists, when in the whole БСЭ there is not a single word about it. Until political radicalisation on the Balcans since 1990es this term was tranquilly in circulation and prevailing and that is what people still feel predilection for it - because it is traditional linguistics who endorse it, while the modern is not politically impartial.
Enough with the most widespread South-Slavic language, let's concentrate on its code. So, either preserve sh, or induce hbs? In mine opinion this ought to be decided in Meta, so that scores of native speakers be enabled to partake, ok? (I mean, discussions in Meta are regularly announced on local Wikipedias, whereas these here are not) Bogorm 13:59, 18 November 2008 (UTC)[reply]
All who are interested in the dispute between me, User:Dijan (whom I ardently consented with) and User:Ivan Štambuk and in the arguable Czechoslovakian language, may espy here.Bogorm 14:06, 18 November 2008 (UTC)[reply]
Look Bogorom, I personally don't give a flying f*** what some Communist encyclopedia says about "Czecho-Slovak", or "Serbo-Croatian", or "Serbo-Croato-Slovene" language (did you here about this latter one? It was suppose to be the official language of SHS kingdom, but it failed to be codified). No "traditional linguistics" endorse it but that of ignorance and laziness. Croats have called their language Croatian centuries before Communist decided to sanction that practice (entire book editions were burned just because they had "Croatian" and not "Serbo-Croatian" in the title), and utilize it to systematically Serbify Croatian speech in all semantic spheres. If you think of Communist linguistics which imprisoned people for using the "wrong words" or published dictionaries stripped of very-much-alive words which were not "acceptable" just because they were Croat-only as "politically impartial", and modern democratic peer-reviewed-journal-published views as a result of 1990s radicalisation in the Balkans, than you are very much deluded. I know it's simple for outsiders to "simplify" things, imagining that "SC" somehow "disintegrated" paralelly with SFRJ, a view which is still cherished by some Serbs (because it gives them right to claim Croatian cultural heritage), and some Communist-sympathising Yugonostalgics, but issuses are far, far more delicate than that, and please don't raise them here because every instance you mention it I feel like being called to explain why you are wrong.
As for the code - this has nothing to do with Meta, but with a set of template such as {{sh}} which are used by other templates to convert ISO code to language name. Wiktionary already uses different sets of codes than those "intelligently" chosen by Meta Language Committee. --Ivan Štambuk 14:25, 18 November 2008 (UTC)[reply]
Do not distort/deride my name, ok? Bogorm 17:42, 18 November 2008 (UTC)[reply]

Now you've had your rants at each other, please can we get back to the matter in hand. There are 48 words in four languages that are included in Category:Serbo-Croatian derivations or a subcategory thereof, either manually or through one of {{SH.}} or {{etyl|sh}}. It would not surprise me to find other words so derived but which are not categorised as such. The only question is do we use the deprecated code "sh" or the active code "hbs" to denote these. {{SH.}}, like all dotted etymology templates, is already deprecated in favour of {{etyl|xx}}, if we chose to continue to use the deprecated code then xx will continue equal sh in the case of Serbo-Croatian derivations; if we chose to use the active code then xx will equal hbs. It will be trivial to convert existing {{etyl|sh}} entries to use {{etyl|hbs}} instead. If I read special:whatlinkshere/template:sh correctly, there are precisely 4 pages that would need amending in this manner. 1 page in the Wiktionary namespace and 1 page in Robert Ullmann's userspace may also need changing.

This is not about politics, what native speakers call their language, what $encylopaedia calls the language. Nor is it a proposal to alter the status of the language on Wiktionary, do anything at all to any entry in or referring to Croatian, Serbian, Bosnian or any other language. Please can we keep this civilised and rational. Thryduulf 15:13, 18 November 2008 (UTC)[reply]

Let's try to follow published technical standards, rather than formulating our own through political and ethnic discussions. Michael Z. 2008-11-18 17:31 z
Unfortunately, hbs is a macrolanguage code (as Ivan Štambuk (talkcontribs) points out), so by my understanding, we can't use {{etyl|sh}} (because it's deprecated) or {{etyl|hbs}} (because it's not a language code). The question is then what attitude we want to take toward {{SH.}}: do we consider it mandatory, desirable, optional, undesirable, or forbidden for it to be replaced with {{etyl|sr}}/{{etyl|hr}}/etc.? My vote is for "desirable": {{SH.}} is acceptable but not ideal. —RuakhTALK 20:03, 18 November 2008 (UTC)[reply]
Agree completely with Ruakh, every word. -Atelaes λάλει ἐμοί 21:26, 18 November 2008 (UTC)[reply]
Ack, it's never as simple as we'd like. I'd like to stick with standards.
But I need to know what to type into an etymology when my source says “Serbo-Croatian”. Isn't hbs a synonym for sh? Michael Z. 2008-11-19 06:33 z
 Michael Z. 2008-11-19 06:33 z
That's easy. The best practice for now is, when your source says "Serbo-Croation", use {{SH.}}. The procedure for macrolanguages/language families is, as yet, undecided. Serbo-Croatian is in the same limbo as Germanic ({{Ger.}}. These will undoubtedly get sorted out in time. -Atelaes λάλει ἐμοί 06:56, 19 November 2008 (UTC)[reply]

{{SH.}} must be obsoleted for individual language codes. From a cursory look of the etymons that use it in their respective etymologies, all of them are either Croatian or Serbian-specific, are shared with Slovenian and/or Bulgaro-Macedonian, or are even Common Slavic (like (deprecated template usage) slava). The act of borrowing of all of them predates the conception of "Serbo-Croatian" by centuries. Defective and unreliable sources are no excuse to push the usage of {{SH.}}. --Ivan Štambuk 15:23, 19 November 2008 (UTC)[reply]

Okay, but I do have access to etymological dictionaries which give “Serbo-Croatian” (or “Serbian-Croatian”). How would I now enter the lang attribute in a template, where I would previously have typed {{term|...|lang=sh}}? I'd prefer to keep doing it the old way and keep the data structured, than to enter it without the template and then not being able to locate and update later it in the wikitext soup. When the standard is updated, we're not obligated to adopt it overnight before we can work out a way to deal with it. Michael Z. 2008-11-19 16:01 z
You can't use {term} with lang=sh, because there are no L2 SC entries you can link to. If you prefer the no-brainer path, feel free to use it that way (or {SH.} when in etymologies), and it will be taken care of sooner or later. --Ivan Štambuk 16:12, 19 November 2008 (UTC)[reply]

Guidelines and Policies

As a newcomer, I want to suggest the following. Using this main "Project page" for an actual list of guidelines and policies...moving all discussions to the discussion page. Which should pave the way for grouping items and then moving whole groups onto separate pages...? -- IrishDragon 03:33, 19 November 2008 (UTC)[reply]

The Beer Parlour is a general discussion forum, comparable to the Village Pump on Wikipedia. I think you are looking for something like Wiktionary:Policies and guidelines. -- Visviva 04:47, 19 November 2008 (UTC)[reply]
Yes, but why is it so hard to find...and why isn't it a "real" page? (This redirects to the "Talk" page??) -- IrishDragon 03:25, 20 November 2008 (UTC)[reply]
This is not Wikipedia. You are coming here with Wikipedia-based expectations. We don't use our policies or guidelines in the same ways that Wikipedia does. We have only a very few core policy pages, not a library-full of them with supplementary guidelines, essays, and ramblings the way that Wikipedia does. --EncycloPetey 07:02, 20 November 2008 (UTC)[reply]


MW 1913 and 1828

What has happened to Webster's Unabridged Revised Dictionary - yesterday I could not open any entry (in two browsers) and neither can I today. For example this does not open and here an innumerable amount of entries are dependent on it. Is anyone seeing something meaningful in the link? Any information? Bogorm 15:55, 23 November 2008 (UTC)[reply]

I have same problem, but this link just now worked for 1828 and this one for 1913. Onelook.com is a useful gateway to multiple dictionaries as well. DCDuring TALK 18:34, 23 November 2008 (UTC)[reply]
It seems to have slipped by. Bogorm 12:31, 28 November 2008 (UTC)[reply]

belter a term used in reference to the individual characteristics displayed commonly found in dundonian female personality. in a collective format the reference to being a belter implies that a person is lacking in social skills and intelligence, often associated with monday books and social loan repeat applicants. — This unsigned comment was added by Tangerine queen (talkcontribs) at 17:00, 23 November 2008.

He/she posted this on many irrelevant pages. I moved it to Requests for entries at the time. Equinox 19:44, 29 November 2008 (UTC)[reply]
Strike as well-covered elsewhere. DCDuring TALK 19:59, 29 November 2008 (UTC)[reply]


Structure of given name categories

Category:fa:Male given names has been deleted and replaced by Category:Persian male given names. This creates a problem insomuch as Category:fa:Male given names did contain quite rightly the Arabic names in Persian, yet now Arabic names such as علی Ali and جمیل Jamil are in a wrongly-named category: 'Category:Persian male given names'. 'Persian given names' are those such as کوروش Kurosh/Cyrus and داریوش Dariush/Darius. How can this be resolved? Kaixinguo 06:09, 24 November 2008 (UTC)[reply]

This is just the kind of misunderstanding I was afraid of. Only the name of the category has changed, it still applies to the same names: any male given name in the Persian language, given by Persian speaking parents to their son, including those derived from Arabic. Just like Cyrus is an English male given name, though it derives from Persian. You may create a subcategory if you wish, by the template {given name|male|from=Arabic|lang=fa}.
A real problem is Category:English male given names from Persian. Besides Cyrus, it contains names like Behrouz and Ehsan, which are not given by English-speaking parents to their children. They should be classified as "English transliterations of Persian male given names", or "Persian male given names transliterated into English", or... I cannot think of a good name. But as long as this isn't resolved, it's better to keep them in the wrong category than to leave them outside all categories.
By the way, your entries on Persian given names were exciting. I learned a lot:)--Makaokalani 14:23, 24 November 2008 (UTC)[reply]
How about "Persian male/female given names in Roman script"? These are at least partially independent of language (that's what makes them so headache-inducing), but the script is a straightforward criterion. We could then have a matching category "English given names in Persian script" and so forth.
But then again, are we sure that names like "Behrouz" are never given to the children of English-speaking parents? Even if they are of Iranian ancestry? Seems difficult to prove either way. -- Visviva 15:01, 24 November 2008 (UTC)[reply]
On the one hand, our main taxonomy is by language, but on the other classifying these names by transliterated script would greatly limit the proliferation of categories, and category membership for many names. But it will clump together foreign-sounding names transcribed for French, German, Slovak, Hungarian, Filipino, etc. Maybe start with script categories, then subdivide them into languages if/when their membership grows large.
Is “given to children of anglophone parents” a useful criterion at all? Many people's romanized names are assigned by passport-issuing authorities or immigration authorities, including Children's, and including many people who speak English or are learning it, or are becoming citizens of an English-speaking country.
Some countries have official romanization schemes used for passports, and it may be useful to note this in etymologies. Michael Z. 2008-11-24 17:05 z
"Persian male/female given names in Roman script" sounds fine, and is suitable to the Template:given name. The language statement is a problem, I would so much like to call such names "Translingual", even when they don't appear in all languages. But the language can be changed later,and subcategories added.
"Given to children of (several) native speaker parents" is an essential criterion for the language statement of a given name. People immigrate and intermarry so much that if the names of first and second generation immigrants were counted, all names would occur in all languages. Another good criterion is the pronunciation. There is no standard English way of pronouncing Behrouz or Ehsan. You'd have to refer to the Persian pronunciation, in the original entry. --Makaokalani 14:55, 25 November 2008 (UTC)[reply]
I'm skeptical about the native-speaker parents criteria. 1. some “foreign” names are used in countries where English is used as an official language or by a large number of second-language speakers, like Pakistan or the Philippines; 2. if someone immigrates to an English-speaking country and make the news, or they start a company, etc., then their name may be widely used in English. Whether standard or not, their name is pronounced somehow in English, and this can be determined by research. I think names in English should simply be included if they are attested, like any other term. Michael Z. 2008-11-25 18:42 z
I think if Behrouz is attested in English (e.g. (not i.e.), if there are three independent people (i.e., not named after one another) whose names appear on official papers in English-speaking countries), it's an English name; otherwise it's a transliteration, and AFAIK our policy is not to include transliterations.—msh210 19:18, 25 November 2008 (UTC)[reply]
Thanks to User:Makaokalani for spending so much time to make the name entries in Wiktionary work.
I want to clarify something for myself : what is the status quo on names not of English origin in Roman script which are not that commonly seen in English-speaking countries? Also, am I correct in thinking that, as things stand, a name such as 'Michael' could have an entry for numerous languages using Roman script? How about a name such as 'Mohammad' which has entered numerous languages? Thanks. Kaixinguo 22:58, 25 November 2008 (UTC)[reply]
Also, I think it is a given that almost every single Persian name would be able to be attested in English.Kaixinguo 23:02, 25 November 2008 (UTC)[reply]
Maybe we should create "Wiktionary:About given names"? Wiktionary rules are made for words that mean something, and some rules make no sense when applied to names. An example: Wolfgang is the first name of Mozart and Goethe, and of thousands of other German speakers. If the CFI is "three citations in three years" - and about different persons, as msh210 suggests - Wolfgang is a word in practically all languages that use Roman script. How many such languages are there - a thousand, certainly? A thousand identical explanations for Wolfgang, and for every name that is reasonably common in any major language? ( And for many place names.)
If Behrouz is called "English", then it should also have an entry in hundreds of other languages using Roman script. That's why I want to call it "Translingual". You could add a list of the languages where this particular transliteration is used.
Surely we could think of some special rules for given names? For names used in India, Pakistan, Philippines, we could decide every country separately. Mohammad should naturally be translated into every language. The present entries for "Michael" mean only that it's a common name in those languages - a very common name in Denmark, for example.--Makaokalani 13:59, 26 November 2008 (UTC)[reply]
I agree that an "About given names" page would be an excellent idea (as would an "About surnames" page). The argument that I have seen in favor of not using Translingual for proper nouns is because the inflection and pronunciation will differ from one language to another ... for example, if I'm not mistaken, the genitive of Wolfgang in Finnish would be "Wolfgangin". I think we could really do without pronunciation info for given names in languages where they are not "native" -- there are at least three different ways that Anglophones pronounce "Wolfgang", but only the German pronunciation is "correct" in any meaningful sense. IMO we could do without inflectional information too, though it does have some use.
This has been an area of contention for some time, so a WT:VOTE will probably be necessary to end our current (bad) practice. -- Visviva 12:57, 29 November 2008 (UTC)[reply]

Formatting of glosses of non-English entries

I wonder about the preferred formatting of disambiguating glosses at non-English entries. I have seen the following variants, and have been entering all the Czech entries using the first one.

  1. car (nonpowered unit in a railroad train) -- using {{i}}
  2. car (nonpowered unit in a railroad train)
  3. car (nonpowered unit in a railroad train) -- using (''gloss'')

User:Tbot uses the second variant when creating non-English entries.

The reasoning behind my choise of italic was that glosses that refer to sense in the synonyms section of English entries are typeset in italic by {{sense}}, and that the text entered using the template {{sense}} serves a similar role as the glosses being the translation at non-English entries.

Is there any mention in WT:ELE to that direction that I have overlooked?

Thanks for your input.

--Dan Polansky 16:59, 24 November 2008 (UTC)[reply]

My reflex is to avoid applying any formatting or other design element unless it serves a clear purpose. I don't see the italics adding any meaning, or meaningfully distinguishing here, so I would simply omit them. And if a bot is making hundreds of these, we may as well remain consistent. Michael Z. 2008-11-24 17:11 z
I like templates like {{gloss}}, they make it easier (slightly) to solve the otherwise impossible linking definitions problem. Whether it should be italics or not is just user-preference, and can be toggled at WT:PREFS. Conrad.Irwin 17:27, 24 November 2008 (UTC)[reply]
Sounds good to me, but AFAIK we currently have no template serving the purpose. {{gloss}} is now a redirect to {{gloss-stub}}, which says that gloss is missing.
What about turning {{gloss}} from a redirect to a template serving exactly that purpose of marking up disambiguating glosses in non-English entries? That should be practicable, as currently less than 30 entries are referring directly to {{gloss}}; I could change them manually to refer to {{gloss-stub}}. --Dan Polansky 21:32, 24 November 2008 (UTC)[reply]
I agree with your last comment Dan. I don't think we should be using {{i}} (or {{i-c}}, etc) if there is something more specific available to allow the greatest amount of customisability. Thryduulf 22:19, 24 November 2008 (UTC)[reply]

Just want to pipe in that for the "final" version, we would want the entry to be formatted as an entry without the parenthesized gloss. I'd format this as "A railroad car." myself. Circeus 23:41, 24 November 2008 (UTC)[reply]

I agree, but I don't think we're ever going to convince everyone else. —RuakhTALK 01:09, 25 November 2008 (UTC)[reply]
I don't mind any specific format for the bot-created entrances (except to say that the {{i}} classes should NOT be used). But I do have issues with parentheses used as disambiguation inside definitions, which is why I believe such should be automatically considered non-final entries (in addition to the various more obvious issues, e.g. automated légal or pal would be incapable of accounting for these words' inflections). Not to mention the definitions most often requires some proper fine-tuning, or the parentheses may be superfluous (cf. lire entre les lignes). Circeus 01:35, 25 November 2008 (UTC)[reply]
I think this gets to the distinction between definitions and translations. Stephen was noting this distinction elsewhere recently (on RFD I think), and while I was busy disagreeing with him at the time, I think it is an important point. There are good reasons why bilingual dictionaries generally list translations rather than giving "full" definitions (except for those words where no simple translation is possible). If we want to break from that tradition, we can, but it merits some serious thought. ... At any rate, in the example in question, I think "railroad car" works neatly as both a translation and a definition. That is the ideal solution. -- Visviva 03:07, 25 November 2008 (UTC)[reply]
This turns the discussion from formatting of glosses to whether the glosses should be there at all, and about the overall preferable format of non-English entries. I find it preferable, and have understood it to be the common practice, to indicate a single word or phrase serving as a translation, or a list of these. That is, to find such a word that a translator could actually use when translating text. In the context of railroad, "car" mostly disambiguously refers to a railroad car (unless it is a car carrying automobiles), and the translation would sound weird if it achieved formal disambiguity by invoking "railroad car" all the time. That is, from the following two options, I prefer the first one (disregarding now the question about the preferred formatting of the gloss).
  1. car (railroad car)
  2. A railroad car.
Sometimes, disambiguity can be achieved by listing synonymous translations instead of using a gloss, as used in vůz:
  1. car, automobile
  2. car, train car
Another example that could serve as a test case is výraz, currently formatted in the first sense as:
  1. expression (facial appearance usually associated with an emotion)
The entry clearly indicates the most useful translation in the given sense, accompanied by a gloss indicating the sense. The following alternative makes the "výraz" entry almost non-distinguishable from an English entry, practically destroying the rule that non-English entries should avoid definitions.
  1. An expression; facial appearance usually associated with an emotion.
--Dan Polansky 09:40, 25 November 2008 (UTC)[reply]
re "destroying the rule that non-English entries should avoid definitions" um, what? I've heard other people say this, but there is no such "rule". Quite the reverse: WT:ELE says unequivocally that all entries have definitions. It is often convenient to simply use the appropriate English term (disambiguated as appropriate), but it is still required to be a definition. And not limited to being only a "translation". Our mission statement (main page) says: "This is the English Wiktionary: it aims to describe all words of all languages using definitions and descriptions in English." (and has always said something similar)
I don't mean to suggest that (say) jicho be defined as "An organ that is sensitive to light, which it converts to electrical signals passed to the brain, by which means animals see." (someone did that for some language, I don't recall ;-). Defining it as "eye (organ of vision)" (in whatever detailed syntax) is sufficient, and provides the one-word translation. But it is still, as required, a definition, not just a translation. Robert Ullmann 10:21, 25 November 2008 (UTC)[reply]
I agree. (I might prefer "An eye: an organ of vision in humans and many animals.", but it's the same idea.) Put another way, let's pretend that (deprecated template usage) jicho were a rare English word meaning "eye". Obviously we wouldn't just copy and past the definition of "eye"; the clearest and simplest definition would be something like "(rare) An eye: an organ of vision in humans and many animals." I think the same logic applies to foreign words. —RuakhTALK 19:16, 26 November 2008 (UTC)[reply]
The example of jicho says "eye (organ of vision)". The definition of eye says: "An organ that is sensitive to light, which it converts to electrical signals passed to the brain, by which means animals see." So I would think that "eye" is the target term, "organ of vision" is a gloss, and the long term that I have just quoted is a definition. Put differently, while "organ of vision" would be probably insatisfactory as a definition in the English entry, it is perfectly okay as a gloss. So AFAICS the common practice is to avoid definitions and aim at the format shown at jicho. This can change, if we agree to do so, but that has not been the practice so far.
Quoting Wiktionary:Entry_layout_explained#Variations_for_languages_other_than_English, boldface mine:
"Entries for terms in other languages should follow the standard format as closely as possible regardless of the language of the word. However, a translation into English should normally be given instead of a definition, including a gloss to indicate which meaning of the English translation is intended."
--Dan Polansky 10:47, 25 November 2008 (UTC)[reply]
normally. normally. normally. In other words, "eye (organ of vision)" (a translation) is normally adequate as the definition, one doesn't write out "An organ that is sensitive ...". As I said. It (the ELE text) does not prohibit providing an adequate definition, and many, many, many terms require more than a one-word "translation" (1-1 translations between languages being a mostly mythical concept anyway). (Can you tell I am heartily sick of people insisting that entries be severely dumbed-down in one way or another because of whinging that "we aren't supposed to do that"? By all means include anything useful for the "translation", even though someone will claim it is a "definition" and therefore must be fixed and thus rendered useless. Argh.) Robert Ullmann 14:19, 25 November 2008 (UTC)[reply]
Okay, agreed. But then, let that additional, "useful" information be in the gloss, and let the best available translation be clearly indicated as standing in the first place at the entry. --Dan Polansky 16:12, 25 November 2008 (UTC)[reply]
On location of this topic in policy documents: The formatting of glosses in brackets and italics is mentioned at
and was added to that document on 14 February 2007. The mentioned document is just a help document; I can't find the relevant policy document. I assume that whether to format in italics or in roman has been considered a matter of taste so far.
The use of glosses in brackets has been codified in the mentioned document at least since 18 October 2006, using the example "[[man|Man]] (adult male)".
Wiktionary:Entry_layout_explained#Variations_for_languages_other_than_English has one paragraph devoted to the topic of formatting of non-English entries.
--Dan Polansky 10:47, 25 November 2008 (UTC)[reply]
To assume that you can go for a single-word translation is at best simplistic 90%. Anybody who's ever don translation knows that for most purposes billingual dictionaries are actually a Bad Thing. Words have connotations, collocations, they are used with specific referents that other languages don't have. They refer to things other languages don't name. There is no such thing as a (deprecated template usage) centre local de services communautaires or a (deprecated template usage) cégep in English, and there is in fact no proper way to translate "U.S. Route X" in other languages). (deprecated template usage) soulier and (deprecated template usage) chaussure have different connotations in Quebec and France (the former is literary in France, usual in Quebec). (deprecated template usage) breuvage does not readily translate to any English concept (AND has differing usages regionally) etc. etc. Circeus 14:29, 25 November 2008 (UTC)[reply]
I do understand that single-word translations are approximations. But so are many definitions. I am not arguing against glosses and against usage notes. I am arguing in favor of finding the best available short terms that can serve as a translation. The claim that bilingual dictionaries are bad thing is an overstatement, to say the least; they are hugely useful as compared to not having them at all.
The examples you give are exceptions to the general rule that direct translations that are good enough mostly can be found. When we give up the effort of finding the best direct translation, we may end up with more definitions than needed, because it is so much easier to describe the meaning around than to search for the single word that does the job.
Taking the second sense in breuvage that you mention--"Any liquid that can be drunk, especially nonalcoholic ones.", the translation that I would have entred is
  1. drink; beverage (especially non-alcoholic one)
The current breuvage entry does not tell me that the straightforward, even if ambigous, translation is "drink".
Taking quite a different example, I find to desirable to remove any additional information that someone could want to add to:
  1. cat (animal)
in an entry for Katze. So the contention is not about one big yes or big no for definitions, but about whether definitions are tolerated in those cases at which they are superfluous.
--Dan Polansky 16:12, 25 November 2008 (UTC)[reply]
Perhaps it would be constructive to assemble short lists of example entries, where the various proposed formats are used and where we (or most of us) agree that the particular choice of format is appropriate. I know that for some Latin entries, I have used single word translations on the definition line, sometimes using just one word but other times using more than one translation when the shade of meaning is not quite the same or the English translation is slightly ambiguous. There are also times where no suitable English translation existed, and I gave a full definition. If we assemble a collection of illustrative examples, we can then write text to accompany them and have a guide for editors into the bargain. --EncycloPetey 19:34, 25 November 2008 (UTC)[reply]

Brief comment, only to the first question: I'm using invariably variant #2, like Tbot. I've never pondered about it, probably for the same reason Mzajac mentioned (to avoid applying any formatting or other design element unless it serves a clear purpose; the formatting in this bracket is used to indicate a quotation). -- Gauss 00:07, 27 November 2008 (UTC)[reply]

Re Ruakh's comment several paragraphs above, from 19:16, 26 November 2008: AFAICS the use of long format without brackets versus the use of list of terms followed by a gloss in brackets is exactly the point of contention, not something for a sidenote. The mentioned (a) "An eye: an organ of vision in humans and many animals." is AFAICS not compatible with what it says at Wiktionary:Entry_layout_explained#Variations_for_languages_other_than_English, unlike the alternative (b) "eye (organ of vision)". In the example (a), there is no gloss; there is a translation + colon + definition + period.
In this discussion, the word "definition" is used ambiguously. There is such a thing as definition by synonym, but that is not what is meant by "definion" in the contrast set consisting of translation, gloss and definition. Sure the combination of translation and gloss effectively provides a definition, but that is not what is meant by "definition" in this distinction. To say that translation and gloss should be there instead of definition is to say that the variant (b) is preferred to (a), not that (b) does not effectively define the foreign-language term.
I'd like to add that I am not necessarily in favor of one formatting or the other. It is just that I have understood the single paragraph devoted to non-English entries in WT:ELE in certain way, and applied it that way, and have seen it applied the same way in many non-English entries. This discussion shows that there is a need for clarification.
I'd think that this discussion must have been lead before, but I do not know where and when, and with what results and opposing parties. --Dan Polansky 07:07, 27 November 2008 (UTC)[reply]

Latin verbs

Is it normal practice to have pages for every form of a Latin (or any other language) verb? Cos I've been creating them and somehow it feels wrong. Examples: detegis, detegit. Thanks. LGF1992UK 18:20, 24 November 2008 (UTC)[reply]

Yes it is. Ultimately we would like for those to be generated automatically, though, because their creation is tedious. Circeus 20:25, 24 November 2008 (UTC)[reply]
I have begun creating verb form pages by bot. A very good reason to do it this way is uniformity of the content. It is nice to see someone else interested in helping with this, but the Latin verb form entries you have created lack some of the desirable formatting for such pages. --EncycloPetey 19:29, 25 November 2008 (UTC)[reply]

Wikipedia is offering some etymology. See w:Wikipedia:Articles for deletion/Pay through the nose. Uncle G 16:07, 25 November 2008 (UTC)[reply]

Using template:gloss in non-English entries

This is a follow-up on a recent discussion about formatting of translations, glosses, and definitions in non-English entries.

Would anyone object if I repurpose {{gloss}} for the formatting of disambiguating glosses of definition lines in non-English entries? The {{gloss}} template is currently a redirect to {{gloss-stub}}, which is used to indicate that a gloss is missing.

The proposed default formatting of the template: "(gloss in roman)", based on the choice made by Tbot. The formatting can be changed in the template later, when we decide to do so.--Dan Polansky 07:31, 26 November 2008 (UTC)[reply]

It couls also be made customizable, which would keep everyone happy regardless of the formatting they'd like to see. --EncycloPetey 17:09, 26 November 2008 (UTC)[reply]
I have set up the {{gloss}} template for the task, and placed it in jicho, and some other entries. --Dan Polansky 09:14, 28 November 2008 (UTC)[reply]

"Wiktionary sucks"

I've just come in at the tail end of a conversation on a talker I'm active on that included the words "Oh, btw, Wiktionary sucks". I eventually got out of them why they thought this, and they came up with two reasons -

  1. "It includes things that aren't proper words" - by which they meant words that aren't in the OED.
  2. "It doesn't mark words that aren't universal as US only, etc" - (deprecated template usage) addicting (adjective sense) and (deprecated template usage) nihilarian (noun sense) were the two examples they gave (both now on the tearoom).

Adding more regional context labels where appropriate is obviously the way to deal with the second of these, but is there any way we could do better at knowing which words should be so labelled?

Regarding the first point, again I think more regional labels would help. However I think the main point here is that our descriptivist ethos of recording all words as used, not recording which words an "authority" says should be used, is not getting through.

Discuss. Thryduulf 13:43, 26 November 2008 (UTC)[reply]

Having just engaged in a long discussion related to these very points, I'm not really sure what our "descriptivist ethos" should be. Languages evolve. They always have and always will. But in times past, the evolution made sense. New words were introduced to describe new things, like "radar", which began as a military acronym and came into common usage. But now, the "Slangists" (if I may coin a word) seem to be in some sort of contest to see how much garble they can create. They make up new words, create new and often vulgar usages for old words, and just generally make a mess! How can we decide, (indeed, are we even qualified to decide), what constitutes valid language? I have no answer, but I will watch this discussion carefully to see if someone else does! -- Pinkfud 14:08, 26 November 2008 (UTC)[reply]
Personally, I'm not inclined to waste my time arguing with anyone about #1. It's not like there is some central authority on what "real" English words are. The only meaningful authority comes from the language itself. To put it another way, I don't think it's that the person doesn't realize Wiktionary is descriptive, but that they don't consider descriptivism to be a valid approach. Whatever, their loss.
On #2, if someone can find a general list of words that should be regionally tagged, it would be fairly trivial to go through them and check that this has been done. But I'm not sure where we would find such a list, and I expect most obvious cases have been tagged already. The ones that haven't are the really non-obvious cases like "addicting". Ultimately all entries need to be audited against their coverage in other dictionaries, which will turn up many such gaps, but coverage auditing is a very slow process and it would take a very long time to get to the kind of long-tail words that this person seems to be interested in.
The ideal course of action is for the person in question to join us and fix the problems themself.  :-) -- Visviva 14:21, 26 November 2008 (UTC)[reply]
Would we let them? I suspect there would be a few arguments to endure. By the way, did you realize you can take the Latin excīdō (to cut out) and cornu (horn) and create a definition for a certain "word" no one wants to see? LOL! (Now there's another "word" to ponder). Where does it all end? My point is, there's simply no way to keep up anymore. Is it real, or is it tosh? Only your hairdresser knows for sure. -- Pinkfud 14:44, 26 November 2008 (UTC)[reply]
Nihilarian is an old (c. 1708) Bishop Berkeley coinage. I'm not sure about any valid current use. Addicting is interesting. I had engaged a user on its talk page. The discussant definitely took a prescriptivist stance. It hadn't occurred to me that it might be regional. We do need more tags to live up to be truly descriptive. DCDuring TALK 16:04, 26 November 2008 (UTC)[reply]
More use of context tags is definitely part of the solution, but encouraging the use of supporting citations is another key component. An anon user can't come in and successfully argue "This word doesn't exist" when there is a list of durably archived citations supporting its existence. Also, do we have a place where we explain clearly that we are descriptivist and what that entails? That would be a GoodThing to have. --EncycloPetey 17:08, 26 November 2008 (UTC)[reply]

How about adding a dictionary checklist to entries? We can choose a list of comprehensive (OED, Webster's 3rd), regional (CanOD), and specialized dictionaries (slang, jargon, etc). Any of our terms can get tagged as to whether it is present or absent in each of the dictionaries.

This would help comfort dictionary users who want a sense of authority. It would also serve as a simple reference citation for the term. It may even help weed out plagiarism which is entered here.

Is there any disadvantage? The only thing I can envision is that if this can be used to generate complete wordlists from copyrighted dictionaries, then it might constitute infringement. Michael Z. 2008-11-26 23:50 z

Well, it also places emphasis on authority over citations. From time to time non-words do appear in dictionaries, such as through hoaxes. There are also phobia "dictionaries" which contain words that (apparently) have never been used outside of those dictionaries. Many people wrongly believe that a word must be in a dictionary before it is "real". When these people proclaim that a word is "not in the dictionary", they mean it is somehow invalid, inferior, or to be avoided (even if the word actually is in some major dictionaries). I therefore feel uncomfortable about feeding into this erroneous way of thinking. --EncycloPetey 02:57, 28 November 2008 (UTC)[reply]
But we already do include references to dictionaries (see category:Reference templates), as well as occasional “Dictionary notes” sections. One advantage of doing this in a systematic way and citing non-mentions would be that such non-words would be identified as missing from all the other dictionaries (rather than being accepted as real because they have a reference). I don't see how adding more factual information can be bad. Michael Z. 2008-11-28 15:59 z
Information good. But it should be noted that many non-words appear in multiple dictionaries, because the lexicographers simply copied from each other instead of doing their own research (perhaps an understandable failing in the pre-computer days). -- Visviva 16:11, 28 November 2008 (UTC)[reply]
Cool. So that means some of them may be suitably attested for inclusion in Wiktionary! Is there a name for such words? Has anyone written an article about them? Michael Z. 2008-11-28 17:04 z
They're called ghost words. See for example dord. Equinox 17:17, 28 November 2008 (UTC)[reply]
Aren't we essentially doing that a lot of the time by avoiding "inferior" words that aren't in Google Books or Usenet? Equinox 15:01, 28 November 2008 (UTC)[reply]
No, that is examination of actual usage, and not through appeal to authority of other dictionaries. --EncycloPetey 15:44, 28 November 2008 (UTC)[reply]
Nope. It's true that we omit such words, but we don't take the view that omission is a mark of inferiority. (Selective description is not prescription.) —RuakhTALK 16:00, 28 November 2008 (UTC)[reply]

I attempted to write an essay at Wiktionary:Descriptivism about this, but it really needs someone who undetrstands it better than I (and, more importantly, can write better than I) to redraft it. Our CFI are arbitrarily chosen to reflect the set of words we are interested in, we could change them to be stricter - perhaps to books only, or looser - and include the entire internet, without a change to our basic philosophy of describing what we see. However, in my opinion, and no doubt in other peoples, the dictionary would be less uesful with such changed CFI. The problem I can forsee with a "dictionary checklist" type of idea is that it encourages competition - either we try and force in as many words as possible so that we beat other dictionaries, or we start removing entries for the fear that we might not be as right as we had assumed. We could have a set of links automatically added to every page that look up the word in other online dictionaries - this would make us more useful as people will be able to find information that we don't have, and increase people's faith in the definitions we give - at the expense of encouraging fewer people to add new information here, and possibly those who do will just copy from the other dictionaries. I don't see the absense of dialectical labels as a huge problem compared to that of missing definitions and words - but again presence of accurate ones in many entries would make Wiktionary yet more useful. Conrad.Irwin 16:39, 28 November 2008 (UTC)[reply]

I do appreciate both sides of the coin, and I'm not sure which side's advantages outweigh the other's. I guess we're mostly speculating here. I do find that after I've created an entry, I do like to compare it to as many references as is practical, to ensure that I haven't made any flubs, and that I haven't inadvertently created an entry which appears to plagiarize one of them. Michael Z. 2008-11-28 17:13 z
That approach is perfectly valid, in my opinion. Making comparisons with published sources for the purposes of refinement is good. It is in using solely the authority of other dictionaries to determine relative merit of an entry where I have a problem. True, inclusion of a word in a major dictionary is persuasive for including an entry for which we might have trouble finding citations, but this is not a justification in and of itself. Rather, it is a stop-gap argument until supporting evidence can be found. Neither is absence from major dictionaries an argument for omitting an entry. "Absence of evidence is not evidence of absence." It is the argument from authority where I take issue. Ideally, entries should be evaluated on the basis of their own merits or flaws, independently of their publication history in dictionaries, and that's the point I think many detractors fail to realize. --EncycloPetey 19:05, 28 November 2008 (UTC)[reply]
This year's Christmas Competition is announced and is open to all contributors!
--EncycloPetey 02:52, 28 November 2008 (UTC)[reply]

Sister Projects

Hi. I just registret myself here on en.wikt and got a nice welcomemessage witch led me to this:Template talk:wikipedia. This is of cause an extreme case, but all the links "wikipedia" in the lefthand menubar called "in other projects" is not very intuitive in my opinion. What do you think about using my proposal on w:Wikipedia:Village_pump_(proposals)#Sister Projects here on Wiktionary? Prillen 10:55, 28 November 2008 (UTC)[reply]

If you have the javascript that implements this, I see no reason not to, but I feel that as we shouldn't be linking to more than one wikipedia article anyway (they have the disambiguation pages), that it's not that relevant. It also would bring problems for pages with long titles. Conrad.Irwin 14:53, 28 November 2008 (UTC)[reply]
Well, a page could link to maore than one Wikipedia article legitimately, but it would be linking to articles on different language editions of Wikipedia. For example, if pruba is a Spanish word meaning "foobar" and is also a Polish word meaning "doo-jigger", and if each has an article on their respective Wikipedias, then both of those WP articles would be linked from that page. It's uncommon (relatively speaking), but it does happen. --EncycloPetey 15:42, 28 November 2008 (UTC)[reply]

The template's talk page is full of tests, so this is not a fair demonstration of a problem needing to be solved. Can we see a real example where there is a need for additional icons?

Do we have a guideline which recommends single or multiple project links? Michael Z. 2008-11-28 19:40 z

Wiktionary:Links is the page where linking issues are covered. --EncycloPetey 19:49, 28 November 2008 (UTC)[reply]

Category for city, town, village...

I'd like to create a subcategory in Category:Geography to collect words like city, city state, village, town, suburb, megacity, metropolis, megalopolis, country. Would Category:Settlements be a good name? --Panda10 14:27, 28 November 2008 (UTC)[reply]

I'd say they are types of settlement. Actual settlements could be geographical places like London. Equinox 14:41, 28 November 2008 (UTC)[reply]
How about Category:Communities? BTW, Roget's has a city concept under Geography. --Panda10 18:48, 28 November 2008 (UTC)[reply]
Will it include suburbs, rural municipalities, counties, shires, districts, provinces, states, countries, federations? Only political subdivisions, or other kinds of regions too?
Wikipedia has a whole category tree for this: see w:category:Settlements and w:category:Country subdivisions. Perhaps we can start with a simple subset of it. Michael Z. 2008-11-28 19:11 z
The Wikipedia categories contain proper nouns in the end. The proposed category would not contain proper nouns. It is for the actual common words where people may live (but not buildings, we already have a category for that). I can actually just put them in Category:Geography. --Panda10 20:57, 28 November 2008 (UTC)[reply]

A related question: is there a word in English for words such as road, street, avenue, lane, crescent... i.e. kinds of odonyms? This would make a very useful category. odotype would be a good word but, unfortunately, it means something else. Lmaltier 21:57, 29 November 2008 (UTC)[reply]

These words are currently in Category:Roads. --Panda10 22:06, 29 November 2008 (UTC)[reply]

November 2008

Etymology sections are very concise

I think we really could use some more wording in etymology sections. Cryptic stuff like ‘short + cut’ really isn’t very helpful. I also do not like the use of ‘<’ (or was it ‘>’?) to indicate inheritance and so on. What do other people think of this and should we work out a consensus here?

The impetus is {{suffix}} and related templates which indirectly promote this terseness. If including some more wording in these templates is not wanted, then at least we whould update their usage to instruct people to put some verbiage around it, but I fear that will make the templates less usable. H. (talk) 16:23, 3 November 2008 (UTC)[reply]

I find the terseness to make the etymologies easier to read. The use of + and < allows the etyma to stand out more clearly. Quite frankly, what more do you want to put down than "short + cut"? Seems to me like unnecessary fluff. If you can show me a wordier ety that I like, I might change my mind. -Atelaes λάλει ἐμοί 18:25, 3 November 2008 (UTC)[reply]
Brevity is only good if it is unambiguous and gives all the necessary information. As we don't have that much information to give, these abbreviated forms work - I'm sure examples can be found where too much information has been packed too tightly. However, if we persist in having the Etymology bit before the useful</troll> parts of the entry, then they will be kept brief. Conrad.Irwin 18:38, 3 November 2008 (UTC)[reply]
I always put "From", e.g. From {{term|short}} + {{term|cut}}., but when another editor removes it, I don't revert. I do think the "From" is important, because not all of our readers know what "etymology" means. Similarly, I don't use <, because I don't think the casual reader will recognize it, and while in some cases I think the idea comes across anyway, in some cases I think it does not. (I view it as analogous to the various abbreviations, F. and so on, that are found in other dictionaries but that we don't use.) However, I'm fine with +, as it seems crystal clear to me. —RuakhTALK 20:05, 3 November 2008 (UTC)[reply]
"not all of our readers know what "etymology" means" Which editors do you mean? Etymology is a loanword from Greek present in almost every Indo-European language (and other languages who are not so reluctant to accept borrowings - エチモロジー ), therefore virtually all editors from Europe, South, Central and North America must know what etymology is, must not they? Bogorm 20:39, 3 November 2008 (UTC)[reply]
Just because a person is a native speaker of a language does not mean they know every word in that language, and certainly not that they understand every concept described by that language. To know what "etymology" means requires at least a rudimentary understanding of language evolution, etc. Those of us who are interested in a discipline should not take for granted such knowledge, as plain as it may seem to us. Up until a few days ago, I had no idea what the term "liquidity" meant (and to say that I have an exceptionally solid grasp of the term now would be deceptive), as I have almost no background in economics. I would not consider myself stupid nor generally uneducated (though others are free to disagree :)). -Atelaes λάλει ἐμοί 23:06, 3 November 2008 (UTC)[reply]
Starting with “From” is good form. It may help a new reader who has never heard the term etymology understand what he is looking at, on his very first page view of Wiktionary. It in no way detracts. Michael Z. 2008-11-04 17:09 z
I agree that starting with "From" is good form. --EncycloPetey 17:15, 4 November 2008 (UTC)[reply]
Etymologies for compound words don't warrant much more than we have, unless we decide to include dates for first attested usage of one or more senses. I wonder if we might not expose ourselves to an endless supply of folk etymologies if we make wordy etymologies, especially for compounds. Long, discursive, or disputed etymologies should certainly not consume too much space, especially if they force definitions off the first screen. Such etymologies and two or more lines of cognates especially should normally only appear under a show-hide bar, if cognates be retained at all. DCDuring TALK 20:48, 3 November 2008 (UTC)[reply]
I am very adamant that cognates remain, when appropriate. While I share your concern about ten page theses blocking out the heart of the entry, I don't think it prudent to trim the preceding content to a minimum. The answer lies, rather, in altering our formatting/presentation. -Atelaes λάλει ἐμοί 23:06, 3 November 2008 (UTC)[reply]
What about using show/hide bars for etymological material such as cognate lists or discussions of disputed etymologies that, in total, take more than 3 lines and push definitions off the initial screen (with right-hand Toc)? As you know I like etymologies, including long chains through Middle and Old English and French; Anglo-Norman; Vulgar, Medieval, Late, and New Latin, the loss of visibility of which would greatly sadden me. DCDuring TALK 23:26, 3 November 2008 (UTC)[reply]
Since I like too etymological chains through Old Norse, Gothic and Sanskrit, it would sadden me as well. (That was facetious, I support every etymological information) I do not embrace the proposal for the hide bars, since I am firmly convinced that etymology is one of the most important parts of the articles, and however disputed it is, expounding the diverse linguistic theories without concealing any of them is indispensable for a thorough comprehension of the entry´s meaning. Bogorm 16:36, 4 November 2008 (UTC)[reply]
If we were just running this for our own benefit, I could agree. I'm looking for ways to make this site more useful for ordinary (unregistered, non-contributing, non-linguist) users, by getting onto the initial sreen more of the info that is, I think, most commonly sought: 1. definitions and 2. a guide to the definitions and other material that don't fit on the first screen (the Table of contents). Registered users ought to be given the power to have show/hide bars in the sections that they select be expanded by default (a feasible option, BTW). DCDuring TALK 20:08, 4 November 2008 (UTC)[reply]
Ever since we introduced Show/Hide bars, my experience has been that average users aren't aware of their function, and overlook their contents entirely. Time and again, I have seen comments made from ordinary users who were surprised when they finally discovered them, and that's just the fraction who discover them. I've been following the addition of Translations to WOTD entries since before the introduction of Show/Hide bars. When these were introduced to the Tranlsations sections, addition of translations by average users plummeted, and this drop has never recovered to its former levels. In short, your position that these tables benefit average users isn't supportable. If we do choose to use them, them they should be expanded by default, and collapsible only as a customization feature available to registered users. --EncycloPetey 20:17, 4 November 2008 (UTC)[reply]
On the use of "<" and conciseness: I like the use of "<" to mean "from". Century 1911 uses "<" and "+" in its etymology markup. Unlike Wiktionary, the etymology in Century 1911 is not introduced by "Etymology" heading, and is placed in "[" and "]" instead; and yet the readers of Century 1911 must have managed to learn to read its entries. A new reader of Wiktionary sees a content under an "Etymology" heading, so can quickly look up the word "etymology" in Wiktionary to find out what it means. --Dan Polansky 21:09, 6 November 2008 (UTC)[reply]
Headers (of all kinds) take up more space on enwikt than on some other wiktionaries, and other online dictionaries, let alone print dictionaries. The space taken by "from" is almost negligible by comparison. DCDuring TALK 22:20, 6 November 2008 (UTC)[reply]
It's not the space taken by "from" for which I prefer "<". I prefer it for its faster showing me the derivation chain: my eye locates the individual elements separated by "<" faster than when they are separated using "from". I understand that one of the reasons why printed dictionaries use terse markup are the space constraints, which are absent in an electronic dictionary such as Wiktionary.
The issue seems to me to be at least remotely similar to the mathematician's preference of symbols to wordy sentences. Mathematical formulas can be phrased in the words of natural language, but when it is done, patterns and structures are obscured. --Dan Polansky 12:38, 7 November 2008 (UTC)[reply]
Good observation. Let me stress that I also do not like page-long etymologies, since most of the time, they contain garbage (that is just an observation, there might be occasional words where the information deserves its place). I think, however, that, probably out of fear for that, we have shot through to the other extreme of being so terse it is barely understandable without exercise. That was my initial concern. I plead for a compromise. For example, for compound words, I would say “Compound of X + Y” is a good compromise: novices will hopefully understand this, or at least have the possibility to look up compound (whether to link it by default or not can be argued for later, it is done for e.g. blend (is that in {{portmanteau}}? Yip.). As for >: I think it is unclear because I never understand which way it is pointing to or what it is supposed to mean. (I see you talk about < above, which just illustrates my point: if even a regular contributor doesn’t get the meaning, a novice will only be confused. I therefore replace the < wherever I see it.)
So please, let’s focus on finding a good compromise. H. (talk) 16:18, 13 December 2008 (UTC)[reply]
I find replacing "<" with "from" to be problematic, because in a full-length etymology it leads to a long series of "from ... from ... from ..." statements, which I actually find more disorienting and harder to read than the arrow notation. I also don't see how "<" could be interpreted as pointing in more than one direction, but I agree it's not very satisfactory. When I'm really feeling verbose, I break the etymology down into normal human sentences, something like: "From Middle English N, derived from Old French O. This in turn was derived from the Ancient Greek Q via the Latin P." But I'm not sure that approach really makes the historical sequence any easier to understand.
Myself, I don't really see a problem with terse etymologies. I don't mind verbose etymologies either, as long as they are accurate and stay on-topic. Still, for many words, a simple {{prefix}} or {{compound}} is a fairly complete etymology in itself, and comparable to what other dictionaries provide. If someone wants to expand that template into a complete sentence, or add supplemental information, so much the better. But a minimal etymology is vastly better than none, and helps to lay the groundwork for a more complete treatment in the future. -- Visviva 16:56, 13 December 2008 (UTC)[reply]
But note that the viewer of the page does not see the difference between {{suffix}} and {{compound}} and the like: both simply produce a plus, the only difference is a minimalistic - at the relevant place. I think that’s unfortunate, we could at least mention that it is a compound or a word with a suffix, as is e.g. done in {{blend}}. I propose to at least change the aforementioned templates to a wording like the latter. Unfortunately, that will probably break a lot of entries. H. (talk) 14:38, 4 January 2009 (UTC)[reply]
I can see your point... I don't think this is worth breaking thousands of entries over, but it would be nice to have a verbose option, or a set of verbose variants, for {{suffix}} et al. Simply forcing verbose behavior would be bad IMO, because there are times when the sentence needs to be formatted in some non-standard way. I am tempted to suggest that we just add an option like "verbose=yes", but as Robert recently noted, that is not usually the best approach... My brain is not working particularly well today, but there has to be some elegant solution that will allow verbosity without breaking any current uses. Maybe a specific control character that could be inserted in any position, something like {{compound|green|house|.}}. Shall we take this to WT:GP? -- Visviva 09:02, 8 January 2009 (UTC)[reply]
What about simply beginning the etymology with 'from'? While I would expect that most editors would know the word 'etymology', I don't think it's fair to make the same assumption for the average casual non-registered, non-editing user (feel free to throw more adjectives in there, if you wish). I know a number of A-level English students who have never heard the word 'etymology' in their life, but would nonetheless have an appreciation of how words evolve and "where words come from". Thus, 'from' at the beginning would be intuitive enough for those not immediately familiar with 'etymology', but without the verbosity of substituting the angle brackets for it. — Sasuke Sarutobi 02:43, 21 February 2009 (UTC)[reply]

Seeking final comment on Hangul syllable entries

OK, I am now the proud owner of a 2.3-megabyte text file containing basic entries for all Unicode Hangul syllables. For examples of the output, see and . Once created, I do not intend to edit these entries again, ever (excepting the handful that are real words), and I would sincerely hope that no one else has to edit them either. With that in mind, are there any final thoughts about the layout of these entries? -- Visviva 04:44, 19 November 2008 (UTC) P.S. If one of our resident wizards could find a way to make Template:ko-symbol-nav a bit less squirrely, that would be wonderful; however, since it's templated it's not urgent.[reply]

Looks good. I like that all the elements are in flexible templates. Quibbling details:
Can we link to Revised and Yale transliteration info? (for applications like these, it would be nice to have unobtrusive links, like the context labels in pl.wiktionary—cf. “geogr.” in pl:Korea).
ko-symbol-nav seems cluttered by all of the hyphen separators. Dots would be less obtrusive if you insist on character separators, but I think the table arrangement and spacing is sufficient. I would also link both the arrow and character for previous and next. I'd be glad to rework the template.
Wording: does ko-usage-keystroke need the word standard?—(to differentiate it from a common non-standard dubeolsik keyboard?) Can we link dubeolsik keyboard to an explanation or Wikipedia? ko-usage-unicode: Unicode standard notation is U+AD6B, with no need to explain that it's hex. You could reduce the wordiness to “Unicode representation U+AD6B.” Michael Z. 2008-11-24 17:36 z
Thanks for this. I think I have implemented all of your suggestions above (except for the "unobtrusive links" part; I'm not sure of the current state of consensus on that). Please feel free to edit the templates further if you are so inclined. -- Visviva 03:11, 25 November 2008 (UTC)[reply]
Neat! Let's do it! bd2412 T 04:52, 25 November 2008 (UTC)[reply]
The only thing that bothers me is that the "Usage notes" aren't actually notes about usage. Rather, they're notes about typing and encoding. Does anyone know of a better header for this section? --EncycloPetey 03:07, 28 November 2008 (UTC)[reply]
I don't see the problem there (taking "usage" with a very broad definition). Maybe just "notes" would avoid any such problem, but I don't think readers will be confused or mislead by the header as it is. bd2412 T 05:50, 28 November 2008 (UTC)[reply]
Good point. I would have just called them "Notes" if that weren't proscribed. How about "Technical notes"? That might come in handy for many of our Translingual entries as well. -- Visviva 05:53, 28 November 2008 (UTC)[reply]
It is of a technical nature. Whatever answer is chosen, the same method should be used on the Chinese/Han/CJK(V) entries and the Korean syllable entries, and from there to any entries about letters or symbols or characters which include technical nature.
An example of an actual usage note for Korean syllables might be to note those which don't actually occur in Korean writing. If I'm not mistaken I believe I've heard or read that Unicode includes Korean syllables which are technically possible but linguistically impossible. Is this correct? — hippietrail 06:40, 28 November 2008 (UTC)[reply]
It's a bit difficult to prove a negative. Syllables that don't exist in standard Korean may still turn up in eye dialect and internet 외계어 ("Martian", the language of the PC-bang generation). There are also some syllabic blocks that can never represent a syllable, but which are nonetheless common in written Korean (in fact that would apply to any syllable with an aspirated or compound batchim, such as 읊 in 읊다 or 없 in 없다). If someone wants to compile this information, it would do no harm, but I ain't volunteering. :-) -- Visviva 07:40, 28 November 2008 (UTC)[reply]
As a comparison, here is a typical Han character entry:
(radical 187 馬+2, 12 strokes, cangjie input 戈一尸手火 (IMSQF) four-corner 31127)
References
KangXi: page 1433, character 11
Dai Kanwa Jiten: character 44579
Dae Jaweon: page 1958, character 5
Hanyu Da Zidian: volume 7, page 4540, character 3
Unihan data for U+99AE
A lot of the technical stuff is in the "inflection line", skipping definitions, which syllables don't have, we then have a "References" heading which tells us where to find this character in several well known character dictionaries, followed by a link to the Unicode site which is where the Unicode codepoint is given. — hippietrail 06:49, 28 November 2008 (UTC)[reply]
I don't think the situations are really comparable. The CJK characters are real units of meaning, with real entries in real dictionaries; Hangul syllabic blocks, for the most part, have no independent meaning, and no existence outside of the realm of digital possibility. (This is why I tried to have them deleted, but failing in that I figure the next-best thing is to create a complete set of consistent entries for them.)
But yes, I could see putting the keyboard input and composition in the inflection line -- though I have to say that seems a bit odd, even for the CJK entries -- and putting the Unicode data under "References". Anyone else have thoughts on this? -- Visviva 07:40, 28 November 2008 (UTC)[reply]
Yes! I think the Korean entries above are far superior to this Han entry. For me, the Han entry is just incomprehensible babbling, even though I already know a little bit about it and have some ideas what some of it might mean. For a laymen, this is totally unusable. With the Korean entries, this is not the case: it is clearly explained what is what. (Ok, discussion about that is possible, but at lease more clearly.) So please keep the current format. I hope the people that create these Han entries like that will not be insulted, it’s just that I think a lot could be done to make them more helpful, like link to the relevant Wikipedia (or even an internal) page for all of the terminology (‘radical’, ‘stroke’, ‘cangjie’, ‘w:IMSQF’, ‘w:four-corner’ and all of the dictionaries). H. (talk) 16:45, 13 December 2008 (UTC)[reply]
One remark would be that the navigation templates are very concise. A simple caption telling what they are for or what they do would be great. H. (talk) 16:45, 13 December 2008 (UTC)[reply]
How about something like {{ko-symbol-nav-ga}}? This would require some more work, since the template would need to be hand-coded to a great extent, but it would provide some of the additional information people seem to want. -- Visviva 18:09, 13 December 2008 (UTC)[reply]
I do not really understand where you’re getting at, but I would just add some more explanation to the navigation templates: words like ‘next’, ‘previous’, probably explaining what kind of next (i.e. in what sense) etc. Unfortunately, I do not understand Korean well enough to do that myself. I think instead of 괴 ← I’d like to see Next symbol in Unicode sequence: 괴; or whatever relevant. A link explaining would do as well. Just think about usability for someone who has no idea of the concepts Unicode and Hangul. H. (talk) 18:14, 4 January 2009 (UTC)[reply]